Shot / scene detection

Definition

Finding the cuts in a video so each shot or scene can be encoded and measured independently. The natural unit for per-shot encoding, because frames within one shot share similar motion and detail; also used to read the worst shots a mean score hides.

Shot (or scene) detection finds the cuts in a video, the points where one continuous shot ends and the next begins, so each shot can be encoded and measured on its own. It rests on a simple assumption: frames within a single shot share spatio-temporal characteristics, similar motion and detail, which makes a shot the natural unit to optimize. In a per-shot encoding pipeline, detection supplies the boundaries the optimizer needs to build a convex hull per shot and choose the best resolution and quality setting for each. It also serves measurement directly: scoring per shot exposes the worst stretches that a single mean VMAF over a whole title quietly absorbs, the same reason teams read a low percentile rather than the average. The catch is that detection is imperfect, missed or false cuts misalign the units, and on a full-reference metric a shot boundary that does not match between source and encode can corrupt the comparison, so the cuts must be consistent on both sides.

Also known as

scene detection