Per-shot encoding

Per-shot (or per-scene) encoding takes per-title encoding one level deeper: instead of one ladder for the whole title, it chooses the best resolution and quality setting for each shot, because a quiet dialogue scene and an explosion have very different complexity. The technique grew out of Netflix's Dynamic Optimizer (2018), which builds a rate-distortion convex hull per shot and picks the resolution and quantization that meet a target bitrate while maximizing the shot's VMAF, then stitches the shots together. Measured at equal VMAF it cut bitrate by roughly 28% for x264, 34% for x265, and 38% for VP9 versus fixed-quality encoding, about 30% on top of the roughly 20% per-title alone delivers. The catch is cost: optimizing each shot means encoding it many times, raising the encode count by more than two orders of magnitude per title, so it is an economic call, worth it for heavily watched content and hard to justify for a long-tail catalog. Shot detection supplies the cut points it needs.

Per-shot encoding

Related terms

Per-title encoding

Shot / scene detection

Convex hull