End-to-end learned compression

End-to-end learned compression is the research frontier where the entire video encoder and decoder are built as a single neural network trained end-to-end. Where traditional codecs (H.264, HEVC, AV1, VVC) are hand-designed hybrids of mathematical components — prediction, transform, quantization, entropy coding — each refined separately over decades, learned compression replaces all of those with neural network layers, and the training process jointly optimises everything for a final reconstruction-quality target. The encoder learns its own prediction, its own transform, its own entropy model; the decoder learns to invert all of that.

The promise is large. Without the constraints of human-designed building blocks, the network can discover compression strategies tuned exactly to the kind of content and quality target you train for. Research systems — Google's HiFiC, various academic neural video codecs — already match or beat traditional codecs on perceptual quality at the same bitrate, especially on extreme low-bitrate scenarios where traditional codecs produce ugly blocking and learned ones produce blurry-but-natural-looking output. As of 2026, MPAI is standardising an "EEV" (End-to-End Video coding) framework specifically for this category.

For a product team in 2026, end-to-end learned compression is research, not production. Decode times are still 10–100× slower than hardware-accelerated traditional codecs; hardware acceleration is essentially nonexistent on consumer devices; standards are not yet ratified; and the output quality, while impressive in perceptual metrics, sometimes hallucinates details that weren't in the source (faces that look "right" but aren't the original face). The technology will likely become commercially relevant in the 2027–2030 timeframe as hardware decoders ship and standards stabilise. Worth watching but not worth deploying yet outside niche experimental applications.

End-to-end learned compression

Related terms

Neural codec

AI in encoding

AV2