Published 2026-05-16 · 14 min read · By Nikolay Sapunov, CEO at Fora Soft

Why this matters

If you operate a streaming service, an OTT platform, an e-learning library, or a video-conferencing product, your CDN bill scales linearly with bitrate and your viewer experience scales non-linearly with quality. Every percentage point a new codec saves at the same quality is real money. But you cannot plan around vendor marketing: a slide claiming "70 percent better than AV1" needs to be read against the theoretical maximum any codec can ever achieve, otherwise you size your encoding farm for a number that will never arrive. This article gives you the tools to read those claims — what Shannon's limit actually is, what it does and doesn't bound, where conventional and neural codecs sit on the curve in 2026, and how to estimate, for your own content and your own quality target, how much room is left.

What Shannon actually said in 1948

Claude Shannon's 1948 paper A Mathematical Theory of Communication did two related things. First, he defined a number called entropy, written H, that measures the average information content of a source — the average number of bits per symbol that the source produces, given how its symbols are distributed.1 The phrase "average information content" is doing real work here. If a source always emits the same symbol, its entropy is zero — the sequence is fully predictable, so describing it costs zero bits per symbol on average. If a source emits one of 256 symbols with equal probability, its entropy is 8 bits — every symbol is maximally surprising.

Second, Shannon proved the source coding theorem: it is impossible to compress a source losslessly below its entropy without losing information, and any compression scheme that uses fewer bits per symbol than H must, on average, fail to reconstruct the source.2 You can get arbitrarily close to H — modern entropy coders like arithmetic coding and CABAC sit within a fraction of a percent of the limit — but you cannot go beneath it. That is the lossless floor.

A useful analogy. A locked filing cabinet has a key, and the key has a minimum number of bits required to describe its shape. You can package the key in a smaller envelope by folding it cleverly, but the key itself is exactly N bits long. You cannot shrink the key — only the envelope.

For video, the lossless floor matters in archival and medical imaging, where formats like FFV1, ProRes 4444, and JPEG-XL operate, but it is not where consumer codecs live. Consumer codecs are lossy, which means they get to throw information away — and that opens a different door.

The lossy door: rate-distortion theory

In the same body of work, Shannon defined a second limit that does apply to consumer video. The rate-distortion function R(D) gives the smallest possible bitrate R at which a source can be encoded such that the average reconstruction error stays below a target distortion D.3 R(D) is a curve, not a single number. As you allow more distortion (D rises), the required bitrate (R) falls. As you demand near-perfect reconstruction (D approaches zero), R climbs toward the entropy H.

The shape of the R(D) curve depends on three things. First, what the source actually looks like — a low-motion talking-head shot has a different curve from a 4K sports broadcast, and a piece of computer-generated animation has yet another. Second, how you measure distortion — mean squared error gives one curve, structural similarity gives another, VMAF gives a third. Third, the statistical model of the source — a richer model, capturing temporal and spatial correlations, produces a tighter (lower) R(D) curve.

R(D) is the right object to think about when you compare codecs. Every codec, conventional or neural, can be plotted as a point or a curve in (R, D) space. The Shannon limit is the lower-left envelope: the curve below which no codec can go for that particular source and that particular distortion measure. Generational progress in codecs is the slow walk of the codec curve down toward the Shannon limit.

A two-dimensional plot with bitrate on the x-axis (low to high) and quality on the y-axis (low to high), with the theoretical Shannon rate-distortion limit drawn as a curve hugging the upper-left region. Five codec generations are plotted as curves at increasing proximity to the limit: H.264 (2003), HEVC (2013), AV1 (2018), VVC (2020), and DCVC-FM (2024). Each curve is labelled with the year and the codec name. A clear annotation in the corner notes: "Distance to the Shannon limit shrinks each generation, but the relative gap closes more slowly than the absolute gap." Figure 1. Codec progress reads as a walk toward the rate-distortion floor. Each generation closes part of the remaining gap, but the easy gains are behind us.

Where conventional codecs sit on the curve in 2026

The most-cited generational milestones in commercial codecs measure relative bitrate savings using the Bjøntegaard delta rate, abbreviated BD-rate. BD-rate gives the average bitrate change between two codecs at the same objective quality, integrated across a range of quality points.4 The numbers below are BD-rate savings against H.264, measured on standard JVET test sets, mostly at HD and UHD resolutions.

HEVC (2013) buys roughly 50 percent over H.264 on standardised test content, with the win larger on UHD than HD because HEVC's larger coding tree units handle high-resolution flat areas more efficiently.5 AV1 (2018) lands at roughly 30 percent over HEVC on the same content, or about 65 percent over H.264.6 VVC (2020) measures about 40 to 50 percent over HEVC on UHD content, which compounds to roughly 75 percent over H.264.7 At 8K resolution, BBC R&D measurements published in late 2024 land VVC at 78 percent savings, AV1 at 63 percent, and HEVC at 53 percent — all relative to H.264.8

AV2, finalised in late 2025, adds another 30 to 40 percent over AV1 on Google's internal test sets.9 That compounds to roughly 77 percent over H.264 — almost identical to VVC. AOMedia and the JVET project (which produces VVC and the experimental ECM extensions) are now within a few percentage points of each other on most content.

A useful pattern emerges. H.264 to HEVC was a 50 percent halving. HEVC to AV1 was a 30 percent third. AV1 to AV2 is another 30 percent third. The absolute gain is shrinking each generation, but more importantly the gap that remains is shrinking faster — which means each future generation has less room to claim. The Shannon limit for natural video, on conventional distortion measures, is not infinitely far away.

Codec Year BD-rate vs H.264 Notes
H.264 / AVC 2003 0% (baseline) Workhorse of the internet
HEVC / H.265 2013 -50% Patent thicket slowed adoption
AV1 2018 -65% Royalty-free; 30% of Netflix viewing in 2025
VVC / H.266 2020 -75% Best conventional standard; weak market traction
AV2 2025 -77% ~30% over AV1; AOMedia finalised late 2025

The pattern is the diminishing-returns curve of any maturing technology. We are not at the floor — there is still room, especially on high-motion and high-resolution content — but the room is thinner each cycle.

The new dimension: perception

For a long time, the codec community measured progress along the (R, D) axes using objective distortion measures like PSNR or SSIM. The dirty secret of those measures is that they correlate poorly with what a human viewer actually sees. A picture can have low MSE and look terrible (waxy, over-smoothed faces) or high MSE and look great (realistic film grain, sharp textures). Netflix's VMAF was a major step forward because it correlates better with subjective scores, but it is still a per-frame regression onto a fixed observer model.

In 2018 and 2019, Yochai Blau and Tomer Michaeli published a series of papers that re-framed the problem in a way that has become central to the field.10 They showed that distortion and perceptual quality are mathematically at odds — at a fixed bitrate, you cannot simultaneously minimise the mean squared error and the divergence between the distribution of reconstructed images and the distribution of natural images. The two objectives trade off, and the trade-off has a precise rate-distortion-perception (RDP) function.11

The practical consequence: at a given bitrate, the picture can be optimised to look correct ("plausible image from the same distribution") at the cost of being slightly wrong ("each pixel differs from the original more than necessary"). Or it can be optimised to be pixel-accurate, at the cost of looking subtly degraded. The two cannot both be minimised. Modern neural codecs, which can be trained against any differentiable loss, have started to exploit this — they sit on a different part of the RDP surface than conventional codecs do.

A 3D concept diagram with three axes: bitrate (R), distortion (D), and perceptual divergence (P). A surface curves through the space showing the rate-distortion-perception trade-off. Two regions are highlighted: "conventional codec target" (low R, low D, high P) sits on one part of the surface; "neural codec target" (low R, higher D, low P) sits on another. An annotation: "You can be pixel-accurate or perceptually convincing at a given bitrate — not both." Figure 2. The Blau-Michaeli rate-distortion-perception surface. Conventional codecs optimise distortion; neural codecs can shift along the perception axis at the same bitrate.

Where neural codecs stand in 2026

A neural video codec replaces some or all of the hand-designed blocks in a conventional encoder — transform, prediction, entropy coder — with a learned model. The learned model is trained directly on the rate-distortion (or rate-distortion-perception) objective using stochastic gradient descent, which means it can exploit statistical regularities in natural video that hand-coded blocks miss.

Microsoft Research's DCVC family is the most-cited line of work. DCVC-DC (2023) was the first neural video codec to beat VVC's reference implementation VTM on standard JVET test sets under intra-period 32. DCVC-FM (2024), with feature modulation and multi-scale temporal context, beat VTM by 25.5 percent in BD-rate on the same test set under the more demanding intra-period -1 setting (a single intra frame at the start, then unlimited inter prediction).12 DCVC-RT (CVPR 2025) is the first real-time neural codec — it reaches roughly 21 percent BD-rate gain over VVC at framerates that match conventional encoders.13

A more recent benchmark, published at CVPR 2025, evaluated DCVC-FM, DCVC-DC, ECM (the JVET experimental codec that sits above VVC), and AVM (AOM's experimental codec that became AV2). Measured by VMAF on 4K content, DCVC-FM achieves bitrate savings of more than 8 percent over AVM, comparable to ECM, and over 37 percent over HEVC.14 Measured by perceptual quality more carefully — using LPIPS or FID against reference video distributions — neural codecs pull further ahead.

The catch in 2026 is the same catch neural codecs have had since they appeared: compute. A neural decoder requires a GPU or a substantial NPU, runs at orders-of-magnitude higher power than a hardware HEVC decoder, and is not yet present in any consumer device's silicon. DCVC-RT's "real-time" benchmarks run on an NVIDIA RTX 4090 — a 450-watt desktop card. That is fine for VOD pre-encoding (where you spend GPU once and recoup it on every play) and for cloud transcoding, but it is not yet a delivery codec.

How far is the Shannon limit, really?

A useful way to estimate is to compare the best current codec to a lower bound on R(D) computed from the source. For natural video at moderate compression rates, several independent estimates land in similar territory. Theoretical work on Gaussian models suggests that the remaining gap from VVC and DCVC-FM to the unconstrained R(D) bound is in the single-digit percentage range at typical streaming bitrates, and somewhat larger at very-low-bitrate operating points.15 In other words, for the workhorse 1080p and 4K rates where most of the internet's viewing happens, the structural floor is close enough that no more than one or two more big generational steps are likely on conventional metrics.

Two important caveats. First, that floor is not a single number — it depends on the distortion measure. On PSNR, the floor is close. On perceptual measures like VMAF or LPIPS, the floor is lower (you can be further from the original pixel by pixel and still look identical), and there is more room. Second, the RDP surface introduces a third axis — perception — and the floor there is harder to estimate because the "perceptual divergence" objective has many possible definitions, each with its own bound.

The practical implication is that the question "how much more compression is left" splits into three:

  • On PSNR and conventional MSE: small single-digit gains over VVC and AV2. The room is close to gone.
  • On perceptual measures like VMAF and LPIPS: 20 to 40 percent more room, mostly accessible only to neural codecs trained on those losses.
  • On the RDP perception axis: large remaining room, but the question shifts from "how few bits" to "how natural does the reconstruction look", which is a different objective altogether.

A 70 percent compression-improvement headline in 2026 is almost always measured on a perceptual axis that conventional codecs were never optimised for. That is real — neural codecs really do produce more natural-looking video at the same bitrate — but it is not the same number as the "60 percent better than HEVC" that AV1 measured against VMAF in 2018.

A horizontal bar chart titled "BD-rate savings vs H.264 across codec generations (2026 measurements)". Bars from left to right at progressive savings: H.264 0% (baseline reference), HEVC -50%, AV1 -65%, VVC -75%, AV2 -77%, DCVC-FM -82% (perceptual). Each bar is colour-coded by family: conventional codecs in blue, AOM open codecs in green, neural codec in purple. A dashed red line marks an estimated "Shannon limit zone" at roughly -90 to -92%. A footnote: "Numbers are averages across standard JVET test sets, mostly 1080p/4K; perceptual savings measured on VMAF." Figure 3. Each generation closes part of the remaining gap. The dashed zone is a rough estimate of the Shannon floor on natural video at common streaming bitrates.

A common mistake: treating "Shannon limit" as a single number

The most expensive misconception in this area is the assumption that there is one Shannon limit you can quote for "video", the way there is one speed of light. There is not. The rate-distortion function R(D) depends on:

  • The source. A 4K sports stream has a different R(D) curve from a low-motion lecture video. Animated content has a curve different from both.
  • The distortion measure. PSNR, SSIM, VMAF, and LPIPS each yield different curves on the same source.
  • The statistical model. A simple block-IID Gaussian model gives one R(D); a temporally-correlated mixture model gives a much lower R(D); a deep autoregressive prior gives lower still.

So the right way to read a vendor claim is: "vs which codec, on which content, with which quality metric, and at which bitrate operating point?" A claim that reads "30 percent better than AV1" with no further context is approximately a coin flip — for some content and some metric it will be true, for some it will not. The Bjøntegaard methodology was invented to standardise this conversation, and the JVET common test conditions and AOM's CTC define exactly which sequences and which configurations are used in academic comparisons.16 If a vendor claim doesn't reference one of these test conditions, the claim is marketing, not measurement.

A worked example: the Shannon-limit budget for a 4K SDR catalogue

To anchor the numbers, imagine a streaming service with a 10,000-hour 4K SDR catalogue, encoded today in HEVC Main 10 at an average 12 Mbps. The catalogue weighs 10,000 × 3,600 × 12,000,000 ÷ 8 = 54 TB. CDN egress at 10x monthly catalogue plays is 540 TB.

Switching to AV1 at the same quality saves roughly 30 percent over HEVC on this kind of content: new bitrate around 8.4 Mbps, catalogue around 38 TB, egress around 380 TB. At a typical 2026 CDN price of about 0.01 dollars per GB on a tier-1 provider, that is (540 - 380) × 1,000 × 0.01 = 1,600 dollars per month, or 19,200 dollars per year saved on a relatively small library.

Switching to a future best conventional codec — AV2 or VVC — buys another 5 to 8 percentage points on top of AV1, perhaps another 1,000 dollars per month. Switching beyond that to a future neural codec trained on perceptual losses could buy another 15 to 20 percent, but only at the price of decoder cost — until consumer silicon ships a neural decoder, the savings are theoretical for delivery and only realised in cloud transcoding.

The arithmetic is the punchline. The remaining structural compression headroom on conventional codecs, on a real CDN budget, is a few thousand dollars per month per 10,000 hours of catalogue. That is enough to justify a codec swap every five or six years, but it is not the kind of number that funds a multi-year R&D program by itself. Neural codecs reset the math because they unlock a different axis (perception) and a different kind of saving, but they also reset the compute cost.

Where Fora Soft fits in

We build video-streaming, OTT, e-learning, telemedicine, video-conferencing, and AR/VR systems where codec-efficiency decisions show up in the line items every quarter. The framing we apply to product teams is the same one this article ends on: separate the question "how few bits do we need for our content at our quality bar" from the question "how few bits is possible in principle". The first answer is your engineering target; the second tells you when to stop expecting more from each generation. In OTT and e-learning deployments we routinely measure first-party R(D) curves on customer content rather than rely on standardised test sets, because the wrong baseline assumption can mis-size a CDN contract by 20 percent in either direction.

A pitfall to avoid: optimising the wrong objective

Engineers who already work in codecs sometimes spend optimisation budget driving down PSNR on operating points where PSNR no longer correlates with perceived quality. The classic symptom is a per-title encoding ladder that scores well on objective metrics but has waxy faces, smeared textures, and unnaturally smooth gradients on subjective review. The fix is to pick the metric (or the combination of metrics, including a perceptual one like VMAF and a learned-perceptual one like LPIPS) that matches your audience's actual judgement, and optimise for that. Otherwise the codec keeps moving down the R(D) curve on the wrong axis, and the picture looks worse at lower bitrate.

What to read next

Talk to us / See our work / Book a scoping call

  • Talk to a video engineer — book a 30-minute call to model the rate-distortion budget on your real content.
  • See our case studies — Fora Soft's streaming, OTT, and conferencing projects (2005-present).
  • Book a scoping call — discuss codec strategy and migration timing for your platform.

References


  1. Shannon, C. E. (1948). "A Mathematical Theory of Communication." Bell System Technical Journal. The founding paper of information theory; defines entropy and proves the source coding theorem. https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf. Accessed 2026-05-16. 

  2. Shannon's source coding theorem — Wikipedia overview of the lossless compression bound at the entropy rate. https://en.wikipedia.org/wiki/Shannon%27s_source_coding_theorem. Accessed 2026-05-16. 

  3. Rate-distortion theory — Wikipedia. Definition of R(D), the smallest rate at which a source can be encoded subject to a distortion constraint. https://en.wikipedia.org/wiki/Rate%E2%80%93distortion_theory. Accessed 2026-05-16. 

  4. Bjøntegaard, G. (2001). "Calculation of average PSNR differences between RD-curves." VCEG-M33. The Bjøntegaard delta methodology that codec teams use to compare two codecs at the same quality. https://www.itu.int/wftp3/av-arch/video-site/0104_Aus/VCEG-M33.doc. Accessed 2026-05-16. 

  5. Sullivan, G. J. et al. (2012). "Overview of the High Efficiency Video Coding (HEVC) Standard." IEEE Transactions on Circuits and Systems for Video Technology. Reports ~50% BD-rate savings of HEVC over H.264 on JVET test sets. https://ieeexplore.ieee.org/document/6316136. Accessed 2026-05-16. 

  6. Chen, Y. et al. (2020). "An Overview of Coding Tools in AV1: the First Video Codec from the Alliance for Open Media." APSIPA Transactions on Signal and Information Processing. AV1 averages ~30% BD-rate gain over HEVC on random-access configurations. https://www.cambridge.org/core/journals/apsipa-transactions-on-signal-and-information-processing/article/overview-of-coding-tools-in-av1-the-first-video-codec-from-the-alliance-for-open-media/. Accessed 2026-05-16. 

  7. Bross, B. et al. (2021). "Overview of the Versatile Video Coding (VVC) Standard and its Applications." IEEE Transactions on Circuits and Systems for Video Technology. VVC reports ~40-50% BD-rate gain over HEVC on UHD content. https://ieeexplore.ieee.org/document/9503377. Accessed 2026-05-16. 

  8. Performance Comparison of VVC, AV1, HEVC, and AVC for High Resolutions (2024). MDPI Electronics. Reports BD-rate savings at 8K: VVC -78%, AV1 -63%, HEVC -53% vs H.264. https://www.mdpi.com/2079-9292/13/5/953. Accessed 2026-05-16. 

  9. AOMedia AV2 open video codec release nears, delivers around 40% bandwidth reduction (2025-11). CNX Software. AOMedia's reported AV2 efficiency over AV1; YUV-PSNR 28.6%, VMAF 32.6% gains. https://www.cnx-software.com/2025/11/21/aomedia-av2-open-video-codec-release-nears-delivers-around-40-bandwidth-reduction/. Accessed 2026-05-16. 

  10. Blau, Y., Michaeli, T. (2018). "The Perception-Distortion Tradeoff." CVPR 2018. The original paper proving that distortion and perceptual quality are mathematically at odds. https://arxiv.org/abs/1711.06077. Accessed 2026-05-16. 

  11. Blau, Y., Michaeli, T. (2019). "Rethinking Lossy Compression: The Rate-Distortion-Perception Tradeoff." ICML 2019. Introduces the RDP function as a generalisation of Shannon's R(D). https://arxiv.org/abs/1901.07821. Accessed 2026-05-16. 

  12. Li, J., Li, B., Lu, Y. (2024). "Neural Video Compression with Feature Modulation." CVPR 2024. DCVC-FM beats VTM (the VVC reference) by 25.5% in BD-rate under intra-period -1. https://arxiv.org/abs/2402.17414. Accessed 2026-05-16. 

  13. Jia, Z. et al. (2025). "Towards Practical Real-Time Neural Video Compression." CVPR 2025. DCVC-RT delivers ~21% BD-rate gain over VVC at real-time framerates. https://dcvccodec.github.io/. Accessed 2026-05-16. 

  14. Benchmarking Conventional and Learned Video Codecs (2024). arXiv preprint. Cross-codec comparison of DCVC-FM, ECM, AVM, and AV1 on VMAF. https://arxiv.org/abs/2408.05042. Accessed 2026-05-16. 

  15. Liu, J., Lu, G., Wang, Y. (2025). "Advances in Neural Video Compression: A Review and Benchmarking." Preprints.org. Reviews how close current codecs sit to the theoretical R(D) bound on natural video. https://www.preprints.org/manuscript/202604.0035. Accessed 2026-05-16. 

  16. JVET Common Test Conditions and AOMedia Common Test Conditions — the standardised test sets and configurations used to make BD-rate comparisons meaningful. https://www.itu.int/en/ITU-T/studygroups/2017-2020/16/Pages/video/jvet.aspx. Accessed 2026-05-16.