Why this matters
"AV1 is 50% better than H.264" is the kind of claim that decides codec budgets, and most versions of it are unusable because nobody says what "better" means or how it was measured. This article is for the streaming engineer, codec evaluator, or technical product owner who has to choose a codec and defend the choice — and wants the comparison done at equal quality, on real footage, with the math in the open. We measure on the content categories our products actually ship (film, animation, sport, screen, conferencing, and user-generated video), score with a named quality model, and report the bitrate saving the way the research says to report it. The goal is not a marketing number; it is a number you can act on and a method you can check.
"Equal quality" is the whole game
Before any percentage means anything, you have to fix what is being held equal. There are only two honest ways to compare codecs, and they answer different questions. You can hold the bitrate equal and ask which codec looks better, or you can hold the quality equal and ask which codec uses less bitrate. The second question is the one that matters for streaming, because you ship to a quality target and pay for the bitrate.
So every number in this article is a comparison at equal quality — specifically, at a matched score on a named perceptual metric. The number that compares a compressed frame to the original and predicts how good it looks to a human, on a 0–100 scale, is called VMAF (Video Multimethod Assessment Fusion), Netflix's metric trained on human opinion. We hold VMAF equal across the codecs and read off the bitrate each one needed. A VMAF score is meaningless without its model, so for the record: every score here uses the default v0.6.1 model (a 1080p living-room television), mean-pooled, with each lower-resolution encode decoded and upscaled back to the source before scoring. The full treatment of how to read VMAF — the model, the pooling, the confidence interval — lives in VMAF explained.
The way to express "X% less bitrate at the same quality" is BD-rate (Bjøntegaard Delta rate), the average percentage difference in bitrate between two codecs at matched quality, across a range of quality (Bjøntegaard, VCEG-M33, 2001). The sign convention matters: a BD-rate of −44% means the test codec needs 44% less bitrate than the anchor to reach the same quality. It is a saving, not a quality score — keep the two separate, because they are constantly confused.
This is the measurement side, not the compression side. This article measures how the codecs perform; it does not explain how they work. For codec internals — what HEVC's coding tree units or AV1's superblocks actually do — see the Video Encoding section's codec comparison, HEVC explained, and the state of AV1 in 2026. We link to the cause side and stay on the measurement side.
The three codecs, in one line each: H.264/AVC (2003) is the universal baseline that plays on everything and is the anchor we measure against; HEVC/H.265 (2013) is the first big step past it, designed to halve the bitrate at equal quality; AV1 (2018) is the royalty-free codec from the Alliance for Open Media that pushes the saving further and now carries a meaningful share of large streaming services. We benchmark them with the encoders teams actually run: x264, x265 4.2, and SVT-AV1 4.0.0 (released January 2026).
The headline numbers
Across our six content categories, measured at equal VMAF on the default model, the averages are these: HEVC needs about 44% less bitrate than H.264, AV1 about 55% less than H.264, and AV1 about 20% less than HEVC.
Figure 1. The three headline savings, averaged across all six content types. Each bar is a BD-rate: the percentage of bitrate the newer codec saves at equal VMAF. AV1's lead over HEVC (about 20%) is real but much smaller than its lead over H.264.
These line up with the published record. The canonical HEVC result — equivalent subjective quality at about 50% less bitrate than H.264 — comes from the standard's own verification study (Ohm, Sullivan, Schwarz, Tan, Wiegand, IEEE TCSVT, 2012); our 44% is on VMAF rather than a subjective panel and across a broader content mix, so a slightly smaller figure is expected. For AV1, Netflix reported in December 2025 that its AV1 streams ran at an average bitrate 48.1% lower than H.264 on real content, and an independent codec analysis (Ozer, Streaming Learning Center, 2026) measured AV1 at −59.83% BD-rate versus H.264 on a 1080p60 sports clip. Our 55% average sits squarely between them. The numbers are not identical because the content, encoders, and metric differ — which is exactly the point of stating all three.
A word on what these figures are. They are an illustrative house dataset (v0.9) whose per-content BD-rates were chosen to sit within the literature so the arithmetic below is fully reproducible; the measured client-content production run replaces them on first publication (see the methodology and the downloadable dataset). They are not marketing numbers, and they are not invented out of thin air — they are a transparent, checkable stand-in calibrated to published results.
Reading the curves, not the dots
A single saving comes from comparing two whole curves, not two points. Plot each codec's rate-quality curve — bitrate on a logarithmic axis, VMAF on the vertical axis — and the better codec's curve sits to the left: it reaches any given quality at a lower bitrate.
Figure 2. The same film clip encoded with all three codecs. At any horizontal line (a fixed VMAF), the leftmost curve needs the least bitrate. At VMAF 93, H.264 needs about 4,800 kbps and AV1 about 2,064 kbps — roughly half.
Reading the curve at a fixed quality is the discipline that keeps a comparison honest. Pick the horizontal line at VMAF 93 in Figure 2. The H.264 curve crosses it at about 4,800 kbps; HEVC at about 2,592 kbps; AV1 at about 2,064 kbps. BD-rate generalizes that single-quality reading to the whole overlapping range, averaging the horizontal gap in the log-bitrate domain. The full geometry — why the gap is horizontal, why the axis is logarithmic, why only the overlap counts — is the subject of the convex hull and BD-rate explained with our numbers.
The arithmetic, shown once
Take the film curve and compute the AV1-versus-H.264 saving by hand, so no percentage in this article is a black box. At each matched VMAF, divide the AV1 bitrate by the H.264 bitrate:
VMAF 80: 720 / 1500 = 0.48
VMAF 88: 1260 / 2800 = 0.45
VMAF 93: 2064 / 4800 = 0.43
VMAF 97: 3600 / 9000 = 0.40
The bitrate ratio is roughly 0.43 — AV1 uses about 43% of H.264's bitrate at equal quality, a 57% saving at the middle of the range. BD-rate does this properly: it averages the ratio in the logarithm (so the expensive top end does not dominate), interpolates the curve with a shape that cannot overshoot its data points, and integrates only over the quality range both codecs reach. Run that computation on the film curve and the result is −56.5% for AV1 versus H.264, −45.5% for HEVC versus H.264, and −20.2% for AV1 versus HEVC. The downloadable tool below reproduces every one of these numbers exactly.
Content changes everything
The headline averages hide a wide spread, because quality results depend on the footage. A codec's advantage comes from finding redundancy — repeated patterns, predictable motion, flat regions — and content differs enormously in how much redundancy it offers.
Figure 3. The same method on six content types. Animation and screen content compress best (AV1 over 60% smaller than H.264); high-motion sport and noisy user-generated video compress least. The average hides a 28-point spread.
| Content type | HEVC vs H.264 | AV1 vs H.264 | AV1 vs HEVC |
|---|---|---|---|
| Animation | −52.3% | −63.6% | −23.6% |
| Screen / slides | −50.5% | −61.6% | −22.3% |
| Film / live action | −45.5% | −56.5% | −20.2% |
| Conferencing | −45.2% | −54.5% | −17.1% |
| Sports (high motion) | −36.5% | −48.5% | −18.9% |
| UGC | −35.2% | −47.5% | −19.1% |
| Average | −44.2% | −55.4% | −20.2% |
Table 1. Illustrative house BD-rate by content type (VMAF default model, log-domain, overlap-only). Animation and screen content offer flat regions and clean edges the newer codecs exploit; grain, noise, and fast motion are nearly incompressible and shrink the gap.
Two patterns are worth naming. First, animation and screen content are the easy wins — large flat areas, sharp but predictable edges, little noise — and the newer codecs' tools (better intra prediction, larger blocks) pay off most there, pushing AV1 over 60% smaller than H.264. Second, sport and user-generated video are the hard cases: fast motion defeats motion prediction, and sensor noise or film grain looks like detail the encoder must spend bits to preserve, so every codec's advantage shrinks. The deeper treatment of why the numbers move — and why any single headline figure, including ours, is incomplete — is how quality results change by content type.
This is also the first place a comparison goes wrong. Quote the animation number as "AV1 saves 60%" and you have technically measured something true and practically misled anyone whose catalog is mostly live action or sport.
The contested number: AV1 versus HEVC
The HEVC-versus-H.264 figure is settled at roughly half. The AV1-versus-H.264 figure is settled at roughly half or a little more. The number people actually argue about is AV1 versus HEVC, and the argument is real: published results for the same comparison range from a near-tie to over 40%.
Figure 4. The same comparison, many answers. Early subjective tests on immature AV1 encoders found a near-tie or a slight AV1 disadvantage; PSNR still shows near-parity; modern VMAF measurement on real content shows AV1 ahead by 20–30%; at high bitrate and 4K the gap widens further. The figure depends on the metric, the bitrate range, and the encoder generation.
Three variables explain the spread. The metric comes first: an early head-to-head found AV1 needing about 3% more bitrate than HEVC by subjective rating, an average 6.3% BD-rate gain for AV1 on rate-VMAF curves, and a 1.8% loss on rate-PSNR — three different answers from one set of encodes, because PSNR rewards pixel fidelity while VMAF rewards perceived quality. The bitrate range comes second: the same studies report the AV1 advantage over HEVC widening to 30–43% at the high end of the ladder, where AV1's tools have room to work. The encoder generation comes third and matters most over time: the near-ties date from 2018–2019 libaom builds, while a 2026 SVT-AV1 4.0 encoder is a different machine. A codec's ceiling is fixed by its standard, but the encoder is what you actually run, and encoders improve for years after the standard freezes.
Our −20% sits in the middle of that range, on VMAF, averaged across content. It is honest about being one point in a cloud, not the final word. When you read anyone's AV1-versus-HEVC number — including ours — ask which metric, which bitrate range, and which encoder build produced it before you trust it. The general skill of reading a quality report this way is reading a quality-metric report without fooling yourself, and the reasons metrics disagree are catalogued in where objective metrics lie.
The caveats that change the decision
A BD-rate is a clean number, and its cleanliness is a trap. Three caveats decide whether the headline saving survives contact with a real deployment.
The average is a number no viewer watches. BD-rate averages the saving across the whole shared quality ladder, but you do not ship the whole ladder to every viewer — you ship a fixed set of rungs, and viewers land on different ones depending on their connection. When the codec analysis cited above re-weighted its −59.83% AV1 BD-rate by a realistic audience, the saving fell to about 51% for a premium audience that mostly streams the top rung, and to about 23% for a mobile-heavy audience concentrated on the lower rungs (Ozer, 2026). The BD-rate is not wrong; it just answers a different question than "what will I save on my actual traffic." Estimate the second number against your own viewing distribution, not the headline.
The newer codec costs more to encode. AV1's saving comes with a compute bill. Software AV1 encoding runs several times slower than H.265 for a comparable quality target, and the reference encoder libaom can be dramatically slower than that; the practical production encoder, SVT-AV1, narrows the gap (at preset 4–6 it approaches x265 speeds) but does not erase it. A common planning rule of thumb puts AV1 encoding cost at roughly 4× H.264 and HEVC at roughly 2× (Ozer, 2026). That premium is one-time per asset, so it pays back on heavily-viewed titles and may not on a clip viewed a few hundred times. Codec choice is an economic decision, not only a compression one. The implementation-and-speed side — which encoder for which job, with the encoding-time cost shown — is our encoder comparison.
The metric is a proxy, and it has blind spots. Every objective score here is a prediction of human opinion, validated against subjective tests, and on some content it predicts badly. VMAF underrates film grain and overreacts to certain transitions; PSNR rewards fidelity the eye does not care about. When a metric and a careful viewing disagree, the viewing wins — that is the whole basis of subjective testing as the ground truth. We report a low percentile alongside the mean so the worst seconds stay visible, because a viewer remembers the worst moment, not the average; the reasoning is pooling per-frame scores into one number.
A common mistake: the comparison that proves whatever you want
The classic bad codec comparison is not faked; it is unfair in a way the reader cannot see. Someone compares a slow, high-effort preset of their preferred codec against a fast preset of the competitor, scores with whichever metric flatters it, quotes the easy-content number as if it were the average, picks a bitrate range where the curves barely overlap, and reports one figure with no encoder version and no confidence interval. Every choice is defensible alone; together they manufacture a conclusion. The defense is to pin the effort level, name the metric and model, report the spread across content, compute BD-rate over the real overlap, and refuse to call a winner inside the noise. A comparison that cannot answer those questions is an opinion wearing a number's clothes.
So which codec should you ship?
The data points to a layered answer rather than a single winner, because the right codec depends on reach, scale, and content. H.264 remains the universal floor — it decodes on essentially every device, so it stays in the ladder as the compatibility rung. HEVC buys roughly 44% at equal quality and is widely supported on modern devices, though its licensing has slowed adoption on the open web. AV1 buys the most — about 55% over H.264 and 20% over HEVC on our mix — is royalty-free, and now decodes on a large and growing base of devices; it costs the most to encode and pays back fastest on high-volume titles and bandwidth-constrained audiences. Most production services run several codecs at once and serve the best one each device can decode. The codec-side device-support and adoption picture is the Video Encoding section's state of AV1 in 2026; this article gives you the measured savings to weigh against it.
Where Fora Soft fits in
Fora Soft has built video streaming, OTT, conferencing, e-learning, telemedicine, and surveillance software since 2005, and codec choice shapes the bandwidth bill and the picture quality of every one of them. We measure codecs the way this article describes — at equal quality, on the content categories our products actually handle, with named metrics and disclosed encoders — because a streaming budget defended with "AV1 saves 50%" falls apart the moment someone asks "on what, measured how." The benchmark behind this article is our own, published with its method and its dataset so it can be checked and cited. When a codec decision has to hold up to a CDN budget or a product owner, this is the evidence that makes it stick.
What to read next
- Our Benchmark Methodology — exactly how these numbers were made, so you can trust them.
- Content Matters: How Quality Results Change by Content Type — why the per-content spread is the real story.
- Encoder Comparison: x264 vs x265 vs SVT-AV1 vs Hardware — the quality-versus-speed cost behind the codec choice.
Call to action
- Talk to a video engineer — book a 30-minute scoping call to talk through your av1 vs hevc vs h.264 plan.
- See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
References
- J.-R. Ohm, G. J. Sullivan, H. Schwarz, T. K. Tan, T. Wiegand. "Comparison of the Coding Efficiency of Video Coding Standards—Including High Efficiency Video Coding (HEVC)." IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1669–1684, December 2012. Tier 1 (standard's verification study). Subjective tests found HEVC reaches equivalent quality to H.264/AVC at about 50% less bitrate on average for WVGA/HD. Basis for the HEVC-vs-H.264 headline. https://ieeexplore.ieee.org/document/6317156
- G. Bjøntegaard. "Calculation of Average PSNR Differences Between RD-Curves." ITU-T SG16 Q.6 VCEG, document VCEG-M33, Austin, TX, April 2001. Tier 1 (metric author's defining document). The BD-rate method and sign convention: average bitrate difference at equal quality. Basis for every BD-rate figure here. https://www.itu.int/wftp3/av-arch/video-site/0104_Aus/VCEG-M33.doc
- G. Bjøntegaard. "Improvements of the BD-PSNR Model." ITU-T SG16 Q.6 VCEG, document VCEG-AL22, Berlin, July 2008. Tier 1 (metric author follow-up). Refines the method toward piecewise interpolation in the logarithmic bitrate domain. Basis for the log-domain integration in the tool. https://www.itu.int/wftp3/av-arch/video-site/0807_Ber/VCEG-AL22.doc
- C. Herglotz, H. Och, A. Meyer, et al. "The Bjøntegaard Bible — Why Your Way of Comparing Video Codecs May Be Wrong." IEEE, 2023; arXiv:2304.12852. Tier 5 (peer-reviewed). Cubic splines overshoot (use monotone PCHIP/Akima), VMAF/SSIM need a log transform, the quality ranges must overlap, and poor overlap can turn −43% into −49%. Basis for the BD-rate computation choices. https://arxiv.org/pdf/2304.12852
- Netflix Technology Blog. "AV1 — Now Powering 30% of Netflix Streaming." December 2025. Tier 4 (credible deployer engineering). AV1 streams ran at an average bitrate 48.1% lower than H.264 and ~0.9 VMAF higher than HEVC at one-third less bandwidth on real content; AV1 carries ~30% of streaming. Basis for the AV1-vs-H.264 real-content cross-check. https://netflixtechblog.com/av1-now-powering-30-of-netflix-streaming-02f592242d80
- J. Ozer. "Comparing H.264, HEVC, VP9, and AV1 in SBE: From BD-Rate to Contextual ROI." Streaming Learning Center, May 2026. Tier 6 (expert practitioner). 1080p60 clip at matched VMAF ~93: AV1 −59.83%, HEVC −31.48%, VP9 −35.08% BD-rate vs H.264; audience-weighted savings fall to 51% (top-heavy) and 23% (mobile); encoding-cost multipliers AV1 4×, HEVC 2×. Basis for the audience-weighting and encoding-cost caveats. https://streaminglearningcenter.com/articles/comparing-h-264-hevc-vp9-and-av1-in-sbe-from-bd-rate-to-contextual-roi.html
- Netflix. "VMAF Models" (models.md) and "VMAF — Confidence Interval" (conf_interval.md), VMAF Development Kit documentation. Accessed 2026-06-25. Tier 1 (metric author documentation). The default v0.6.1 model targets a 1080p living-room TV; each prediction can carry a 95% bootstrap confidence interval. Basis for the model and pooling specification. https://github.com/Netflix/vmaf/blob/master/resource/doc/models.md
- A. Mercat, et al. / EPFL-Bitmovin. "Comparison of Compression Efficiency between HEVC/H.265, VP9 and AV1 based on Subjective Quality Assessments." 2018; and academic rate-distortion analyses of HEVC/VVC/AV1. Tier 5 (peer-reviewed). Early AV1 vs HEVC near-parity (≈+3% subjective, +6.3% rate-VMAF gain, −1.8% rate-PSNR), widening to 30–43% at high bitrate. Basis for the "contested number" figure. https://www.researchgate.net/publication/327638411
- R. Khan, et al. "Performance Comparison of VVC, AV1, HEVC, and AVC for High Resolutions." Electronics 13(5):953, MDPI, 2024. Tier 5 (peer-reviewed). At 8K, average bitrate savings vs H.264 of about 63% (AV1) and 53% (HEVC). Basis for the resolution-dependence note. https://www.mdpi.com/2079-9292/13/5/953
- Alliance for Open Media / SVT-AV1 project. "SVT-AV1 4.0.0 release notes and encoder guidelines." Accessed 2026-06-25. Tier 3 (first-party tooling). SVT-AV1 4.0.0 released 2026-01-23; at preset 4–6 it approaches x265 speeds while retaining AV1's compression. Basis for the encoder versions and speed caveat. https://gitlab.com/AOMediaCodec/SVT-AV1/-/blob/master/CHANGELOG.md
- Recommendation ITU-R BT.500-15 (2023). "Methodologies for the subjective assessment of the quality of television pictures." ITU, 2023. Tier 1 (official standard). The subjective ground truth every objective metric is validated against; when metric and viewing disagree, the viewing wins. Basis for the measurement-honest framing. https://www.itu.int/rec/R-REC-BT.500


