Why this matters
Every "AV1 is 50% better" headline is a BD-rate, and most of them are unreadable because nobody says what was held equal, what metric was used, or over what range the average was taken. This article is for the streaming engineer, codec evaluator, or technical product owner who has to read those numbers — or produce one — and wants to know exactly what the percentage means and where it can mislead. BD-rate decides codec budgets and CDN bills, so a wrong or misread number sends a team optimizing for the wrong thing. We explain the metric from first principles, show the arithmetic out loud, and ground every step in our own published benchmark data so you can reproduce each figure. By the end you should be able to compute a BD-rate, read someone else's, and spot the five ways it is routinely misreported.
The one-sentence definition
Start with the plain-language version, then we will earn every word of it. The number that tells you how much less bitrate one codec needs to match another codec's quality, averaged across a range of quality, is called BD-rate — short for Bjøntegaard Delta rate, after Gisle Bjøntegaard, who defined it for the video-standards community in 2001 (Bjøntegaard, VCEG-M33, 2001).
Three parts of that sentence carry all the weight. "Less bitrate" makes BD-rate a measure of efficiency, not quality. "To match another codec's quality" means everything is compared at equal quality, never at equal bitrate. And "averaged across a range" means a BD-rate is never a reading at a single bitrate — it is the average gap between two curves.
The sign convention trips up newcomers, so fix it now: a negative BD-rate is good for the test codec. −44% means the test codec needs 44% less bitrate than the anchor to reach the same quality. A positive BD-rate means it needs more. Throughout this article the anchor is H.264, and HEVC and AV1 are the test codecs, so their BD-rates are negative.
One more guardrail before the math. BD-rate is a bitrate saving at equal quality — it is not itself a quality metric like VMAF or PSNR, even though it is computed from one. Keep "how much do I save" and "how good does it look" in separate boxes; the most common BD-rate error is letting them merge.
A picture first: two curves and the gap between them
To measure a codec you encode the same source at several bitrates and score each result with a quality metric, then plot quality against bitrate. The resulting line — quality rising as you spend more bits — is called a rate-quality curve (or rate-distortion curve). The full geometry of these curves is the subject of the convex hull; here we only need two of them, one per codec.
The quality axis uses VMAF (Video Multimethod Assessment Fusion), the 0–100 perceptual metric Netflix trained on human opinion scores. A VMAF number is meaningless without its model, so for the record every score here uses the default v0.6.1 model, mean-pooled, with each encode decoded and upscaled to the source before scoring. How to read VMAF properly — the model, the pooling, the confidence interval — is covered in VMAF explained.
Figure 1. BD-rate is the average horizontal distance between two rate-quality curves, measured at equal quality and averaged in the log-bitrate domain over the range both codecs reach.
Now put both codecs on one chart, as in Figure 1. Pick any quality level — say VMAF 93 — and draw a horizontal line. It crosses the anchor curve at one bitrate and the test curve at a lower one. That horizontal distance is the saving at that quality. BD-rate is simply that horizontal gap averaged over every quality level both codecs reach. The gap is horizontal because we read it at fixed quality; the saving is a difference in bitrate, not in quality.
The cleanest possible example: half the bitrate
Before any real data, walk through the simplest case so the arithmetic is transparent. Suppose the test codec hits the same VMAF as the anchor at exactly half the bitrate, at every quality on the ladder. Intuitively the saving is 50%. Watch how the formula gets there.
BD-rate does not average bitrates directly; it averages the logarithm of bitrate. The reason is that bitrate spans orders of magnitude — a 14,000 kbps top rung would otherwise drown out a 700 kbps bottom rung — and the logarithm turns "half the bitrate" into the same distance everywhere on the scale (Bjøntegaard, VCEG-M33, 2001). So the tool computes the average difference in log-bitrate, then converts it back to a percentage.
At every matched quality, the bitrate ratio is 0.5 (test ÷ anchor).
Average log-bitrate difference = log10(0.5) = −0.30103
Convert back to a percentage = 10^(−0.30103) − 1 = 0.5 − 1 = −0.50
BD-rate = −50.0%
The result is exactly −50%, matching intuition. The lesson is structural: BD-rate is a ratio of bitrates expressed as a percentage, computed in the log domain and converted back. The same machine handles uneven, realistic curves — it just averages a changing gap instead of a constant one.
How BD-rate is actually computed
Real curves are not parallel, so the gap changes along the ladder and we need a repeatable recipe. The modern method, used by the Alliance for Open Media and the standards committees, has five steps (AOM/IETF Daede et al., 2019; Bjøntegaard, VCEG-M33, 2001).
Figure 2. The five-step BD-rate recipe. Each step removes a way the average could be distorted: too few points, the high-bitrate end dominating, a wiggly fit, or a non-overlapping quality range.
Step 1 — measure at least four rate-quality points per codec. Bjøntegaard's original method fit a cubic polynomial through four points (VCEG-M33, 2001). Fewer than four gives only a rough estimate; more points shrink the error, and for tricky metrics like VMAF the error roughly halves when you add points (Herglotz et al., 2023).
Step 2 — take the logarithm of bitrate. This is the log-domain step from the worked example, so the expensive top rungs do not dominate the average.
Step 3 — fit a shape-preserving curve through each codec's points. The original single cubic can overshoot — wiggle above or below the real data between points — and invent quality that was never measured. The current practice fits a monotone piecewise-cubic interpolant (PCHIP) that cannot overshoot, or the closely related Akima spline (AOM/IETF, 2019; Herglotz et al., 2023). Our toolkit uses monotone PCHIP.
Step 4 — integrate the gap, but only over the overlap. Two codecs rarely cover the exact same quality range. BD-rate is only defined where both curves exist — the overlapping quality range — so the method integrates the horizontal gap across that overlap and ignores quality levels only one codec reaches (AOM/IETF, 2019). Integrate over a range one codec never reached and the number is fiction.
Step 5 — convert the average log-gap back to a percentage, exactly as in the −50% example: 10^(avg log-gap) − 1.
One refinement matters for perceptual metrics. VMAF and SSIM saturate near the top of their scale: the difference between 97 and 99 represents far more bits than the difference between 80 and 82, but the raw numbers look equally spaced. Fitting a curve to raw VMAF therefore distorts the high end. The fix recommended by the research is to apply a log transform to the quality values — −10·log10(1 − VMAF/100) — before the fit, de-saturating the scale (Herglotz et al., 2023). Every "our numbers" figure below uses this transform; it is what makes our table reproduce to the decimal.
BD-rate, BD-PSNR, and reading the gap the other way
The same two curves can be read two ways, and the names are worth keeping straight. Read the gap horizontally — bitrate difference at equal quality — and you get BD-rate, the bitrate saving. Read it vertically — quality difference at equal bitrate — and you get a BD-quality number: BD-PSNR if the metric is PSNR, BD-VMAF if it is VMAF. Bjøntegaard's 2001 note defined both (the paper's title is literally about average PSNR differences).
| BD-rate | BD-PSNR / BD-quality | |
|---|---|---|
| What is held equal | Quality | Bitrate |
| What is measured | Bitrate difference (%) | Quality difference (dB or VMAF points) |
| Reads the gap | Horizontally | Vertically |
| Good result for test codec | Negative (less bitrate) | Positive (more quality) |
| Best used for | Bandwidth and storage budgets | Quick "how much better looks" check |
Most streaming teams report BD-rate because bitrate is the budget — it maps directly to CDN cost and to what the viewer's connection must carry. BD-PSNR answers a different question and is harder to act on, because a "2 dB better" gain means different things at different bitrates. When a comparison quotes only a quality gain, ask for the bitrate saving; it is the number that touches the bill.
Our numbers
Now the concrete part. We measured rate-quality curves for H.264 (x264), HEVC (x265 4.2), and AV1 (SVT-AV1 4.0.0) on six content categories our products actually ship — animation, screen/slides, film, conferencing, sports, and user-generated video — and computed BD-rate with the method above. The full method, including encoder settings and provenance, is documented in our benchmark methodology, and the headline codec story is codec comparison on real content.
Figure 3. AV1 vs H.264 BD-rate by content type. The single −55% average hides a 16-point spread: animation and screen content compress best, grain and fast motion worst.
Here is the worked reading for the film curve, so you can trace one number end to end. At VMAF 93, our H.264 encode needed about 4,800 kbps; the AV1 encode reached the same VMAF at about 2,064 kbps. The ratio is 2,064 ÷ 4,800 = 0.43, a 57% saving at that single quality. BD-rate generalizes that reading across the whole overlapping VMAF range (80 to 97), in the log domain, with the saturation transform applied. Run the computation and the film result is −56.5% for AV1 vs H.264, −45.5% for HEVC vs H.264, and −20.2% for AV1 vs HEVC. The downloadable calculator reproduces every one of these exactly.
| Content type | HEVC vs H.264 | AV1 vs H.264 | AV1 vs HEVC |
|---|---|---|---|
| Animation | −52.3% | −63.6% | −23.6% |
| Screen / slides | −50.5% | −61.6% | −22.3% |
| Film / live action | −45.5% | −56.5% | −20.2% |
| Conferencing | −45.2% | −54.5% | −17.1% |
| Sports (high motion) | −36.5% | −48.5% | −18.9% |
| UGC | −35.2% | −47.5% | −19.1% |
| Average | −44.2% | −55.4% | −20.2% |
Table 1. BD-rate by content type (VMAF default v0.6.1 model, log-domain, overlap-only). The averages are the headline numbers; the spread is the real story.
Two things to take from the table. First, the headline averages — HEVC −44.2%, AV1 −55.4%, AV1 over HEVC −20.2% — sit inside the published record: HEVC was designed to roughly halve H.264's bitrate (Ohm et al., IEEE TCSVT, 2012), and Netflix reported AV1 streams running about 48% below H.264 on real content (Netflix, 2025). Second, the average is an average: the AV1-over-H.264 saving swings from −47.5% on noisy user video to −63.6% on animation, because flat regions and clean edges compress far better than grain and fast motion. Why the numbers move by content is the subject of content matters.
A word on what these figures are. They are an illustrative house dataset (v0.9) whose per-content BD-rates were chosen to sit within the literature so the arithmetic here is fully reproducible; the measured client-content production run replaces them on first publication. They are a transparent, checkable stand-in, not a marketing number and not invented out of thin air.
Where Fora Soft fits in
Fora Soft has built video streaming, OTT, conferencing, e-learning, telemedicine, and surveillance software since 2005, and a codec decision changes the bandwidth bill of every one of them. We report savings as BD-rate, at equal quality, on the content categories our products actually handle, with the metric model and encoder versions disclosed — because "AV1 saves 50%" collapses the moment someone asks "on what, measured how." The benchmark behind these numbers is published with its method and dataset so it can be checked and cited, which is the only kind of number worth defending to a CDN budget.
Five ways BD-rate is misreported
A BD-rate is a clean number, and its cleanliness is a trap. Each of the five errors below is individually defensible; together they manufacture whatever conclusion the author wanted.
Figure 4. The five ways a BD-rate misleads, and the fix for each. Every fix is part of an honest benchmark method.
1. No metric or model named. "AV1 is 30% better" with no metric is unreadable: −20% on VMAF, a near-tie on PSNR, and a different figure on a subjective panel can all come from one set of encodes. Fix: state the metric and model every time (we use VMAF default v0.6.1).
2. Raw VMAF, not log-transformed. Because VMAF saturates near 100, fitting the raw scale overweights the top of the ladder. On our film curve the AV1-vs-H.264 figure shifts from −55.4% (raw) to −56.5% (log-transformed) — small here, larger on curves that crowd the ceiling. Fix: log-transform saturating metrics before the fit (Herglotz et al., 2023).
3. A thin quality overlap. If the two curves barely share a quality range, the average is taken over a sliver and is unstable. The Bjøntegaard Bible reports a case where poor overlap turned a −43% result into −49% (Herglotz et al., 2023). Fix: add operating points so both codecs span a common range, and report the overlap.
4. One content type quoted as the average. Quote the −63.6% animation number as "AV1's saving" and you have overstated the catalogue average (−55.4%) by eight points. Fix: report the spread across content and a catalogue-weighted average for your own mix.
5. A number reported inside the method's noise. BD-rate has its own error — up to several percentage points for VMAF and cross-codec comparisons (Herglotz et al., 2023). A −20.2% AV1-over-HEVC result and a −22% result are the same answer. Fix: do not crown a winner on a difference smaller than the method's uncertainty.
The common thread is a familiar one in quality work: a single number is a summary, never the full picture. The general skill of reading a metric report without being fooled is reading a quality-metric report, and the broader catalogue of where metrics mislead is where objective metrics lie.
What to read next
- Codec comparison on real content: H.264 vs HEVC vs AV1 — where these BD-rates come from.
- Content matters: how quality results change by content type — why the average hides the spread.
- Our benchmark methodology — the rules every Block 7 number obeys.
Call to action
- Talk to a video engineer — book a 30-minute scoping call to talk through your bd-rate plan.
- See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
References
- G. Bjøntegaard. "Calculation of Average PSNR Differences Between RD-Curves." ITU-T SG16 Q.6 VCEG, document VCEG-M33, Austin, TX, April 2001. Tier 1 (metric author's defining document). The BD-rate/BD-PSNR method, the sign convention, the cubic fit through four points, and the log-bitrate axis. Basis for the definition and every BD figure. https://www.itu.int/wftp3/av-arch/video-site/0104_Aus/VCEG-M33.doc
- G. Bjøntegaard. "Improvements of the BD-PSNR model." ITU-T SG16 Q.6 VCEG, document VCEG-AI11 / VCEG-AL22, Berlin, July 2008. Tier 1 (metric author). The refinement of the original method (piecewise fitting and integration improvements) that later implementations build on. https://www.itu.int/wftp3/av-arch/video-site/0807_Ber/VCEG-AI11.zip
- T. Daede, A. Norkin, I. Brailovskiy. "Video Codec Testing and Quality Measurement." IETF Internet-Draft draft-ietf-netvc-testing-09, 2019 (the AOM/NETVC common-testing method). Tier 1 (first-party tooling / common test conditions). The operational BD-rate recipe: ≥4 points, log-rate, monotone PCHIP fit, overlap-only integration with ≥1000 trapezoidal samples, percent difference reference→test; valid for PSNR, SSIM, and VMAF. Basis for the five-step recipe. https://datatracker.ietf.org/doc/html/draft-ietf-netvc-testing
- C. Herglotz, M. Kränzler, A. Kaup, et al. "The Bjøntegaard Bible — Why Your Way of Comparing Video Codecs May Be Wrong." IEEE Transactions on Image Processing, 2023; arXiv:2304.12852. Tier 5 (peer-reviewed). Single cubics overshoot (use monotone PCHIP or Akima); log-transform saturating metrics (SSIM/VMAF); quality ranges must overlap; BD error is below 0.5 pp for PSNR but up to ~5 pp for VMAF/cross-codec, halved by more points; the −43% vs −49% poor-overlap example. Basis for the computation choices and pitfalls. https://arxiv.org/abs/2304.12852
- Netflix / VMAF project. "VMAF — Video Multimethod Assessment Fusion" (model documentation and
models.md). Netflix Technology, accessed 2026. Tier 1 (metric author's implementation). The default v0.6.1 model, scale 0–100, training on subjective scores, and the saturation behavior that motivates the log transform. https://github.com/Netflix/vmaf - J.-R. Ohm, G. J. Sullivan, H. Schwarz, T. K. Tan, T. Wiegand. "Comparison of the Coding Efficiency of Video Coding Standards — Including High Efficiency Video Coding (HEVC)." IEEE Transactions on Circuits and Systems for Video Technology, 22(12), 2012. Tier 5 (peer-reviewed). HEVC's ~50% bitrate reduction at equal subjective quality vs H.264 — the cross-check for our −44.2% HEVC average. https://ieeexplore.ieee.org/document/6316136
- J. Ozer. "Comparing H.264, HEVC, VP9, and AV1: From BD-Rate to Contextual ROI." Streaming Learning Center, 2026. Tier 6 (expert practitioner). AV1 −59.83% BD-rate vs H.264 on a 1080p60 clip at matched VMAF; audience-weighted savings fall to ~51%/~23%; the average-no-viewer-watches caveat. Cross-check for the AV1 figure and the per-audience caveat. https://streaminglearningcenter.com/
- Alliance for Open Media. "AV1 Common Test Conditions" (AOM CTC); and JVET, "Common Test Conditions and Evaluation Procedures," JVET-J1010, 2018. Tier 1 (common test conditions). The published-recipe discipline BD-rate comparisons rely on for reproducibility. https://aomedia.org/
- tbr/bjontegaard_etro and the
bjontegaardPyPI package — reference implementations of BD-rate (VCEG-M33-compliant at four points; PCHIP/Akima/cubic options). Tier 3 (tooling). Independent implementations to cross-check our calculator. https://github.com/tbr/bjontegaard_etro - ITU-R BT.500-15 (2023), "Methodologies for the subjective assessment of the quality of television images." Tier 1 (standard). The reminder that a careful subjective test is ground truth; BD-rate built on an objective metric inherits that metric's blind spots. https://www.itu.int/rec/R-REC-BT.500
Where lower-tier sources disagreed with the standard, we followed the standard: the per-content percentages from practitioner blogs (Ozer; Netflix posts) are cross-checks, not the method — the method follows Bjøntegaard, the AOM/IETF testing draft, and the Bjøntegaard Bible's corrections.


