Why this matters
If you have ever argued about whether a new encoder is "better", shipped an encode that looked fine on your monitor but stuttered for users, or been asked to justify the cost of a quality program, this article is the one you forward to the person holding the budget. It is written for the streaming or encoding lead, the QA engineer, and the technical product owner who already sense that measuring quality is the right thing to do but need to express the value in money, not in decibels. The job it does is simple: connect each measurement activity to a cash outcome — lower content delivery network cost, protected engagement revenue, faster and more accountable decisions — so the program funds itself. It is the practical follow-on to what video quality measurement is, and it sets up the production-side articles on quality gates and streaming QoE.
"It looks fine to me" is not a measurement
Most teams that ship video judge quality the same way: someone senior watches a clip on a good monitor and signs off. That works right up until it doesn't. The reviewer's screen is not the viewer's phone on a train, one reviewer cannot watch the whole catalog, and "looks fine" cannot be compared, logged, or put in a contract. An opinion is a single sample with no error bar and no audit trail.
A measurement is different in three ways that all turn into money. It is repeatable, so two encoders or two settings can be compared on the exact same footing. It is scalable, so a script can check ten thousand assets while your reviewer sleeps. And it is defensible, so when someone disputes the result you can point to a method, a number, and a confidence interval instead of a memory. Those three properties — repeatable, scalable, defensible — are the entire reason a number beats a vote, and each one underwrites one of the business cases below.
Figure 1. The three properties that make a number worth money — repeatable, scalable, defensible — are exactly what an opinion lacks.
Figure 2. One defensible quality number feeds three places where the money is: the delivery bill, viewer engagement, and the speed and accountability of your decisions.
Case 1: lower bitrate at the same quality cuts the delivery bill
This is the case that pays for the whole program, so it goes first. The core idea: the bits you send cost money to deliver, and measurement is how you safely send fewer of them.
Start with the metric that makes this possible. VMAF — Video Multimethod Assessment Fusion, Netflix's perceptual quality metric on a 0-to-100 scale, trained to predict human opinion — lets you ask a precise question: can I lower the bitrate of this encode and keep the same delivered quality? Without a metric, that question has no safe answer, so teams over-send bits "to be safe." With a metric, you find the lowest bitrate that still hits your quality target and stop there. Netflix's own per-title optimization, where each title gets a bitrate ladder tuned to its content instead of one fixed ladder, reported roughly a 20% average bitrate reduction at the same quality, and its later per-shot "Dynamic Optimizer" work reported VMAF-matched reductions of 28% for x264, 34% for x265, and 38% for libvpx-VP9. Those are measurement results — the savings exist because the quality was measured and shown to hold.
Here is the arithmetic, shown out loud, because this is where the budget conversation is won.
Suppose your 1080p rendition currently ships at 5.0 Mbps to reach a target of VMAF 93 (default model, harmonic-mean pooled across frames). You run a measurement program, adopt per-shot optimization, and your own measurements confirm the same VMAF 93 now holds at 4.0 Mbps — a 20% reduction, in line with the per-title figure above. Convert the bitrate into delivered data per viewing-hour:
Data per hour = bitrate (Mbit/s) × 3,600 s ÷ 8 bits/byte
Before: 5.0 × 3,600 ÷ 8 = 2,250 MB = 2.25 GB per viewing-hour
After: 4.0 × 3,600 ÷ 8 = 1,800 MB = 1.80 GB per viewing-hour
Saving: 0.45 GB per viewing-hour (exactly the 20%)
Now multiply by volume and a delivery price. Say this rendition serves 10 million viewing-hours per month, and your content delivery network bills egress at $0.02 per GB (commodity video CDN rates in 2026 run roughly $0.01–0.04/GB at volume):
Bytes saved = 10,000,000 hours × 0.45 GB = 4,500,000 GB per month
Money saved = 4,500,000 GB × $0.02/GB = $90,000 per month
Annualised = $90,000 × 12 ≈ $1.08 million per year
A million dollars a year, from a measurement that told you the quality did not move. Halve the CDN rate to $0.01/GB and it is still $540,000. The percentage is what your measurement program earns; the dollar figure just scales with how much video you push. And the saving compounds with every other benefit — a 4.0 Mbps stream also starts faster and rebuffers less than a 5.0 Mbps one on a constrained connection, which feeds directly into Case 2.
Figure 3. The same quality at a lower bitrate is money. The 20% reduction flows through data-per-hour, monthly volume, and CDN price to about $1.08M a year.
Common mistake: quoting a saving you never verified. "We cut bitrate 20%" is only good news if you measured that the quality held — at a named metric, model, and pooling method, ideally confirmed against a few human eyes. Cut the bits without measuring and you may be shipping a quality regression that customers feel and you cannot see on your monitor. The bitrate saving and the quality check are one move, not two.
Case 2: a quality gate catches a regression before your users do
The second case is insurance, and the math runs the other way: instead of money you save, it is money you avoid losing. A quality gate is an automatic check in your encoding pipeline that refuses to publish an encode whose measured quality falls below a threshold — the video equivalent of a unit test that fails the build. Its value comes from a brutal asymmetry: a bad encode caught before release costs one re-run, while the same bad encode shipped to your whole audience costs engagement across every viewer who hits it.
That cost is not hypothetical, and it has been measured on a very large scale. A study of 23 million video views from 6.7 million viewers on Akamai's network (Krishnan and Sitaraman, 2012) established a causal link between stream quality and viewer behavior, and the numbers are sobering. Viewers begin to abandon a video once startup takes longer than two seconds, and each additional second of startup delay raised the abandonment rate by 5.8%. A viewer who hit rebuffering equal to just 1% of the video's duration watched 5% less of the video than a comparable viewer with no stalls. And a viewer who experienced a playback failure was 2.32% less likely to return to the same site within a week. More recent industry analysis tells the same story from the revenue side: Conviva's 2025 report, drawn from 223 million sessions, found that poor digital experience pushed 39% of consumers to cancel a subscription and 50% to switch providers, while a great experience brought 93% back within a week.
Tie that back to the gate. A quality regression usually does not announce itself — a codec update changes a default, a mezzanine file arrives subtly broken, a new ladder rung is misconfigured — and the damaged encode looks plausible until a viewer on a real device meets a stall or a smeared frame. Without measurement, you learn about it from a support ticket or a churn number weeks later. With a quality gate, the encode never ships: the pipeline measures it, sees it under threshold, and stops. You are paying the price of one re-encode to avoid the abandonment of every viewer who would have hit the bad asset. For a popular title that is the cheapest insurance you will ever buy.
Figure 4. The same defect is cheap on the left of the gate and expensive on the right. Startup and rebuffering penalties are measured viewer behavior, not estimates.
Case 3: a defensible metric ends the "which encoder is better" argument
The third case is about decision speed, and it is the one engineers feel most. Every team that ships video has had the argument: a new encoder, codec, or setting is proposed, two people eyeball some clips, and they disagree — because they watched different content, on different screens, at different bitrates, trusting different instincts. The argument is unresolvable because it has no common unit.
A measurement supplies the unit. The standard one for "is encoder A better than encoder B" is BD-rate — the Bjontegaard Delta rate, defined by Gisle Bjontegaard in 2001 (ITU-T VCEG document VCEG-M33) — which reports the average bitrate difference between two encoders at equal quality, as a single percentage. "SVT-AV1 gives a BD-rate of −30% versus x264 on this content" means it reaches the same quality at 30% lower bitrate, averaged across the operating range. That is a number you can reproduce, put in a table, and defend in a review. It does not care whose monitor is nicer.
The business value is twofold. First, speed: a debate that used to take a week of subjective back-and-forth becomes an overnight measurement run with a ranked answer in the morning. Second, accountability: when a codec vendor or an internal team claims a 40% improvement, a defensible metric lets you verify it on your content instead of taking the claim on faith — and vendor claims, measured on the vendor's favorite clips, routinely shrink on real catalogs. One reproducible BD-rate or VMAF comparison, run apples-to-apples, replaces a month of opinion and a procurement leap of faith. (Keep the comparison honest: a metric number is only valid at the same resolution, frames, reference, model, and pooling — see where objective metrics lie.)
The four business cases, side by side
The three cases above plus production monitoring give four distinct places where measurement turns into money. Because this is the honest section of a measurement-honest site, the last column names the trap in each — the blind spot you inherit if you trust the number blindly.
| Business case | What you measure | The decision it informs | Where the money shows up | Where it lies (the honesty check) |
|---|---|---|---|---|
| Cut delivery cost | Quality (VMAF) vs bitrate at equal quality | How low can the bitrate go and hold quality | Smaller CDN egress bill; faster startup | "Equal quality" must be verified at a named model/pooling, not assumed |
| Catch regressions | Quality of every encode against a threshold | Ship this encode, or block and re-run | Avoided abandonment and churn on bad assets | A gate only catches what the metric can see — temporal and banding artifacts hide |
| End the encoder argument | BD-rate / VMAF, encoder A vs B, apples-to-apples | Which encoder, codec, or setting to adopt | Faster decisions; verified vendor claims | Only valid at matched resolution, frames, reference, model, pooling |
| Prove delivered quality | No-reference QoE in production (startup, rebuffer, score) | Where to spend on infrastructure; SLA proof | Protected engagement; defensible SLAs | No-reference scores are trend signals, not absolute ground truth |
Table 1. Four measurement activities, four cash outcomes, four blind spots. The right-hand column is the difference between a measurement program and a measurement theater.
The cost of not measuring
It helps to state the null case plainly, because "do nothing" is never free. A team that does not measure quality over-sends bitrate to feel safe (paying Case 1 in reverse, every month), discovers regressions from churn dashboards instead of gates (paying Case 2 at full retail), and settles encoder decisions by seniority instead of evidence (paying Case 3 in slow, unaccountable choices). None of these show up as a line labeled "we didn't measure." They show up as a CDN bill that is higher than it needs to be, a retention curve with unexplained dips, and an encoding roadmap that moves at the speed of argument. Measurement converts those invisible, recurring losses into visible, one-time costs you control.
Common mistake: optimizing for the metric instead of the viewer. Once a number is attached to money, the temptation is to chase the number. But a metric is a proxy validated against human opinion, and you can inflate some metrics — sharpening and contrast tricks can lift a VMAF score without improving real quality, which is exactly why the VMAF-NEG no-enhancement-gain model exists. The number is the means; the viewer's experience is the end. When the metric and a careful human check disagree, the human is the ground truth — the metric just failed on that content.
Where Fora Soft fits in
Fora Soft has built video software since 2005 — streaming, OTT and internet TV, video conferencing, e-learning, telemedicine, and surveillance — and every one of those products lives or dies on delivered quality at a controlled cost. The discipline we bring is the one this article argues for: we attach a defensible number to the quality question before we make an encoding or delivery decision, so a bitrate reduction is a measured saving and not a gamble, and a "this encoder is better" claim is a reproducible BD-rate and not a hunch. When the master is on disk we measure with full-reference VMAF at a named model and pooling; when it is a live conferencing or surveillance feed with no master, we monitor no-reference signals and treat them as trends. Our benchmark methodology documents exactly how each number was produced, so the savings we claim are ones a client can audit.
To make the first case concrete for your own numbers, we built a small bitrate-savings calculator — feed it your baseline bitrate, the measured reduction at equal quality, your monthly viewing-hours, and your CDN rate, and it prints the monthly and annual saving using the same arithmetic as Figure 3.
What to read next
- What is video quality measurement, and why it is harder than it looks
- Quality gates in CI/CD
- Streaming QoE: the metrics that predict whether a viewer stays
Call to action
- Talk to a video engineer — book a 30-minute scoping call to talk through your business case for measuring video quality plan.
- See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
References
- S. S. Krishnan and R. K. Sitaraman, "Video Stream Quality Impacts Viewer Behavior: Inferring Causality Using Quasi-Experimental Designs," Proceedings of the ACM Internet Measurement Conference (IMC 2012); extended version in IEEE/ACM Transactions on Networking, 2013. Tier 5 (peer-reviewed, Akamai/UMass). Source for the 2-second startup threshold, +5.8% abandonment per added second, the 1%-rebuffer → 5%-less-play result, and the 2.32% repeat-visit penalty, measured on 23M views / 6.7M viewers. https://people.cs.umass.edu/~ramesh/Site/HOME_files/imc208-krishnan.pdf
- G. Bjontegaard, "Calculation of Average PSNR Differences between RD-curves," ITU-T SG16/Q6 VCEG, 13th meeting, Austin, TX, document VCEG-M33, April 2001. Tier 1 (defining document). Defines BD-rate as the average bitrate difference between two encoders at equal (objective) quality. https://www.itu.int/wftp3/av-arch/video-site/0104_Aus/VCEG-M33.doc
- Netflix Technology Blog, "Per-Title Encode Optimization," 2015. Tier 4 (credible deployer). Reported ~20% average bitrate reduction at the same quality from per-title bitrate ladders. https://netflixtechblog.com/per-title-encode-optimization-7e99442b62a2
- Netflix Technology Blog, "Optimized shot-based encodes: Now Streaming!", 2018. Tier 4 (credible deployer). Reported VMAF-matched bitrate reductions of 28% (x264), 34% (x265), and 38% (libvpx-VP9) from per-shot Dynamic Optimization. https://netflixtechblog.com/optimized-shot-based-encodes-now-streaming-4b9464204830
- Netflix / VMAF project, VMAF documentation and repository, accessed 2026-06-22. Tier 3 (metric author / first-party). VMAF definition, 0–100 scale, default/phone/4K models, and pooling guidance. https://github.com/Netflix/vmaf
- Recommendation ITU-T P.1401 (01/2020), Methods, metrics and procedures for statistical evaluation, qualification and comparison of objective quality prediction models. International Telecommunication Union. Tier 1. The procedure for validating any objective metric against subjective MOS (PCC, SROCC, RMSE) — why a metric is "defensible." https://www.itu.int/rec/T-REC-P.1401
- Recommendation ITU-T P.910 (2023), Subjective video quality assessment methods for multimedia applications. International Telecommunication Union. Tier 1. The subjective methods that are the ground truth every objective metric is validated against. https://www.itu.int/rec/T-REC-P.910
- Conviva, 2025 State of Digital Experience Report (223 million sessions). Tier 4 (industry analytics vendor). Current-year figures on how poor experience drives cancellation, switching, and lost return visits. https://www.conviva.ai/2025-state-of-digital-experience-report/
- Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, "Image Quality Assessment: From Error Visibility to Structural Similarity (SSIM)," IEEE Transactions on Image Processing, vol. 13, no. 4, 2004. Tier 1 (metric author). Cited for the principle that an objective metric is a proxy validated against the eye. https://ece.uwaterloo.ca/~z70wang/publications/ssim.html
- Industry CDN pricing benchmarks (multiple providers), 2026: commodity video egress ≈ $0.01–0.04/GB at volume; hyperscaler origin egress ≈ $0.08–0.09/GB. Tier 6 (orientation only; the worked example uses $0.02/GB as an illustrative mid-volume rate, not a quoted price). Used solely to scale the illustrative arithmetic.


