Why this matters
If you encode video at scale, the quality target is the one decision that sets both how good your service looks and how much it costs to run — and most teams never write it down, so it drifts. This article is for the streaming or encoding lead, the platform engineer, and the technical product owner who has to defend a quality bar to a CFO on one side and a perfectionist video engineer on the other. It turns "make it look good" into a number you can measure, gate on, and budget against. It is the operational companion to the business case for measuring quality: that article argues why quality is worth measuring; this one tells you which number to aim at and what it will cost.
Stop targeting a bitrate. Target a quality number.
For most of streaming history, an encoding ladder was a list of bitrates: 1080p at 5 Mbps, 720p at 3 Mbps, and so on, the same fixed rungs for every title in the catalog. The problem with a fixed bitrate is that it buys wildly different quality depending on the content. A flat animated cartoon at 5 Mbps looks pristine; a confetti-and-crowd sports shot at the same 5 Mbps blocks up and smears, because the hard content needs far more bits to reach the same picture. A bitrate is an input. Quality is the output you actually care about. Targeting the input and hoping the output lands is backwards.
The fix is to target the output directly. Pick a perceptual quality score — a number from a metric trained to predict what people rate, VMAF (Video Multi-method Assessment Fusion, Netflix's perceptual metric scored 0–100, where higher is closer to the original to a human eye; see VMAF explained) — and let the encoder spend whatever bitrate each title needs to hit it. The cartoon hits the target at a low bitrate and the sports clip at a high one, and both look the same to the viewer. That is the whole idea behind per-title and per-shot encoding, covered from the measurement side in per-title and per-shot encoding and as a geometry in the convex hull. Here we focus on the number those methods aim at: the target itself, and the budget it implies.
A quick guard before we pick a number. A VMAF score means nothing on its own — it is only defined together with the model (which screen it predicts for), the pooling (how the per-frame scores were combined into one number; see pooling per-frame scores), and the content it was measured on. "Target VMAF 95" is shorthand for "target a mean VMAF of 95, default 1080p model, on this content, with the low percentile checked." Keep that full sentence in mind every time the short version appears below.
Where quality goes invisible: the transparency target
Start at the top of the ladder, because it is the most expensive rung to deliver and often the most watched. How good does the best rung need to be? The answer is bounded by human vision: past a certain quality, more bits buy nothing a viewer can detect. That ceiling is called transparency — the point where the compressed video is, to the eye, the same as the original.
Two pieces of published research put a number on it, and they agree. Reza Rassool of RealNetworks correlated VMAF against formal subjective scores and found that a service encoding to about VMAF 93 could be confident of serving most of its audience content that is "either indistinguishable from original or with noticeable but not annoying distortion" — a mean opinion score between 4 and 5 (Rassool, VMAF Reproducibility, IEEE, 2017). A separate study from RheinMain University and the streaming provider Joyn, testing on a 4K television, found that VMAF 95 delivered a file "on average subjectively indistinguishable from the original video signal" (Kah et al., SPIE 11842-38, 2021). So the working rule for premium content on a big screen is a top-rung target of 93 to 95 VMAF. Below 93 the best viewers start to notice; above 95 you are spending bits on quality nobody can see.
How much is "a point or two" of VMAF worth in human terms? Netflix gave the field a ruler in 2017: about 6 VMAF points equals one just-noticeable difference (JND) — a gap most viewers notice more than half the time (Netflix, via Ozer, 2017). The Joyn study refined the small end: viewers could not tell apart two encodes within 2 VMAF points of each other, and only began to see a difference at 3 points and up. So a target of VMAF 98 sits 3 points above transparency — half a JND — which is, by definition, invisible to most people. That gap is the first place a quality budget leaks.
Figure 1. Picture quality climbs steeply with bitrate, then flattens. Transparency sits near VMAF 95 on a living-room screen: below it, more bits are visible; above it, the curve is flat and the extra bitrate is wasted budget. One JND is about 6 VMAF points; a difference under 2 points is invisible.
The number changes with the screen: target per platform
Transparency is not one number, because vision is not one condition. A small screen held at arm's length hides impairments that a 65-inch television across a living room reveals. VMAF builds this directly into its models, each trained on people watching a specific kind of screen (Netflix, VMAF models documentation):
The default model predicts quality on a 1080p HDTV at three times the screen height — the living-room case, and the strictest. The phone model (invoked with --phone-model) predicts quality on a cellular screen viewed at whatever distance the viewer finds comfortable; because the screen is small and relatively far, the same encode scores higher on the phone model than on the default. The 4K model predicts a 4K television viewed close, at 1.5 times the screen height, the distance at which 4K detail becomes appreciable (Netflix, VMAF models documentation; viewing distances per ITU-R BT.2022).
The practical consequence is money. Because the phone model scores the same file higher, you can reach the same perceived target at a lower bitrate for mobile viewers — the artifacts that would fail the bar on a TV are simply invisible on the phone. A single ladder built for the living room over-delivers to every phone in your audience. So the right target is a small table, not one row.
| Platform | VMAF model | Top-rung target | What it measures | Where it lies / blind spot |
|---|---|---|---|---|
| Living-room TV | Default 1080p (3H) | 93–95 | Quality at the strictest common viewing condition | Over-delivers to phones; blind to chroma/banding before VMAF v1 |
| 4K TV | 4K model (1.5H) | 93–95 | Quality of 4K detail seen up close | Needs a true 4K source; punishing on grain and fine texture |
| Phone / mobile | Phone model | 90–93 (scores run higher) | Quality on a small screen at comfortable distance | Hides real artifacts a TV would show; never use it to ship to TVs |
| Laptop / desktop | Default 1080p | 92–95 | Mid-size screen, mid distance | Between TV and phone; pick by your analytics, not by guess |
Table 1. Quality targets are per-platform because each VMAF model predicts a different screen. The "where it lies" column matters: the phone model's optimism is a feature for mobile and a trap if you ever score a TV encode with it.
A warning that belongs next to this table. The phone model's higher scores are not free quality — they are a statement about a small screen. Score a television encode with the phone model and you will pass video that visibly blocks up in the living room. Match the model to the device every time; the model is part of the target, not a tuning knob.
Figure 3. Each VMAF model predicts a different screen, so the target is per-platform. The phone reaches the same perceived bar for fewer bits — and a single living-room ladder over-delivers (and overspends) on every mobile viewer.
The floor: the lowest quality you will still ship
A ladder has a bottom as well as a top. The floor is the lowest-quality rung you will serve before you would rather show nothing. The Joyn study measured this too, as an acceptance rate — the share of viewers who rate a stream good enough to keep watching. Acceptance crossed 50% at about VMAF 55 for a free service and VMAF 70 for a paid one: below those scores, more than half the audience finds the picture unacceptable (Kah et al., SPIE, 2021). Paying customers expect more, so their floor is higher.
In practice most services set a rung or two below even that floor, and on purpose. The reasoning is the one universal truth of adaptive streaming: a viewer on a collapsing connection would rather watch a soft, low-quality picture than stare at a buffering spinner. A rung at VMAF 40 is not "acceptable" by the survey, but it keeps the stream alive through a tunnel or a crowded stadium, and a live stream that keeps playing beats a sharp one that stalls. The floor from the acceptance survey is where quality stops being good; the actual bottom rung is where playback stops being possible. Set both deliberately.
From a target to a budget: what the number costs
A target becomes a budget the moment you multiply it by traffic. The chain is short and every link is arithmetic you can check by hand: a quality target implies a bitrate (per title, via the convex hull); a bitrate implies bytes per viewing-hour; bytes times viewing-hours times your egress price is a monthly bill. Bitrate is a direct multiplier on delivery cost — streaming at 6 Mbps costs exactly twice the egress of streaming at 3 Mbps for the same minutes — and in a media service the egress line is typically the largest single cost (industry CDN-cost analyses, 2026).
Here is the leak from targeting above transparency, worked out. Take a premium 1080p title where transparency sits at VMAF 95. Left at a fixed high bitrate, the encoder delivers VMAF 98 at 8.0 Mbps; but per-title measurement shows VMAF 95 is reached at 5.0 Mbps for this content. The 3 extra VMAF points are half a JND — invisible. Convert the two bitrates to bytes per hour:
GB per hour = Mbps × 3600 s ÷ 8 bits/byte ÷ 1000
at 8.0 Mbps: 8.0 × 3600 ÷ 8 ÷ 1000 = 3.60 GB/hour
at 5.0 Mbps: 5.0 × 3600 ÷ 8 ÷ 1000 = 2.25 GB/hour
wasted: 1.35 GB/hour
Now put it at scale. At 100 million viewing-hours a month and a blended egress price of $0.02 per GB:
cost at 8.0 Mbps: 3.60 GB × 100,000,000 × $0.02 = $7,200,000 / month
cost at 5.0 Mbps: 2.25 GB × 100,000,000 × $0.02 = $4,500,000 / month
wasted on invisible quality: $2,700,000 / month (38% of the bill)
Two-point-seven million dollars a month, spent on three VMAF points no viewer can see. That is the case for a written transparency target stated as plainly as a budget line. The quality-target & budget planner shipped with this article runs this calculation — and the two below — from your own numbers; pass --demo to reproduce the figures here exactly.
Figure 2. A quality target becomes a quality budget in four steps. The same chain run at a target above transparency shows the waste: three invisible VMAF points cost $2.7M a month at 100M viewing-hours and $0.02/GB.
The rung budget: space the ladder so no one sees a step
The top and the floor set the range; the rungs in between are their own budget decision. As a viewer's bandwidth rises and falls, the player switches between rungs. If two neighbouring rungs are far apart in quality, the switch is visible — a sudden softening or sharpening that draws the eye. If they are close, the switch is invisible and playback feels stable.
How close is close enough? The Joyn study's 2-VMAF finding gives the rule: keep neighbouring rungs within about 2 VMAF points and viewers cannot tell when the player switches (Kah et al., SPIE, 2021). That turns rung count into arithmetic. For a paid large-screen ladder with a floor at VMAF 70 and a ceiling at VMAF 95, spacing of 2 needs roughly (95 − 70) ÷ 2 + 1 ≈ 13 rungs; a free service with a floor at 55 needs (95 − 55) ÷ 2 + 1 = 21. (The planner ceils these to guarantee no gap exceeds the spacing, so it reports 14 and 21.) Few services ship 13-rung ladders, because every rung is another encode to produce and store. So most loosen the spacing toward one JND, 6 VMAF, accepting an occasional visible switch in exchange for a third the encodes:
paid ladder, floor 70 to ceiling 95
spacing 2 VMAF (invisible switching): ~14 rungs — smooth, but 14 encodes to store
spacing 6 VMAF (1 JND per step): ~6 rungs — cheaper, occasional visible step
That trade — switching smoothness against encode-and-storage cost — is the rung budget. There is no universal right answer; there is the answer your storage bill and your viewers can both live with. Tight spacing for a flagship paid tier, looser for a free one, is a defensible split.
Per-platform budgeting: the phone-model saving
The per-platform target from Table 1 is also a per-platform budget, and the phone model is where it pays off. Because a mobile audience reaches the same perceived quality at a lower bitrate, splitting the target by device takes real money off the egress bill. Work it through for a service with 50 million viewing-hours a month, a 60% phone / 40% TV split, a 6.0 Mbps TV top rung, and $0.02/GB egress. Suppose measurement shows the phone target is reached 30% cheaper — at 4.2 Mbps — for this content:
one TV-grade ladder for everyone:
6.0 Mbps × all 50M hours → $2,700,000 / month
per-platform targets:
phones 4.2 Mbps × 30M hours (60%) → $1,512,000
TVs 6.0 Mbps × 20M hours (40%) → $702,000
total → $2,214,000 / month
saving from splitting the target by device: $486,000 / month (18%)
Same perceived quality for every viewer, eighteen percent off the delivery bill — because the phones were being over-served by a ladder built for the living room. The 30% figure is illustrative; you must measure it for your own content, since the phone-model gain depends on the material. But the structure holds for any catalog: a single living-room ladder quietly overspends on every mobile viewer.
Writing a target the business and the engineers both accept
A quality target is a negotiated number, not a discovered one, and it fails when only one side owns it. The engineers, left alone, drift toward VMAF 98 "to be safe" and burn the budget on invisible quality. The finance side, left alone, drives the target down until churn climbs and nobody can say why. A target survives because both sides can see the same trade.
The way to get there is to express the target in both languages at once. State it as a perceptual number with a citation (top rung at VMAF 95, the transparency point from Kah et al.; floor at VMAF 70, the 50%-acceptance point for paid). Then state the same target as a budget line (this top rung costs $X per million viewing-hours; moving it from 95 to 98 adds $Y for no visible gain; moving it from 95 to 92 saves $Z but crosses into territory the best viewers notice). The general ROI argument behind that conversion lives in the business case for measuring quality; the per-platform targets here are how you make it operational. Once the target is written down with both a perceptual and a financial figure, it stops drifting — and it becomes the threshold you can enforce automatically in a CI/CD quality gate.
Figure 4. The whole decision on one page: screen → model → top rung (transparency 93–95) → floor (70 paid / 55 free, plus a survival rung) → rung spacing (2 invisible / 6 cheaper) → a written target that is both a perceptual number and a budget line.
A common mistake: targeting the metric instead of the viewer
The failure that wastes the most money is treating the VMAF number as the goal rather than as a proxy for the viewer. Three versions of it recur, and all three come from forgetting that the metric is a stand-in the eye can overrule:
The first is targeting above transparency "for headroom" — pushing for VMAF 97 or 98 because higher feels safer. Above the transparency line the curve is flat: you are buying bytes, not quality. The second is targeting the mean and ignoring the low percentile. A title can average VMAF 95 while one dark, high-motion scene sits at VMAF 70, and that one scene is what the viewer remembers; always read the 1st or 5th percentile next to the mean (see pooling per-frame scores). The third is optimizing so hard for VMAF that you encode for the metric — sharpening or processing in ways that lift the score while the picture looks worse, exactly what VMAF's blind spots invite (see where objective metrics lie). The defence against all three is the same: the target is a number, but the ground truth is a properly run subjective test (ITU-R BT.500-15, 2023). When the metric and a careful viewing disagree, the viewer wins, and the target is recalibrated — not the other way around.
Where Fora Soft fits in
Fora Soft has built video streaming, OTT, conferencing, e-learning, surveillance, and telemedicine systems since 2005, and on every one of them the quality target is the decision that ties picture quality to running cost. We help teams set the target the way this article describes — a per-platform VMAF number anchored to a real subjective test, then converted into a delivery budget the business can read — and wire it into the pipeline as an enforced gate rather than a hope. Where a project needs evidence rather than rules of thumb, our own measured benchmarks (see our benchmark methodology) give the rate-quality data behind a target for a given codec and content type. The aim is never the highest possible number; it is the lowest target that no viewer can tell from perfect.
What to read next
- The business case for measuring quality — the ROI argument behind the target.
- Quality gates in CI/CD — turning the target into an automated threshold.
- The convex hull: optimal bitrate-resolution points — the geometry that turns a target into a bitrate.
Call to action
- Talk to a video engineer — book a 30-minute scoping call to talk through your vmaf target plan.
- See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
References
- Andreas Kah, Christopher Friedrich, Thomas Rusert, Christoph Burgmair, Wolfgang Ruppel, Matthias Narroschke. "Fundamental relationships between subjective quality, user acceptance, and the VMAF metric for a quality-based bit rate ladder design for over-the-top video streaming services." SPIE 11842-38 (Applications of Digital Image Processing XLIV), 2021. Tier 5. Transparency ≈ VMAF 95; no visible difference within 2 VMAF; acceptance floors ≈55 (free) / ≈70 (paid); ladder-rung formula. https://www.hs-rm.de/fileadmin/user_upload/SPIE_11842-38_HSRM.pdf
- Reza Rassool. "VMAF Reproducibility: Validating a Perceptual Practical Video Quality Metric." IEEE BMSB, 2017. Tier 5. VMAF ≈93 maps to MOS 4–5 ("indistinguishable, or noticeable but not annoying"). https://realnetworks.com/sites/default/files/vmaf_reproducibility_ieee.pdf
- Netflix. "VMAF — Models documentation" (Netflix/vmaf,
resource/doc/models.md), accessed 2026-06-24. Tier 1 (metric-author primary). The default 1080p/3H model, the--phone-model, and the 4K/1.5H model; the same encode scores higher on the phone model. https://github.com/Netflix/vmaf/blob/master/resource/doc/models.md - Netflix Technology Blog. "Toward a Practical Perceptual Video Quality Metric" (Li, Aaron, Katsavounidis, Moorthy, Manohara), 2016. Tier 4. The defining VMAF post; basis for VMAF's training scale and meaning. https://netflixtechblog.com/toward-a-practical-perceptual-video-quality-metric-653f208b9652
- Jan Ozer. "Finding the Just Noticeable Difference with Netflix VMAF." Streaming Learning Center, 2017. Tier 6. Records Netflix's guidance that 6 VMAF points ≈ 1 JND, and that VMAF is relative to its source. https://streaminglearningcenter.com/codecs/finding-the-just-noticeable-difference-with-netflix-vmaf.html
- Jan Ozer. "Identifying the Top Rung of a Bitrate Ladder." OTTVerse, 2022. Tier 6. Synthesis of the Rassool and Kah papers (target 93–95) and measured top rungs: premium 95–96, UGC 84–92. https://ottverse.com/top-rung-of-encoding-bitrate-ladder-abr-video-streaming/
- Netflix Technology Blog (C. Bampis, Z. Li, K. Swanson, et al.). "VMAF v1: Good Is Not Good Enough." June 2026. Tier 4. The current VMAF generation: new 1080p/phone/4K models adding chroma (modified SpEED-QA) and banding (CAMBI) awareness and a motion threshold — closing blind spots that mattered for targets. https://netflixtechblog.com/vmaf-v1-good-is-not-good-enough-60d7e4244ea8
- Recommendation ITU-R BT.500-15. "Methodologies for the subjective assessment of the quality of television pictures." International Telecommunication Union, 2023. Tier 1. The subjective-assessment ground truth that any VMAF target is ultimately validated against. https://www.itu.int/rec/R-REC-BT.500
- Recommendation ITU-R BT.2022. "General viewing conditions for subjective assessment of quality of SDTV and HDTV television pictures on flat panel displays." International Telecommunication Union, 2012. Tier 1. The viewing-distance basis (3H for 1080p, 1.5H for 4K) behind the per-device VMAF models. https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.2022-0-201208-W!!PDF-E.pdf
- Meta Engineering. "How Facebook encodes your videos." 2021. Tier 4. A production cost/benefit model that sets per-asset quality effort by predicted watch time — a real-world per-content quality-target policy. https://engineering.fb.com/2021/04/05/video-engineering/how-facebook-encodes-your-videos/
- Backblaze. "CDN Bandwidth Fees: Costs, Factors, and How to Save." 2024. Tier 6. Egress-pricing context: bitrate as a direct multiplier on delivery cost and egress as the dominant media-bill line. https://www.backblaze.com/blog/cdn-bandwidth-fees-what-you-need-to-know/


