Why this matters
If you build bitrate ladders, the convex hull is the geometry that decides which resolution belongs at each rung — and getting it right is the difference between a 540p rung that looks clean and a 1080p rung that blocks up at the same bitrate. This article is for the streaming or encoding lead, the platform engineer, or the QA engineer who has heard "put each rung on the convex hull" and wants to actually understand the curve, build it from real measurements, and avoid the one mistake that quietly corrupts it. It is the measurement-side companion to per-title encoding: per-title tells you to fix a quality target and let the bitrate float; the convex hull tells you which resolution to use once you know the bitrate. Get the hull wrong and every ladder you build from it is wrong in the same way.
The question the hull answers
Start with a question that sounds like it has an obvious answer, and does not. You are about to encode a rung of your bitrate ladder at, say, 1.5 Mbps. What resolution should that rung be?
The tempting answer is "the highest you can" — surely more pixels is more picture. The real answer is "it depends on the bitrate," and the dependence is strong enough to flip the decision. A resolution is the pixel grid the frame is encoded at — 1920×1080 (1080p), 1280×720 (720p), 960×540 (540p), and so on. A bitrate is how many bits per second you spend to send it. The thing people forget is that those bits have to be spread across every pixel, and 1080p has four times as many pixels as 540p. At a low bitrate, the 1080p encode is trying to describe four times the detail with the same handful of bits, so it runs out and falls back on coarse approximations the viewer sees as blocking and smearing. The 540p encode, with a quarter of the pixels to feed, spends its bits comfortably, produces a clean small frame, and the player scales that clean frame up to fill the screen. Up to a point, the clean-and-upscaled small frame looks better than the starved big one.
So the answer to "which resolution at 1.5 Mbps?" is a measured result, not a guess. Measure quality at several resolutions across a range of bitrates, and you find that each resolution wins inside its own bitrate band and loses outside it. The convex hull is the tool that captures "whichever resolution is winning right now" as one continuous line. Before we build it, we need to be precise about what a convex hull actually is, because the name comes from geometry, not from video.
What a convex hull is, in plain terms
Forget video for a moment. Take a handful of points scattered on a page — pins stuck in a corkboard. Now stretch a rubber band so it surrounds all of them and let go. The rubber band snaps to the outermost pins and forms a taut loop; every pin is either on the loop or inside it. That loop is the convex hull of the points: the smallest convex shape that contains them all (de Berg et al., Computational Geometry, 3rd ed., 2008).
Two properties define it. It is convex — pick any two points inside it and the straight line between them stays inside; the boundary never caves inward. And it is minimal — it is the tightest such shape; no smaller convex shape contains every point. The rubber-band picture captures both: a stretched band is always taut (convex) and always the smallest loop that still holds every pin (minimal).
For video we do not need the whole loop — only its top edge. We are going to plot points where going up means better quality, so the only points we care about are the ones along the upper boundary, the ones nothing beats. That top edge is what people mean when they say "the convex hull" in encoding: the upper-left frontier of the best points. A point that sits below the frontier is dominated — some other encode is at least as cheap and at least as good — and a dominated encode is never the right choice. The hull is exactly the set of encodes that are not dominated by any other. Economists call this a Pareto frontier; the encoding world calls it the convex hull. Same idea: keep only the choices nothing strictly beats.
Figure 1. The convex hull is the rubber band around a set of points — the smallest convex shape containing them all. In encoding we keep only the upper frontier: the points that no other point beats on both bitrate and quality. Points below the frontier are "dominated" and never worth using.
Why the resolution curves cross
Now bring the points back to video. For one piece of content, encode it at one resolution across a range of bitrates and measure the quality of each encode. Plot bitrate on the horizontal axis and a quality score — say VMAF (Video Multi-method Assessment Fusion, Netflix's perceptual metric scored 0–100, where higher means closer to the original to a human eye; see VMAF explained) — on the vertical axis. You get one rising curve: more bits, better picture, with the gains flattening at the top as the encode approaches visual transparency and extra bits buy nothing.
Do that for several resolutions and you get several curves — and they cross. This is the heart of the matter. At low bitrates the low-resolution curves sit on top: a 540p frame encoded at 800 kbps is clean, and clean-upscaled beats the blocky 1080p encode that 800 kbps produces. At high bitrates the high-resolution curves take over: once 1080p has enough bits to render its detail without artifacts, it shows more real picture than any upscaled smaller frame can. Somewhere in between, the curves intersect. That intersection is the crossover bitrate — the bitrate at which it stops being worth staying at the lower resolution and starts being worth stepping up.
The crossovers are real and measurable, not theoretical. Fraunhofer FOKUS, reproducing the Netflix method with FFmpeg, found that at around 500 kbps the 480p and 720p encodes delivered better PSNR than 1080p of the same content, even after upscaling for the comparison (Fraunhofer FOKUS Video-Dev, 2019). On a different clip, an analysis of the "Dolls" sequence put the 540p-to-1080p crossover at roughly 2.0 Mbps: below 2.0 Mbps, 540p scored higher VMAF; above it, 1080p won (Optimal Transcoding Resolution Prediction, arXiv, 2024). The exact crossover bitrate depends entirely on the content — a flat cartoon and a grainy crowd shot cross at very different places — which is the whole reason the hull has to be measured per asset rather than assumed.
Figure 2. One rate-quality curve per resolution, and they cross. At low bitrate the smaller frames (clean, then upscaled) win; at high bitrate the larger frames pull ahead once they have bits to spend. The bitrate where two curves meet is the crossover — where it becomes worth stepping up a resolution.
Building the convex hull
The hull is the upper envelope of those crossing curves: at every bitrate, take whichever resolution's curve is highest, and trace that "best of all resolutions" line from left to right. Where you follow the 360p curve, then the 540p curve, then the 720p curve, then the 1080p curve, the switches happen exactly at the crossover bitrates. The result is a single curve made of the winning segments of every resolution — the convex hull. Each point you would ever put on a bitrate ladder lives on this line; nothing below it is worth encoding.
Here is the procedure Netflix described when it introduced per-title encoding in 2015, stated as steps you can run.
Step 1 — Encode a grid. Pick a set of resolutions and a set of quality or bitrate settings, and encode the content at every combination. This is deliberately brute force. Fraunhofer's reproduction used seven resolutions and twelve constant-rate-factor values — 84 test encodes for one piece of content (Fraunhofer FOKUS, 2019). Netflix's own per-shot system encodes "hundreds of combinations of resolution and data rate" per source (Netflix Technology Blog, 2018). The constant-rate-factor (CRF) setting holds quality roughly steady and lets bitrate vary, which spreads your points naturally along the bitrate axis.
Step 2 — Measure each encode against the source. Score every encode with a quality metric — VMAF or, in the original demonstration, PSNR (peak signal-to-noise ratio, the number that compares a compressed frame to the original pixel by pixel, in decibels, where higher is closer; see PSNR explained). This step contains the one trap that ruins more convex hulls than anything else, so it gets its own section below.
Step 3 — Drop the dominated points and keep the frontier. Among all the (bitrate, quality) points from the grid, discard any point that another point beats on both axes — cheaper and better. What survives is the upper frontier: the convex hull. Plotted, it is the smooth line hugging the best points of every resolution.
Step 4 — Read the resolution off the hull. For any bitrate you want a rung at, find where it lands on the hull and use that resolution. Below the first crossover, that is your lowest resolution; past the last crossover, your highest. The hull has told you the answer to the opening question — which resolution at this bitrate — for every bitrate at once.
Figure 3. The convex hull (bold) is the upper envelope of the resolution curves — the best achievable quality at each bitrate. The green points are the non-dominated winners; the hull follows 360p, then 540p, then 720p, then 1080p, switching at each crossover. Every sensible ladder rung sits on this line.
The measurement trap that corrupts the hull
The convex hull is only as honest as the numbers you build it from, and there is one measurement mistake that silently poisons it. Both PSNR and VMAF are full-reference metrics: they compare a distorted frame to the pristine original, pixel against pixel. That comparison only works when the two frames are the same size. But the whole point of the hull is to compare encodes at different resolutions — a 540p encode against a 1080p source. The frames are not the same size, so the metric cannot run directly.
The fix is a required step, not an optional one: before scoring a lower-resolution encode, upscale the decoded frame back to the source resolution, then compare the upscaled frame to the source. This mirrors what actually happens in the home — a 540p stream is decoded and scaled up to fill a 1080p or 4K screen — so scoring the upscaled frame measures what the viewer truly sees (Fraunhofer FOKUS, 2019). Skip the upscale and you are comparing a small frame to a big one, which either crashes the metric or, worse, produces numbers that look plausible but are meaningless. A hull built on un-upscaled scores is not a weaker hull; it is a wrong one.
The upscaling filter is itself a decision that moves the hull. Netflix recommends bicubic upsampling for this measurement, which is also FFmpeg's default; Lanczos is another common choice (Netflix Technology Blog; Fraunhofer FOKUS, 2019). The choice is not cosmetic: one study found the 1080p-versus-lower-resolution crossover sitting near 320 kbps with one upscaler and near 1220 kbps with another on the same content (Cross-resolution VMAF analysis, arXiv, 2024). A different filter shifts the crossover bitrates, which shifts which resolution wins in a band, which changes the ladder. Pick one upscaler, write it down, and use it for every encode you compare — apples to apples or nothing.
A minimal, current-style FFmpeg measurement of a 540p encode against a 1080p source, upscaling first, looks like this:
# Decode the 540p encode, upscale to the 1080p source size with bicubic,
# then score VMAF against the original. State the model and the scaler.
ffmpeg -i encode_540p.mp4 -i source_1080p.y4m \
-lavfi "[0:v]scale=1920:1080:flags=bicubic[up];[up][1:v]libvmaf=model=version=vmaf_v0.6.1" \
-f null -
The libvmaf filter syntax has changed across FFmpeg releases — confirm the exact form for your version — but the principle is fixed: scale the smaller frame up to the source size, name the model, then score. The deeper tooling walk-through lives in measuring quality with FFmpeg and libvmaf.
Figure 4. The required step before scoring across resolutions. Decode the lower-resolution encode, upscale it to the source size with a fixed filter (bicubic by default), then run the full-reference metric against the source. Scoring frames of different sizes directly produces meaningless numbers and a corrupted hull.
A worked example: reading a ladder off the hull
Numbers make it concrete. Suppose you have measured one asset at four resolutions and, after upscaling each encode to the 1080p source and scoring VMAF, you have these points (bitrates in kbps, VMAF on the default 1080p model):
| Bitrate (kbps) | 360p | 540p | 720p | 1080p | Hull winner |
|---|---|---|---|---|---|
| 400 | 78 | 74 | 68 | 60 | 360p (78) |
| 800 | 86 | 88 | 84 | 76 | 540p (88) |
| 1500 | 90 | 93 | 94 | 90 | 720p (94) |
| 3000 | 92 | 95 | 96.5 | 97 | 1080p (97) |
| 6000 | 92.5 | 96 | 97.5 | 99 | 1080p (99) |
Read the "Hull winner" column top to bottom and you can see the hull walk up the resolutions: 360p owns the lowest rung, 540p the next, 720p the middle, and 1080p the top two. The crossovers are visible too — 360p hands off to 540p somewhere between 400 and 800 kbps, and 720p hands off to 1080p between 1500 and 3000 kbps. If you now want a four-rung ladder, you do not pick resolutions by habit; you read them off the hull: 400 kbps → 360p, 800 kbps → 540p, 1500 kbps → 720p, 3000 kbps → 1080p. Notice the 3000 kbps rung: 1080p (97) beats 720p (96.5) only by half a VMAF point, so on a content where storage or compute is tight you might keep 720p there and lose almost nothing — the hull shows you the trade, it does not make it for you.
The companion convex-hull plotter does exactly this from your own measurements: feed it a grid of (resolution, bitrate, quality) points and it computes the hull, marks the crossover bitrates, draws the rate-quality plot with the hull overlaid as an SVG, and — given two hulls — computes the BD-rate between them. It reproduces this table in its --demo.
Why it is a convex hull: the rate-distortion view
The plain-language version above is enough to build a ladder. For the reader who wants the layer underneath, here is why the winning frontier is convex and not just any upper edge — and why the word "convex" is doing real work.
Compression lives in rate-distortion space: every encode is a trade between rate (bits) and distortion (how far the result is from the original — the inverse of quality). Decades of coding theory frame the encoder's job as minimizing a combined cost, written J = D + λR — distortion plus a weight λ times rate (Sullivan and Wiegand, Rate-Distortion Optimization for Video Compression, IEEE Signal Processing Magazine, 1998). The multiplier λ (lambda) sets the exchange rate between bits and quality: a small λ says "bits are cheap, chase quality," a large λ says "bits are precious, economize." Sweep λ and you trace out operating points, and the set of points reachable this way is precisely the convex hull of all achievable (rate, distortion) points.
This is the punchline that justifies the name: optimizing the D + λR cost can only ever land you on the convex hull — points that dip below it (a better-than-hull trade) do not exist, and points above it (a dominated, wasteful trade) are never chosen because some λ gives you a strictly better one. The hull is the frontier of the physically possible. A useful consequence falls out of it: at the optimal ladder, every rung sits where the hull's local slope matches its λ, so the best rungs share a common "bang per bit." Netflix's per-shot optimizer uses exactly this — it picks shot encodes of approximately equal slope in rate-distortion space and stitches them into a whole-video encode along a Trellis (Netflix Technology Blog, Dynamic Optimizer, 2018). You do not need the calculus to build a ladder, but it explains why the hull, and only the hull, is where the good encodes live.
The hull is the math behind per-title encoding
The convex hull and per-title encoding are the same idea seen from two angles. Per-title encoding says: stop using one fixed ladder for every video; give each title the ladder its content actually needs. The convex hull is how you find that ladder. Building the per-title ladder is exactly Steps 1–4 above — encode a grid, measure it, take the hull, read the rungs — done once per asset. The hull is where "this title needs these resolutions at these bitrates" comes from.
Per-shot encoding pushes the same machinery one level finer. A two-hour film is not uniformly hard: a dark dialogue scene and a confetti explosion want different resolutions at the same bitrate. Per-shot encoding computes a convex hull for each shot — each run of similar frames — and then combines the per-shot choices into a whole-title encode, again using the equal-slope rule to decide how many bits each shot deserves (Netflix Technology Blog, Dynamic Optimizer, 2018). The payoff is real: Netflix reported around 25% BD-rate savings for multi-shot videos from shot-based optimization, and roughly 28% (x264), 34% (x265), and 38% (VP9) bitrate savings at equal VMAF versus fixed-quality encoding (Katsavounidis and Guo, SPIE, 2018). The cost is that per-shot multiplies the number of encodes by two orders of magnitude — you are building hundreds of little hulls instead of one. The codec-side mechanics of how those encodes are produced and assembled into a ladder live in the Video Encoding section; this article stays on the measurement side — how the hull is found and read.
The hull and BD-rate: the same curve, two uses
The convex hull is also the curve that BD-rate is computed on, which is why this article and BD-rate explained are companions. BD-rate (Bjøntegaard Delta rate) is the standard way to say "codec A needs X% less bitrate than codec B for the same quality." It is computed by taking each codec's rate-quality curve — its convex hull — interpolating between the measured points, and measuring the average horizontal gap between the two curves across the quality range (Bjøntegaard, Calculation of average PSNR differences between RD-curves, ITU-T VCEG-M33, 2001). The gap, expressed as a percentage of bitrate, is the BD-rate.
Two things follow. First, BD-rate is only as trustworthy as the hulls it compares: if one hull was built with un-upscaled scores or a different upscaler, the BD-rate between them is fiction. Second, BD-rate is a saving at equal quality, not a quality score — keep it distinct from VMAF or PSNR, which are quality scores. The convex hull is the shared foundation: build it correctly once, and you can both read a ladder off it and compare two of them with BD-rate. The rate-quality curve that underlies both is treated as its own visualization topic in visualizing quality.
Common mistakes that corrupt a convex hull
The hull is a precise object, and a handful of mistakes quietly break it. Naming them is the difference between a ladder you can trust and one that looks principled but is not.
Scoring across resolutions without upscaling. The trap from the measurement section, repeated because it is the most common and most damaging. A full-reference metric needs both frames at the same size; compare a 540p encode to a 1080p source without upscaling first and the numbers are meaningless. Every cross-resolution point on the hull must come from an upscaled-to-source comparison.
Changing the upscaler mid-grid. The hull is only valid if every point was measured the same way. Mixing bicubic for some encodes and Lanczos for others shifts crossover bitrates between points that should be comparable. Pick one filter and use it throughout.
Treating a VMAF hull and a PSNR hull as interchangeable. The hull's shape depends on the metric. PSNR rewards pixel fidelity; VMAF rewards perceived quality; they crown different resolutions at the same bitrate. And a VMAF hull depends on which VMAF model you used — the default television model, the phone model, the 4K model all draw different hulls because they weight detail differently. State the metric and model with every hull, and match the model to how the content is watched (see VMAF in depth). Remember what these metrics are: both are proxies for the human eye, validated against subjective scores gathered under controlled viewing conditions (ITU-R BT.500-15, 2023). A convex hull is therefore a model of what viewers perceive, not perception itself — where a hull and a careful viewing disagree, the viewing is the ground truth and the metric is the proxy that missed on that content.
Trusting a hull built on mean scores. Each point on the hull is usually a mean quality over all frames, and a mean can hide a bad stretch. A hull that says "540p at 800 kbps scores VMAF 88" might be averaging a clean talking-head over a smeared action beat. Check the low percentile, not just the mean, before you commit a rung to the ladder (see pooling per-frame scores).
Forgetting the hull is per-content and dated. There is no universal convex hull. The hull is specific to this asset, this codec, this encoder version, and these settings; a new encoder release or a different codec redraws it. A hull is a measurement with a date and a configuration attached, not a constant — which is also why the broader catalogue of where objective metrics lie applies to every point on it.
The cost of the hull, and how teams cut it
The honest weakness of the convex hull is its price. Building it the brute-force way means encoding the same content dozens or hundreds of times — every resolution at every quality point — then measuring all of them. For a large catalogue, or for per-shot where the count multiplies again, that is a serious compute bill. It is affordable for a title watched tens of millions of times and hard to justify for a long-tail asset watched a handful of times; the trade is encoding compute now against bandwidth saved over the title's lifetime of views.
This is an active research area, and the direction is clear: predict the hull instead of measuring all of it. Machine-learning methods estimate a content's convex hull from features of the video, skipping most of the brute-force encodes with little loss in rate-quality performance; a 2025 survey catalogues the family of convex-hull-prediction methods now in use (Telili et al., Convex Hull Prediction Methods for Bitrate Ladder Construction, ACM TOMM, 2025). A lighter, dependency-free version of the same instinct — reduce the grid, keep the points most likely to land on the hull — is what production teams reach for first. The hull stays the goal; what changes is how much encoding you spend to find it.
Where Fora Soft fits in
Fora Soft has built video software since 2005 — streaming and OTT, video conferencing, e-learning, telemedicine, and surveillance — and when a client's bandwidth bill or picture quality is the problem, the convex hull is one of the first things we measure. We build it the disciplined way: a measurement grid per asset, every lower-resolution encode upscaled to the source before scoring, one named upscaler and one named VMAF model held constant, and the low percentile checked before a rung goes on the ladder. For an OTT or e-learning catalogue that turns into bandwidth and storage saved; for a live conferencing or surveillance product, where there is no pristine reference to score against, the same thinking moves to no-reference methods and a different part of the pipeline. Our benchmark methodology applies the same measure-then-decide discipline to our own codec tests, so the ladders we recommend rest on hulls we actually measured, dated and reproducible.
What to read next
- Per-Title and Per-Shot Encoding, and How Quality Drives Them
- BD-Rate Explained, With Our Numbers
- Setting a Quality Target and a Quality Budget
Call to action
- Talk to a video engineer — book a 30-minute scoping call to talk through your convex hull encoding plan.
- See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
References
- Netflix Technology Blog (A. Aaron, Z. Li, A. Norkin, et al.), "Per-Title Encode Optimization," 2015 (with the accompanying IEEE paper, 2016). Tier 4 (vendor engineering blog — the work that introduced per-title encoding and the convex-hull-of-resolutions method to streaming). Source of the brute-force grid (encode each source at hundreds of resolution/bitrate combinations), the convex-hull construction, and the ~20% per-title saving. Basis for the build procedure and the per-title connection. https://netflixtechblog.com/per-title-encode-optimization-7e99442b62a2
- Netflix Technology Blog / I. Katsavounidis, "Dynamic optimizer — a perceptual video encoding optimization framework," 2018. Tier 4 (vendor engineering blog — the defining work for per-shot/shot-based convex-hull optimization). Source of the per-shot convex hull, the equal-slope/Trellis combination of shot encodes, the ~25% BD-rate saving for multi-shot videos, and the two-orders-of-magnitude encode-count cost. Basis for the per-shot and rate-distortion sections. https://netflixtechblog.com/dynamic-optimizer-a-perceptual-video-encoding-optimization-framework-e19f1e3a277f
- I. Katsavounidis and L. Guo, "Video codec comparison using the dynamic optimizer framework," SPIE Applications of Digital Image Processing XLI, 2018 (see also arXiv:1808.03898). Tier 5 (peer-reviewed). Source of the per-shot bitrate-saving figures at equal VMAF: ~28% (x264), ~34% (x265), ~38% (VP9). Basis for the per-shot savings numbers. https://arxiv.org/pdf/1808.03898
- G. Bjøntegaard, "Calculation of average PSNR differences between RD-curves," ITU-T VCEG document VCEG-M33, 2001. Tier 1 (metric-author defining work). Defines BD-rate as the average bitrate difference between two rate-distortion (convex-hull) curves at equal quality, computed by interpolating the curves and measuring the area between them. Basis for the BD-rate section. https://www.itu.int/wftp3/av-arch/video-site/0104_Aus/VCEG-M33.doc
- VMAF — Video Multi-method Assessment Fusion (repository and documentation), Netflix (Z. Li, C. Bampis, et al.), with the Netflix Tech Blog VMAF posts. Tier 1 (metric-author defining work). The perceptual full-reference metric used to score the grid and build the hull; source of the model families (default / phone / 4K) and the bicubic-upsampling recommendation for cross-resolution scoring. Basis for the metric choice and the model caveat. https://github.com/Netflix/vmaf
- T. Wiegand and B. Girod / G. J. Sullivan and T. Wiegand, "Rate-Distortion Optimization for Video Compression," IEEE Signal Processing Magazine, 1998. Tier 5 (peer-reviewed, foundational). Source of the J = D + λR Lagrangian formulation and the result that rate-distortion optimization reaches only points on the convex hull. Basis for the rate-distortion / "why convex" section. https://ieeexplore.ieee.org/document/733497
- Fraunhofer FOKUS Video-Dev (D. Silhavy), "Per-Title Encoding," 2019. Tier 6 (institutional engineering blog). The clearest public FFmpeg reproduction of the Netflix method: the 7-resolution × 12-CRF grid (84 encodes), the mandatory upscale-to-source step before PSNR/VMAF, the observation that lower resolutions beat 1080p below ~500 kbps, and the convex-hull plot. Orientation for the build steps and the measurement trap. https://websites.fraunhofer.de/video-dev/per-title-encoding/
- W. Telili, W. Hamidouche, et al., "Convex Hull Prediction Methods for Bitrate Ladder Construction: Design, Evaluation, and Comparison," ACM Transactions on Multimedia Computing, Communications, and Applications, 2025 (arXiv:2310.15163). Tier 5 (peer-reviewed survey). Source of the current state of machine-learning convex-hull prediction that avoids exhaustive brute-force encoding. Basis for the "cost of the hull" section. https://arxiv.org/html/2310.15163
- "Optimal Transcoding Resolution Prediction for Efficient Per-Title Bitrate Ladder Estimation," arXiv, 2024. Tier 5 (peer-reviewed). Source of the worked crossover example (540p-to-1080p crossover near 2.0 Mbps for the "Dolls" sequence) and the per-title resolution-prediction framing. Basis for the concrete crossover figure. https://arxiv.org/html/2401.04405v1
- M. de Berg, O. Cheong, M. van Kreveld, M. Overmars, "Computational Geometry: Algorithms and Applications," 3rd ed., Springer, 2008. Tier 5 (standard textbook). Source of the convex-hull definition (smallest convex set containing a point set) and the convexity/minimality properties. Basis for the "what a convex hull is" section. https://link.springer.com/book/10.1007/978-3-540-77974-2
- Recommendation ITU-R BT.500-15, "Methodologies for the subjective assessment of the quality of television pictures," International Telecommunication Union, 2023. Tier 1 (official standard). The subjective-assessment methodology that PSNR and VMAF are ultimately validated against; cited for the measurement-honest point that a convex hull built from objective scores is a model of perceived quality, with a properly run subjective test as the ground truth. Basis for the "VMAF hull vs PSNR hull" caveat and the proxy framing. https://www.itu.int/rec/R-REC-BT.500


