Why this matters
If you measure video quality, you will eventually try to run a number where it cannot work — asking for a VMAF score on a webcam feed with no master, or comparing two clips that are silently misaligned by a few frames. Both produce confident numbers that mean nothing. This article gives you the one mental check that prevents most of those errors: name your reference setup before you name your metric. It is the foundation for the metric deep-dives in Block 2 and for the hard real-world case of no-reference quality for live and UGC, and it is the difference between a measurement you can defend and a screenshot you can only argue about.
The question that decides everything
Every video quality metric is built to answer "how good does this look?" But they split on a more basic question first: what do you have to compare the impaired video against?
Call the clean, uncompressed source the reference — the pristine original before any encoding, scaling, or transmission touched it. The impaired video is what came out the other end. Quality measurement is, at heart, the gap between those two. The catch is that you do not always have the reference at the place and time you want to measure. In an encoder lab the master file is sitting on disk; on a phone playing a live stream, the master never left the broadcaster's building, and the only video that exists on that phone is the already-compressed one.
How much of the reference you can get your hands on sorts every metric into one of three families. A full-reference metric needs the whole original. A reduced-reference metric needs only a small fingerprint of it. A no-reference metric needs none of it. These are not three flavors of the same tool — they are three different jobs, with different accuracy, different costs, and different places they belong. Get the family wrong and the most accurate metric in the world will hand you nonsense.
Figure 1. The one question that sorts every metric: how much of the original do you still have where you are measuring?
Full-reference: you have the original
A full-reference metric (often written FR) compares the impaired video against the complete pristine original, frame for frame and pixel for pixel. It is the comfortable, accurate case, and it is where the famous metrics live.
Think of it as proofreading a copy against the original document with both pages side by side. You can point to every difference because you can see exactly what the text was supposed to say. PSNR (Peak Signal-to-Noise Ratio, raw pixel error in decibels), SSIM (Structural Similarity Index, a 0-to-1 structural comparison), MS-SSIM (its multi-scale extension), and VMAF (Netflix's perception-trained 0-to-100 metric) are all full-reference. So are the older ITU-standardized models: ITU-T J.144 (2004) and ITU-R BT.1683 (2004) defined full-reference measurement for standard-definition television, and ITU-T J.247 (2008) did the same for multimedia. Each needs the master to compute a score.
Because it can see exactly what was lost, a full-reference metric is the most accurate family — it generally correlates best with what humans rate. That is why it owns the lab. You reach for full-reference whenever the master is on disk: comparing encoders, tuning settings, building a regression suite, running a quality gate in CI/CD, or scoring a video-on-demand library before it ships.
Full-reference's hidden catch: alignment
There is a requirement that newcomers skip and then spend a day debugging. A full-reference metric assumes the two videos are aligned: the same frame in each, at the same resolution, with the same brightness scaling. Compare frame 100 of the impaired video against frame 98 of the original — a tiny temporal shift — and the metric reports a quality collapse that the eye would never see. The same happens with a one-pixel spatial shift or a small luminance offset.
This is why the people who build these tools treat calibration as a first-class step. The U.S. National Telecommunications and Information Administration (NTIA), whose VQM model became ITU-T J.144, warns that PSNR in particular "is very sensitive to calibration errors," and ITU-T standardized a special exhaustive-search PSNR (ITU-T J.340) precisely to find the ideal spatial shift, temporal shift, and luminance gain before scoring. The rule for you is simple: a full-reference number is only valid once the two videos are aligned. An unaligned full-reference measurement is not a low score, it is a broken one.
Figure 2. Full-reference assumes alignment. A few frames or pixels of offset turn a valid score into a meaningless one.
Reduced-reference: you have a fingerprint
A reduced-reference metric (RR) is the middle ground. It does not carry the whole original — only a compact set of features extracted from it, a few numbers small enough to travel alongside the stream through a side channel. At the far end, the metric extracts the same features from the impaired video and compares the two fingerprints.
The analogy is a tamper-evident seal. You do not ship the original document to the recipient; you ship a short checksum of its key properties, and the recipient checks their copy against that. You lose the ability to see every difference, but you gain the ability to measure quality somewhere the full original could never reach.
This idea has a serious pedigree. The NTIA invented a reduced-reference paradigm built on low-bandwidth perceptual features — edges and motion energy — that could be communicated through a broadcast network. It was standardized as the Fast Low Bandwidth Model in ITU-T J.249 (2010), with ITU-T J.246 (2008) defining reduced-reference measurement for multimedia over cable and ITU-T J.143 (2000) setting out the user requirements for in-service perceptual measurement. The promise was monitoring quality in service — at a set-top box or a network probe — where you will never have the master but can afford to receive a trickle of reference features.
Why reduced-reference is rare in practice
If it sounds clever, it is — and you will still almost never use it. The reason is logistical, and the NTIA states it plainly: "most companies are not able to connect the measurement point with the originating video supply," yet that connection is exactly what a reduced-reference deployment needs to deliver the reference features to the far end. Building and maintaining that side channel rarely pays off. In practice, teams use full-reference metrics in the lab where the master is local, and no-reference metrics in production where it is not — and the middle is squeezed out. The NTIA itself notes that all current implementations of its low-bandwidth model run in full-reference mode, and it has discontinued reduced-reference research. Reduced-reference matters to understand the taxonomy and to read older broadcast literature; it is not where you will spend your time.
No-reference: you have only the impaired video
A no-reference metric (NR), also called blind, gets only the impaired video and must judge its quality with nothing to compare against. It is the human ability to glance at a photo and say "that looks compressed" without ever seeing the original — turned into an algorithm.
This is the hard case, and it is also the most common one in the real world. Live broadcast, a video call, a security camera, a clip a user just uploaded — none of them has a pristine master, because the camera feed already is the only version that exists. If you want to measure quality there, no-reference is not the convenient option, it is the only option. That is why no-reference quality for live and UGC and monitoring quality in production lean on it entirely.
No-reference metrics come in three broad styles, and the differences matter when you choose one.
The first style is natural-scene-statistics metrics. They model what undistorted images statistically look like, then measure how far the impaired video has drifted from that model. NIQE (Natural Image Quality Evaluator) and BRISQUE (Blind/Referenceless Image Spatial Quality Evaluator) are the classic examples; they are fast, run without ever seeing human scores in some configurations, and are cheap enough for real-time use.
The second style is learned metrics — deep neural networks trained on large sets of videos with human ratings. This is the active frontier: recent work includes a no-reference version of VMAF that uses a convolutional network to predict the full-reference VMAF score from pixels alone, and Google's UVQ for user-generated content. These close much of the accuracy gap with full-reference on the content they were trained on, at higher compute cost.
The third style does not look at pixels at all. Bitstream and parametric metrics read the encoded stream's metadata — bitrate, resolution, codec, frame types, quantization — and predict quality from that. ITU-T P.1203 (2017) predicts the quality of a streaming session this way, and ITU-T P.1204.3 (2020) is a bitstream-based no-reference model for resolutions up to 4K. Because they skip decoding, they are light enough to run on every stream in a content delivery network.
The honest caveat is accuracy. With no original to anchor against, a no-reference metric is the hardest to trust: its accuracy varies widely by content type, and a clean-looking but low-quality source can fool it. Treat a no-reference score as a useful signal for trends and alerting, not as ground truth — and never compare no-reference scores across very different content or devices as if they shared one scale.
A second axis: pixels vs the bitstream
The full/reduced/no-reference split is about how much of the reference you have. A second, independent axis is about what the metric reads: the decoded pixels, or the encoded bitstream. Most classic metrics (PSNR, SSIM, VMAF) are pixel-based — they decode the video and look at the picture. Bitstream and parametric models read the compressed file's parameters instead, and hybrid models do both.
The ITU-T P.1204 (2020) family is the cleanest illustration, because it ships all three points: P.1204.3 is a bitstream-based no-reference model, P.1204.4 is a pixel-based full-reference/reduced-reference model, and P.1204.5 is a hybrid no-reference model that combines metadata and pixels. Keeping the two axes separate stops a common confusion: "no-reference" does not mean "low-tech." A hybrid no-reference model can be sophisticated and accurate; it simply does its work without the master.
Figure 3. Two axes, not one. Reference availability is separate from whether a metric reads pixels or the bitstream.
How much accuracy you trade
The three setups form a ladder. With everything to compare against, full-reference metrics generally track human opinion best; with a fingerprint, reduced-reference gives up some accuracy for reach; with nothing, no-reference is the hardest to get right. Quality-assessment surveys consistently report this ordering, and standards bodies measure it directly: an objective metric is graded by how well its scores correlate with human Mean Opinion Scores, using the Pearson Correlation Coefficient (PCC) and Spearman rank correlation, per ITU-T P.1401 (2020).
A worked sense of the numbers helps, with one firm caveat: correlation figures are only comparable on the same test database, so treat these as illustrative of the ordering, not as fixed grades. On modern streaming content, a well-run full-reference VMAF measurement commonly reaches a PCC around 0.95 against MOS. Strong no-reference models trail but have closed much of the gap — the hybrid no-reference ITU-T P.1204.5 model, for example, reported a Pearson correlation near 0.93 in its validation. The lesson is not a leaderboard; it is that you pay for missing reference information in accuracy, and the price has been falling as learned models improve.
| Setup | What it can compare against | Example metrics & standards | What it measures well | Where it lies (blind spot) | Typical use |
|---|---|---|---|---|---|
| Full-reference (FR) | The complete pristine original | PSNR, SSIM, MS-SSIM, VMAF; ITU-T J.144, J.247 | Exact, perception-correlated loss vs the master | Useless with no master; broken if misaligned | Encoder tests, regression suites, VOD QC, CI/CD gates |
| Reduced-reference (RR) | A compact feature fingerprint of the original | NTIA Fast Low Bandwidth Model; ITU-T J.246, J.249 | In-service quality where the full master cannot travel | Needs a side channel to the source; rarely deployed | Broadcast/transmission monitoring (mostly historical) |
| No-reference (NR / blind) | Nothing — the impaired video alone | NIQE, BRISQUE, NR-VMAF, UVQ; ITU-T P.1203, P.1204.3 | The only option when no original exists | Hardest to trust; varies by content; foolable | Live, UGC, surveillance, production monitoring |
Table 1. The three setups at a glance. The metric you may use is decided by the reference column — pick the row that matches what you actually have, then read its blind spot.
Common mistake: quoting a full-reference metric where there is no reference. "Our live stream scored VMAF 92" is a category error — VMAF is full-reference, and a live capture has no master to reference. Either you are secretly scoring against a local recording (fine, but say so) or the number is meaningless. Before quoting any score, state the setup: do you have the original, a fingerprint, or nothing? Then quote a metric from that row.
Common mistake: an unaligned full-reference score. A spatial shift, a few frames of temporal offset, or a brightness change will make PSNR, SSIM, or VMAF report a quality cliff that no viewer would perceive. If a full-reference score looks shockingly bad, suspect alignment before you suspect the encoder. Calibrate first, score second.
How this shows up in your tools
For full-reference work, the tool you will reach for most is FFmpeg, whose libvmaf, ssim, and psnr filters all take two inputs — the impaired video and the reference — because they are full-reference by construction. A minimal current invocation looks like this:
# Full-reference: both inputs required (distorted first, reference second).
# FFmpeg with libvmaf enabled; computes VMAF, and also SSIM and PSNR in one pass.
ffmpeg -i distorted.mp4 -i reference.mp4 \
-lavfi "[0:v][1:v]libvmaf=feature=name=psnr|name=float_ssim:log_fmt=json:log_path=out.json" \
-f null -
Notice the shape of the command: it will not run without both files. That is the full-reference contract made concrete. The deep treatment of this tooling — model selection, pooling, and reading the JSON — lives in measuring quality with FFmpeg and libvmaf; the encoder-side quick version is in Video Encoding's FFmpeg cheat sheet. For the no-reference case there is no second input to give: you run a blind metric on the single impaired file, which is the toolbox covered in open-source no-reference metric tools.
To make the choice fast at your desk, we built a one-page reference-setup cheat sheet: the decision question, the three rows, the metrics that belong to each, and the two pitfalls above. Download the reference-setup cheat sheet (PDF) and keep it next to your encoder.
Where Fora Soft fits in
Fora Soft has built video software since 2005 — streaming, WebRTC conferencing, OTT, e-learning, telemedicine, and surveillance — and these products span both ends of this taxonomy. When we compare encoders or set a per-title quality target, we have the master and use full-reference VMAF, aligned and pooled deliberately. When we monitor a live conferencing or surveillance feed, there is no master, so we measure with no-reference signals and treat them as trend indicators, not absolute truth. The discipline we keep is the one this article argues for: name the reference setup first, then choose the metric — and never let a full-reference number masquerade as a verdict on content that had no reference. Our benchmark methodology documents exactly which setup produced each number.
What to read next
- PSNR explained: the metric everyone starts with
- No-reference quality for live and UGC
- What a metric can and cannot tell you
Call to action
- Talk to a video engineer — book a 30-minute scoping call to talk through your full reference reduced reference no reference plan.
- See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
- Download the Reference-setup cheat sheet — One-page decision aid: the question that sorts every metric, the FR/RR/NR rows with their example metrics and tools, and the two common pitfalls (no-reference where there is no master; unaligned full-reference scores).
References
- Recommendation ITU-T P.1401 (01/2020), Methods, metrics and procedures for statistical evaluation, qualification and comparison of objective quality prediction models. International Telecommunication Union. Tier 1. Defines PCC, SROCC, and RMSE and the procedure for validating objective metrics against subjective scores. https://www.itu.int/rec/T-REC-P.1401
- Recommendation ITU-T J.247 (08/2008), Objective perceptual multimedia video quality measurement in the presence of a full reference. International Telecommunication Union. Tier 1. Standardized full-reference multimedia video quality models. https://www.itu.int/rec/T-REC-J.247
- Recommendation ITU-T J.246 (08/2008), Perceptual visual quality measurement techniques for multimedia services over digital cable television networks in the presence of a reduced bandwidth reference. International Telecommunication Union. Tier 1. Defines reduced-reference measurement using low-bandwidth side information. https://www.itu.int/rec/T-REC-J.246
- Recommendation ITU-T J.249 (01/2010), Perceptual video quality measurement techniques for digital cable television in the presence of a reduced reference. International Telecommunication Union. Tier 1. Standardized the NTIA Fast Low Bandwidth reduced-reference model. https://www.itu.int/rec/T-REC-J.249
- Recommendation ITU-T J.144 (03/2004), Objective perceptual video quality measurement techniques for digital cable television in the presence of a full reference, and ITU-R BT.1683 (2004). International Telecommunication Union. Tier 1. Standardized the NTIA General Model (VQM) as a full-reference model. https://www.itu.int/rec/T-REC-J.144
- Recommendation ITU-T P.1204 series (01/2020): P.1204.3 (bitstream, no-reference), P.1204.4 (pixel-based, full-/reduced-reference), P.1204.5 (hybrid, no-reference). International Telecommunication Union. Tier 1. Multi-model standard for video quality assessment up to 4K. https://www.itu.int/rec/T-REC-P.1204
- Recommendation ITU-T P.1203 (10/2017), Parametric bitstream-based quality assessment of progressive download and adaptive audiovisual streaming services over reliable transport. International Telecommunication Union. Tier 1. Bitstream/parametric (no-reference) session-quality model. https://www.itu.int/rec/T-REC-P.1203
- NTIA/ITS, Video Quality Model (VQM) — Frequently Asked Questions, and the NTIA General and Fast Low Bandwidth Models. National Telecommunications and Information Administration, Institute for Telecommunication Sciences. Tier 3 (standards-author tooling). Source for the reduced-reference paradigm, the calibration sensitivity of PSNR, and the in-service deployment limitation of RR metrics. https://its.ntia.gov/research/qoe/video-quality-research/white-papers-obsolete/software/vqm-faq/
- Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, "Image Quality Assessment: From Error Visibility to Structural Similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004. Tier 1 (metric-author). The defining SSIM paper, a full-reference metric. https://ece.uwaterloo.ca/~z70wang/publications/ssim.html
- Netflix, VMAF documentation and repository (full-reference perceptual metric), accessed 2026-06-22. Tier 3 (first-party / metric-author). VMAF is full-reference by construction (model and reference required). https://github.com/Netflix/vmaf
- FFmpeg, libvmaf, ssim, and psnr filter documentation, accessed 2026-06-22. Tier 3 (first-party tooling). The full-reference filters each take two inputs (distorted + reference). https://ffmpeg.org/ffmpeg-filters.html#libvmaf
- R. R. R. Rao et al., "Bitstream-based Model Standard for 4K/UHD: ITU-T P.1204.3 — Model Details, Evaluation, Analysis and Open-Source Implementation," and the ITU-T P.1204 multi-model evaluation. Tier 5 (peer-reviewed). Reported correlations for the bitstream and hybrid models. https://github.com/Telecommunication-Telemedia-Assessment/bitstream_based_models


