Learning course · Updated June 2026

Video quality, measured: PSNR, SSIM, VMAF & benchmarks

How delivered video quality is actually measured — the discipline one step downstream of encoding and streaming. PSNR, SSIM, and VMAF from first principles, subjective testing that survives scrutiny, a labelled artifact gallery, production QC gates, streaming QoE, and Fora Soft's own benchmarks on real content. A vendor-neutral, measurement-honest course from Fora Soft engineers.

Every metric has a blind spot, and we name it. Every claim is tied to a named source and year — the SSIM paper (Wang et al., 2004), the Netflix VMAF docs, ITU-T P.910 / P.808 / P.1204, ITU-R BT.500, and Bjøntegaard's BD-rate. A single number is a summary, not the truth — this course teaches you to read it honestly.

8 chapters 64 articles 120+ glossary terms ~24 hrs total reading

Choose a learning path Browse all chapters Open glossary →

Outcomes

What you'll be able to ship.

Eight blocks that take you from "the picture looks fine to me" to a defensible quality number. By the end, you can choose the right metric, run a subjective test that holds up, diagnose any artifact, gate a pipeline on quality, and read a benchmark without fooling yourself.

Choose the right metric for the job

PSNR, SSIM, MS-SSIM, and VMAF from first principles — what each measures, where each lies, and how to read a score without fooling yourself. Plus VMAF-NEG when the score can be gamed.

Run a subjective test that holds up

MOS and DMOS, the ACR/DCR/PC methodologies and the ITU recommendations (P.910, BT.500, P.808), test design, the statistics, and the failure gallery — so the result survives scrutiny.

Diagnose any artifact on sight

A labelled field guide — blocking, banding, ringing, blur, judder, color and streaming artifacts — and the capstone skill: tracing an artifact back to the pipeline stage that caused it.

Gate a pipeline on quality

Turn measurement into an automated QC gate — quality targets and budgets, per-title and the convex hull, CI/CD gates, regression testing against golden references, and monitoring at scale.

Measure the viewer's experience

Streaming QoE done right — rebuffering ratio, startup time, the switching penalty, player-side metrics, and how picture metrics and delivery metrics combine into one view of experience.

Benchmark and tool up

Read and reproduce a codec benchmark with BD-rate on real content, then wire measurement into your workflow with FFmpeg and libvmaf — commands, a reusable script, and the pitfalls.

Pick a path

Three routes through video quality

The same 57 articles, ordered for what you actually need to do this quarter.

Path A · 8 hrs

Quality from first principles

Why a number beats an opinion, and the metrics that produce it. QoE vs QoS, the reference taxonomy, then PSNR, SSIM, and VMAF explained in full — math once, worked example, blind spots named.

What video quality measurement is, and why it's hard

Subjective vs objective; full / no-reference

PSNR explained — the baseline metric

SSIM explained — structure, not just error

VMAF explained — and where it lies

20 articlesStart path →

Path B · 9.5 hrs

Test, diagnose, and gate

The practitioner's middle. Run a human-rating test that survives scrutiny, identify any artifact on sight and trace it to its cause, then turn quality into an automated CI gate.

Subjective testing — MOS, ACR/DCR, ITU methods

The artifact gallery — banding, blocking, ringing

Tracing an artifact to its pipeline cause

Automated quality gates in CI/CD

Per-title and the convex hull, quality-first

25 articlesStart path →

Path C · 7 hrs

Measure delivery, benchmark, tool up

The operator's edge. Measure the viewer's experience, read and reproduce a codec benchmark with BD-rate, and wire the whole thing up with FFmpeg and libvmaf.

Streaming QoE — rebuffering, startup, switching

No-reference quality for live and UGC

Our benchmark methodology

Codec comparison on real content + BD-rate

Measuring with FFmpeg + libvmaf

19 articlesStart path →

Syllabus

The full course in eight chapters

Every chapter is self-contained. Read in order, or jump straight to the block you need — from why we measure to the tools that do it.

Why Measure Video Quality

Why a number beats an opinion — what VQM is and why it's harder than it looks, where quality is lost in a pipeline, subjective vs objective, QoE vs QoS, the full/reduced/no-reference taxonomy, and the border with Video Encoding.

Beginner8 articles · ~3 hrs

Read→

Objective Metrics

The keyword heart — each metric from first principles: PSNR, SSIM, MS-SSIM, VMAF (+ models, VMAF-NEG, confidence intervals), P.1204/AVQT, validation against human scores, pooling, the blind-spot catalogue, choosing a metric, and reading a report.

beginner12 articles · ~5 hrs

Read→

Subjective Testing

The ground truth — MOS/DMOS and the scales, ACR/DCR/PC and the ITU recommendations (P.910, BT.500), test design and execution, the statistics, crowdsourced testing (P.808), and the failure gallery.

intermediate8 articles · ~3 hrs

Read→

The Artifact Gallery

The visual reference (and the link magnet) — a labelled field guide to blocking, banding, ringing, blur, judder, color artifacts, and streaming-specific artifacts, plus tracing an artifact back to its pipeline cause.

intermediate9 articles · ~3.5 hrs

Read→

Production QC

Quality measurement as an automated gate — the QC reference architecture, per-title/per-shot from the quality-target angle, the convex hull, quality budgets, CI/CD gates, regression testing, monitoring at scale, and the stakeholder report.

INTERMEDIATE8 articles · ~3 hrs

Read→

Streaming QoE

Measuring the viewer experience — rebuffering ratio, startup time / TTFF, the bitrate-switching trade-off, player-side metrics, connecting picture metrics to perceived QoE, and no-reference quality for live/UGC.

INTERMEDIATE7 articles · ~2.5 hrs

Read→

Fora Soft Benchmarks

Fora Soft's own measurements on real cases: the methodology, codec comparison (H.264/HEVC/AV1, BD-rate), encoder comparison, per-content-type results.

Advanced5 articles · ~2 hrs

Read→

Tools

The hands-on block — the tooling landscape, FFmpeg + libvmaf (the workhorse, with commands and a reusable script), VQMT and dedicated tools, commercial suites, open-source no-reference tools, CI integration, and visualization.

Advanced7 articles · ~2.5 hrs

Read→

Quality, measured on real content

Fora Soft validates every streaming and encoding decision against quality numbers — PSNR, SSIM, VMAF, and our own benchmarks since 2005.

Book a 30-min call Get a tailored quote for your use case

Featured

Where to start.

Hand-picked deep dives — the three metric anchors everyone searches, plus the benchmark work that makes this section worth citing.

Reference

The vocabulary of video quality

120+ terms with crisp, cited definitions, aliases, and links to the deep dives. From PSNR, SSIM, and VMAF to MOS, BD-rate, banding, and the convex hull — the full A–Z of video-quality measurement is one click away.

PSNR

Peak Signal-to-Noise Ratio. The baseline full-reference metric: the ratio of maximum signal to the error versus a reference, in decibels. Simple and fast, but a weak match for human perception — the metric every comparison starts from.

SSIM

Structural Similarity Index (Wang et al., 2004). Compares luminance, contrast, and structure in a sliding window instead of pixel error, so it tracks perceived quality better than PSNR. Scored 0–1.

VMAF

Video Multi-Method Assessment Fusion (Netflix). A machine-learned metric that fuses several quality features and is trained against human scores; the de-facto modern metric, with models for phone, 4K, and the no-enhancement-gain VMAF-NEG variant.

MOS

Mean Opinion Score. The average of human quality ratings on a fixed scale (usually 1–5) — the ground truth every objective metric is validated against. DMOS is its differential form.

BD-rate

Bjøntegaard Delta rate. The standard way to express how much bitrate one encoder or codec saves another at equal quality — a single percentage that summarizes two rate-quality curves.

Banding

The visible stair-stepping in what should be a smooth gradient (sky, shadow), caused by too few code values — a classic compression artifact that PSNR and SSIM often miss but viewers always see.

Browse all 120+ terms

Written and maintained by

The author.

Nikolay Sapunov

CEO at Fora Soft

Leads a software studio specialising in video-centric products — streaming and OTT platforms, WebRTC apps, encoding pipelines, computer vision, and AI-driven video tools. Writes this course so video engineers can reason honestly about quality: what PSNR, SSIM, and VMAF really measure, how to run a subjective test that holds up, how to gate a pipeline on quality, and how to read a benchmark without fooling themselves.

Full author page →

LinkedIn →

GitHub →

FAQ

Frequently asked questions.

What is video quality measurement?

Video quality measurement assigns a defensible number to how good a video looks, instead of relying on opinion. It splits into objective metrics — algorithms like PSNR, SSIM, and VMAF that compare a processed video to a reference — and subjective testing, where humans rate quality to produce a Mean Opinion Score. The objective metrics are validated against the human scores. It sits downstream of encoding and streaming, and it is how teams prove a change helped, not hurt.

What is PSNR?

PSNR (Peak Signal-to-Noise Ratio) is the baseline full-reference metric. It expresses, in decibels, the ratio between the maximum signal and the mean-squared error versus a reference — higher is better, and values above roughly 40 dB usually look good. It is simple, fast, and universally supported, so every comparison starts with it, but it correlates only loosely with perception and misses artifacts like banding — rarely the metric you finish with.

What is the difference between PSNR, SSIM, and VMAF?

All three are full-reference picture-quality metrics of increasing sophistication. PSNR measures raw pixel error in decibels — fast but a weak match for the eye. SSIM compares luminance, contrast, and structure in a sliding window, tracking perception better, scored 0 to 1. VMAF is machine-learned, fusing several features and trained against human ratings, so it usually correlates best. Use PSNR for a sanity check, SSIM for structure, VMAF for a perception-aligned score — and know each blind spot.

What is VMAF-NEG?

VMAF-NEG (No Enhancement Gain) is a VMAF variant designed to resist gaming. Standard VMAF can be inflated by sharpening or contrast tricks that raise the score without improving fidelity — output that looks better than the reference rather than closer to it. VMAF-NEG removes that gain, reporting how faithfully the output matches the source. Use it when comparing encoders or settings and you need a score a post-filter cannot juice.

What is BD-rate?

BD-rate (Bjontegaard Delta rate) is the standard way to express how much bitrate one codec or encoder saves another at equal quality. Instead of comparing single points, it integrates the area between two rate-quality curves into one percentage — for example, AV1 giving 30% BD-rate savings over H.264 means equal quality at 30% less bitrate. It can be computed against PSNR, SSIM, or VMAF, and it is the headline number in every serious codec comparison.

How do you measure video quality with FFmpeg?

FFmpeg computes PSNR and SSIM directly with its libavfilter filters, and VMAF through the libvmaf filter (built with the VMAF library). You pass the processed video and the reference, make sure they are aligned and at the same resolution and frame rate, choose the right VMAF model, and read the pooled score from the log. The common pitfalls are mismatched scaling, frame misalignment, and the wrong model — get those right and FFmpeg is the everyday workhorse.

Need to measure video quality, not just understand it?

Fora Soft has built real-time video, audio, and AI products since 2005 — WebRTC, LiveKit, generative pipelines, and AI agents at scale. Tell us what you’re building and we’ll send a real engineer your way.

Book a 30-min call Try the cost calculator The benchmark methodology toolkit →

Video quality, measured: PSNR, SSIM, VMAF & benchmarks

What you'll be able to ship.

Choose the right metric for the job

Run a subjective test that holds up

Diagnose any artifact on sight

Gate a pipeline on quality

Measure the viewer's experience

Benchmark and tool up

Three routes through video quality

Quality from first principles

Test, diagnose, and gate

Measure delivery, benchmark, tool up

The full course in eight chapters

Why Measure Video Quality

Objective Metrics

Subjective Testing

The Artifact Gallery

Production QC

Streaming QoE

Fora Soft Benchmarks

Tools

Quality, measured on real content

Where to start.

Streaming QoE: The Metrics That Predict Whether a Viewer Stays

Streaming-Specific Artifacts: Switching, Freezing, and Tiling

Designing a Subjective Test That Survives Scrutiny

How Objective Metrics Are Validated Against Human Scores

Choosing the Right Metric for the Job

Where Quality Is Actually Lost in a Video Pipeline

The vocabulary of video quality

The author.

Nikolay Sapunov

Frequently asked questions.

Need to measure video quality, not just understand it?