Learning course · Updated June 2026

Video quality, measured: PSNR, SSIM, VMAF & benchmarks

How delivered video quality is actually measured — the discipline one step downstream of encoding and streaming. PSNR, SSIM, and VMAF from first principles, subjective testing that survives scrutiny, a labelled artifact gallery, production QC gates, streaming QoE, and Fora Soft's own benchmarks on real content. A vendor-neutral, measurement-honest course from Fora Soft engineers.

Every metric has a blind spot, and we name it. Every claim is tied to a named source and year — the SSIM paper (Wang et al., 2004), the Netflix VMAF docs, ITU-T P.910 / P.808 / P.1204, ITU-R BT.500, and Bjøntegaard's BD-rate. A single number is a summary, not the truth — this course teaches you to read it honestly.

8 chapters       64 articles        120+ glossary terms       ~24 hrs total reading

Outcomes

What you'll be able to ship.

Eight blocks that take you from "the picture looks fine to me" to a defensible quality number. By the end, you can choose the right metric, run a subjective test that holds up, diagnose any artifact, gate a pipeline on quality, and read a benchmark without fooling yourself.

01

Choose the right metric for the job

PSNR, SSIM, MS-SSIM, and VMAF from first principles — what each measures, where each lies, and how to read a score without fooling yourself. Plus VMAF-NEG when the score can be gamed.

02

Run a subjective test that holds up

MOS and DMOS, the ACR/DCR/PC methodologies and the ITU recommendations (P.910, BT.500, P.808), test design, the statistics, and the failure gallery — so the result survives scrutiny.

03

Diagnose any artifact on sight

A labelled field guide — blocking, banding, ringing, blur, judder, color and streaming artifacts — and the capstone skill: tracing an artifact back to the pipeline stage that caused it.

04

Gate a pipeline on quality

Turn measurement into an automated QC gate — quality targets and budgets, per-title and the convex hull, CI/CD gates, regression testing against golden references, and monitoring at scale.

05

Measure the viewer's experience

Streaming QoE done right — rebuffering ratio, startup time, the switching penalty, player-side metrics, and how picture metrics and delivery metrics combine into one view of experience.

06

Benchmark and tool up

Read and reproduce a codec benchmark with BD-rate on real content, then wire measurement into your workflow with FFmpeg and libvmaf — commands, a reusable script, and the pitfalls.

Syllabus

The full course in eight chapters

Every chapter is self-contained. Read in order, or jump straight to the block you need — from why we measure to the tools that do it.

01

Why Measure Video Quality

Why a number beats an opinion — what VQM is and why it's harder than it looks, where quality is lost in a pipeline, subjective vs objective, QoE vs QoS, the full/reduced/no-reference taxonomy, and the border with Video Encoding.

Beginner8 articles · ~3 hrs
Read

02

Objective Metrics

The keyword heart — each metric from first principles: PSNR, SSIM, MS-SSIM, VMAF (+ models, VMAF-NEG, confidence intervals), P.1204/AVQT, validation against human scores, pooling, the blind-spot catalogue, choosing a metric, and reading a report.

beginner12 articles · ~5 hrs
Read

03

Subjective Testing

The ground truth — MOS/DMOS and the scales, ACR/DCR/PC and the ITU recommendations (P.910, BT.500), test design and execution, the statistics, crowdsourced testing (P.808), and the failure gallery.

intermediate8 articles · ~3 hrs
Read

04

The Artifact Gallery

The visual reference (and the link magnet) — a labelled field guide to blocking, banding, ringing, blur, judder, color artifacts, and streaming-specific artifacts, plus tracing an artifact back to its pipeline cause.

intermediate9 articles · ~3.5 hrs
Read

05

Production QC

Quality measurement as an automated gate — the QC reference architecture, per-title/per-shot from the quality-target angle, the convex hull, quality budgets, CI/CD gates, regression testing, monitoring at scale, and the stakeholder report.

INTERMEDIATE8 articles · ~3 hrs
Read

06

Streaming QoE

Measuring the viewer experience — rebuffering ratio, startup time / TTFF, the bitrate-switching trade-off, player-side metrics, connecting picture metrics to perceived QoE, and no-reference quality for live/UGC.

INTERMEDIATE7 articles · ~2.5 hrs
Read

07

Fora Soft Benchmarks

Fora Soft's own measurements on real cases: the methodology, codec comparison (H.264/HEVC/AV1, BD-rate), encoder comparison, per-content-type results.

Advanced5 articles · ~2 hrs
Read

08

Tools

The hands-on block — the tooling landscape, FFmpeg + libvmaf (the workhorse, with commands and a reusable script), VQMT and dedicated tools, commercial suites, open-source no-reference tools, CI integration, and visualization.

Advanced7 articles · ~2.5 hrs
Read

Quality, measured on real content

Fora Soft validates every streaming and encoding decision against quality numbers — PSNR, SSIM, VMAF, and our own benchmarks since 2005.

Reference

The vocabulary of video quality

120+ terms with crisp, cited definitions, aliases, and links to the deep dives. From PSNR, SSIM, and VMAF to MOS, BD-rate, banding, and the convex hull — the full A–Z of video-quality measurement is one click away.

PSNR

Peak Signal-to-Noise Ratio. The baseline full-reference metric: the ratio of maximum signal to the error versus a reference, in decibels. Simple and fast, but a weak match for human perception — the metric every comparison starts from.

SSIM

Structural Similarity Index (Wang et al., 2004). Compares luminance, contrast, and structure in a sliding window instead of pixel error, so it tracks perceived quality better than PSNR. Scored 0–1.

VMAF

Video Multi-Method Assessment Fusion (Netflix). A machine-learned metric that fuses several quality features and is trained against human scores; the de-facto modern metric, with models for phone, 4K, and the no-enhancement-gain VMAF-NEG variant.

MOS

Mean Opinion Score. The average of human quality ratings on a fixed scale (usually 1–5) — the ground truth every objective metric is validated against. DMOS is its differential form.

BD-rate

Bjøntegaard Delta rate. The standard way to express how much bitrate one encoder or codec saves another at equal quality — a single percentage that summarizes two rate-quality curves.

Banding

The visible stair-stepping in what should be a smooth gradient (sky, shadow), caused by too few code values — a classic compression artifact that PSNR and SSIM often miss but viewers always see.

Written and maintained by

The author.

Nikolay Sapunov, CEO at Fora Soft

Nikolay Sapunov

CEO at Fora Soft

Leads a software studio specialising in video-centric products — streaming and OTT platforms, WebRTC apps, encoding pipelines, computer vision, and AI-driven video tools. Writes this course so video engineers can reason honestly about quality: what PSNR, SSIM, and VMAF really measure, how to run a subjective test that holds up, how to gate a pipeline on quality, and how to read a benchmark without fooling themselves.

FAQ

Frequently asked questions.

What is video quality measurement?

Video quality measurement assigns a defensible number to how good a video looks, instead of relying on opinion. It splits into objective metrics — algorithms like PSNR, SSIM, and VMAF that compare a processed video to a reference — and subjective testing, where humans rate quality to produce a Mean Opinion Score. The objective metrics are validated against the human scores. It sits downstream of encoding and streaming, and it is how teams prove a change helped, not hurt.

What is PSNR?

PSNR (Peak Signal-to-Noise Ratio) is the baseline full-reference metric. It expresses, in decibels, the ratio between the maximum signal and the mean-squared error versus a reference — higher is better, and values above roughly 40 dB usually look good. It is simple, fast, and universally supported, so every comparison starts with it, but it correlates only loosely with perception and misses artifacts like banding — rarely the metric you finish with.

What is the difference between PSNR, SSIM, and VMAF?

All three are full-reference picture-quality metrics of increasing sophistication. PSNR measures raw pixel error in decibels — fast but a weak match for the eye. SSIM compares luminance, contrast, and structure in a sliding window, tracking perception better, scored 0 to 1. VMAF is machine-learned, fusing several features and trained against human ratings, so it usually correlates best. Use PSNR for a sanity check, SSIM for structure, VMAF for a perception-aligned score — and know each blind spot.

What is VMAF-NEG?

VMAF-NEG (No Enhancement Gain) is a VMAF variant designed to resist gaming. Standard VMAF can be inflated by sharpening or contrast tricks that raise the score without improving fidelity — output that looks better than the reference rather than closer to it. VMAF-NEG removes that gain, reporting how faithfully the output matches the source. Use it when comparing encoders or settings and you need a score a post-filter cannot juice.

What is BD-rate?

BD-rate (Bjontegaard Delta rate) is the standard way to express how much bitrate one codec or encoder saves another at equal quality. Instead of comparing single points, it integrates the area between two rate-quality curves into one percentage — for example, AV1 giving 30% BD-rate savings over H.264 means equal quality at 30% less bitrate. It can be computed against PSNR, SSIM, or VMAF, and it is the headline number in every serious codec comparison.

How do you measure video quality with FFmpeg?

FFmpeg computes PSNR and SSIM directly with its libavfilter filters, and VMAF through the libvmaf filter (built with the VMAF library). You pass the processed video and the reference, make sure they are aligned and at the same resolution and frame rate, choose the right VMAF model, and read the pooled score from the log. The common pitfalls are mismatched scaling, frame misalignment, and the wrong model — get those right and FFmpeg is the everyday workhorse.

Need to measure video quality, not just understand it?

Fora Soft has built real-time video, audio, and AI products since 2005 — WebRTC, LiveKit, generative pipelines, and AI agents at scale. Tell us what you’re building and we’ll send a real engineer your way.

Specialist software house for video, real-time and AI products. Founded 2005. 50 in-house engineers.

+1 (914) 775-5855
New York · USA
© Fora Soft, 20052026
Describe your project and we will get in touch
Enter your message
Enter your email
Enter your name

By submitting data in this form, you agree with the Personal Data Processing Policy.

Your message has been sent successfully
We will contact you soon
Message not sent. Please try again.