SSIM — Structural Similarity Index — is a video quality metric designed to fix PSNR's biggest weakness: PSNR cares about raw pixel error, but humans care about structure. SSIM compares two videos along three perceptually meaningful axes — luminance (overall brightness), contrast (how light and dark areas differ) and structure (the spatial patterns and textures) — and returns a score from -1 to 1, where 1.0 means identical. In practice, anything above 0.95 looks essentially flawless to viewers; 0.9–0.95 is good; below 0.85 starts to show.
What makes SSIM useful: it correlates much better with how people actually perceive quality than PSNR does. A blurry image gets penalised, a slightly noisy image gets penalised more lightly because viewers tolerate noise better than blur, a posterised image stands out clearly. For straightforward A/B comparisons of encodes — "did the new encoder make things look worse?" — SSIM gives an answer that more closely matches what humans would say.
The catch: SSIM is sensitive to small geometric shifts. A frame shifted by even one pixel can score badly compared to its identical-but-untouched twin, even though a viewer would not notice. SSIM also doesn't capture the more sophisticated quality cues that modern viewers respond to — motion smoothness, scene-cut visibility, banding on gradients. It's a clear step up from PSNR but a step below VMAF. In 2026 the practical recommendation is: use SSIM as your day-to-day quality dial during encode tuning, and use VMAF as the authoritative final check when bandwidth or quality decisions are on the line.

