Video Blur and Detail Loss: Causes & Metrics

Why this matters

If your encodes look soft, you are losing the detail that makes a picture feel expensive — the texture of skin and fabric, the legibility of small text, the crispness of a logo edge — and you are often losing it for a reason you can fix. Blur is the quiet artifact: it rarely looks "broken" the way blocking or banding does, so it ships unnoticed, and then the picture just looks cheap without anyone being able to say why. This article is for a video engineer, encoding lead, or QA engineer who can see that a render is soft but wants to know exactly where the detail went, whether the softness is a fault or a choice, and which metric will (and will not) catch it. Get this right and you stop over-compressing the frames that needed the bits and stop "fixing" softness the director asked for.

What blur and detail loss look like

Start with the look, because two related things travel under this heading and it helps to keep them straight.

Blur is what happens to edges. A sharp edge is a fast jump from one brightness to another — black text on a white page goes from dark to light in a pixel or two. Blur stretches that jump out: instead of snapping across one or two pixels, the transition ramps gently across many, so the edge looks soft and the boundary becomes hard to place. The wider the ramp, the blurrier the edge.

Detail loss is what happens to texture. Fine, busy patterns — the weave of a jacket, blades of grass, film grain, the pores in skin — live in the picture's high spatial frequencies, the rapid pixel-to-pixel variations that carry small detail. When those are flattened, the texture smooths into a waxy, plasticky surface that looks clean in a still and lifeless in motion. Edges spreading and texture flattening are two faces of the same loss: the high-frequency content is gone.

It helps to place this beside its neighbours in the artifact field guide. Blocking adds a visible grid; ringing adds halos beside edges. Blur is the opposite kind of damage — it does not add a pattern, it removes detail. That is why it is the easiest artifact to miss: nothing is wrong with the picture except that something is no longer there.

Sharp edges step in one or two pixels and carry high-frequency detail; blur spreads the step and removes that detail Figure 1. What blur is. A sharp edge crosses in one or two pixels and the frame is rich in high spatial frequencies (left). Blur stretches the edge across many pixels and attenuates the high frequencies (right) — the fine texture that lived there is gone.

Where the detail goes

Three different stages in a video pipeline remove high-frequency detail, and they often stack. Knowing which one softened your picture tells you which knob to turn.

Quantization at the encoder. A codec rewrites each block of the frame with a transform — the Discrete Cosine Transform (DCT) or a similar one — that describes the block as a stack of wave patterns, from a flat base tone up to fine, rapid ripples. Fine detail and sharp edges live in the high-frequency waves. Quantization then divides each wave's strength by a step size and rounds, and the small high-frequency coefficients round straight to zero. This is the same mechanism that produces blocking and ringing, but the visible result here is different: with the high frequencies gone, the decoder rebuilds the block from only its coarse, low-frequency parts, and a block built from coarse parts is a soft block. Push the quantizer harder to save bits and you soften the picture. The cause-side detail of how the codec chooses that step size lives in the Video Encoding section; here we only need the result — high frequencies discarded, detail lost.

Downscaling in the resolution ladder. Most streaming sends several resolutions and lets the player pick one. When a 1080p master is encoded at a lower rung — say 540p — the picture is shrunk, and shrinking discards samples. Walk the arithmetic. A 1920×1080 frame holds 1920 × 1080 = 2,073,600 pixels. Scale it to 960×540 and it holds 960 × 540 = 518,400 pixels — exactly one quarter, so three out of every four samples are gone. The limit on the finest detail a picture can represent (the Nyquist limit) halves in each direction, so anything finer than a roughly four-original-pixel cycle can no longer be stored. When the player scales that 540p back up to fill a 1080p screen, the missing detail does not come back — upscaling can only interpolate, not reinvent. A perfectly clean 540p encode still looks soft on a big screen, and no encoder setting changes that, because the detail was discarded before the encoder ever saw it. Choosing the right rung is the job of the convex hull.

Denoising and pre-filtering. Before encoding, pipelines often run a denoise or low-pass filter to remove sensor noise and grain — partly to clean the image, partly because noise is expensive to compress. A denoiser cannot perfectly separate "noise" from "fine detail," because to the filter they look the same: both are high-frequency. Turn it up too far and it strips the texture along with the grain, and the result is the waxy, over-smoothed look. This softening happens before the codec, so it is invisible to any metric that compares the encode to the filtered source — the reference itself is already soft.

A pipeline: denoise, downscale, and quantization each remove high-frequency detail before the picture reaches the screen Figure 2. Where the detail goes. Three pipeline stages remove high-frequency detail and often stack: a denoise/low-pass pre-filter, downscaling to a lower rung, and quantization of high-frequency coefficients at the encoder. Upscaling at the display cannot bring back what earlier stages discarded.

The hard problem: intentional softness vs artifact

Here is what makes blur different from every other artifact in this gallery, and why it deserves its own article. A blurry region is not always a fault. Filmmakers blur on purpose, constantly, and that softness is the look they want.

A shallow depth of field throws the background out of focus to make the subject pop — the creamy, blurred backdrop behind a portrait is called bokeh, and it is a sign of an expensive lens, not a cheap encode. Motion blur is the streak a fast-moving object leaves within a single exposure; remove it and motion looks stuttery and unnatural. A soft-focus look is a deliberate aesthetic, used for mood or to flatter a face. In all of these, the picture is genuinely, intentionally soft, and "fixing" it would wreck the shot.

A blind, no-reference metric — one that judges a single frame with no pristine original to compare against — cannot tell any of this apart from compression blur. To the math, a blurry region is a blurry region: low high-frequency energy, wide edges, whatever the cause. This is the central honesty of measuring blur: the number tells you how soft, never whether the softness belongs there. (For why no-reference is the only option on live and user-generated content, see full-, reduced-, and no-reference metrics.)

So the practical workflow is not "measure blur, fail the soft frames." It is: measure sharpness across the clip, look at where the softness sits and how it behaves, and only then judge. A few questions separate intent from artifact most of the time. Is the whole frame soft, or only part of it? Localized softness that hugs a subject and leaves the background sharp (or vice versa) is almost always depth of field, not compression — quantization blur is broad and indifferent to what is in the shot. Does the softness appear or worsen only at low bitrate, and clear up when you raise it? That is the encoder. Does a sharper version exist at a higher rung or in the master? Then the soft rung lost detail to downscaling. Is the softness constant and tied to the camera work? Likely intentional.

A decision tree separating intentional softness like depth of field from compression and downscaling blur Figure 3. Intentional or artifact? Localized softness that tracks a subject is usually depth of field or motion blur; softness that covers the frame, worsens at low bitrate, or disappears at a higher rung is a compression or downscaling artifact. The metric measures how soft; this judgment is yours.

How the metrics react, and where they lie

Now the good news, and it is genuine: of all the artifacts in this gallery, blur is the one the full-reference metrics handle best.

The reason is that blur is real, broadband pixel error. When detail is removed, a great many pixels across the frame differ from the original — not by the near-invisible single code value of banding, but by substantial amounts spread over the whole picture. So the pixel-error metric reacts: the number that compares a compressed frame to the original pixel by pixel, called PSNR (Peak Signal-to-Noise Ratio, measured in decibels), drops clearly when a frame is blurred. The metric that compares structural patterns, SSIM (Structural Similarity, on a 0–1 scale), drops too, because its contrast and structure terms both fall when edges soften and texture flattens.

And VMAF — Netflix's fused perceptual metric (Video Multimethod Assessment Fusion, on a 0–100 scale) — does best of all, because one of its three core ingredients was built for exactly this. VMAF fuses a detail-fidelity feature called VIF (Visual Information Fidelity) with a feature whose name says it all: ADM, the Additive Distortion Metric, which was previously named the Detail Loss Metric (DLM). As Netflix's own documentation describes it, the Detail Loss Metric measures "the loss of details which affects the content visibility." When a video engineer worries that an encode is soft, VMAF is already watching for it with a feature named after the problem. (For how VMAF fuses its features, see VMAF explained.)

So blur is not where the full-reference metrics are blind. But they still lie about it in three specific ways, and all three are about reference and context rather than whether the error registers.

The first and biggest: a metric cannot see a low-resolution source. Encode that 540p rung well and measure it against its own 540p reference, and PSNR and VMAF read high — the encode is faithful to what it was given. But what it was given was already soft. On a 1080p screen the viewer sees a soft picture while the score says "excellent," because the score never compared against the 1080p detail that was discarded upstream. This is the classic trap of judging a rung against its own downscaled reference instead of the original master.

The second: no-reference sharpness is blind to intent, as the previous section laid out. A blind metric flags bokeh and a soft-focus close-up as "blurred" exactly as it flags a starved encode.

The third is the inverse trap, and it connects straight back to ringing: sharpening can fake detail and fool the eye. Running a sharpening filter exaggerates the contrast right at each edge, which raises acutance — the perceived crispness of an edge — and can make a frame look sharper and even nudge some no-reference sharpness scores up. But sharpening invents no real detail; it just adds an overshoot beside the edge, which is the seed of ringing. Chase apparent sharpness too far and you trade soft-but-clean for crisp-but-ringing. Blur and ringing sit at opposite ends of the same dial.

Metric	What it measures	Reference needed	Where it lies on blur and detail loss
PSNR	Mean pixel error (dB)	Full-reference	Reacts well — blur is broadband error; but high vs a downscaled reference even when the picture is soft on screen
SSIM	Structural similarity (0–1)	Full-reference	Contrast and structure terms drop with softening; still scored against whatever reference you give it
VMAF (+ DLM/VIF)	Fused perceptual score (0–100)	Full-reference	Best of the group — the Detail Loss Metric feature is built for this; still blind to a low-res source measured against itself
Laplacian variance	Edge/detail energy	No-reference	Cheap sharpness proxy; very content- and noise-dependent — only valid comparing renditions of the same shot
Edge width / CPBD	How far edges spread	No-reference	Built for blur; works on live/UGC, but cannot tell intentional softness (bokeh, soft focus) from an artifact

Table 1. How six ways of measuring treat blur and detail loss. The full-reference metrics see blur well — it is real error — and VMAF carries a feature named for it. The honest gaps are a low-resolution source measured against its own soft reference, and no-reference metrics being blind to whether the softness was intended.

How metrics treat blur: full-reference metrics drop with softening while no-reference is blind to intent Figure 4. How metrics treat blur. Full-reference PSNR, SSIM, and VMAF all drop when a frame softens — VMAF most reliably, because its Detail Loss Metric feature is built for it. Where they lie: a low-resolution rung scored against its own soft reference reads "excellent," and a no-reference metric flags intentional bokeh as blur.

Common mistake: scoring a low-resolution rung against its own downscaled reference. The expensive error here is encoding the 540p or 720p rung, measuring it against a reference at that same low resolution, seeing PSNR or VMAF read "excellent," and shipping it as if the picture were sharp. It is sharp relative to a soft target. On a large screen the viewer sees the softness the score never measured. Measure perceived quality against the high-resolution master (libvmaf will upscale the rung to the reference for you), so the detail lost to downscaling actually counts against the score. A second pitfall: do not "fix" softness with a sharpening filter and call it a quality win — you have raised acutance and seeded ringing, not recovered detail.

How to measure sharpness without a reference

On live streams, video calls, and user-generated clips there is no pristine original, so the full-reference metrics above do not apply and you need a no-reference sharpness measure. A handful of classic ones, in rising order of how well they match the eye:

The simplest is the variance of the Laplacian — run an operator that responds to rapid changes, then take its variance; a crisp frame scores high, a soft one low (Pertuz et al., 2013). It is one line of code and genuinely useful for autofocus or for flagging the softest frame in a batch, but it is so sensitive to content and noise that an absolute value means little — only the comparison between two renditions of the same shot is trustworthy.

The edge-width family measures the artifact directly: find strong edges and measure how many pixels each transition takes to climb, because blur is exactly what widens that climb (Marziliano et al., 2002). The Just Noticeable Blur (JNB) metric refined this with a perceptual threshold — below a certain edge width, which depends on local contrast, the eye does not register the blur at all (Ferzli and Karam, 2009) — and CPBD (Cumulative Probability of Blur Detection) pooled per-edge blur probabilities into a single score that tracks human judgments well on Gaussian blur and compressed images (Narvekar and Karam, 2011).

A particularly elegant approach is the Crete-Roffet perceptual blur metric (2007): since people struggle to see the difference between a blurred image and the same image blurred again, you re-blur the frame and measure how much its local variation drops. A sharp frame loses a lot of variation to the re-blur; an already-soft frame barely changes — so the size of the drop is an inverse measure of sharpness, needing no reference at all.

On the optics side, the camera world measures sharpness as MTF50 and acutance from a slanted-edge test chart (the ISO 12233 method): MTF50 is the spatial frequency at which contrast falls to half, and acutance weights the response by the eye's contrast sensitivity, which is why it tracks perceived sharpness better than MTF50 alone.

The companion tool below puts three of these in one place. Download the no-reference blur & detail-loss detector — a small Python script that scores a single frame with the Crete-Roffet perceptual blur measure, the mean edge width, and the Laplacian variance, and, given two renditions of the same shot, reports which one kept more high-frequency detail. In its self-test it reads a crisp synthetic frame at perceptual-blur 0.24 with ~2.3-pixel edges, and the same frame after low-pass filtering at 0.93 with ~9.2-pixel edges — roughly four times wider — which is the edge-spreading of Figure 1 turned into a number. It is a triage tool, not a calibrated metric, and it cannot tell intentional softness from an artifact — so confirm every flag by eye.

Where Fora Soft fits in

Fora Soft has built video software since 2005 — streaming, WebRTC conferencing, e-learning, OTT, telemedicine, and surveillance — and detail loss is the artifact whose stakes change most by vertical. In a telemedicine stream, softness can erase the texture of a skin lesion a clinician needs to see; in surveillance, it blurs the license plate or face that is the whole point of the recording; in e-learning, it turns small slide text illegible. We treat it as a measurement problem with a clear method: never judge a downscaled rung against its own soft reference — measure it against the master so lost detail counts — and on reference-free live and UGC paths, score edge width and high-frequency energy to flag soft frames, then confirm by eye whether the softness was intended before acting. The fixes follow the cause: more bits where detail lives, a denoiser tuned to keep texture, and the right resolution rung for the screen.

Call to action

Talk to a video engineer — book a 30-minute scoping call to talk through your video blur plan.
See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.

References

S. Li, F. Zhang, L. Ma, and K. Ngan, "Image Quality Assessment by Separately Evaluating Detail Losses and Additive Impairments," IEEE Transactions on Multimedia, vol. 13, no. 5, pp. 935–949, Oct. 2011. Tier 1 (metric-author defining work). Defines the Detail Loss Metric (DLM) — separately measuring the loss of detail that affects content visibility from additive impairments — which VMAF adopts (renamed ADM) as a core feature. Basis for the metric-behavior section and Table 1. https://ieeexplore.ieee.org/document/5765502
VMAF Features (Netflix/vmaf, resource/doc/features.md), Netflix, accessed 2026-06-24. Tier 3 (first-party tooling / metric owner's documentation). States that VMAF's three core features are VIF, Motion2, and ADM, that "ADM was previously named Detail Loss Metric (DLM)," and that DLM measures "the loss of details which affects the content visibility." Basis for the claim that VMAF carries a detail-loss feature. https://github.com/Netflix/vmaf/blob/master/resource/doc/features.md
H. R. Sheikh and A. C. Bovik, "Image Information and Visual Quality," IEEE Transactions on Image Processing, vol. 15, no. 2, pp. 430–444, 2006. Tier 1 (metric-author defining work). Defines Visual Information Fidelity (VIF), the detail-fidelity feature VMAF fuses alongside the Detail Loss Metric. Basis for the VIF reference in the metric-behavior section. https://ieeexplore.ieee.org/document/1576816
F. Crété-Roffet, T. Dolmiere, P. Ladret, and M. Nicolas, "The Blur Effect: Perception and Estimation with a New No-Reference Perceptual Blur Metric," SPIE Electronic Imaging, Human Vision and Electronic Imaging XII, 2007. Tier 1 (metric-author defining work). Defines the re-blur perceptual blur metric: humans cannot perceive the difference between a blurred image and the same image re-blurred, so the drop in local variation after re-blurring is an inverse measure of sharpness. Basis for the Crete-Roffet description and the companion detector. https://hal.science/hal-00232709/document
P. Marziliano, F. Dufaux, S. Winkler, and T. Ebrahimi, "A No-Reference Perceptual Blur Metric," Proc. IEEE International Conference on Image Processing (ICIP), vol. 3, pp. 57–60, 2002. Tier 1 (metric-author defining work). Defines the edge-width blur metric: blur spreads edges, so average edge width (distance between the local minimum and maximum around each edge) measures blur. Basis for the edge-width description and the companion detector. https://infoscience.epfl.ch/record/87069
R. Ferzli and L. J. Karam, "A No-Reference Objective Image Sharpness Metric Based on the Notion of Just Noticeable Blur (JNB)," IEEE Transactions on Image Processing, vol. 18, no. 4, pp. 717–728, Apr. 2009. Tier 1 (metric-author defining work). Introduces the just-noticeable-blur threshold — a contrast-dependent edge width below which blur is imperceptible. Basis for the JNB paragraph. https://ieeexplore.ieee.org/document/4799375
N. D. Narvekar and L. J. Karam, "A No-Reference Image Blur Metric Based on the Cumulative Probability of Blur Detection (CPBD)," IEEE Transactions on Image Processing, vol. 20, no. 9, pp. 2678–2683, Sept. 2011. Tier 1 (metric-author defining work). Pools per-edge blur-detection probabilities into the CPBD score, which tracks human judgments on Gaussian-blurred and JPEG2000-compressed images. Basis for the CPBD paragraph. https://ieeexplore.ieee.org/document/5739529
S. Pertuz, D. Puig, and M. A. Garcia, "Analysis of Focus Measure Operators for Shape-from-Focus," Pattern Recognition, vol. 46, no. 5, pp. 1415–1432, 2013. Tier 5 (peer-reviewed). Surveys focus-measure operators including the variance of the Laplacian; documents that Laplacian-based operators perform best at low noise but are the most noise-sensitive. Basis for the Laplacian-variance caveat. https://www.sciencedirect.com/science/article/abs/pii/S0031320312004736
Sharpness — MTF (Modulation Transfer Function) and SFR, Imatest documentation, accessed 2026-06-24. Tier 3 (first-party tooling) / references ISO 12233. Defines MTF50 (the spatial frequency at 50% response) and acutance (the SFR weighted by the human contrast sensitivity function), and the ISO 12233 slanted-edge method. Basis for the MTF50/acutance paragraph. https://www.imatest.com/imaging/sharpness/
M. Yuen and H. R. Wu, "A Survey of Hybrid MC/DPCM/DCT Video Coding Distortions," Signal Processing, vol. 70, no. 3, pp. 247–278, 1998. Tier 5 (foundational peer-reviewed). The classic taxonomy of block-DCT coding artifacts; attributes blurring and detail loss to coarse quantization of high-frequency coefficients. Basis for the quantization-cause framing. https://www.sciencedirect.com/science/article/abs/pii/S0165168498001285
"Acutance," Wikipedia, accessed 2026-06-24. Tier 6 (orientation). Summarizes acutance as perceived edge sharpness and the way edge contrast enhancement (sharpening) raises it without adding real detail. Orientation only; the optics facts are cited to Imatest/ISO 12233 above. https://en.wikipedia.org/wiki/Acutance

Why this matters

What blur and detail loss look like

Where the detail goes

The hard problem: intentional softness vs artifact

How the metrics react, and where they lie

How to measure sharpness without a reference

Where Fora Soft fits in

What to read next

Call to action

References

Related glossary terms

Video Blur and Detail Loss: Causes & Metrics

Why this matters

What blur and detail loss look like

Where the detail goes

The hard problem: intentional softness vs artifact

How the metrics react, and where they lie

How to measure sharpness without a reference

Where Fora Soft fits in

What to read next

Call to action

References

Related glossary terms

Blur

VMAF

Ringing

Quantization

PSNR

Fidelity

Blocking

SSIM