Why this matters
If your content has hard edges — text, captions, logos, line art, charts, screen recordings, license plates, subtitles — ringing and mosquito noise are the artifacts most likely to make a "clean" encode look cheap, because the damage clusters exactly where the eye is already looking. They are also the artifacts that best expose the weakness of a single quality number: the score is averaged over the whole frame, and the ringing lives in the 5% of pixels next to the edges, so a mean-pooled VMAF can read fine while the title card shimmers. This article is for a video engineer, encoding lead, or QA engineer who can see the halos around text but wants to know exactly why they form, why a frame score under-reports them, and which codec features and measurement habits actually catch them. Get this right and your screen-content and graphics-heavy encodes stop embarrassing you in the one place viewers notice first.
What ringing and mosquito noise look like
Start with the look, because the two artifacts are the same physics seen at two different timescales.
Ringing is a set of faint, repeating ripples that run parallel to a sharp edge, fading as they move away from it — a high-contrast border seems to echo into the flat area beside it. Around black text on a white background it shows as a grey haze hugging each letter; around a bright logo on a dark backdrop it shows as ghost outlines stepping outward. The name is literal: the picture signal "rings" beside the edge the way a struck bell keeps sounding after the strike. Ringing is a spatial artifact — you can see it in a single paused frame.
Mosquito noise is ringing that will not hold still. When the edge is moving — a caption scrolling, a logo bug animating, a person's shoulder crossing a clean wall — the ripple pattern is recomputed slightly differently on every frame, so instead of a fixed halo you get a cloud of small specks that flicker and crawl around the edge. It earns its name honestly: the busy, shifting dots look like a swarm of mosquitoes hovering around the object. Mosquito noise is a temporal artifact — it is far more visible when the video plays than when you pause it, which is exactly why a frame-by-frame check can walk right past it.
It helps to separate these from their neighbours in the artifact field guide. Blocking is the block grid laid over the whole picture; banding is the staircase in smooth gradients. Ringing and mosquito noise are different: they cling to edges, and they are loudest where a hard edge sits on an otherwise calm surface.
Where they come from: the Gibbs phenomenon
To see why an edge rings, follow what the codec does to it. The mechanism is the same quantization that produces blocking, but its effect on an edge is different and worth tracing once.
A codec splits each frame into blocks and rewrites every block with the Discrete Cosine Transform (DCT) — it describes the block as a sum of wave patterns, from a flat base tone up through finer and finer ripples. A smooth patch is mostly the low-frequency waves. A sharp edge is the opposite: a clean step from dark to light needs the high-frequency waves to snap the transition into place, so the edge's energy spreads across the block's highest-frequency coefficients.
Now comes the bit-saving step. Quantization divides each coefficient's strength by a step size and rounds, and it rounds the small high-frequency coefficients straight to zero. That is fine for a smooth block, which barely used them. For the edge block it is the whole problem: the decoder has to rebuild a sharp step from a truncated set of frequencies, and a step rebuilt from too few frequencies cannot stay flat on either side. It overshoots just past the edge, dips back, overshoots again less, and oscillates its way to calm — the ripples you see as ringing.
This is not a codec bug; it is a law of Fourier reconstruction called the Gibbs phenomenon. Reconstruct any sharp jump from a limited band of frequencies and the result overshoots the jump by a fixed proportion no matter how many frequencies you keep. The textbook figure is striking: the overshoot settles at about 9% of the jump height (precisely ≈8.95%), and adding frequencies makes the ripples narrower and tighter to the edge but never shorter. Take a mid-contrast edge that steps by 128 code values; the Gibbs overshoot is about 0.0895 × 128 ≈ 11 code values of ghost beside the edge. A black-to-white text edge jumps further and overshoots harder — which is why text rings so visibly.
One distinction the literature is careful about, and you should be too. The first bump — the signal shooting past the edge value — is overshoot; the decaying ripples that follow it are ringing. They usually arrive together and most people call the whole thing "ringing", but the difference matters for one fix we will meet later: a little overshoot is actually how sharpening makes edges look crisp.
Figure 1. How ringing is born. A sharp edge (left) carries its energy in high-frequency DCT coefficients; quantization rounds those away (middle); the decoder rebuilds the edge from too few frequencies, so it overshoots by ~9% and rings (right). More frequencies make the ripples tighter, never shorter — the Gibbs phenomenon.
Why mosquito noise moves
Ringing becomes mosquito noise the moment the edge moves, and the reason is that the codec re-solves the edge from scratch on every frame.
Between one frame and the next, a moving edge sits at a slightly different sub-pixel position, falls across the block grid differently, and is predicted from a different reference patch by motion compensation. Each of those changes hands the quantizer a slightly different set of high-frequency coefficients to round, so the overshoot-and-ripple pattern lands in a slightly different place each time. The eye integrates these jittering halos over time and reads them as a swarm of crawling specks — luminance and colour fluctuating frame to frame in the smoothly textured zone hugging the edge. That temporal flicker is the signature of mosquito noise, and it is why it is classed as a moving artifact while plain ringing is a still one.
Two practical consequences follow. First, mosquito noise is worst on high-contrast moving edges over calm backgrounds — scrolling white subtitles, an animated logo, a dark railing panning against a bright sky — because the ripples have a quiet surface to flicker against. Second, because the artifact is defined by its change over time, any measurement that looks at one frame in isolation is structurally blind to it. Hold that thought for the metrics section.
Figure 2. Two timescales of one artifact. Ringing is a fixed halo you see in a paused frame (top). Mosquito noise is the same ripple recomputed every frame as the edge moves, so it shimmers and crawls (bottom, three frames) — visible in motion, easy to miss when paused.
Where they hurt most
Ringing and mosquito noise are picky about content, and knowing where they bite tells you where to look.
They are loudest on synthetic, high-contrast edges: text and captions, station logos and watermarks, user-interface chrome in a screen recording, cartoon and anime line art, charts and slides in an e-learning capture, and the crisp boundaries of graphics overlaid on video. Camera-shot footage of soft natural scenes rings far less, because real edges are already a little soft and there is texture nearby to mask the ripples. The artifact needs two things to be obvious — a hard edge and a calm surface beside it — and screen content supplies both in abundance. This is the artifact that makes "we'll just compress the webinar" go wrong.
Colour makes it worse in one specific way. Because most delivery pipelines store colour at reduced resolution (chroma subsampling, typically 4:2:0), the colour channels are quantized even more coarsely than brightness, so ringing on a coloured edge can show as fringes of the wrong colour beside it — a magenta or cyan ghost along a saturated border. The cause-side detail of why chroma is subsampled lives in the Video Encoding section's color article; here we just note that subsampled colour rings in colour.
How modern codecs fight ringing: in-loop filters
Because ringing is baked into how block transforms handle edges, every modern codec ships dedicated machinery to clean it up before the frame is shown — and, importantly, before the frame is stored as a reference for the next one. These are in-loop filters: normative, built into the codec, applied inside the encoding loop. Three of them matter for edges.
The deblocking filter comes first. The DCT concentrates error at block edges, so deblocking smooths the seams between blocks — its main job is blocking, but because ringing often rides along a block boundary, a good deblocking pass softens some of it too. H.264, HEVC, and AV1 all carry one.
HEVC adds a second filter aimed more squarely at ringing: Sample Adaptive Offset (SAO). After deblocking, SAO classifies each pixel and adds a small correcting offset to whole groups of them. Its edge-offset mode looks at how each pixel compares to its neighbours along a chosen direction and nudges the overshoots and undershoots back toward where the edge should be — which is precisely the ringing that high-frequency quantization left behind. SAO exists in the H.265/HEVC standard largely to attenuate the ringing produced by coarse quantization of the high-frequency coefficients at strong edges.
AV1 goes furthest with the Constrained Directional Enhancement Filter (CDEF) — the cleanest example of a deringing filter built to remove the ripple without smearing the edge that caused it. CDEF first works out the direction of the edge in each block, choosing one of eight orientations, then runs a constrained low-pass filter primarily along that edge and weakly across it. The "constrained" part is the trick: a per-tap constraint function ignores neighbouring pixels that differ too much from the centre, so the filter cleans the ripples in the calm zone beside the edge but leaves the edge itself sharp. Because it follows edges instead of blurring in all directions, CDEF removes ringing and DCT "basis noise" while keeping the crisp transition intact. (AV1 then has an optional loop-restoration stage — Wiener and self-guided filters — for general clean-up.) The full filter is documented by its authors, Midtskogen and Valin, in the CDEF paper.
Figure 3. How codecs de‑ring inside the loop. Deblocking softens block seams; HEVC's Sample Adaptive Offset nudges edge over/undershoots back; AV1's CDEF detects the edge direction (1 of 8) and filters along it with a constrained low-pass, cleaning the ripple without dulling the edge.
On older content, or when you receive an already-damaged file, a post-processing deringing filter is the fallback. FFmpeg's legacy libpostproc carries one (pp=dr, the MPEG-4-era deringer), though it is being retired and was only ever meant for old MPEG formats; modern pipelines lean on the in-loop filters above and on simply giving edges more bits. The encode-side levers are the obvious ones: raise the bitrate or lower the quantizer so fewer high-frequency coefficients are discarded, keep the in-loop filters enabled (turning them off to "preserve sharpness" trades a little softness for a lot of ringing), and use a codec with the right tools — screen-content and graphics-heavy material benefits from the newer codecs' dedicated edge handling. The detail-loss trade-off of filtering too hard is the subject of Blur and detail loss.
How the metrics react, and where they lie
Here is where ringing behaves differently from its sibling banding, and the difference is instructive.
Ringing is real pixel error. The ripples beside the edge genuinely differ from the original pixels, so unlike banding — whose one-code-value error is nearly invisible to the math — ringing does register in PSNR, SSIM, and VMAF. SSIM's structural term reacts to the disrupted edge; VMAF's detail-loss and fidelity features react to the added ripple. So the first thing to say is honest: the full-reference metrics are not blind to ringing the way they are to banding.
But ringing hides in two other ways, and both are about where the error sits rather than whether it exists.
The first is spatial pooling. Ringing is concentrated in the thin band of pixels next to edges — often a few percent of the frame — while the rest of the picture is clean. A frame score averages the error over every pixel, so the loud edge error gets diluted by the quiet majority. Walk the arithmetic. Suppose the ringing band covers about 5% of the frame with a mean squared error of 100 (an RMS error around 10 code values, consistent with that ~11-value Gibbs overshoot decaying), and the other 95% is near-perfect:
Frame MSE = 0.05 × 100 + 0.95 × 0 = 5
Frame PSNR = 10 · log10(255² / 5)
= 10 · log10(65025 / 5)
= 10 · log10(13005)
= 41.1 dB → reads "very good"
Edge-band PSNR = 10 · log10(255² / 100)
= 10 · log10(650.25)
= 28.1 dB → reads "visibly degraded"
A 13 dB gap between the frame and the edges it is built from. The number is not lying about the average; it is lying about the experience, because the viewer's eye is on the 28 dB edges, not the 41 dB average. This is the same trap covered in pooling per-frame scores into one number: a mean hides a localized defect.
The second is time. Mosquito noise is defined by flicker, and the popular full-reference metrics score one frame against its reference and then mean-pool across frames. A handful of frames where the swarm is bad get averaged in with many where it is mild, and the temporal crawl — the thing the viewer actually notices — never enters the number at all. A per-frame metric has no axis for "this shimmers."
The strongest evidence that metrics under-value this artifact comes from the people who built the filter to remove it. When AV1's CDEF deringing was tested, it produced a statistically significant subjective improvement on half the test clips — the kind of gain normally worth 5–10% in coding efficiency — while the objective improvement in PSNR and SSIM was only about 1%. As the filter's authors put it, the visual improvements that motivate deringing are mostly outside the evaluation ability of primitive objective tools like PSNR or SSIM. When removing an artifact barely moves the metric but clearly helps viewers, the metric is under-counting the artifact.
| Metric | What it measures | Reference needed | Where it lies on ringing & mosquito noise |
|---|---|---|---|
| PSNR | Mean pixel error (dB) | Full-reference | Sees the ripple energy, but whole-frame averaging dilutes edge-localized ringing; no temporal axis for mosquito flicker |
| SSIM | Structural similarity (0–1) | Full-reference | Reacts to disrupted edge structure; still per-frame and window-averaged, so localized rings are diluted |
| VMAF | Fused perceptual score (0–100) | Full-reference | Detail-loss features react to ringing better than PSNR; default model is spatial, so mosquito's temporal crawl is under-weighted |
| VMAF (temporal pooling) | Per-frame VMAF, percentile-pooled | Full-reference | Min/low-percentile pooling surfaces bad-frame ringing a mean would hide; still no true flicker measure |
| No-reference edge-busyness | Ripple energy beside edges; frame-to-frame change | No-reference | Built for this: scores the near-edge band and its temporal jitter; the right tool for live, UGC, and screen content |
Table 1. How five ways of measuring treat ringing and mosquito noise. The full-reference metrics see ringing (it is real pixel error) but dilute it by averaging over space and time; mosquito noise's temporal flicker needs either percentile pooling or a no-reference edge-busyness detector. Read the edge regions, not just the frame mean.
Figure 4. The dilution trap. Ringing concentrates in ~5% of pixels next to edges (28 dB there), while the calm 95% is near-perfect; mean-pooling the frame reports a reassuring 41 dB. The defect the viewer sees is averaged away. Score the edge band and the motion, not just the frame.
Common mistake: signing off screen content on a mean-pooled frame score. The expensive error here is reading "VMAF 95, ship it" on a webinar, a slide capture, or a logo-heavy promo, where every defect is clustered on text and graphic edges that make up a few percent of the frame. The mean averages the shimmer away, and the artifact reaches the one content type — sharp synthetic edges — where viewers are most sensitive to it. When the content has hard edges, measure the edge regions specifically, use low-percentile (not mean) temporal pooling so bad frames cannot hide, and watch a moving clip rather than a still — mosquito noise only appears in motion. A second pitfall sits at the other end: over-sharpening that adds halo on purpose can raise apparent sharpness while seeding ringing the metric won't flag, so do not chase acutance past the point the edge starts to echo.
A note on acutance: when a little overshoot is wanted
One subtlety keeps deringing honest. The first part of the artifact — the overshoot right at the edge — is the same mechanism sharpening filters use deliberately: a small overshoot increases acutance, the apparent sharpness of an edge, by exaggerating the contrast across it. This is why an unsharp mask "pops" an image, and why some JPEG encoders deliberately let the edge overshoot and then clip it to mask ringing. The line between "crisp" and "ringing" is therefore perceptual, not absolute: a touch of overshoot reads as sharp, a train of decaying ripples reads as a defect. When you tune a deringing filter or a sharpener, you are choosing where on that line to sit — which is one more reason the eye, not the metric, is the final judge here.
Where Fora Soft fits in
Fora Soft has built video software since 2005 — streaming, WebRTC conferencing, e-learning, OTT, telemedicine, and surveillance — and ringing is the artifact our work runs into most because so much of it is edge-heavy: shared slides and screens in e-learning and conferencing, captions and channel logos in OTT, and the text that matters most in surveillance, like license plates and timestamps. We treat it as a measurement problem with a known shape: find the strong edges, score the ripple in the band beside them and how much it flickers frame to frame, and never sign an edge-heavy encode off on a mean-pooled frame number that averages the shimmer away. The fixes then follow the cause — more bits where edges live, the right in-loop filters left on, the right codec for screen content — and where it helps a decision we point to our own benchmark data so you can check the method rather than take our word.
What to read next
Call to action
- Talk to a video engineer — book a 30-minute scoping call to talk through your mosquito noise plan.
- See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
References
- Gibbs' Phenomenon, MIT 18.03SC Differential Equations (OpenCourseWare notes), Massachusetts Institute of Technology, 2011. Tier 1 (foundational primary mathematics). Establishes that the Fourier reconstruction of a jump discontinuity overshoots by a fixed ≈8.95% (~9%) of the jump regardless of the number of terms; the basis for the ringing mechanism and the 9% figure. https://ocw.mit.edu/courses/18-03sc-differential-equations-fall-2011/05cce833730ffd3c39f420a41ad82fd6_MIT18_03SCF11_s22_7text.pdf
- S. Midtskogen and J.-M. Valin, "The AV1 Constrained Directional Enhancement Filter (CDEF)," IEEE ICASSP, 2018 (arXiv:1602.05975). Tier 1 (filter-author defining work). Defines CDEF: an eight-direction search plus a constrained low-pass filter run along the edge that removes ringing and DCT basis noise without blurring the edge. Basis for the CDEF section and Figure 3. https://arxiv.org/abs/1602.05975
- C.-M. Fu, E. Alshina, A. Alshin, Y.-W. Huang, C.-Y. Chen, C.-Y. Tsai, et al., "Sample Adaptive Offset in the HEVC Standard," IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1755–1764, 2012. Tier 1 (standard-author defining work). Specifies SAO's edge-offset and band-offset modes and their role in reducing ringing from high-frequency quantization. Basis for the SAO paragraph. https://ieeexplore.ieee.org/document/6324412
- G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, "Overview of the High Efficiency Video Coding (HEVC) Standard," IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1649–1668, 2012. Tier 1 (standard overview by its editors). Names deblocking and SAO as HEVC's normative in-loop filters and their purpose. Basis for the in-loop-filter framing. https://ieeexplore.ieee.org/document/6316136
- P. List, A. Joch, J. Lainema, G. Bjøntegaard, and M. Karczewicz, "Adaptive Deblocking Filter," IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 614–619, 2003. Tier 1 (filter-author defining work). Defines the H.264/AVC in-loop deblocking filter; basis for the deblocking description. https://ieeexplore.ieee.org/document/1218065
- C. Montgomery, "AV1: next generation video — The Constrained Directional Enhancement Filter," Mozilla Hacks, 2018. Tier 4 (credible deployer engineering blog). Explains normative vs non-normative and in-loop filters, the deblocking/CDEF/loop-restoration stack, and reports CDEF's ~5–10% subjective gain vs ~1% objective gain — the basis for the metric-blindness evidence. https://hacks.mozilla.org/2018/06/av1-next-generation-video-the-constrained-directional-enhancement-filter/
- "Mosquito noise in MPEG-compressed video: test patterns and metrics," NIST/ITS publication, 2000. Tier 5 (peer-reviewed/institutional). Characterizes mosquito noise as temporal edge distortion (fluctuating luminance/chrominance around high-contrast edges) and proposes test patterns and metrics. Basis for the temporal definition of mosquito noise. https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=28452
- M. Yuen and H. R. Wu, "A survey of hybrid MC/DPCM/DCT video coding distortions," Signal Processing, vol. 70, no. 3, pp. 247–278, 1998. Tier 5 (foundational peer-reviewed). The classic taxonomy of block-DCT coding artifacts, attributing ringing and mosquito noise to coarse quantization of high-frequency coefficients at edges. Basis for the artifact taxonomy. https://www.sciencedirect.com/science/article/abs/pii/S0165168498001285
- "Ringing artifacts," Wikipedia, accessed 2026-06-24. Tier 6 (orientation). Summarizes the signal-processing definition, the overshoot-vs-ringing distinction, the acutance trade-off, and the mozjpeg overshoot-and-clip deringing trick. Orientation only; primary facts cited to the sources above. https://en.wikipedia.org/wiki/Ringing_artifacts
- L. Liu et al., "PEA265: Perceptual Assessment of Video Compression Artifacts," arXiv:1903.00473, 2019. Tier 5 (peer-reviewed/institutional). A large video database with per-artifact perceptual labels (blockiness, blurriness, ringing, mosquito, jerkiness), supporting the claim that generic metrics under-capture localized and temporal edge artifacts. https://arxiv.org/abs/1903.00473
- pp — FFmpeg Filters Documentation (libpostproc deblock/dering subfilters), FFmpeg project, accessed 2026-06-24. Tier 3 (first-party tooling). Documents the legacy
pp=drderinging post-filter for older MPEG-family formats and its deprecation status. Basis for the post-processing fallback. https://ffmpeg.org/ffmpeg-filters.html#pp


