Published 2026-05-17 · 12 min read · By Nikolay Sapunov, CEO at Fora Soft
Why this matters
If you have ever watched a low-bitrate video stream and seen the tiles of an 8×8 grid pop into view across someone's face, you have looked at the failure mode that in-loop filtering exists to suppress. A product manager who knows that every modern decoder applies four post-quantisation filters inside the coding loop will stop assuming that a "softer" stream is a worse stream — it may simply be the deblocking filter doing its job. A streaming engineer who can tell SAO from ALF can read an encoder log without guessing. A founder who understands why AV1 dropped SAO in favour of CDEF will recognise why AV1 hardware decoders ship with a different on-chip filter pipeline than HEVC ones. This article walks through each filter in turn, shows the arithmetic on a worked example, and ends with a comparison of which codec ships which filters and what each contributes to the final bitrate.
What "in-loop" really means
Every modern codec is built around a hybrid architecture: predict the next block from neighbours or from a previous frame, transform the residual, quantise the coefficients, and entropy-code the result. The decoder reverses those steps to reconstruct each block. Reconstruction is approximate because quantisation throws information away on purpose. The approximation error shows up at block edges as visible discontinuities, around sharp transitions as faint copies of nearby pixels, and across flat regions as low-amplitude noise.
Two paths exist to clean that up. Post-filtering runs after the decoder is done and never affects the next frame's prediction. It is allowed to be aggressive because nothing depends on its output. In-loop filtering, by contrast, runs inside the prediction loop. The filtered frame, not the raw reconstruction, is the reference that the encoder consults when it predicts the next P or B frame. That is why every decoder in the world must run the same filter in the same order — a one-pixel mismatch on either side becomes a propagating drift that grows with every motion-compensated frame.
In-loop filtering pays for itself because the cleaner reference frame yields better motion estimation on the next P frame. The downstream residuals shrink, fewer bits are needed to code them, and the saving more than offsets the cost of running the filters in the first place. Vendor benchmarks since H.264 have converged on a 5–10% rate savings at the same PSNR from the deblocking filter alone, and another 1–4% on top from each of SAO, ALF, and CDEF where present.
Figure 1. The in-loop filter sits between the reconstructed block and the reference-frame buffer. Both the encoder and every decoder run the same filter chain in the same order; otherwise the reference frames diverge and the stream visibly degrades.
The deblocking filter — smoothing the grid
The first filter every modern codec applies is the deblocking filter. The artefact it targets is the blocking artefact: a visible step in pixel values at the edge between two coded blocks, caused by independent quantisation of each block. At low bitrates the step can be 8–20 luma levels tall and it traces out the block grid across the frame.
The deblocking filter inspects each horizontal and vertical block edge and decides, edge by edge, how strongly to smooth across it. The decision uses three inputs: the boundary strength (was either side intra-coded? do the two sides have different motion vectors or different reference frames?), the average activity on each side (a textured region tolerates a larger step than a flat one), and the quantisation parameter (more aggressive quantisation produces larger steps, so the filter ramps up with QP). The filter itself is a short symmetric kernel — H.264 used a 4-tap filter modifying up to 2 pixels on each side; HEVC widened it to modify up to 3 pixels on each side at strong boundaries; VVC extended it again to 7 pixels per side on the longest-block boundaries.
A worked example. Suppose two adjacent 4×4 luma blocks on a flat surface meet at a vertical edge. The four pixels on the left side along one row read 148, 150, 151, 152; the four pixels on the right read 162, 161, 160, 159. There is an 11-level step at the boundary — visible on a high-quality monitor. The deblocking filter applies a normalised symmetric kernel that nudges the two boundary-adjacent pixels on each side toward the midpoint:
- new left-boundary pixel = round((152 × 1 + 162 × 3) / 4) = round(159.5) = 160
- new right-boundary pixel = round((152 × 3 + 162 × 1) / 4) = round(154.5) = 155
The 11-level step becomes a smoother 5-level transition, and the next pair of pixels on each side gets a smaller nudge in the same direction. The result is a visually continuous edge, not a step. Critically, the filter is conservative — it backs off when either side has high activity, so it never smooths a real texture edge into a smear.
The deblocking filter is the cheapest of the four filters to run. It touches only the pixels next to a block edge, the kernel is short, and the decisions are local. HEVC and AV1 publish typical decoder-throughput numbers of 1–2% of total decode cycles for deblocking, and the bitrate saving sits in the 5–10% range. It is the textbook good deal of modern video coding.
Figure 2. The deblocking filter smooths the step at a coded-block boundary without disturbing the texture on either side. Inputs to the decision are the boundary strength, the activity on each side, and the quantisation parameter.
Sample-Adaptive Offset (SAO) — small corrections everywhere
After deblocking, HEVC and VVC apply Sample-Adaptive Offset, abbreviated SAO. The artefact SAO targets is the residual quantisation error that the deblocking filter leaves behind — the small DC shift in flat regions, and the ringing halo around sharp transitions where the transform's high-frequency basis functions have been over-quantised.
SAO works one CTU at a time. For each CTU, the encoder picks one of two modes — Edge Offset (EO) or Band Offset (BO) — and signals four small offsets in the bitstream. Edge Offset classifies every pixel into one of five categories based on the values of its two neighbours along a chosen direction (horizontal, vertical, 45°, or 135°): is the pixel a local minimum, a local maximum, the start of an edge, the end of an edge, or part of a flat run? The encoder then signals one offset per non-flat category — typically ±1 or ±2 luma levels — which the decoder adds to every pixel in that category. Band Offset divides the full luma range into 32 equal bands and signals offsets for four consecutive bands.
A worked example. Suppose a CTU contains a flat patch with mean luma 132 and the transform left a 2-level negative DC shift after quantisation — the reconstructed mean is 130 instead of 132. The encoder spots this in its rate-distortion search, sets the SAO mode for this CTU to Band Offset, picks the four bands covering luma 128–143, and signals offsets of +2 for each. The decoder adds +2 to every pixel in those bands, restoring the mean to 132 with no impact on the surrounding bands. The cost in bitstream bits is roughly 30–60 per CTU.
SAO is cheap to encode and cheap to decode. The encoder runs a small rate-distortion search per CTU — typically 5–8 alternatives — and the decoder needs only a lookup table and an integer add per pixel. The bitrate saving over deblocking-only on top of HEVC is 1–3% at the same PSNR, and a bit more on content with smooth gradients and sharp edges (animation, motion graphics).
Figure 3. SAO's two modes. Edge Offset adjusts pixels that fall into one of four neighbour-relative categories along a chosen direction; Band Offset adjusts pixels that fall into four consecutive bands of the luma range.
The Adaptive Loop Filter (ALF) — Wiener filtering inside the codec
VVC adds a third filter on top of deblocking and SAO: the Adaptive Loop Filter, ALF. The artefact ALF targets is what is left over after SAO — the small-amplitude high-frequency error scattered throughout the reconstructed frame, the kind of error that no per-CTU offset can address because it has zero mean.
ALF is a Wiener filter — a classical signal-processing tool that finds the linear filter which, on average, minimises the squared error between the filtered reconstruction and the original frame. The encoder estimates the filter coefficients on the encoder side from a comparison of the reconstruction and the original; the coefficients are quantised and shipped in the bitstream; the decoder applies the same coefficients to its reconstruction.
VVC's ALF uses a 7-tap diamond-shaped luma filter and a 5-tap diamond chroma filter, with coefficients chosen per CTU from a pool of up to 25 alternatives. The "adaptive" part is the per-CTU classification: the encoder labels each 4×4 sub-block by its local gradient activity and direction, and the decoder switches between filter shapes accordingly. A flat sub-block gets a smoother filter; a sub-block dominated by a vertical edge gets a filter that preserves the edge while smoothing along it; a 45°-edge sub-block gets a filter rotated 45°. The signalling cost is small — a few bits per CTU plus the global filter coefficients per frame.
VVC also adds a Cross-Component ALF (CC-ALF), which uses the luma channel as an additional input when filtering the chroma channels. The intuition is that texture in luma usually corresponds to chroma activity, and a luma-driven filter can recover chroma detail that has been lost to chroma sub-sampling and quantisation. CC-ALF buys roughly 0.5–1% additional chroma bitrate saving in VVC at typical streaming QPs.
The combined effect of ALF and CC-ALF in VVC is a 3–6% bitrate saving on top of deblocking and SAO at the same PSNR, with a heavier decoder cost — Karczewicz et al. (2021) report that ALF accounts for roughly 5–8% of the VVC decoder runtime, the most expensive of any single filter in the chain. AVS3 ships a similar Wiener-based ALF; AV1 does not.
CDEF — AV1's directional sharpener
AV1 dropped SAO and went a different way. After deblocking, AV1 applies the Constrained Directional Enhancement Filter (CDEF), invented by Steinar Midtskogen and Jean-Marc Valin at Cisco and Mozilla and refined inside the AOMedia coding effort. CDEF targets the ringing halo around sharp edges and the directional blur that the deblocking filter sometimes leaves on a slanted texture.
CDEF runs on every 8×8 block. For each block, the encoder estimates the dominant edge direction out of eight possible directions (every 22.5°) and picks a primary filter strength along that direction and a secondary filter strength across that direction. The primary filter smooths along the edge; the secondary filter sharpens across the edge. A second pair of strengths handles chroma. The combined effect is to preserve sharpness on real edges while suppressing the dot-ringing pattern that comes from sparse quantised high-frequency coefficients.
The CDEF design is deliberately simple. Eight directions, four primary strengths, two secondary strengths, no Wiener-style trained coefficients — the filter is a fixed kernel applied with chosen strengths, not a fully adaptive Wiener filter. That simplicity is the point: CDEF runs in roughly the same decoder cost as HEVC's SAO and delivers a 2–4% bitrate saving on natural content. The Midtskogen/Valin 2017 paper reports CDEF beating SAO on PSNR-Y by an average of 0.85% on the AOMedia test set and beating it by more on the perceptual VMAF metric, which is what convinced AOMedia to ship it in place of SAO.
AV1 also applies a Loop Restoration filter after CDEF, with two sub-modes: a Wiener-style filter and a self-guided filter. Loop Restoration is closer in spirit to VVC's ALF but is simpler — at most a 7-tap Wiener filter or a guided-image filter with two parameters. Its contribution is roughly 1–2% bitrate saving on top of CDEF. AV2 is expected to refine all three AV1 in-loop filters with neural-network alternatives currently in the experimental tracks.
Figure 4. The in-loop filter chain in four major codecs. The order is fixed in each codec's spec; mixing it would break decoder synchronisation. The total bitrate contribution of the chain sits between 8% and 14% depending on content and codec.
A worked end-to-end example
A 1080p HEVC stream encodes a slow-motion football clip at 4 Mbps with QP 28. Inside one CTU on the green pitch, the reconstructed luma values are flat at 137; the original was flat at 140. The deblocking filter does nothing because there is no block-edge step. The SAO encoder runs its per-CTU search, picks Band Offset, identifies that bands 128–143 are 3 levels too low, signals four offsets of +3, and pays 36 bits in the bitstream for the signalling. The decoder applies +3 to every pixel in those bands, restoring the patch to 140. If ALF were active (it would be in VVC, not HEVC), it would add a small high-frequency correction on top of that, paying roughly 60–100 bits per frame on the global filter coefficients. The whole filter chain takes the patch from a visibly dim 137 back to the original 140, and the encoder's next P-frame prediction now starts from the corrected reference. Without the filters, every subsequent P-frame would predict from 137 and either accept the 3-level error (and propagate it) or spend extra residual bits to correct it. The filters pay for themselves on the third frame and have been paying ever since.
A comparative table
The story across the five codecs that dominate 2026 is one of more filters and more compression at a slowly rising decoder cost.
| Codec (year) | Deblocking | SAO | ALF | CDEF | Loop Restoration | Total filter-chain saving (typical) | Filter share of decode cost |
|---|---|---|---|---|---|---|---|
| H.264 / AVC (2003) | yes | no | no | no | no | ~5–8% | ~3% |
| HEVC (2013) | yes | yes | no | no | no | ~7–11% | ~5% |
| AV1 (2018) | yes | no | no | yes | yes | ~9–13% | ~6% |
| VVC (2020) | yes | yes | yes (+CC-ALF) | no | no | ~10–14% | ~8–10% |
| AVS3 (2019) | yes | yes | yes | no | no | ~9–12% | ~7% |
Two observations. First, every codec since H.264 has kept the deblocking filter — it is the cheapest gain in the chain and nobody has found a reason to retire it. Second, the choice of what to put after deblocking is one of the clearest architectural signatures of a codec: HEVC picked an offset-based correction (SAO), AV1 picked a directional sharpener (CDEF), VVC added a trained Wiener filter (ALF) on top of SAO, and every choice traces back to the priorities of the standards body that designed the codec.
A common pitfall — deblocking turned off in transcode
The most common in-loop-filter bug in a transcoding pipeline is a deblocking filter that has been disabled at the command line and forgotten. FFmpeg, x264, x265, and SVT-AV1 all accept an option that turns deblocking off — usually --no-deblock or -deblock 0:0. Engineers reach for it during debugging to see the raw block-grid output, then forget to remove it before pushing the encoder config to production.
The visible symptom is subtle. The encoder runs faster, the output passes basic playability checks, and the first VMAF measurement on a high-bitrate test clip looks roughly similar to the deblocking-on baseline. But on low-bitrate content, blocks march across every flat region as soon as motion enters the frame. Customer-facing dashboards see a steady stream of "video is pixelated" complaints, and the bug is hard to find because the encoder log does not flag it as an error.
The fix is a contract test in the CI pipeline: encode a reference clip with the production config, decode the output, and assert that the bitstream's deblocking_filter_disabled_flag (H.264 / HEVC) or equivalent (loop_filter_level > 0 in AV1) is set to the expected value. We have added a one-line ffprobe check to every customer pipeline we maintain — it costs a millisecond per encode and has caught the same bug twice across three customers in two years.
Where Fora Soft fits in
We ship encoder configurations and bitstream-analysis tools across video conferencing, OTT, surveillance, e-learning, telemedicine, and AR/VR pipelines. Every pipeline we maintain has an in-loop-filter audit in its CI suite — deblocking is on, SAO is enabled where the standard permits, ALF coefficients are present in VVC streams, and AV1 streams have CDEF strengths non-zero. When a customer complains about visible blocks on a 720p mobile feed, the first thing we check is the deblocking filter state in the latest encode. Twenty years of bitstream debugging across H.263, MPEG-2, MPEG-4 Part 2, H.264, HEVC, VP8, VP9, and AV1 have taught us that the in-loop filter chain is where small misconfigurations produce large customer pain — which is why the audit costs us a millisecond and saves us a support ticket every other week.
What to read next
- Quantization: where quality is lost — the upstream stage that creates the artefacts in-loop filters clean up.
- H.265 / HEVC explained — the codec that introduced SAO alongside deblocking.
- AV1: the new internet standard and where it stands in 2026 — the codec that replaced SAO with CDEF.
Talk to us / See our work / Download
- Talk to a video engineer — bring us an encoder config and we will audit the in-loop filter chain with you.
- See our case studies — two decades of video conferencing, OTT, surveillance, and telemedicine pipelines.
- Download the In-Loop Filter Tuning Cheat Sheet — one page covering deblocking, SAO, ALF, CDEF, and Loop Restoration side by side, with the encoder flags that turn each one on or off and the debugging hints we use every day.
References
- Norkin, A., Bjøntegaard, G., Fuldseth, A., Narroschke, M., Ikeda, M., Andersson, K., Zhou, M., and Van der Auwera, G. "HEVC Deblocking Filter." IEEE Transactions on Circuits and Systems for Video Technology, 22(12), 1746–1754, 2012. https://ieeexplore.ieee.org/document/6324411 — Authoritative description of HEVC's deblocking design and its 5–10% bitrate gain over H.264 deblocking.
- Fu, C.-M., Alshina, E., Alshin, A., Huang, Y.-W., Chen, C.-Y., Tsai, C.-Y., Hsu, C.-W., Lei, S.-M., Park, J.-H., and Han, W.-J. "Sample Adaptive Offset in the HEVC Standard." IEEE Transactions on Circuits and Systems for Video Technology, 22(12), 1755–1764, 2012. https://ieeexplore.ieee.org/document/6324412 — Original SAO paper covering Edge Offset and Band Offset modes.
- Karczewicz, M., Hu, N., Taquet, J., Chen, C.-Y., Misra, K., Andersson, K., Yin, P., Lu, T., François, E., and Chen, J. "VVC In-Loop Filters." IEEE Transactions on Circuits and Systems for Video Technology, 31(10), 3907–3925, 2021. https://ieeexplore.ieee.org/document/9399502 — VVC's three-filter chain (deblocking, SAO, ALF/CC-ALF) with per-filter complexity and gain analysis.
- Midtskogen, S., and Valin, J.-M. "The AV1 Constrained Directional Enhancement Filter (CDEF)." IEEE International Conference on Acoustics, Speech and Signal Processing, 2018. https://arxiv.org/abs/1602.05975 — CDEF design, eight-direction estimation, primary/secondary strengths, and comparison against SAO.
- Han, J., Li, B., Mukherjee, D., et al. "A Technical Overview of AV1." Proceedings of the IEEE, 109(9), 1435–1462, 2021. https://arxiv.org/pdf/2008.06091 — AV1's full filter chain (deblocking, CDEF, Loop Restoration) and its overall contribution to AV1 compression efficiency.
- ITU-T Recommendation H.264 | ISO/IEC 14496-10. Advanced Video Coding for Generic Audiovisual Services, edition 14, 2023. https://www.itu.int/rec/T-REC-H.264 — Authoritative spec for H.264's deblocking filter, including the boundary-strength tables and QP-dependent thresholds.
- ITU-T Recommendation H.265 | ISO/IEC 23008-2. High Efficiency Video Coding, edition 8, 2024. https://www.itu.int/rec/T-REC-H.265 — HEVC's deblocking and SAO normative descriptions.
- ITU-T Recommendation H.266 | ISO/IEC 23090-3. Versatile Video Coding, edition 3, 2024. https://www.itu.int/rec/T-REC-H.266 — VVC's deblocking, SAO, ALF, and CC-ALF normative descriptions.
- AOMedia. AV1 Bitstream & Decoding Process Specification, version 1.0.0 errata 1, 2019. https://aomediacodec.github.io/av1-spec/av1-spec.pdf — Authoritative description of AV1's deblocking, CDEF, and Loop Restoration filters.
- Bross, B., Wang, Y.-K., Ye, Y., Liu, S., Chen, J., Sullivan, G. J., and Ohm, J.-R. "Overview of the Versatile Video Coding (VVC) Standard and its Applications." IEEE Transactions on Circuits and Systems for Video Technology, 31(10), 3736–3764, 2021. https://ieeexplore.ieee.org/document/9503377 — Codec-wide overview citing the filter-chain contributions to VVC's ~50% BD-rate savings over HEVC.


