Macroblocking and the DCT Grid: Cause, Filter, Metric

Why this matters

If you encode or QC video, blocking is the artifact a client will screenshot and email you, because it is the one a non-engineer can see and name. Knowing exactly where it is born — the block-transform grid and its quantizer — tells you which knob actually fixes it, and which "fix" only hides it behind softness. This article is written for a video engineer, encoding lead, or QA engineer who can see the tiles but wants to measure them, trace them to a setting, and choose a metric that does not quietly pass them. It is the first deep dive in the artifact gallery because blocking is the clearest window into how every block-based codec works and where it fails.

What blocking actually is

Start with the look. Blocking is the breakup of a picture into visible squares or rectangles, as if the image were reassembled from coarse mosaic tiles. On a face it flattens the cheeks into facets; on a sky it lays down a checkerboard; on motion it makes whole tiles jump a beat behind the rest of the frame. The common name macroblocking comes from the macroblock, the 16×16-pixel unit older codecs processed as a whole — but the squares you see are the codec's processing grid showing through the picture.

The key idea, and the reason a field guide can exist at all, is that this grid is not random. It is a fixed lattice the encoder imposed on the frame before it compressed anything. When the artifact appears, it appears on that lattice — straight horizontal and vertical seams at regular intervals — which is exactly what lets your eye (and a metric) recognize it instantly. A cloud of random noise could be anything; a clean grid of seams can only be one thing.

It helps to picture compression as tiling a wall with small square tiles, each grouted separately. Lay them carefully and the wall reads as one surface. Rush the job and each tile sets at a slightly different height, so the light catches every grout line and the grid jumps out. Blocking is those grout lines: the tiles (blocks) were processed independently, and at low bitrate they no longer meet flush.

Where the grid comes from: block-transform coding

Every mainstream codec — H.264/AVC, HEVC, AV1 — is a block-transform coder. It chops each frame into blocks and compresses each block on its own with a Discrete Cosine Transform (DCT), a math step that rewrites the block's pixels as a sum of wave patterns, from a flat "average brightness" wave up to fine ripples. Smooth blocks need only the first few waves; detailed blocks need many. Representing the block as waves is what makes the next step — throwing some away — possible.

That next step is quantization: the encoder divides each wave's strength (its DCT coefficient) by a step size and rounds to the nearest whole number, which deletes the small high-frequency waves entirely. Quantization is where the bits are saved and where the loss happens. The coarser the step — the lower the target bitrate — the more waves get rounded to zero, and the more each block collapses toward a single flat tone.

Here is the mechanism in one sentence: because the DCT is computed on each block independently and never models the correlation across block boundaries, two neighboring blocks can round to two different flat tones, and the picture gets a visible step exactly on the seam between them. Yuen and Wu's classic taxonomy of coding distortions named this the defining cause of the blocking effect, and Wang, Bovik, and Evans modeled the blocky picture precisely as a clean image with an added "blocky signal" living on the grid.

A note on grids, because the vocabulary trips people up. The macroblock (16×16 in H.264) and its successors — HEVC's Coding Tree Units up to 64×64, AV1's superblocks up to 128×128 — are the large units the codec splits a frame into for prediction. Inside them, the DCT runs on smaller transform blocks (4×4 and 8×8 in H.264; 4×4 up to 32×32 in HEVC; up to 64×64 in AV1). The seams you see can fall on either grid, which is why the practical term stays the generic "block" rather than "macroblock" — the modern grid is variable, not a fixed 16×16.

A frame on a block grid; each block transformed and quantized alone, leaving mismatched flat tones and a visible seam Figure 1. How the grid is born. Each block is transformed and quantized on its own; coarse rounding flattens each to a slightly different tone, and the mismatch shows as a seam on the lattice.

Why it shows up at low bitrate, and where it hides

Blocking is a bitrate-starvation artifact. At a generous bitrate the quantizer step is small, most DCT waves survive, and adjacent blocks reconstruct close enough that the seam stays below the eye's threshold. Starve the encode and the step grows: in the limit, only each block's average brightness (its DC coefficient) survives, every block becomes a flat square, and the frame turns into a literal mosaic of solid tiles. The encoder-side mechanics of quantization and rate control — how the quantizer step is set, and the short version of how to read the metrics while tuning it — belong to the Video Encoding section's quality-metrics overview; here we stay on recognizing and measuring the result.

Where the grid shows first is the part worth remembering, because it tells you where to look. Blocking is most visible in smooth, low-detail regions — a clear sky, a studio wall, a slow gradient — where a small step between two flat tones has nothing to hide behind, and the eye locks onto the straight seam. It is least visible in busy texture — foliage, gravel, hair — where the surrounding detail masks the boundary. This is texture masking, and it is why a QC pass for blocking should start on the flattest part of the frame, not the busiest.

The arithmetic of why this matters for measurement is worth doing once out loud, because it exposes a trap. Take an 8-bit picture, where each pixel runs 0–255 and the peak value is 255. Suppose a flat patch that was a uniform 128 gets reconstructed as two adjacent blocks: the left block lands at 125, the right at 131. Every pixel is off by 3, so the mean squared error is 3² = 9, and the Peak Signal-to-Noise Ratio — the raw pixel-error metric, in decibels — is:

PSNR = 10 · log10(255² / MSE)
     = 10 · log10(65025 / 9)
     = 10 · log10(7225)
     = 38.6 dB

A PSNR of 38.6 dB reads as a clean encode by any rule of thumb. Yet a six-code-value step running straight down the middle of a flat field is plainly visible — the eye is built to find exactly that kind of straight luminance edge. That gap between "38.6 dB, looks fine" and a seam you can point at is the whole reason blocking needs its own attention in a measurement workflow.

The deblocking filter that fights it

Modern codecs do not leave the seams alone. Built into the H.264/AVC and HEVC standards is a normative in-loop deblocking filter: after a frame is reconstructed, the codec scans the block boundaries and smooths the ones that look like artificial steps rather than real edges in the scene. "In-loop" is the important word — the filtered frame goes back into the prediction loop and becomes the reference for later frames, so encoder and decoder must apply the identical filter or they drift apart. This is different from a post-processing deblock you might apply only at display, which never touches the prediction loop.

The filter is deliberately cautious so it does not erase real detail. In H.264, it runs along every 4×4 or 8×8 transform-block edge and decides, per edge, how hard to smooth using a boundary strength from 0 (leave it alone) to 4 (strongest), set by whether the blocks were intra-coded, whether they carry coded coefficients, and how steep the luminance step is. At most it adjusts three pixels on each side of a seam. HEVC simplifies this to an 8×8 deblocking grid with three strength levels, then adds a second in-loop stage, Sample Adaptive Offset (SAO), that nudges pixel values per region to claw back fidelity. AV1 goes further still, chaining three optional in-loop stages — a deblocking filter, then the Constrained Directional Enhancement Filter (CDEF) that targets ringing, then a loop-restoration filter — with the strength selectable per superblock.

The deblocking filter earns its keep. The team that designed the H.264 filter reported that enabling it cut bitrate by up to about 9% at equal quality, or equivalently raised quality at equal bitrate — a large saving for one in-loop tool. That payoff comes with a side effect you have already seen without naming it: when the encoder runs out of bits, the deblocking filter smooths the unavoidable error into softness instead of seams. This is the central trade of low-bitrate video — blocking is traded for blur — and it is why heavily compressed modern streams look mushy where 2005-era video would have looked tiled. The blur side of that trade gets its own treatment in Blur and detail loss.

Two operational levers follow from this. On the encode side, x264 and x265 expose the in-loop filter through a --deblock alpha:beta control (and a --no-deblock switch); turning it off to "preserve sharpness" is a common way to ship visible tiles, so leave it on unless you have measured a reason not to. On already-damaged footage you cannot re-encode cleanly, a post-processing deblock — FFmpeg's deblock filter, or the older pp/spp post-processors — can take the edge off existing blocking at the cost of detail.

# Re-encode with the in-loop deblocking filter left on (x264 default), slightly relaxed
ffmpeg -i input.mov -c:v libx264 -crf 23 -x264-params "deblock=-1:-1" output.mp4

# Post-process visible blocking on footage you cannot re-encode from source
ffmpeg -i blocky_input.mp4 -vf "deblock=filter=strong:block=8" -c:v libx264 -crf 18 deblocked.mp4

A block boundary before and after the in-loop deblocking filter, with boundary strength and the H.264, HEVC and AV1 chains Figure 2. The deblocking filter. It smooths artificial seams in-loop — H.264 by boundary strength, HEVC adding SAO, AV1 adding CDEF and loop restoration — trading visible blocking for mild softness.

How to measure blocking, and where the metrics lie

Blocking is the friendliest artifact for the popular full-reference metrics, because it is exactly what they are built to see: a spatial, structural difference between the reference and the encode, sitting in the luma channel where the metrics put most of their weight. SSIM (Structural Similarity, a 0–1 structural match) and VMAF (Video Multimethod Assessment Fusion, Netflix's 0–100 perception-trained score) both track blocking reasonably well, and even PSNR moves in the right direction. So unlike banding or judder — the metrics' true blind spots, catalogued in Where objective metrics lie — blocking will generally register.

The lie is subtler: it is a matter of weight, not blindness. As the worked example showed, PSNR scores a visible seam as a small pixel error, so a blocky encode can post a respectable PSNR while looking obviously tiled on the flat regions. PSNR measures pixel distance, not whether the error landed on a straight line the eye hunts for. Lean on a perceptual metric (SSIM or VMAF) when blocking is the suspect, and never sign off on PSNR alone — the full case against pixel-error metrics is in PSNR explained.

There is also a measurement setup the full-reference metrics simply cannot serve: live streams and user-generated content, where there is no pristine original to compare against. For that you need a no-reference blockiness metric — one that detects the grid from the encoded frame alone. The research has a clean answer. Wang, Bovik, and Evans showed that you can find blocking blindly by taking the differences across each row and column and running an FFT: a blocky picture plants sharp peaks in the power spectrum at the block-grid frequency (N/8 and its harmonics for an 8-pixel grid), while a clean picture has none. The size of those peaks is the blockiness measure. Earlier work by Wu and Yuen (the Generalized Block-edge Impairment Metric, GBIM) and the later PSNR-B, which adds a blocking-effect term to PSNR, attack the same problem from the pixel side.

Metric	What it measures	Reference needed	Where it lies on blocking
PSNR	Mean pixel error in dB	Full-reference	Detects it, but under-weights a visible seam as small error
SSIM	Structural similarity, 0–1	Full-reference	Tracks blocking well; one number hides which region is hit
VMAF	Fused perceptual score, 0–100	Full-reference	Tracks blocking well; report the model and confidence interval
PSNR-B	PSNR plus a blocking-effect term	Full-reference	Built for blocking; still needs the pristine original
Blind FFT (Wang-Bovik)	Power at the block-grid frequency	No-reference	Works on live/UGC; can false-trigger on real grid-like texture
GBIM	Weighted block-edge difference	No-reference	Can over-count a real edge that lands on the grid

Table 1. Six ways to put a number on blocking. The full-reference metrics see it; the no-reference ones are what you reach for on live and UGC where no original exists. Pair the number with a look at the flat regions, where blocking shows first.

Table of how PSNR, SSIM, VMAF, PSNR-B, blind FFT and GBIM behave on blocking: what each measures and where it lies Figure 3. Metric behavior on blocking. The full-reference metrics detect it (PSNR under-weighting it); the no-reference detectors are the ones that work on live and UGC. Color and column both carry the verdict, so neither alone does.

The blind FFT approach is worth one more sentence because it is so visual. When you plot the power spectrum of a blocky frame's row-and-column differences, the block grid announces itself as a spike at one specific frequency — the inverse of the block size — riding above the smooth curve of the natural image. That spike is the artifact, expressed as a number. We ship a small no-reference detector built on this idea below.

Power-spectrum plot: smooth for a clean frame, with sharp peaks at the block-grid frequency and harmonics for a blocky frame. Figure 4. The blocking signature. In the frequency domain, the block grid is a spike at the grid frequency (N/8) and its harmonics — present in the blocky frame, absent in the clean one. This is what a no-reference detector keys on.

Common mistake: turning off the deblocking filter to "keep it sharp." When an encode looks soft, the tempting fix is to disable the in-loop deblocking filter so detail survives. On easy content at a healthy bitrate that can help marginally; at a starved bitrate it simply swaps mild softness for hard, visible tiles — the worse of the two artifacts, and the one clients notice. The deblocking filter is normative and bitrate-efficient for a reason; the real fix for softness-or-blocking is more bits, a better preset, or a lower resolution on the convex hull, not switching the filter off. And never declare an encode clean on PSNR alone — confirm the flat regions by eye or with a blockiness metric.

Where Fora Soft fits in

Fora Soft has shipped video software since 2005 — streaming, WebRTC conferencing, OTT, e-learning, telemedicine, and surveillance — and blocking is the complaint that reaches us most often, because it is the one an end user can see and describe. A blocky surveillance feed is usually a bitrate or rate-control question at the camera or the transcoder, while blocking that appears only after a network dip is a delivery question, and the two are fixed in different places. We treat blocking as a measurement problem first: locate the grid, confirm it on the flat regions, decide whether the in-loop filter is doing its job, and only then change a setting. Where it helps a decision, we point to our own benchmark data so you can check the method rather than take our word.

Call to action

Talk to a video engineer — book a 30-minute scoping call to talk through your macroblocking plan.
See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.

References

ITU-T Recommendation H.264 (V14, 08/2021), Advanced video coding for generic audiovisual services. International Telecommunication Union. Tier 1 (primary standard). Defines block-based transform coding and the normative in-loop deblocking filter, including the boundary-strength decision (Bs 0–4) operating on 4×4/8×8 transform-block edges. https://www.itu.int/rec/T-REC-H.264
ITU-T Recommendation H.265 (V9, 09/2023), High efficiency video coding. International Telecommunication Union. Tier 1 (primary standard). Defines the HEVC 8×8 deblocking grid with three boundary-strength levels and the Sample Adaptive Offset (SAO) in-loop stage applied per Coding Tree Unit. https://www.itu.int/rec/T-REC-H.265
P. de Rivaz and J. Haughton, AV1 Bitstream & Decoding Process Specification, version 1.0.0 with Errata 1, Alliance for Open Media, 2019. Tier 1 (primary specification). Defines AV1's three optional in-loop filter stages — deblocking, the Constrained Directional Enhancement Filter (CDEF), and loop restoration — applied in sequence with per-superblock control. https://aomediacodec.github.io/av1-spec/av1-spec.pdf
Z. Wang, A. C. Bovik, and B. L. Evans, "Blind Measurement of Blocking Artifacts in Images," Proc. IEEE International Conference on Image Processing (ICIP), Vancouver, 2000, vol. 3, pp. 981–984. Tier 1 (metric-author, peer-reviewed). Models a blocky image as a clean image plus an added blocky signal; detects the grid blindly via FFT peaks at the block-grid frequency (N/8 and harmonics). Basis for the no-reference detector shipped with this article. https://live.ece.utexas.edu/publications/2000/zw_icip_2000_blindblock.pdf
P. List, A. Joch, J. Lainema, G. Bjøntegaard, and M. Karczewicz, "Adaptive Deblocking Filter," IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 614–619, July 2003. Tier 5 (peer-reviewed; designers of the H.264 filter). Reports bit-rate savings of up to about 9% at equal subjective quality from the in-loop deblocking filter. https://ieeexplore.ieee.org/document/1218210
M. Yuen and H. R. Wu, "A survey of hybrid MC/DPCM/DCT video coding distortions," Signal Processing, vol. 70, no. 3, pp. 247–278, 1998. Tier 5 (peer-reviewed, foundational). The classic taxonomy of block-DCT coding artifacts, naming independent block quantization as the cause of blocking. https://www.sciencedirect.com/science/article/abs/pii/S0165168498001285
H. R. Wu and M. Yuen, "A generalized block-edge impairment metric for video coding," IEEE Signal Processing Letters, vol. 4, no. 11, pp. 317–320, Nov. 1997. Tier 5 (peer-reviewed). The GBIM no-reference blockiness measure based on weighted block-edge differences. https://ieeexplore.ieee.org/document/641398
C. Yim and A. C. Bovik, "Quality Assessment of Deblocked Images," IEEE Transactions on Image Processing, vol. 20, no. 1, pp. 88–98, Jan. 2011. Tier 5 (peer-reviewed). Defines PSNR-B, adding a blocking-effect factor to PSNR to better track blocking and deblocking quality. https://ieeexplore.ieee.org/document/5535179
FFmpeg, deblock — FFmpeg Filters Documentation, accessed 2026-06-24. Tier 3 (first-party tooling). Documents the deblock post-processing filter (weak/strong, default block size 8, alpha default 0.098, beta/gamma/delta 0.05). https://ffmpeg.org/ffmpeg-filters.html#deblock
A. Unterweger, "Compression Artifacts in Modern Video Coding and State-of-the-Art Means of Compensation," in Multimedia Networking and Coding, IGI Global, 2013. Tier 5 (peer-reviewed). Survey of H.264/HEVC-era artifacts and the in-loop tools (deblocking, SAO) that compensate for them. https://www.wavelab.at/papers/Unterweger13a.pdf

Why this matters

What blocking actually is

Where the grid comes from: block-transform coding

Why it shows up at low bitrate, and where it hides

The deblocking filter that fights it

How to measure blocking, and where the metrics lie

Where Fora Soft fits in

What to read next

Call to action

References

Related glossary terms

Macroblocking and the DCT Grid: Cause, Filter, Metric

Why this matters

What blocking actually is

Where the grid comes from: block-transform coding

Why it shows up at low bitrate, and where it hides

The deblocking filter that fights it

How to measure blocking, and where the metrics lie

Where Fora Soft fits in

What to read next

Call to action

References

Related glossary terms

Blocking

PSNR

Deblocking filter

FFmpeg

Quantization

Macroblock

Blur

VMAF