Published 2026-05-17 · 14 min read · By Nikolay Sapunov, CEO at Fora Soft

Why this matters

Reordering, scanning, and run-length coding sit between the parts of the codec everyone talks about — quantization on one side, entropy coding on the other — and they decide whether the bits saved at quantization actually reach the file. A product manager who knows that a "32-coefficient block with two non-zero values" can ship in fewer than twenty bits stops fearing the term "high QP" and starts asking the right questions about bitrate. A founder who can explain why HEVC files are 30–50% smaller than H.264 at the same visual quality, partly because HEVC encodes the position of the last non-zero coefficient once instead of walking past every zero, understands one of the structural advantages of newer codecs. A streaming engineer who can read a hex dump of an MPEG-2 transport stream and recognise the EOB symbol can debug a bad encoder in an afternoon instead of a week.

What "reordering" actually does

After quantization, each transform block is a small two-dimensional grid of integers — typically 4×4, 8×8, 16×16, or 32×32 — in which most cells are zero and the few non-zero cells cluster in the top-left corner. The top-left corner holds low-frequency energy (the average brightness and gentle gradients) and survives quantization; the bottom-right corner holds high-frequency energy (fine textures and edges) and is mostly wiped out.

The entropy coder that follows works on a one-dimensional stream of symbols, not on a two-dimensional grid. So the encoder must pick an order in which to read the cells, write them out as a list, and hand that list to the entropy stage. The order is the scan order; the act of converting the grid into a list is reordering; the way the list is then packaged for the entropy coder — usually as pairs of (number of leading zeros, next non-zero value) — is run-length coding.

A two-by-three layout. On the left, a 4x4 grid of small numbers shows non-zero values clustered in the top-left corner and zeros filling the rest. An arrow labelled 'scan' points right to a one-dimensional list reading the cells in zig-zag order, with non-zero values at the start and a long tail of zeros. A second arrow labelled 'run-length code' points to a compact pair-list ending in an 'EOB' symbol, much shorter than the raw list. Figure 1. From two-dimensional grid to one-dimensional symbol stream. The scan turns the grid into a list; the run-length coder turns the list into a handful of (run, level) pairs plus an end marker.

The whole point of the scan order is to put the surviving non-zero coefficients at the front of the list and pack all the zeros into a tail at the end. If the scan succeeds at that, the entropy coder can send the front of the list with normal codewords and dispose of the entire tail with one symbol — the "end of block" marker. If the scan picks a bad order, the non-zero values are sprinkled among the zeros, the runs are short, and the bitstream balloons.

A short history — why zig-zag, and from where

The zig-zag scan is older than any video codec on the market today. The first published description appears in a 1977 paper by Wen-Hsiung Chen and William Pratt, "Scene Adaptive Coder", which combined a DCT, scalar quantisation, and Huffman codes for transform coefficients. Chen and Pratt noticed that natural images, after DCT and quantisation, concentrated energy near the DC coefficient and that scanning the coefficients in a zig-zag order maximised the runs of zeros and minimised the number of Huffman codewords the decoder had to read.

Eight years later, the JPEG committee adopted that same scan pattern for 8×8 blocks, citing Chen's 1977 and 1984 papers. The H.261 video standard (1988) used the same pattern for its 8×8 DCT. MPEG-1 (1993) inherited it. MPEG-2 (1995) added an alternate scan for interlaced content. H.264 / AVC (2003) used a zig-zag scan for progressive 4×4 and 8×8 blocks and a "field scan" for interlaced ones. HEVC (2013) replaced the zig-zag with a diagonal walk over 4×4 sub-blocks. AV1 (2018) and VVC (2020) kept the diagonal walk and added shape-specific scans for rectangular and one-dimensional transforms.

A horizontal timeline running from 1977 on the left to 2020 on the right, with five milestone cards above and below the axis. The cards read: 1977 — Chen and Pratt 'Scene Adaptive Coder' first publishes the zig-zag scan; 1992 — JPEG adopts zig-zag for 8x8 blocks; 1995 — MPEG-2 adds an alternate scan for interlaced video; 2003 — H.264 uses zig-zag for 4x4 and 8x8 blocks plus a field scan for interlaced; 2013 — HEVC replaces zig-zag with a 4x4 sub-block diagonal scan and a signalled last-coefficient position; 2018 — AV1 chooses a scan per transform shape; 2020 — VVC keeps the diagonal walk and couples it to dependent quantization. Figure 2. Forty-three years of coefficient scanning, from the first published zig-zag in 1977 to the dependent-quantization-aware scan in VVC.

Through all that change the underlying intuition has not moved: walk the grid in a direction that follows lines of equal frequency, because lines of equal frequency hold cells of similar magnitude, and similar magnitudes give long runs.

The classical zig-zag — how it walks an 8×8 block

The classical JPEG / H.261 / MPEG-2 zig-zag walks an 8×8 block in 64 steps. It starts at the DC coefficient in the top-left corner, moves one cell to the right, then walks down and to the left along the diagonal, then one cell down, then walks up and to the right along the next diagonal, and so on. The path looks like a single hand-drawn zigzag from the top-left to the bottom-right corner.

The numbers below show the order in which the 64 cells are visited (0 is the DC; 63 is the highest-frequency AC):

 0   1   5   6  14  15  27  28
 2   4   7  13  16  26  29  42
 3   8  12  17  25  30  41  43
 9  11  18  24  31  40  44  53
10  19  23  32  39  45  52  54
20  22  33  38  46  51  55  60
21  34  37  47  50  56  59  61
35  36  48  49  57  58  62  63

The key property is that index k in this scan corresponds, on average, to a coefficient at roughly the same distance from the DC corner. The diagonals of an 8×8 block sit along lines of equal radial frequency, and a DCT applied to a smooth natural-image patch dumps roughly equal energy on each diagonal. So if you read the 64 cells in this order, the average magnitude falls smoothly from index 0 to index 63 — and once the encoder hits a tail of zeros, it is very unlikely to find another non-zero value before the end.

H.264 uses the same idea on its smaller 4×4 transform:

 0   1   5   6
 2   4   7  12
 3   8  11  13
 9  10  14  15

And, when an 8×8 transform is selected (in High profile), a 64-position scan very similar to the JPEG one. For interlaced content, H.264 ships a "field scan" that walks the columns first — interlaced fields have different vertical-versus-horizontal frequency statistics, and a column-major walk packs the zeros better.

A 4x4 grid labelled 'H.264 zig-zag (4x4 block)' on the left, with cells numbered in zig-zag order from 0 in the top-left to 15 in the bottom-right and a curved arrow tracing the path. An 8x8 grid labelled 'JPEG / MPEG-2 / H.264 (8x8 block)' on the right, with cells numbered 0 to 63 and a longer arrow tracing the same diagonal-zigzag pattern. A footer note reads 'The path always starts at the DC corner and follows lines of equal frequency.' Figure 3. The classical zig-zag scan on the 4×4 and 8×8 transform blocks. The same path serves JPEG (1992), H.261 (1988), MPEG-1 / MPEG-2 (1993–1995), and H.264 (2003).

Run-length coding — turning a list of mostly zeros into a handful of pairs

Once the scan has produced a one-dimensional list, the encoder packages it as a sequence of (run, level) pairs. Run is the number of zeros that precede the next non-zero value; level is that non-zero value. After the last non-zero value the encoder emits an end-of-block symbol (EOB) that tells the decoder "everything else in this block is zero".

A worked example. Suppose the 8×8 transform of a flat-ish patch, after quantization, looks like this:

 12   0   3   0   0   0   0   0
  0   0   0   0   0   0   0   0
 -2   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0

The zig-zag reads off the values in the order 12, 0, 0, -2, 0, 3, 0, 0, … and 56 more zeros. The first cell — the DC — is usually coded separately by predicting it from the neighbouring block's DC and sending the difference, so the AC stream starts at the second position. The AC values, in scan order, are:

0, 0, -2, 0, 3, 0, 0, 0, 0, 0, ... (53 more zeros) ... 0

The run-length coder packages this as:

(run=2, level=-2)   # two zeros, then -2
(run=1, level=3)    # one zero, then 3
(EOB)               # everything else is zero

Three symbols cover 63 cells. A naive coder that sent the 63 values one by one would emit at least 63 codewords. The compression ratio of this stage alone is roughly 20×, before the entropy coder even sees the data.

The DC coefficient — the 12 at position (0, 0) — is handled separately. Its value is very likely to be close to the DC of the neighbouring block, so the encoder transmits the difference (DPCM) rather than the absolute value. The difference goes to its own VLC table, distinct from the AC table.

In MPEG-2, each (run, level) pair maps to a variable-length codeword from a fixed Huffman table; the most common pairs — (0, 1), (0, -1), (1, 1), (0, 2) — get codewords as short as two or three bits. EOB itself is a 2-bit code at the most-likely position. The whole 8×8 block above might fit in 15–20 bits of bitstream, down from 4096 raw bits for an uncompressed 8×8×8-bit patch.

A pipeline diagram with four labelled boxes from left to right. Box 1, 'Quantized 8x8 grid', shows the small block with values 12, 3, -2 and zeros. Box 2, 'Zig-zag scan', shows the same values written out as a horizontal list 12, 0, 0, -2, 0, 3, 0...0 with a curved arrow indicating the scan path. Box 3, 'Run-level pairs', shows the list collapsed into (DC=12) (run=2, level=-2) (run=1, level=3) (EOB). Box 4, 'Variable-length codes', shows each pair replaced by a short bitstring like 0010, 110, 1011, 10 totalling 17 bits. A footer note reads '63 AC cells coded in 17 bits.' Figure 4. The full reordering and run-length pipeline. Sixty-three quantized coefficients become three symbols become seventeen bits.

The asymmetry of the rule is what makes it work. A block that is genuinely sparse — one or two non-zero coefficients — gets one or two pair-symbols plus EOB and fits in a dozen bits. A block that is genuinely dense — many non-zero coefficients spread through the scan — gets many pair-symbols and the cost grows. The encoder pays for what it sends, and quantization decided what it sends.

Why HEVC threw away the zig-zag

HEVC ships with much larger transform blocks than H.264 — up to 32×32, with rectangular options. A 32×32 zig-zag would have to step through 1,024 cells, with the last non-zero coefficient potentially sitting deep inside the tail. The 1990s rule "walk every cell, emit EOB at the end" loses its appeal when "every cell" means a thousand of them.

HEVC's designers replaced the global walk with two changes. First, the transform block is divided into 4×4 sub-blocks. The scan walks the sub-blocks diagonally — top-right to bottom-left along each diagonal — and inside each sub-block walks the 16 cells in the same diagonal direction. Second, instead of letting the decoder discover EOB by reading until it sees the symbol, the encoder transmits the (x, y) coordinates of the last non-zero coefficient up front, so the decoder knows in advance how many cells of the scan it needs to read.

Together, those two changes mean that a 32×32 block with non-zero values only in the top-left 8×8 region costs roughly the same as an 8×8 block — the encoder sends the last-coefficient position, the decoder reads only that many positions, and the rest of the 1,024 cells is skipped wholesale.

A second change goes a level deeper. Inside each 4×4 sub-block, HEVC also transmits a coded sub-block flag that says "this entire sub-block is zero — skip it". An encoder that quantizes a 16×16 transform aggressively often ends up with three or four 4×4 sub-blocks that are entirely zero plus one corner sub-block with a few survivors. The coded-sub-block flag turns those zero sub-blocks into a single bit each.

A 16x16 grid divided into sixteen 4x4 sub-blocks with thick borders. The top-left sub-block is shaded and contains a few small non-zero numbers; the next two sub-blocks along the diagonal are shaded lightly with one or two non-zero numbers each; the remaining thirteen sub-blocks are unshaded and labelled 'coded_sub_block_flag = 0 (skip)'. A diagonal arrow runs from the top-right of the grid through the sub-blocks toward the bottom-left, illustrating the inter-sub-block scan order. A second short diagonal arrow inside the top-left sub-block illustrates the intra-sub-block scan. A footer note reads 'HEVC encodes the position of the last non-zero coefficient up front; everything past it is skipped.' Figure 5. HEVC's two-level scan. The encoder walks 4×4 sub-blocks diagonally and walks cells diagonally inside each sub-block, and an explicit last-coefficient position lets the decoder skip the tail wholesale.

HEVC also offers three scan patterns per sub-block — diagonal, horizontal, and vertical — and picks one based on the intra-prediction direction of the block. A block predicted from the column above has most of its surviving energy in horizontal rows, so a horizontal scan packs zeros better. A block predicted from the row to the left flips the pattern. The compression savings of the right scan choice over a fixed scan are small (roughly 0.5–1.0% bitrate at the same quality), but they cost nothing at decode time, so HEVC ships them.

AV1 and VVC — scan patterns per transform shape

AV1, released by AOMedia in 2018, took the per-block scan idea further. Where HEVC offers three patterns, AV1 ships scan tables for every supported transform shape — square 4×4, 8×8, 16×16, 32×32, 64×64, rectangular variants like 4×8 and 16×32, and one-dimensional transforms. For 4×4 the 2-D scan is a zig-zag; for 1-D vertical transforms a column scan; for 1-D horizontal transforms a row scan; for larger 2-D blocks a diagonal walk.

AV1 also reverses the direction of coefficient coding. The encoder transmits the position of the last non-zero coefficient (called eob) first, then walks the scan in reverse order from that position back toward the DC, coding lower magnitude bits in one pass and higher magnitude bits in a second pass. The reverse walk lets later coefficients (which tend to be smaller) condition the context model used for earlier coefficients (which tend to be larger), tightening the entropy coding.

VVC, finalised in 2020, keeps HEVC's diagonal scan over 4×4 sub-blocks and adds two twists. First, the scan is coupled to dependent quantization — a trellis-coded quantiser with two states, in which the choice of state depends on the parity of the previous coefficient in scan order. The context model for the current coefficient's significance flag depends on its trellis state, on its diagonal position (the sum d = x + y), and on the sum of partially reconstructed neighbouring coefficients in a small template around the current position. Second, VVC adds sign data hiding, a trick borrowed from HEVC's optional toolkit, where the sign of the first non-zero coefficient in a sub-block is inferred from the parity of the sum of the absolute values in that sub-block, saving one bit per sub-block.

The combined effect is small per block — a fraction of a percent of bitrate — but it adds up over millions of blocks in a movie.

A comparative table

The same conceptual stage shows up in every codec, but the implementation has drifted a long way from Chen and Pratt's 1977 paper.

Codec (year) Block sizes Scan pattern Last-coefficient signal Run-length packaging Sub-block flag
JPEG (1992) 8×8 only Single zig-zag None — read until EOB (run, level) + EOB No
H.261 (1988) 8×8 only Single zig-zag None — read until EOB (run, level) + EOB No
MPEG-2 (1995) 8×8 only Zig-zag (frame) + alternate (field) None — read until EOB (run, level) + EOB No
H.264 (2003) 4×4, 8×8 Zig-zag (frame) + field scan None — coded inside CAVLC / CABAC (total_zeros, run_before, level) No
HEVC (2013) 4×4 to 32×32 Diagonal walk over 4×4 sub-blocks; H/V/D per intra mode (x, y) of last non-zero coefficient Significance map + level bypass passes Yes — coded_sub_block_flag
AV1 (2018) 4×4 to 64×64, rectangular, 1-D Per-transform-shape table eob index transmitted first Reverse-scan multi-pass coding Implicit via eob
VVC (2020) 4×4 to 64×64, rectangular Diagonal walk + dependent-quantization context (x, y) of last non-zero coefficient Multi-pass coding + sign data hiding Yes — coded_sub_block_flag

The pattern is monotone in one direction: every newer codec sends fewer bits to skip the zero parts of the block. The price is decoder complexity — a VVC decoder doing per-coefficient context modelling on a trellis-quantized 64×64 block does much more work per pixel than a JPEG decoder reading a fixed Huffman table.

A common pitfall — the EOB symbol that never arrives

One of the easier ways to break a video file is to corrupt a single bit inside a run-length-coded block. The decoder reads the bitstream cell by cell, looking for EOB; if a level is mis-parsed and EOB never arrives, the decoder will keep reading past the end of the block and into the start of the next one, where everything is misaligned. The result is a smear of garbage that propagates until the next slice header re-synchronises the parser.

This is one reason newer codecs send an explicit last-coefficient position. HEVC's last_significant_coeff_x and last_significant_coeff_y are short integers — typically four or five bits each — coded with their own context models, and they put a hard upper bound on how much of the block the decoder reads. A bit error inside a sub-block can still corrupt a few coefficients, but it cannot run away into the next block.

If you are debugging an H.264 stream that produces "blocky garbage from the centre of the frame to the right edge", look for a missing or mis-parsed EOB. If you are debugging an HEVC stream that produces "blocky garbage in a single 16×16 region that does not propagate", look for a bit error inside a sub-block's coefficient list — the last-coefficient signal contained the damage.

Where Fora Soft fits in

We build the streaming and decoding pipelines behind video-conferencing, OTT, surveillance, e-learning, and telemedicine products, and we read a lot of bitstreams in production. When a customer's H.264 decoder is faulting on a specific class of frames, the trail usually leads back to one of the stages described above — a quantizer running too hot for a CABAC context that was tuned for lower QPs, a CAVLC parser that mis-counts trailing ones, or a fragmented MP4 in which the sub-block flags of an HEVC frame survived but the (x, y) of the last coefficient did not. Understanding the bit-level grammar of the residual block is what separates a debugging session that closes in an afternoon from one that closes in a week.

What to read next

Talk to us / See our work / Download

References

  1. Chen, W.-H., and Pratt, W. K. "Scene Adaptive Coder." IEEE Transactions on Communications, COM-32(3), 225–232, 1984. The foundational paper that introduced the zig-zag scan for transform-coded images.
  2. Reznik, Y. A. "Origins of the Zigzag Scan in Transform-Based Picture Coding." Applied Digital Image Processing XLVII, SPIE, 2024. http://m.reznik.org/papers/ADIP2024_Origins_of_Zigzag_Scan.pdf
  3. ITU-T Recommendation H.262 | ISO/IEC 13818-2. Generic Coding of Moving Pictures and Associated Audio Information: Video (MPEG-2 Video), edition 3, 2013.
  4. ITU-T Recommendation H.264 | ISO/IEC 14496-10. Advanced Video Coding for Generic Audiovisual Services, edition 14, 2023.
  5. Richardson, I. E. G. "H.264/AVC Context Adaptive Variable Length Coding (CAVLC)." Vcodex BV technical note. https://www.vcodex.com/h264avc-context-adaptive-variable-length-coding
  6. Sze, V., and Budagavi, M. "Transform Coefficient Coding in HEVC." IEEE Transactions on Circuits and Systems for Video Technology, 22(12), 1765–1777, 2012. https://dspace.mit.edu/bitstream/handle/1721.1/100315/hevc_cabac_chapter.pdf
  7. Han, J., Li, B., Mukherjee, D., et al. "A Technical Overview of AV1." Proceedings of the IEEE, 109(9), 2021. https://arxiv.org/pdf/2008.06091
  8. Schwarz, H., Coban, M., Karczewicz, M., et al. "Quantization and Entropy Coding in the VVC Standard." IEEE Transactions on Circuits and Systems for Video Technology, 31(10), 2021. http://www.ecodis.de/video/schwarz_tcsvt_vvc_rev2.pdf
  9. AOMedia. AV1 Bitstream & Decoding Process Specification, version 1.0.0 errata 1, January 2019. https://aomediacodec.github.io/av1-spec/av1-spec.pdf
  10. Golomb, S. W. "Run-length encodings." IEEE Transactions on Information Theory, 12(3), 399–401, 1966. The original paper on the family of codes that wrap most modern run-length packaging.