Published 2026-05-17 · 18 min read · By Nikolay Sapunov, CEO at Fora Soft

Why this matters

Entropy coding is one of those topics where a five-minute conversation with the right person saves a project a year of confusion. A product manager who can tell CAVLC from CABAC stops asking "why is the H.264 file twice the size on Baseline profile" — the answer is that Baseline forbids CABAC. A founder who knows that AV1 ships a non-binary arithmetic coder borrowed from the Daala research project understands why AV1 hardware decoders are harder to build than HEVC ones and why the rollout has been slower than expected. A streaming engineer who can read the difference between a CABAC re-init at a slice header and an AV1 frame_context_idx switch can debug a corrupted live stream in an hour instead of a week. This article is the long-form version of that five-minute conversation. We start from Shannon and Huffman, build up to CABAC, walk through AV1's range coder, and end with a clean comparison table you can take into your next codec selection meeting.

What "entropy coding" actually means

Every preceding stage of a video encoder — prediction, transform, quantization, reordering and run-length packaging — produces a sequence of symbols. Symbols include motion-vector differences, prediction modes, run-level pairs, coded-block flags, the position of the last non-zero coefficient, and a hundred other small integers that describe how the next block should be reconstructed. Entropy coding takes that symbol sequence and writes it into the bitstream using as few bits as possible without losing any information.

The key insight, due to Claude Shannon in 1948, is that the minimum average number of bits required to send a symbol is the entropy of its probability distribution: H = -Σ p(x) log₂ p(x), measured in bits per symbol. A perfectly fair coin needs 1 bit per flip. A coin that lands heads 90% of the time needs only 0.47 bits per flip — but only if the encoder is allowed to use fractional bits and amortise them across many flips. A binary code that uses a whole number of bits per symbol cannot get below 1 bit per coin flip, no matter how skewed the coin.

Entropy coders solve that fractional-bit problem in two different ways. Variable-length codes (Huffman, Exp-Golomb, CAVLC) approximate the Shannon limit with a fixed table of codewords, each codeword being some integer number of bits. They are simple, fast, and parallel-friendly, and they pay a small efficiency penalty — typically 3–10% above the entropy. Arithmetic codes (CABAC, AV1's range coder) treat the whole message as a single fractional number between zero and one and accumulate bits incrementally as that interval shrinks. They get within 1% of the entropy in practice — at the cost of an internal state machine that updates after every coded bit.

A two-panel diagram. Top panel labelled 'Variable-length code (Huffman / CAVLC)' shows a probability distribution as bars on the left, a table mapping each symbol to a short binary codeword in the middle, and a concatenated bitstream on the right. Bottom panel labelled 'Arithmetic code (CABAC / range coder)' shows the same probability distribution on the left, the unit interval 0..1 in the middle subdivided into intervals proportional to symbol probabilities and progressively narrowed by each symbol, and a single fractional number written as a bitstring on the right. A footer note reads 'Huffman pays whole-bit overhead per symbol; arithmetic coding pays fractional bits across a message.' Figure 1. The two families of entropy coder. Variable-length codes pick a whole-bit codeword per symbol; arithmetic codes track a single fractional number and accumulate bits as the interval shrinks.

A short history

Entropy coding predates video by decades. David Huffman published his algorithm for optimal prefix codes in 1952 as a course assignment at MIT, and it became the default lossless compression for everything from PKZIP to JPEG. Arithmetic coding was invented independently by Jorma Rissanen at IBM (1976) and by Richard Pasco in his PhD thesis (1976), and it was patented heavily through the 1980s — which is why JPEG used Huffman and not arithmetic by default, even though the spec defined both. IBM's Q-Coder (1988) and the QM-Coder (1993) made arithmetic coding practical on the hardware of the time.

CABAC arrived in H.264/AVC in 2003, designed by Detlev Marpe, Heiko Schwarz, and Thomas Wiegand at Fraunhofer HHI. It combined binary arithmetic coding with context modelling — the idea that the probability of the next bit depends on the bits already seen in the local neighbourhood. CABAC delivered roughly 9–14% better compression than H.264's CAVLC at the same quality, at roughly 1.5× the decode complexity. HEVC kept CABAC and dropped CAVLC entirely. AV1 (2018) replaced the binary arithmetic coder with a non-binary, multi-symbol range coder ported from the Daala research codec — which can encode several values per symbol and adapts probabilities per symbol rather than per frame. VVC (2020) stayed with binary CABAC but reworked the probability estimator to use multiple parallel "rate" tracks tuned for different bit-rate regimes.

A horizontal timeline from 1948 to 2020 with milestone cards alternating above and below the axis. Cards: 1948 Shannon publishes 'A Mathematical Theory of Communication'; 1952 Huffman publishes optimal prefix codes; 1976 Rissanen and Pasco independently invent arithmetic coding; 1988 IBM Q-Coder makes arithmetic coding practical; 1993 JPEG ships with Huffman default and optional arithmetic; 2003 H.264 introduces CABAC; 2013 HEVC drops CAVLC and uses CABAC only; 2018 AV1 ships a Daala-derived multi-symbol range coder; 2020 VVC keeps CABAC and adds multi-rate probability estimation. Figure 2. Seventy-two years of entropy coding, from Shannon to VVC.

The pattern is one of slow, monotonic improvement. Every codec since H.264 has been arithmetic-only; every new arithmetic coder has added a context-modelling trick that takes the bitrate a percent or two closer to the entropy of the source.

How CAVLC works — variable-length coding with a twist

CAVLC, short for Context-Adaptive Variable-Length Coding, is the entropy coder of choice in H.264 Baseline, Extended, and the lower High profiles. It is allowed in High and the 10-bit profiles too but is mostly used when the decoder is a battery-powered device — a 2007-era smartphone, a low-end set-top box — or when the encoder cares more about decode speed than about file size.

CAVLC is not just Huffman. It is a small family of variable-length codewords whose choice depends on the context — specifically, on what the encoder just coded. The encoder transmits a coefficient block in five steps, each using a different VLC table chosen by the running state. Below, we walk through each step on a worked example.

Assume the zig-zag scan of one 4×4 luma block produced this list of coefficients, in order from DC outwards:

0, 3, 0, 1, -1, -1, 0, 1, 0, ...  (rest are zeros)

The non-zero values are at scan positions 1, 3, 4, 5, 7. There are 5 non-zero coefficients in total. Three of them (the values 1, -1, -1) have absolute value 1, and they sit immediately before the run of zeros at the end of the scan order — these are called the trailing ones.

Step 1 — coeff_token. The encoder transmits a single codeword that encodes two numbers at once: the total number of non-zero coefficients (here, 5) and the number of trailing ones (here, 3, but capped at 3). The codeword is read from one of four tables; the table is picked using the average number of non-zero coefficients in the two neighbouring 4×4 blocks above and to the left. Common neighbour states get a small (1–6 bit) codeword; rare states get a larger one.

Step 2 — trailing ones signs. For each trailing one, the encoder transmits one bit: 0 for positive, 1 for negative. Three trailing ones → three bits.

Step 3 — non-trailing levels. The remaining non-zero coefficients (here, the two values 3 and 1 — wait, only one non-trailing: the value 3) are coded as signed integers using one of seven VLC tables. The starting table is picked by an initial rule and shifts up as the magnitude of each successive level grows. A small first level keeps the encoder on a table tuned for small numbers; a large first level pushes it onto a table that codes large numbers cheaply at the price of larger codewords for small ones. This adaptation is the "context" in CAVLC — the table itself is the state.

Step 4 — total_zeros. The encoder transmits the total number of zeros between the DC coefficient and the last non-zero coefficient (here, 3 — the zeros at scan positions 0, 2, 6). This is a fixed VLC table indexed by the number of non-zero coefficients from Step 1.

Step 5 — run_before. For each non-zero coefficient (working backwards from the last one), the encoder transmits the number of zeros that precede it. The VLC table for run_before is indexed by the remaining number of zeros — the encoder runs out of "zero budget" quickly and the codewords get cheaper.

Add up the bits: a 5-coefficient 4×4 block typically codes in 22–28 bits. The cost is dominated by the levels of the larger coefficients; the run-coding overhead is small because the tables are tightly tuned.

The cleverness of CAVLC is that it uses the freshly-coded data as the context for the next step. The encoder does not need to consult global state, the decoder does not need to maintain it, and the whole block can be parsed in five short bursts of table lookups. That is why CAVLC was the default for hardware decoders in the 2003–2010 era — it parses in a few cycles per block and parallelises across blocks.

A flowchart with five sequential rectangles labelled coeff_token, trailing-ones signs, non-trailing levels, total_zeros, and run_before. Below each rectangle is a one-line note describing what gets emitted. Arrows connect the boxes left to right. A side panel lists the example coefficients and shows the bits emitted at each stage adding up to roughly 24 bits. A footer note reads 'CAVLC uses up to nine VLC tables; the choice depends on recent symbols.' Figure 3. CAVLC codes a 4×4 block in five short steps. The choice of VLC table at each step depends on what the encoder just coded, which is what makes it "context adaptive".

How CABAC works — binary arithmetic coding with a brain

CABAC, short for Context-Adaptive Binary Arithmetic Coding, sits at the heart of H.264 High profile, all of HEVC, and all of VVC. It is the most studied entropy coder in modern video and it has three moving parts: binarisation, context modelling, and binary arithmetic coding. They run as a small pipeline on every symbol.

Binarisation turns every input symbol — a motion vector difference, a prediction mode, a coefficient value — into a string of bits called bins. Some bins are coded with a fixed bypass mode that costs exactly one bit each (used when the symbol is genuinely random, like the trailing magnitude bits of a large coefficient). Most bins go through the full arithmetic-coding path. The binarisation rule for each symbol is fixed by the standard: signed Exp-Golomb for motion vector differences, truncated unary for prediction modes, and so on. The point of binarisation is to reduce a many-valued alphabet (which would need a complicated arithmetic coder) to a two-valued alphabet (which needs only one).

Context modelling picks a probability model for each bin based on its position in the symbol and on recently-coded neighbours. HEVC defines roughly 130 context models for the luma transform-coefficient bins alone, plus a few hundred more for prediction modes, motion data, and flags. Each context is a small piece of state — typically two numbers describing the current estimate of P(bin = 1). The encoder picks the right context for the current bin, reads its probability, and hands the (bin, probability) pair to the arithmetic engine. The decoder mirrors the choice. Picking the right context is what separates CABAC from generic arithmetic coding — a context that says "the bin we are about to code is the significance flag of a coefficient at diagonal position 3 in a 4×4 sub-block whose top-left neighbour has 2 non-zero coefficients" predicts P(bin = 1) far more accurately than a single global probability would.

Binary arithmetic coding maintains a single interval — a low and a high value — that represents the message coded so far. When the encoder emits a bin, it splits the current interval in two pieces sized by the probability of 0 and 1, picks the piece corresponding to the actual bin, and shrinks the interval to that piece. As the interval shrinks past 0.5 in either direction, the leading bit is fixed and the encoder writes it to the bitstream, then doubles the interval and continues. A symbol with probability 0.9 shrinks the interval by 10%, which corresponds to writing only 0.15 bits on average — far less than the 1-bit minimum of a variable-length code. The probability estimate for each context is updated after every bin: a state machine moves "up" toward more confident predictions when the predicted bin arrives, and "down" otherwise. In H.264 / HEVC the state machine has 64 states; in VVC the estimator runs two parallel tracks tuned for different rate regimes.

The combined effect is striking. CABAC codes the bins of a typical HEVC stream with roughly 9–14% fewer bits than CAVLC would on the same H.264 stream at the same visual quality, and roughly 6–8% fewer bits than the same CABAC would in an H.264 High profile, because HEVC's context model is finer-grained. The throughput cost is real: CABAC decoders process bins sequentially, with each bin's context lookup depending on the bin before it, and that serial dependency is the main reason HEVC hardware decoders run at lower MPixels-per-second than CAVLC decoders of the same silicon area.

A horizontal pipeline with three rectangles labelled binarisation, context modelling, and binary arithmetic coding. The first box shows an input symbol (mv_diff = -7) being turned into a string of bins (1, 1, 1, 0). The middle box shows each bin tagged with a context model index (ctx 18, 19, 20, 21) drawn from a table of 'context models'. The right box shows the interval (0, 1) being shrunk by four operations down to (0.62, 0.65), with the bitstream pieces written underneath the interval line. A footer note reads 'CABAC = three stages: bin, context, range.' Figure 4. CABAC's three-stage pipeline. Every symbol is first binarised, then each bin is tagged with a context, then the arithmetic engine shrinks an interval and writes bits when the interval crosses a halving boundary.

The cost of CABAC is its serial throughput. Every bin's context lookup depends on the bin before it; the context's state update depends on the actual bin value; and the arithmetic engine's interval update depends on both. A single bin requires roughly 10–20 cycles on a typical decoder, and a 4K HEVC frame contains millions of bins per second. The MIT Sze and Budagavi 2012 paper on HEVC throughput reports that the HEVC CABAC design reduces the maximum number of context-coded bins by 8× and the line-buffer memory by 20× relative to H.264 CABAC, precisely because the standard was tuned to make hardware decoders feasible at 4K.

AV1's range coder — non-binary, per-symbol adaptive

AV1's entropy stage looks different from CABAC at every level. The arithmetic engine is a range coder — a binary-arithmetic-coding variant that operates on integer intervals rather than fractional ones, easier to implement in fixed-point hardware. The "binary" part is dropped: AV1 codes symbols of up to 16 alphabet values directly, without first binarising them. The probability estimator updates per symbol, not per frame — the moment the decoder reads a transform coefficient, the probability of the next coefficient is recomputed.

The arithmetic engine is borrowed from Daala, an experimental codec developed at Mozilla (later xiph.org) by Timothy Terriberry, Jean-Marc Valin, and others in the early 2010s. Daala had three innovations that AV1 adopted: a non-binary arithmetic coder that codes one of up to 16 symbols per call, per-symbol probability adaptation, and a multiplication-free implementation that uses shifts and table lookups. The combined effect, according to the 2021 Proceedings of the IEEE technical overview by Han et al., is roughly equivalent to coding four binary bins in parallel — which means an AV1 decoder can run the entropy engine at the same throughput as HEVC's CABAC while reading four times the alphabet.

Per-symbol adaptation is the second big change. In H.264 / HEVC, the probability state of each context is initialised at the start of each slice from a default table and then updates as the slice is decoded. AV1 keeps a similar model but updates the probability after every coded symbol, using a small fixed step size. The result is that a long, statistically homogeneous frame finishes the entropy coding very close to the empirical distribution of its own symbols — closer than CABAC, which is anchored to the slice-start initialisation table.

AV1's bitstream also supports multiple frame context tables, indexed by frame_context_idx. A typical encoder keeps four or eight contexts in memory and switches between them based on the previous frame's content, which helps when the codec is encoding a scene change or a temporal layer with sharply different statistics. The cost is in the decoder: hardware AV1 decoders must hold these context tables in fast on-chip memory and switch between them on cue, which is one of the reasons hardware AV1 decoders were slow to ship even after the spec stabilised.

A worked example. Suppose the encoder is coding a transform coefficient from an alphabet of {0, 1, 2, 3, 4}. The current probabilities, scaled to 32768 (AV1's fixed-point base), are {16384, 8192, 4096, 2048, 2048}. The encoder picks coefficient 2 (probability 4096/32768 = 12.5%). The arithmetic engine narrows the range to the slice corresponding to symbol 2 — that costs about 3 bits — and then nudges the probability table: P(2) climbs by about 100, the other entries shrink by about 25 each. The next coefficient in the same context starts from a slightly different distribution.

The compression gain over HEVC's CABAC depends entirely on what the encoder is doing with motion estimation, transform selection, and quantisation. In an apples-to-apples test reported in the AV1 overview paper, AV1's entropy stage contributes roughly 3–5% bitrate savings over HEVC's CABAC at the same quality. That is on top of the bigger savings from AV1's larger block sizes, more transform shapes, and adaptive quantisation — but it is a clean piece of evidence that the engine itself is materially more efficient.

A diagram comparing CABAC and AV1 range coder side by side. Left half labelled 'CABAC (H.264 / HEVC / VVC)': input symbol of value 7 turns into 4 bins; each bin queries one of ~250 context models; arithmetic engine processes 4 bins serially; output is roughly 2 bits of bitstream. Right half labelled 'AV1 range coder (Daala-derived)': input symbol of value 7 from an alphabet of 16 values; one of ~1000 context tables holds a 16-bin probability distribution; arithmetic engine processes 1 symbol; output is roughly 1.7 bits of bitstream; the probability table updates after this single call. A footer note reads 'AV1 codes the same information in fewer engine calls.' Figure 5. CABAC vs AV1's range coder. CABAC processes binary bins through a context lookup; AV1 processes multi-valued symbols directly and adapts the table after every call.

A common pitfall — context model leakage across slices

The most common entropy-coding bug in a streaming pipeline is a probability state that survives a boundary it should not. Slice headers in HEVC, frame headers in AV1, and CABAC re-initialisation flags in H.264 all exist to re-seed the entropy engine at points where the decoder might join the stream mid-flight — at a tune-in event in live streaming, at a seek event in VOD, or at a random access point in a transport stream.

If the encoder forgets to re-seed, the decoder picks up the stream with the wrong probability state. The first few symbols decode wrong because the encoder's interval no longer matches the decoder's interval. The errors propagate through the entire slice, producing the famous "blocky garbage from one point to the end of the slice" effect that anyone who has worked on live HEVC has seen at least once.

The fix is a careful audit of the encoder's IDR/CRA/BLA frame production and a corresponding audit of the decoder's slice-header parsing. In AV1 the relevant flag is error_resilient_mode, which forces the decoder to use the default context tables instead of the inherited ones. In HEVC it is the slice's entry_point_offsets and the CABAC re-init at every slice. In H.264 it is cabac_init_idc at the slice header. Get those right and the entropy stage becomes invisible. Get them wrong and the stream looks broken in ways that are very hard to diagnose without bit-level tooling.

A comparative table

Across the five codecs that dominate 2026, the entropy stage tells a clear story of more compression at the cost of more decoder work.

Codec (year) Primary entropy coder Alphabet Context models Adaptation Bits saved vs. previous codec (entropy stage only)
MPEG-2 (1995) Static Huffman + DPCM Per-symbol VLC None (fixed tables) None baseline
H.264 / AVC Baseline (2003) CAVLC Per-symbol VLC 5 tables Table choice from neighbour state ~5% over MPEG-2
H.264 / AVC High (2003) CABAC Binary bins ~460 models 64-state estimator, per-slice ~10–15% over CAVLC
HEVC (2013) CABAC Binary bins ~500 models, finer template 64-state estimator, per-slice ~6–8% over H.264 CABAC
AV1 (2018) Range coder (Daala) Up to 16 per symbol ~1000 tables Per-symbol, multiple frame contexts ~3–5% over HEVC CABAC
VVC (2020) CABAC Binary bins ~700 models Multi-rate estimator, dependent-Q context ~2% over HEVC CABAC

Two observations land from this table. First, the gains have been getting smaller — the easy bits were extracted in 2003, and every codec since has chased fractions of a percent. Second, only AV1 broke the binary-arithmetic-coding mould. VVC stayed with binary CABAC because its committee judged that the marginal compression gain of going non-binary did not justify the hardware-design disruption. Both choices are defensible; both are now in production.

Where Fora Soft fits in

We build the conferencing, OTT, surveillance, e-learning, and telemedicine pipelines around H.264, HEVC, and increasingly AV1, and we read entropy-coded bitstreams on every project. When a customer's live HEVC feed shows blocky garbage that starts at the same point in every GOP, the first place we look is the CABAC re-initialisation at the slice header. When a customer's AV1 file decodes correctly in one player and falls apart in another, the trail usually leads to frame_context_idx and a player that has not implemented the full set of probability tables. Twenty years of bitstream debugging across H.263, MPEG-2, MPEG-4 Part 2, H.264, HEVC, VP8, VP9, and AV1 have taught us that the entropy stage is the last place a bug should live and the first place a debugger should look.

What to read next

Talk to us / See our work / Download

References

  1. Shannon, C. E. "A Mathematical Theory of Communication." Bell System Technical Journal, 27, 379–423, 1948. The foundational paper that defines entropy and bounds the achievable compression of any lossless code.
  2. Huffman, D. A. "A Method for the Construction of Minimum-Redundancy Codes." Proceedings of the IRE, 40(9), 1098–1101, 1952. The original algorithm for optimal prefix codes that underlies every variable-length entropy coder.
  3. Rissanen, J. J. "Generalized Kraft Inequality and Arithmetic Coding." IBM Journal of Research and Development, 20(3), 198–203, 1976. One of the two independent inventions of practical arithmetic coding.
  4. Marpe, D., Schwarz, H., and Wiegand, T. "Context-Based Adaptive Binary Arithmetic Coding in the H.264/AVC Video Compression Standard." IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 620–636, 2003. The original CABAC paper.
  5. Richardson, I. E. G. "H.264/AVC Context Adaptive Variable Length Coding (CAVLC)." Vcodex BV technical note. https://www.vcodex.com/h264avc-context-adaptive-variable-length-coding
  6. Sze, V., and Budagavi, M. "High Throughput CABAC Entropy Coding in HEVC." IEEE Transactions on Circuits and Systems for Video Technology, 22(12), 1778–1791, 2012. https://dspace.mit.edu/bitstream/handle/1721.1/100315/hevc_cabac_chapter.pdf
  7. Han, J., Li, B., Mukherjee, D., et al. "A Technical Overview of AV1." Proceedings of the IEEE, 109(9), 1435–1462, 2021. https://arxiv.org/pdf/2008.06091
  8. Valin, J.-M., Terriberry, T. B., Egge, N., et al. "Daala: Building A Next-Generation Video Codec From Unconventional Technology." IEEE International Workshop on Multimedia Signal Processing, 2016. https://arxiv.org/pdf/1608.01947
  9. Schwarz, H., Coban, M., Karczewicz, M., et al. "Quantization and Entropy Coding in the VVC Standard." IEEE Transactions on Circuits and Systems for Video Technology, 31(10), 3891–3906, 2021. http://www.ecodis.de/video/schwarz_tcsvt_vvc_rev2.pdf
  10. Im, S.-K., et al. "Dynamic estimator selection for double-bit-range estimation in VVC CABAC entropy coding." IET Image Processing, 2024. https://ietresearch.onlinelibrary.wiley.com/doi/full/10.1049/ipr2.12980
  11. ITU-T Recommendation H.264 | ISO/IEC 14496-10. Advanced Video Coding for Generic Audiovisual Services, edition 14, 2023. https://www.itu.int/rec/T-REC-H.264
  12. ITU-T Recommendation H.265 | ISO/IEC 23008-2. High Efficiency Video Coding, edition 8, 2024. https://www.itu.int/rec/T-REC-H.265
  13. AOMedia. AV1 Bitstream & Decoding Process Specification, version 1.0.0 errata 1, January 2019. https://aomediacodec.github.io/av1-spec/av1-spec.pdf