Arithmetic coding

Arithmetic coding is a near-optimal lossless compression technique that turns a stream of symbols into a sequence of bits as short as the entropy lower bound allows. Unlike older techniques like Huffman that assign whole numbers of bits to each symbol, arithmetic coding can encode a single symbol in a fractional number of bits — meaning it doesn't waste any space rounding up to the next bit. Over millions of symbols in a typical video frame, the savings add up to several percent.

The intuition: arithmetic coding represents the entire input message as a single floating-point number between 0 and 1, where each symbol narrows down the range further. Frequent symbols barely change the range (using almost no bits); rare symbols narrow it dramatically (using many bits). The final number is written out in just enough binary digits to uniquely identify the chosen range. Combined with context-adaptive probability estimates that update as the encoder scans the frame, you get coding that adapts to local statistics in real time.

For a product team, arithmetic coding is the mathematical backbone of CABAC (H.264, HEVC) and a variant called daala_bool (AV1). It contributes roughly 10–15 % of any modern codec's compression efficiency. The cost is computational: arithmetic coding is inherently serial (each symbol depends on the running state), so it's hard to parallelise and is often the bottleneck of low-power decoders. For an executive summary: arithmetic coding is why the same video at the same visual quality is meaningfully smaller in H.264 Main profile than in Baseline profile — the difference is entirely in the entropy coder.

Arithmetic coding

Related terms

CABAC

Huffman coding

Entropy coding