Transform coding is the mathematical step that turns raw pixel differences into a form that compresses well. After the encoder predicts each block of pixels and gets a small leftover residual, that residual is run through a mathematical transform — usually the DCT or one of its relatives — to re-express it in the frequency domain. The point: instead of dozens of small numbers scattered across the block, you get a few important numbers (the low-frequency coefficients) plus a tail of values close to zero. That structure is exactly what later compression stages can squeeze hard.

The intuition. Natural visual content — faces, landscapes, lighting — is dominated by smooth low-frequency variation (the broad shape of things) and only has small high-frequency variation in the details (fine textures, edges). The transform separates the two cleanly. Once separated, the encoder can keep the important low-frequency coefficients almost intact and aggressively quantize or discard the less important high-frequency ones, exploiting the fact that the human eye barely notices fine high-frequency loss. Modern codecs offer a menu of transforms — DCT, ADST, identity, flipped variants — and the encoder picks whichever one best matches each block.

For a product team, transform coding is invisible plumbing that's been at the heart of video compression since JPEG in 1992. There's nothing to tune. Its existence explains why a 4K HDR file at 15 Mbps looks essentially identical to its uncompressed 6 Gbps master — the transform concentrates the visually important information into a tiny number of bits, and modern codecs (HEVC, AV1, VVC) keep refining which transforms are available and how the encoder chooses among them. Each refinement contributes a few more percent of bitrate savings.