The DCT — Discrete Cosine Transform — is the mathematical step at the heart of almost every image and video codec. JPEG uses it, MPEG-2 used it, H.264 uses a close variant, HEVC and AV1 use refined versions. Intuitively, it takes a small block of pixels (say, 8×8) and re-describes it not as 64 individual values, but as a sum of standard wavy patterns at different scales.
Why this is useful: for most natural images, only the first few patterns (the broad, smooth ones) carry the bulk of the information, while the high-frequency patterns (fine textures, sharp edges) contribute much less. So after a DCT, you get a few important numbers and many tiny ones close to zero — perfect for compression. You can throw away or quantize the tiny ones aggressively (the eye won't notice) and keep only the big ones, dropping the data volume by 5–10× before any entropy coding even starts.
That last point — the eye won't notice — is the whole reason DCT-based codecs feel "right". Human vision is much more sensitive to low-frequency information (the overall shape of a face) than to high-frequency information (individual pores). The DCT cleanly separates the two, letting the encoder spend bits on what the viewer perceives and save them on what the viewer doesn't. AV1, HEVC and VVC add complementary transforms (ADST, integer transforms, etc.) to handle blocks where pure DCT isn't optimal, but the core principle has been the same since JPEG in 1992.

