Spatial redundancy is the simple fact that pixels next to each other usually look alike. A patch of blue sky is mostly the same blue; a brick wall is mostly the same shade of brown; a face has smooth regions where neighbouring pixels barely differ. Storing every pixel individually would be wasteful — you'd be writing the same number over and over.

Video codecs exploit this in two main ways inside every single frame. First, intra-prediction: the encoder predicts each block of pixels by guessing it's similar to the blocks already coded just above and to the left of it, then only stores the small difference between the guess and the actual content. Second, the dct transform: instead of storing pixel values directly, it stores how the pixels change across the block — a flat region becomes one strong number plus many near-zero ones, which compresses to almost nothing.

This is half of why video files can be so small. The other half is temporal-redundancy — the fact that consecutive frames also look alike. Spatial redundancy is what lets a single still image compress (JPEG works on the same principle); temporal redundancy is what lets video compress dramatically more than a sequence of JPEGs. For a business reader: it's not magic, it's just the encoder noticing patterns the human eye and brain take for granted.