Temporal redundancy is the fact that consecutive frames in a video look almost the same. A talking head moves slightly, the background doesn't move at all, a soccer player runs across the field but the grass behind them stays unchanged. Frame after frame, most of the picture is repeated information. Exploiting that is the single biggest reason video compresses so much better than just storing thousands of still images.
Codecs handle this through inter-prediction and motion-estimation. Instead of storing the full content of each frame, the encoder describes it as a small change relative to one or two earlier frames (or, with B-frames, also later ones). For most blocks it just says "look at the previous frame, shift this 16×16 patch by 3 pixels right and 2 pixels down, copy it here". That instruction takes a handful of bits, instead of the thousands it would take to store the same patch from scratch. The encoder stores the full picture (an i-frame) only once every few seconds; everything in between is described as a delta.
The implications for product. Static content compresses extremely well — a talking-head webinar can sit comfortably at 1 Mbps in 1080p. Motion-heavy content compresses badly — a snowstorm, a sparkling water surface, confetti falling on a stage all destroy temporal redundancy and need 3–5× the bitrate to look acceptable. This is why sports streaming is consistently more expensive per viewer than scripted drama: nothing's the same as the frame before.

