Inter-prediction is the trick that lets video compress dramatically better than a sequence of still images: instead of coding each frame from scratch, code it as a difference from earlier (and sometimes later) frames. In most natural video, the change from one frame to the next is small — a face moves a few pixels, a background stays still, a car shifts slightly to the left. Inter-prediction expresses that change in a few bytes and skips re-describing the unchanged parts entirely.
Mechanically, the encoder splits each frame into small blocks and, for each block, searches the reference frame for the patch that looks most similar. It records a motion-vector — basically "move this block 3 pixels right and 2 down from where it was in the previous frame" — plus a tiny residual to correct anything the move didn't capture exactly. That's it: an entire moving face might become a few dozen motion vectors plus minor fixups, instead of millions of explicitly stored pixels.
Inter-prediction is the single biggest reason video files are so much smaller than the equivalent stack of JPEGs — accounting for roughly half of all video compression gains. Its limit is content where temporal redundancy breaks down: scene cuts, sparkles, water, snow, fast camera pans. In those cases the encoder gives up trying to predict and inserts an I-frame, paying the storage cost but starting fresh. Modern codecs (HEVC, AV1, VVC) extend the basic idea with multiple reference frames, sub-pixel precision motion vectors, and bidirectional b-frame prediction.

