Prediction is the encoder's best guess at what each block of pixels should look like, made before the actual content is coded. The two flavours: intra-prediction guesses based on the already-decoded pixels just above and to the left in the same frame; inter-prediction guesses based on previously decoded frames using motion vectors. After making a guess, the encoder subtracts the guess from the real content to get a small residual, and only the residual gets transformed, quantized and stored.
This is the single biggest reason modern video compresses well. A good prediction means the residual is mostly zeros, which compress to almost nothing in the entropy stage. A bad prediction means the residual is essentially the original content, which compresses poorly. Codec generations measure their progress largely by how good their predictions are: better directional modes, more reference frames, sub-pixel precision motion vectors, multiple-hypothesis prediction (combining two guesses), and increasingly sophisticated intra-prediction modes (AV1 has 56 directional intra modes; H.264 had just 9).
For a product team, prediction is the central idea that makes modern video economically feasible. Without prediction, every frame would be encoded like a JPEG and a 90-minute movie would weigh in the tens of gigabytes. With prediction, the same movie fits in 2–3 GB at the same quality. Every codec improvement claim — "30 % better than the previous generation" — almost always traces back to improvements in prediction. If you remember just one concept about how modern video compression works, prediction is the one.

