Motion estimation is the search the encoder does for where each part of the current frame "was" in a previous frame. For every block of pixels (typically 4×4 up to 64×64), the encoder hunts through nearby positions in the reference frame, finds the patch that looks most similar, and records the offset as a motion-vector. It's the most computationally expensive step in video encoding — often 50–80 % of the total CPU time — and the single biggest lever on compression quality.
Why this step is hard: a high-quality encoder doesn't just check a few obvious positions; it tries dozens or hundreds of candidate offsets per block, sometimes at sub-pixel precision (quarter or eighth of a pixel), to find the best match. For a 1080p frame at 30 fps, that's millions of comparisons per second. The faster preset on x264 might do a quick diamond search around an initial guess; the slower preset does exhaustive search over a wide radius. The slower one finds better matches and shrinks the file by 10–20 %, at the cost of taking 5–10× as long to encode.
The practical takeaway: motion estimation is where the encoder preset (x264 -preset slow, x265 -preset slower, SVT-AV1 -preset 4) cashes out. Slow presets buy quality and bandwidth savings by doing more searching here; fast presets save encoding time by cutting it short. For VOD where you encode once and serve millions of times, slower presets pay off enormously. For live streams where every frame must encode in real time, fast presets are mandatory and hardware encoders take over.

