The motion feature is the temporal input that VMAF adds to its otherwise per-frame, spatial features. It measures how much the picture is moving by computing the temporal difference between neighbouring frames — in the open-source VMAF package, the mean co-located pixel difference on the luma (brightness) channel, that is, how much each pixel changes from one frame to the next at the same location. Motion matters perceptually because the eye forgives more distortion when the scene is moving fast: a still frame is scrutinised, a fast pan is not. Feeding motion into the trained model lets it calibrate how harshly to judge the spatial damage that VIF and the detail-loss metric detect, raising or lowering tolerance with the pace of the action. It is one of the three elementary features in the classic VMAF (v0) fusion. Its catches mirror VMAF's: it is full-reference and the co-located-difference measure is coarse, which is part of why the 2026 v1 high-frame-rate variants use a wider five-frame motion window for fast content.

