Frame sampling picks a subset of frames — evenly spaced, or where motion happens — so a VLM sees the gist without paying to process every frame. The sampling rate is a central accuracy-versus-cost knob in video understanding.
Definition
Choosing which video frames to feed a model rather than all of them, to cut cost and tokens while keeping enough information to understand the clip.
Frame sampling picks a subset of frames — evenly spaced, or where motion happens — so a VLM sees the gist without paying to process every frame. The sampling rate is a central accuracy-versus-cost knob in video understanding.
Also known as
frame selection