A frame is one picture in the stream; play many per second and the eye sees motion. Vision models usually process frames one at a time, so how many frames you analyse per second directly sets accuracy, latency, and compute bill.