Lip-sync is the alignment of audio and video so that the sound a viewer hears lines up with the picture — most visibly, speech with mouth movement. The ear and eye tolerate a small offset, and ITU-R BT.1359 quantifies it: viewers begin to detect errors when audio leads the picture by more than about 45 ms or lags by more than about 125 ms, with production guidelines (EBU R37) tighter still. Maintaining sync means carrying timing through the whole chain — capture, encode, multiplex, deliver, decode, render — and it breaks in characteristic places: separate audio/video pipelines, post-decode buffering, and clock drift over long sessions. It is the failure users notice and complain about first.