Latency in clinical video is the glass-to-glass delay — the time from light entering one person's camera to the corresponding image appearing on the other person's screen. It is not a single number from one source but an accumulation along the whole pipeline: capturing the frame, encoding it, sending it across the network, holding it briefly in the receiver's jitter buffer to smooth out timing variation, decoding it, and finally rendering it. The same applies to audio in parallel. Because every stage adds milliseconds, latency is best understood as a budget you spend hop by hop.

Why it matters to a telemedicine team is that conversation is exquisitely sensitive to delay, and the breakdown is gradual rather than sudden. Around 200 ms of one-way delay, people begin to talk over each other slightly and the rhythm of dialogue stiffens. Beyond roughly 300 to 500 ms one-way, natural turn-taking breaks down — each side waits, then both speak at once, then both stop — and that awkwardness erodes the clinical rapport that makes a remote consultation feel like real care.

The practical discipline is to budget latency stage by stage and, crucially, to measure it in production rather than trusting lab conditions. Averages are dangerously reassuring because they hide the tail: the patient on satellite internet, the rural cellular link, the overloaded hospital Wi-Fi, where one-way delay is multiples of the median. The common mistake is optimizing for the average user and reporting a healthy mean latency, while the small group of high-latency patients quietly has unusable conversations and never reports it as a 'latency' problem — they just say the call 'felt weird' or stopped using telehealth.