A jitter buffer is the receiver-side queue that absorbs jitter — the variation in how evenly packets arrive. Packets sent at a steady cadence rarely arrive at a steady cadence; the network spaces them irregularly, some bunching up and some lagging. If the receiver played them out the instant each arrived, audio would stutter and warble. The jitter buffer holds incoming packets for a brief, controlled interval so that playback can proceed at a smooth, even rhythm despite the uneven arrival underneath.
The key insight is that the buffer's depth is purchased latency: every millisecond of buffering you add to smooth playback is a millisecond added to the glass-to-glass delay of the conversation. That is the central tension, because the same delay that makes audio smooth also makes turn-taking harder. Modern implementations resolve it dynamically rather than with a fixed size — WebRTC's NetEq for audio is the well-known example — growing the buffer when the network turns turbulent and shrinking it when conditions calm, and even stretching or compressing audio almost imperceptibly to recover from drift without an audible glitch.
For a telemedicine product the practical implication is that smoothness and responsiveness are in direct tension, and the jitter buffer is where that trade is made automatically, dozens of times a second. The common mistake is reaching for 'just buffer more' as a fix for choppy audio. It does smooth the choppiness, but it does so by spending latency, and past a point the call becomes smooth and laggy — clinicians and patients talk over each other — which on a clinical call is its own failure rather than a solution.

