Live Captions (SFU-Side ASR Fan-Out) Engineering Cheat Sheet

One-page reference: what a live caption is (timed, speaker-labelled WebVTT-style cues); the SFU-side ASR fan-out pattern (tap audio at the SFU, VAD-gate recognition, fan caption text out over the data channel); the client-vs-server cost math ($13.86 naive vs $0.69 fan-out for a 1-hour 30-person call); partial-vs-final result handling; caption delivery over RFC 8831 data channels; the WCAG 1.2.4 Level AA / DOJ ADA April 2026 accessibility deadline; and a build-vs-buy checklist (mediasoup/Janus/Jitsi vs LiveKit; Deepgram/AssemblyAI/Whisper).

Download free PDF

PDF