How WebRTC actually works end-to-end at production scale. The topology decision (P2P, SFU, MCU, hybrid, broadcast) every architect is asked to answer on the first call. The 12 components every production WebRTC system needs. The build-vs-SDK economics that decide whether a custom stack or Twilio / Agora / LiveKit Cloud is the right call. Written from the platforms we have shipped: Nucleus (600M+ call minutes per month), Worldcast Live (sub-second latency at 10,000 concurrent), StreamLayer (NBC / CBS / Chelsea FC / Sony Music interactive sports), and BrainCert (500M+ classroom minutes per year).
WebRTC is the browser-native standard for real-time peer-to-peer media — four primitives (getUserMedia, RTCPeerConnection, MediaStream, RTCDataChannel) plus ICE / STUN / TURN for NAT traversal. WebRTC architecture is the system-design pattern for shipping that standard at production scale.
Every production WebRTC system has 12 components surrounding the browser primitives: signaling, a media gateway (SFU / MCU / P2P), TURN servers, recording pipeline, transcoding, CDN distribution, observability, billing, end-to-end encryption, compliance tooling, guardrails, and deployment. Skipping any of them is technical debt that surfaces in the first month of production traffic.
P2P, SFU, MCU — the topology decision. P2P is direct browser-to-browser, lowest latency, but stops at ~4 participants because upload bandwidth scales O(N²). An SFU (Selective Forwarding Unit — mediasoup, LiveKit, Janus, Pion, Kurento) routes each peer’s stream once and forwards selectively; bandwidth at the peer becomes O(1), so SFUs scale to 10,000+ participants per cluster. An MCU mixes peer streams server-side into a single composite stream — useful for legacy SIP interop, costly because the server transcodes everything. Most modern stacks ship hybrid — P2P for 1:1, SFU for groups, MoQ or LL-HLS for broadcast above ~10K viewers.
How you scale WebRTC depends on the participant tier. Below 100 participants, a single SFU node handles it. 100–1K, simulcast + SVC gates the publisher count and serves receive-only viewers cheaply. 1K–10K, cascade SFUs across regions or hybrid with LL-HLS broadcast. Above 10K, WebRTC becomes the talent layer; LL-HLS or MoQ (Media over QUIC, the emerging 2026 standard) handles broadcast fan-out. Fora Soft has shipped at every tier — Nucleus runs 600M+ call minutes per month; Worldcast Live runs HD broadcast at 10K concurrent with sub-second latency; StreamLayer powers interactive sports for NBC, CBS, Chelsea FC, and Sony Music.
Four shapes of WebRTC stack dominate the 2026 landscape. Each gets a different topology, a different vendor stack, and a different scale ceiling.
WebRTC + SIP-based video / audio calls with CRM and ERP integration. AI phone agents handle voice intake. SOC II + GDPR + HIPAA on the receipts. P2P for 1:1, SFU for groups.
Interactive overlays (polls, predictions, in-stream betting), watch parties, real-time stats. WebRTC for the talent layer plus LL-HLS / HLS for broadcast fan-out. Interactive viewers watch 33% longer than passive.
HD concert streaming with multichannel audio (5 channels), 1.5 Gb/s bitrate, full-duplex two-way streaming for remote performers playing together in real time. 0.4–0.5 s latency at 10K concurrent.
WebRTC + HTML5 virtual classrooms at production scale. 500M+ classroom minutes per year, 99.995% uptime, native HD virtual classrooms, interactive whiteboard, server-side recording, proprietary DRM.
Telehealth and business communication at 600M+ call minutes per month. HD broadcast at 10K concurrent with sub-second latency. Interactive sports streaming for major networks. The world’s first WebRTC + HTML5 virtual classroom LMS at 500M+ classroom minutes per year. Four production builds running today across four very different scale shapes.
Secure on-premise Slack alternative, serving 300,000+ customers on a national network processing 2 billion+ phone calls annually. Nucleus serves 5,000+ businesses with AI phone agents handling 600M+ call minutes monthly. Featured in Financial Post and PRWeb.
The WebRTC architecture combines real-time WebRTC calls with SIP for cellular and landline bridging — chat-to-SMS, voice-to-voice AI translation, video and audio calls, CRM and ERP integration. SOC II, GDPR, and HIPAA-compliant.
HD concert streaming platform — one of the first to achieve sub-second latency (0.4–0.5s) at scale, serving 10,000 simultaneous viewers. Custom WebRTC + Kurento architecture with multichannel audio (5 channels), 1.5 Gb/s bitrate for true HD quality, and dynamic video quality adjustment for poor connections.
Full-duplex two-way streaming lets performers in different locations play together in real time. The white-label Multiple Venue Streaming (MVS) plugin syncs live streams across multiple websites simultaneously.
Interactive sports streaming SaaS used by NBC, CBS, Red Bull, Live Nation, Chelsea FC, Coca-Cola, and Sony Music. Powered live engagement at Lollapalooza and Jay-Z’s Made In America festival. Raised $14.1M across 7 rounds ($8M Series A in 2022 led by Las Vegas Sands). Featured on NASDAQ and Gaming America.
Platforms implementing the full StreamLayer stack report 60–100% revenue uplift over basic ad breaks, with effective CPM rates reaching $50–80+. Interactive viewers watch 33% longer than passive viewers — ~2x increases in dwell time, ad revenue, and subscriber growth.
The world’s first WebRTC + HTML5 virtual classroom LMS. Bootstrapped to $3M annual revenue and 100K+ customers — a 12-person team outcompeting VC-backed giants across the $400B+ e-learning market. 500M+ real-time classroom minutes delivered across 10 worldwide datacenters with 99.995% guaranteed uptime. 4 Brandon Hall Awards (Triple Bronze 2021).
Full compliance stack: SOC 2 Type I & II, ISO/IEC 27001:2022, HIPAA, GDPR, PCI DSS, CCPA, NIST SP 800-171. Proprietary QuantumKey DRM encryption shipped at no additional cost to customers. Fora Soft built the product from the ground up; 2+ year ongoing partnership.
Three architectural paths for shipping a real-time video product. None is universally correct. The right choice is a function of usage volume, customization depth, compliance scope, and whether the platform is the product or supports the product.
Wins when: real-time video is the product, concurrency clears 1,000 simultaneous, custom server-side routing logic, regulated industry (HIPAA, SOC 2, GDPR), brand-embedded experience, multi-tenant SaaS, or specific media routing (MoQ delivery, multi-region cascade, simulcast tuning by network condition). You own all the IP.
Cost shape: $5K–$40K build over months. Plus $500-$2K monthly operations (depends on the usage). Break-even on per-minute economics typically falls around 8M participant-minutes per month. Archetypes: Nucleus, Worldcast Live, BrainCert, StreamLayer.
Wins when: under 5M participant-minutes per month, standard call patterns, no custom server-side routing, no regulatory or data-residency constraints, no in-house engineering capacity to operate multi-region WebRTC. Twilio Video, Agora, LiveKit Cloud, Daily, Vonage — all do this well.
Cost shape: $0.004–$0.06 per participant-minute (Twilio band), per-minute rate scales linearly with usage. No upfront build cost. Vendor lock-in. Most SDKs don’t offer BAAs.
Wins when: the runtime fits an SDK but the differentiator (custom turn-taking, regional routing, voice cloning, embedded experience) needs custom code. The most common 2026 pattern in regulated SaaS plays.
Cost shape: SDK subscription + $–$0K for the custom layer + monthly maintenance. Designed to migrate to full custom if usage scales past ~8M participant-minutes.
Cost ranges are 2026-indicative. Implementation specifics — concurrent participant targets, compliance scope, recording requirements, codec mix, multi-region cascade depth — typically dominate the spread within each tier.
Custom WebRTC architecture costs $-build over months. SDK alternatives cost $0.004–$0.06 per participant-minute with vendor lock-in. The break-even point typically falls around 8M participant-minutes per month — below, SDK wins; above, custom wins.
An SFU (Selective Forwarding Unit) is the media routing topology used by every production WebRTC group call above ~4 participants. Each peer uploads a single encoded stream to the SFU; the SFU forwards selective copies to every other peer without decoding or re-encoding. This makes bandwidth at the peer O(1) regardless of room size. SFU vendors include mediasoup, LiveKit, Janus, Pion, ion-sfu, Kurento, and Jitsi Videobridge. Modern SFUs add simulcast and SVC so the SFU can drop quality tiers selectively per receiver based on network conditions.
P2P (peer-to-peer) connects peers directly via RTCPeerConnection — no server in the media path. Bandwidth at the peer scales O(N²) across the room (every peer uploads to every other peer), which works for 1:1 and stops being viable above ~4 participants. SFU routes streams through a server: each peer uploads once, the server forwards. Bandwidth at the peer becomes O(1), making SFU scale to 10,000+ participants. Most production stacks ship hybrid topology — P2P for 1:1 legs, SFU for groups, selected dynamically per session.
SFU and MCU are both server-side topologies, but they route media differently. SFU forwards streams without transcoding — low latency (150–300 ms) but consumes bandwidth at the SFU egress. MCU mixes peer streams server-side into a single composite stream — higher latency (250–500 ms with transcoding), lower bandwidth at the SFU, but very compute-heavy. In 2026, SFU is the default for almost every production WebRTC build. MCU is reserved for legacy SIP interop and broadcast composite output where a single mixed stream is required.
Scale depends on the participant tier. Below 100 participants, a single SFU node handles it. 100–1K, simulcast + SVC gates the publisher count and serves receive-only viewers cheaply. 1K–10K, cascade SFUs across regions or move to a hybrid SFU + LL-HLS architecture. Above 10K, WebRTC becomes the talent layer; LL-HLS or MoQ (Media over QUIC, the emerging 2026 standard) handles broadcast fan-out. The Topology Decision Picker above walks through each tier with vendor stack, latency budget, and failure mode.
It depends on participant count and use case. P2P for 1:1 (sub-250 ms latency, no server). SFU for 4–10,000 participants (the default in 2026). MCU only for legacy SIP interop or when a single composite output stream is required. Hybrid (P2P + SFU) for products where call patterns vary — consumer apps, telehealth, customer-support video. Broadcast (LL-HLS / MoQ) above 10,000 viewers — WebRTC remains the interactive talent layer.
WebRTC uses ICE (Interactive Connectivity Establishment), STUN (Session Traversal Utilities for NAT), and TURN (Traversal Using Relays around NAT). ICE gathers candidate addresses from each peer. STUN punches direct connections through most NATs. TURN relays through a server when STUN cannot — and 18–35% of cellular connections relay through TURN in practice. Production deployments run TURN servers in 3+ regions with anycast routing to minimize relay latency.
Glass-to-glass latency by topology: P2P direct 80–250 ms; SFU region-local 150–300 ms; SFU cross-region 250–500 ms; MCU 250–500 ms with transcoding; LL-HLS broadcast 2–4 seconds; MoQ broadcast 200–500 ms (emerging). For real-time interactive use cases (telehealth, customer-support video, conferencing) the target is sub-300 ms; for interactive broadcast (sports interactive layer) sub-500 ms; for near-real-time broadcast (LL-HLS sports streaming) 2–4 seconds.
All five are production-grade SFUs. Choice depends on team skills and product needs. mediasoup (Node.js-based, lowest-level) for teams that want fine-grained control. LiveKit (Go-based, opinionated, Cloud + self-host) for teams that want fastest time-to-MVP and the AI-agent framework on top. Janus (C-based, plugin-driven, mature) for legacy SIP / VOIP integration. Pion (Go, library not server) for teams building custom servers from primitives. Kurento (Java, mature, used by Nucleus and Worldcast Live) for teams that need a media server with built-in transcoding and recording. Fora Soft has shipped production builds on all five.
Server-side recording, pulled from the SFU media path, never from the client browser (browser MediaRecorder drops participants when tabs close). The SFU emits an event when a publisher joins or leaves; a recording orchestrator spawns an FFmpeg process that subscribes to the SFU as a participant and writes HLS / DASH segments to S3-compatible storage with chunked uploads. Transcoding happens server-side with GPU tiers above 100K minutes / month. Plan for ~1 GB / hour / room at 720p, more for 1080p.
Four layers. (1) Signed BAA with infrastructure providers (AWS, GCP, Azure all offer BAAs; many WebRTC SDKs do not). (2) Encryption in transit (DTLS-SRTP is built into WebRTC) and at rest (encrypted recording storage). (3) Full audit logging of every join, leave, recording access, and admin action. (4) Access controls — RBAC, MFA, automatic session termination. Fora Soft has shipped HIPAA-compliant WebRTC platforms including Nucleus (SOC II + GDPR + HIPAA) and BrainCert (full compliance stack). Most SaaS WebRTC SDKs do not offer BAAs, so HIPAA almost always pushes toward custom or self-hosted deployments.
A custom WebRTC platform costs $5K–$40K to build over 1–3 months, then $500–$2K per month to operate (depending on the usage). Twilio Video costs $0.004–$0.06 per participant-minute; Agora $0.99 per 1,000 HD participant-minutes; LiveKit Cloud $0.40–$2.00 per agent-hour. Break-even is around 8M participant-minutes per month — below, SDK is cheaper; above, custom wins on per-minute economics. Add regulatory constraints and custom often wins below the break-even, because most SDKs do not flexibly contract around HIPAA, GDPR EU-only data, or FedRAMP.
Media over QUIC (MoQ) is an IETF-standardizing delivery protocol for sub-500 ms latency at one-to-many scale. It uses QUIC (the HTTP/3 transport) with publish / subscribe semantics. MoQ is not a WebRTC replacement for interactive group calls — it targets the gap between WebRTC (interactive, sub-300 ms, 10K participants) and HLS (broadcast, 6–30 seconds, millions of viewers). Cloudflare, Twitch, and Meta have shipped early MoQ deployments. For new builds expecting >100K simultaneous viewers, MoQ is the right thing to test in 2026.
Each piece below extends one slice of this — the LiveKit agent runtime, the multimodal cross-cluster, the translation architecture, the e-learning live-class layer, the commercial path to commissioning a build, or the deeper blog dive on P2P vs MCU vs SFU.
If you are scoping a real-time video or audio platform and want a second opinion on the topology, the SFU vendor, the latency budget, the compliance approach, or the build-vs-SDK threshold — write us. A senior engineer with shipped WebRTC platforms in production replies within 24 hours.