WebRTC Delivery: From Peer-to-Peer to Scaled Distribution

Why This Matters

If you ship a product where the viewer must react to what they see in less than a second — live sports betting, interactive game shows, auction floors, telemedicine consultations, e-learning with hand-raise, live shopping, remote surveillance, low-latency drone control — HLS and DASH cannot help you. Even Low-Latency HLS, the most aggressive HTTP-based protocol shipping today, sits between 1.5 and 3 seconds of glass-to-glass latency in production. WebRTC delivery sits between 200 and 500 ms in the same conditions, and it is the only protocol that supports interactive use cases at internet scale. The engineering trade is steep — WebRTC scales linearly with viewers instead of logarithmically with edge caches, the server cost per viewer is 10–100× higher than HLS, and the player and infrastructure complexity are substantial. This article explains the trade so that your product, architecture, and finance teams can pick the right protocol for the right job, and it shows the specific seams — simulcast, SVC, cascading SFUs, WHEP egress, the WebRTC-to-HLS bridge — where the system either earns its cost or wastes it.

What WebRTC was designed for, and what changed

WebRTC was specified between 2011 and 2021 to let two browsers exchange audio, video, and data directly across the public internet with the lowest practical latency. The IETF published the first stable umbrella document, RFC 8825 — Overview: Real-Time Protocols for Browser-Based Applications, in February 2021, alongside the rest of the RFC 8826–8866 family that nails down the security model (RFC 8826), transport (RFC 8835), data channels (RFC 8831), congestion control (RFC 8888), and SDP usage (RFC 8839). The W3C webrtc JavaScript API specification reached Candidate Recommendation in 2021 and was last refreshed as a Candidate Recommendation Snapshot on 2024-11-26.

The original use case was a 1:1 audio or video call between two browser tabs. Two peers each gather their network candidates with Interactive Connectivity Establishment (ICE), exchange Session Description Protocol (SDP) offers and answers via an application-specific signalling channel, perform a DTLS handshake to derive shared keys, and from then on send RTP packets encrypted with Secure RTP (SRTP) directly to each other over UDP. End-to-end latency in this setup is dominated by the round-trip time between the two peers — typically 50–200 ms on the public internet — and the encoder/decoder, jitter buffer, and renderer pipelines add another 100–200 ms on top. The total budget is 200–500 ms, which is what every WebRTC vendor still quotes today.

Two structural facts about that design dictate everything that follows. First, every WebRTC session is a real-time stateful relationship between exactly two endpoints — there is no notion of an HTTP cache in the protocol, no segment, no manifest, nothing a CDN can hold and serve to many clients without active media-server software. Second, every WebRTC connection consumes server-side resources for as long as the viewer is watching — CPU for SRTP encryption, memory for the jitter buffer, and a UDP socket that some firewall along the path is willing to keep open. These two facts are what make WebRTC delivery to many viewers fundamentally different — economically and architecturally — from HLS delivery to many viewers.

From two peers to many — the three topologies

When the industry wanted to use WebRTC for one-to-many delivery rather than 1:1 calls, three topologies emerged. Each has a different cost profile, a different latency profile, and a different operational complexity.

Mesh

The first topology is mesh: every participant in the session opens a direct WebRTC connection to every other participant. With N participants, the network carries N × (N−1) connections; every publisher uploads N−1 copies of its own stream; every viewer downloads N−1 streams. Mesh has the lowest latency of any topology (one network hop, no media-server processing in the middle) and the lowest server cost (no media server at all — only the signalling and TURN fallback). It also collapses past about 4–6 participants: the uplink at each participant melts under the upload load, and the CPU melts under N−1 concurrent decode pipelines. Mesh is the right architecture for tiny calls (browser-to-browser video chat with two or three peers), and it is the wrong architecture for everything else. Nobody ships mesh for delivery — it is a 1:few topology, not a 1:many topology.

MCU — Multipoint Conferencing Unit

The second topology is the Multipoint Conferencing Unit — MCU. An MCU is a media server that takes every participant's audio and video, decodes it, mixes the audio into a single combined track and composes the video into a single combined picture (the classic Brady-Bunch grid), then re-encodes the mix and sends one stream out to every viewer. MCUs were the standard architecture for conferencing in the H.323 / SIP era because they let endpoints with limited CPU receive a single low-bitrate stream. They are almost gone today for two reasons. First, the decode-mix-encode pipeline adds 100–300 ms of latency per hop, which destroys the WebRTC advantage. Second, the CPU cost of decoding every incoming stream and re-encoding the composed output is enormous — a single 1080p mix at 30 fps costs about 1.5–3.0 CPU cores on a modern server, all the time, for one stream. MCUs survive only in tightly controlled corporate conferencing where the decoder-on-the-server model is the only way to reach legacy endpoints; nobody uses them for internet-scale WebRTC delivery.

SFU — Selective Forwarding Unit

The third topology is the Selective Forwarding Unit — SFU. An SFU is a media server that receives every publisher's RTP stream, does not decode it, and forwards the same RTP packets to each subscribed viewer over a separate WebRTC connection. The SFU adds about 20–80 ms of latency per hop — far less than an MCU — and uses an order of magnitude less CPU because it never decodes or re-encodes. The trade is that each viewer must run a full WebRTC decoder, but every modern device (browser, iOS, Android, smart TV with WebView, set-top box) can do that. The SFU is the topology every WebRTC delivery vendor in 2026 uses — mediasoup, Janus, LiveKit Server, Jitsi Videobridge, Pion, ion-sfu, the Cloudflare Stream WebRTC backend, the Dolby OptiView platform, and the AWS IVS Real-Time service all build on it.

The single most important thing to understand about an SFU for delivery is that each viewer's downstream is its own real WebRTC connection. The SFU is not a CDN edge — it cannot serve the same byte stream to many clients in parallel from a cache. Each subscriber maintains a separate DTLS session, a separate set of ICE candidates, a separate jitter-buffer feedback loop, and a separate congestion-control loop. The SFU spends CPU and memory on each one. This is the cost shape we will return to several times in this article.

How the SFU actually scales — simulcast and SVC

A single SFU running on a single machine can typically serve a few hundred to a few thousand viewers, depending on encoding parameters and machine size. Two techniques — simulcast and Scalable Video Coding (SVC) — let one SFU serve a much larger range of viewer bandwidths from a single publisher stream without re-encoding.

Simulcast

In simulcast, the broadcaster encodes the same video at two or three resolutions and bitrates at the same time and sends all of them to the SFU as separate RTP streams in the same WebRTC session. A typical simulcast configuration for live streaming is:

High:     1080p @ 2.5 Mbps
Medium:    540p @ 0.8 Mbps
Low:       180p @ 0.15 Mbps

The SFU then forwards exactly one of those three streams to each subscriber, based on the subscriber's last reported bandwidth and the decoder's reported decoding budget. A viewer on a fibre connection gets the 1080p stream; a viewer on a phone over LTE gets the 540p stream; a viewer on a flaky 3G link gets the 180p stream. The forwarding decision can change for any individual viewer at any time, mid-stream, without re-encoding — the SFU simply switches which of the three streams it puts on that viewer's downstream. This is the WebRTC equivalent of adaptive-bitrate (ABR) selection in HLS or DASH, except it happens server-side, frame-accurate, and within the same RTP session.

The cost is borne by the broadcaster: encoding three versions instead of one consumes roughly 2.0–2.5× the CPU of single-rendition encoding (the encoder shares some early-stage computations across renditions). The benefit is borne by the viewer: every viewer gets the right rendition for their network and device, and the SFU's downstream-bandwidth load to each viewer is matched to what that viewer can actually consume.

SVC — Scalable Video Coding

Scalable Video Coding (SVC) is the more elegant solution. The broadcaster encodes one stream, but the stream is organised into multiple layers — typically three temporal layers (the same picture at 7.5, 15, and 30 fps), and two or three spatial layers (the same picture at 360p, 720p, and 1080p). Each layer is a slice that the SFU can drop or keep on a per-frame basis. To send a viewer the 540p-at-15-fps version, the SFU forwards the base spatial layer and one temporal-enhancement layer and drops the rest. To upgrade that viewer to 1080p-at-30-fps when their bandwidth recovers, the SFU starts forwarding more layers. The viewer's decoder reconstructs the highest-quality picture from whichever layers arrive.

SVC has two practical advantages over simulcast: the broadcaster only encodes one stream instead of three (lower CPU on the publisher), and the SFU can switch renditions on every frame with no glitch (no need for a new IDR-frame to be available in the destination rendition). The disadvantage is that the codec must support SVC. VP9 has supported SVC since 2014 (Google has used VP9-SVC inside Meet for years). AV1 supports SVC as a first-class feature and is the codec the industry is converging on for WebRTC delivery in 2026. H.264 supports SVC only via the optional Annex G extension, which most encoder hardware does not implement, so H.264-based stacks fall back to simulcast.

The combination of simulcast (or SVC) plus an SFU is what makes WebRTC delivery scale in the width dimension — one publisher feeding heterogeneous viewers. It does not, by itself, scale in the count dimension — one publisher feeding many viewers. That is what cascading is for.

Cascading SFUs — how WebRTC reaches a million viewers

A single SFU on a single machine handles, in well-tuned production deployments, a few hundred to a few thousand concurrent WebRTC subscriptions before either CPU or downstream bandwidth saturates. To serve more viewers than one machine can hold, the industry uses cascading SFUs — a tree of SFUs where the root SFU pulls the broadcaster's stream and forwards it to a small number of child SFUs, each of which fans out to its own set of viewers, and which may themselves have grandchild SFUs in front of them.

The math is what makes this work. If each SFU handles 1,000 downstream viewers, a two-tier tree with 1 root and 100 leaf SFUs handles 100,000 viewers; a three-tier tree with 1 root, 100 mid-tier, and 100 leaves on each mid-tier handles 10 million. The hop count from the broadcaster to the most distant viewer is 3, and each hop adds 20–80 ms of latency, so the total added latency from cascading is about 60–240 ms — small enough that the end-to-end budget stays under 500 ms in most production deployments.

Cascading has two engineering wrinkles that vendors solve differently. The first is who pulls from whom: every cascading deployment must define how a leaf SFU discovers its parent (so it can request the stream) and how the root SFU knows about all the children (so it does not over-provision). LiveKit's published architecture uses a mesh of SFUs with peer-to-peer routing via Redis; Cloudflare Stream uses Durable Objects on its global network to pin a stream to one origin and have edge points pull on demand; Dolby OptiView Real-time Streaming (formerly Millicast) runs a proprietary CDN-shaped overlay. Each pattern is a working answer to the same problem.

The second wrinkle is failure isolation. If one child SFU dies mid-broadcast, the viewers under it must reconnect to a sibling without disrupting the rest of the tree. The patterns here are well-known from CDN engineering — health checks, fast reroute, viewer-side reconnection with exponential backoff — and every WebRTC delivery vendor implements some version of them.

The reason the world's largest WebRTC deployments still cap somewhere between a few hundred thousand and a few million concurrent viewers is not the tree shape, which scales arbitrarily, but the cost per viewer and the operational complexity of running that many media servers. We will look at the cost in a moment.

WHEP — the standardised egress for WebRTC delivery

For most of WebRTC's first decade, every vendor invented its own signalling protocol — the side channel that exchanges the SDP offer and answer before the WebRTC connection comes up. Cloudflare's signalling was different from LiveKit's, which was different from Janus's, which was different from mediasoup's, which was different from Dolby's. Every player had to be re-implemented for every vendor's stack. RTMP did not have this problem — the protocol shipped with its own signalling — and the industry knew it was a real adoption blocker for WebRTC delivery.

WHIP — WebRTC-HTTP Ingestion Protocol, IETF RFC 9725, published March 2025 — was the first standardised signalling protocol for WebRTC. It defines a single HTTP POST that carries the publisher's SDP offer; the server replies with the SDP answer in the body; the WebRTC connection comes up. WHIP solves the publisher side of the signalling problem — it is the protocol an encoder uses to push a stream into a server.

WHEP — WebRTC-HTTP Egress Protocol — solves the viewer side. The latest revision at time of writing is draft-ietf-wish-whep-03, published March 2026. The protocol mirrors WHIP exactly: one HTTP POST from the viewer carries the SDP offer; the server replies with the SDP answer; the WebRTC connection comes up; the SFU starts forwarding RTP. Once the connection is established, WHEP is out of the loop — transport is plain WebRTC. WHEP is signalling only. The protocol is currently an Internet-Draft, subject to revision, but the design is stable and every major vendor (Cloudflare Stream, Dolby OptiView, Mux Live, LiveKit, Ant Media, OvenMediaEngine) has shipped a production WHEP endpoint.

WHEP is the reason a WebRTC delivery player in 2026 can switch between back-ends almost as easily as an HLS player switches between origins. A single open-source player (the WHEP-compatible portions of hls.js, the LiveKit RoomConnection SDK in WHEP mode, OvenPlayer, THEOplayer's WebRTC mode) can play from any conformant WHEP server. This is the standardisation that finally unlocks WebRTC for delivery at the same player-portability level the HTTP-based protocols have enjoyed since 2010.

End-to-end WebRTC delivery architecture showing the broadcaster pushing one stream to an ingest SFU via WHIP, a cascading tree of regional SFUs fanning out, each leaf SFU serving thousands of viewers via separate WebRTC connections negotiated by WHEP, and a parallel WebRTC-to-HLS bridge feeding a CDN for the broadcast tail audience.

Figure 1. A production WebRTC delivery architecture in 2026. The broadcaster encodes once and ingests with WHIP to a regional origin SFU. The origin replicates the RTP stream to a tier of regional SFUs over a private backbone; each regional SFU fans out to a tier of edge SFUs, which serve viewers over WebRTC negotiated by WHEP. A side branch decodes the RTP stream once and re-packages it as LL-HLS for the broadcast-tail audience that does not need sub-second latency. The same architecture serves both the 200–500 ms interactive layer and the 2–6 second mass-broadcast layer from a single ingest.

The latency budget — where the 200–500 ms goes

The headline latency number for WebRTC delivery is 200–500 ms glass-to-glass. Walking through where each piece of that budget goes makes it concrete and shows where vendors are still finding wins.

Take a representative configuration: a broadcaster in Berlin streaming to a viewer in San Francisco via a cascading SFU tree with one root SFU in Frankfurt, one mid-tier SFU in Washington, and one edge SFU in San Jose.

Capture and encode (broadcaster):              30–60 ms
Network: Berlin → Frankfurt (RTP):              5–15 ms
SFU forwarding (Frankfurt):                    10–30 ms
Network: Frankfurt → Washington (private):     90–110 ms
SFU forwarding (Washington):                   10–30 ms
Network: Washington → San Jose (private):      55–70 ms
SFU forwarding (San Jose):                     10–30 ms
Network: San Jose → viewer (public, last mile): 5–25 ms
Decode and render (viewer):                    30–60 ms
Jitter buffer (variable):                      30–80 ms
                                              -----------
Total:                                        275–510 ms

Two budgets dominate. The first is the long inter-region network leg between Frankfurt and Washington — this is physics, fibre-optic transit speed across the Atlantic, and there is no protocol on earth that beats it without putting a satellite in the middle. Every WebRTC delivery vendor mitigates it by running their backbone on private peering with major IXPs, not the public internet — that is what shaves 30–50 ms off the same path compared to a default route.

The second budget is the jitter buffer at the viewer, which the WebRTC stack adjusts dynamically based on how much loss and reordering it sees on the last-mile path. On a clean Wi-Fi connection, the jitter buffer can run as low as 30 ms; on flaky cellular, the WebRTC stack will pull it up to 80 ms or more to avoid frame freezes. The trade is brutal: every 10 ms of jitter buffer is 10 ms of extra latency that the viewer cannot recover from later in the path.

The encoder and decoder paths can be shrunk by giving the broadcaster a hardware encoder (NVENC, QuickSync, VideoToolbox), but they cannot be eliminated. The SFU forwarding latency is roughly the time to receive a UDP packet, run it through the RTP header rewrites that cascading requires, and send it out again — modern SFUs running on bare metal sit at the low end of the 10–30 ms range; SFUs running in a busy Kubernetes pod sit at the high end.

The end-to-end number is between 275 and 510 ms in this example, depending on conditions. For the same broadcaster reaching the same viewer over LL-HLS, the comparable budget — including a 2-second target buffer at the player — is between 1,500 and 3,000 ms. The WebRTC delivery path is between 3× and 10× faster end-to-end. That is what people are buying.

The cost — why WebRTC delivery is the expensive option

WebRTC delivery costs more than HLS delivery, per viewer-hour, by roughly two orders of magnitude. This is a function of how the two protocols use server resources, not a function of vendor pricing. Two specific numbers explain the gap.

The first is server compute per viewer. An HLS edge serving cached segments to a viewer consumes effectively zero compute per viewer once the segment is in cache — the edge is doing what an HTTP cache does, which is what HTTP caches were designed for. An SFU serving a WebRTC viewer maintains a DTLS session, runs SRTP encryption on every outgoing RTP packet, processes RTCP feedback, runs a congestion-control loop, and re-routes packets through the cascading tree. The CPU cost per concurrent viewer on a typical SFU is roughly 1–3 mCPU continuous (out of 1,000 mCPU = one full core) — a single machine with 32 cores serves on the order of 10,000–30,000 concurrent viewers. The same machine acting as an HLS edge serves several million.

The second is egress bandwidth from a private network. An HLS or LL-HLS edge serves bytes from a CDN's private edge into the last mile. The CDN's wholesale cost is dominated by transit and peering — typically $0.01–$0.05 per GB at the volumes the major streaming services run at. A WebRTC SFU egress also pushes bytes into the last mile, but because the SFU needs to maintain a real-time UDP connection to each viewer through their NAT, the egress runs through capacity provisioned for low jitter and low loss, which costs more — typically $0.05–$0.15 per GB. The bandwidth that cascades between SFUs runs on the same private backbone as a CDN's, so that part is no more expensive than HLS.

The compounded effect on a representative live event — say, 100,000 concurrent viewers for a 1-hour broadcast at an average of 1.5 Mbps per viewer — looks like this:

LL-HLS path:
  Total bytes:     100,000 × 1.5 Mbit/s × 3,600 s / 8 = 67.5 TB
  Edge egress:     67.5 TB × $0.02/GB                 = $1,350
  Origin compute:  Negligible (cache hit ratio > 99%)
                                                       --------
  Total:                                                $1,350

WebRTC delivery path:
  Total bytes:     67.5 TB                             same
  Edge egress:     67.5 TB × $0.10/GB                 = $6,750
  SFU compute:     100,000 viewers × 1 hour × ~$0.0005/min × 60 = $3,000
                                                       --------
  Total:                                                $9,750

WebRTC delivery costs roughly 7× more than LL-HLS for the same audience at this scale. The number gets worse as the audience grows — because LL-HLS continues to scale on edge caching while WebRTC compute scales linearly — and it gets better as the audience shrinks (a 1,000-viewer interactive show might pay $100 either way, in which case the WebRTC latency advantage is essentially free). The break-even point in most production stacks sits around 5,000–10,000 concurrent viewers: below that, WebRTC's cost penalty is small enough to be worth paying for the latency win; above that, vendors usually mix WebRTC (for the interactive tier) with LL-HLS or DASH (for the broadcast tail).

The hybrid stack — WebRTC for the front row, LL-HLS for the back

The architectural pattern every serious WebRTC delivery platform ships in 2026 is the hybrid stack: WebRTC delivery for the interactive top layer of the audience, LL-HLS or DASH for the broadcast tail. The picture is roughly this:

                        +-----------------+
                        |   Broadcaster   |
                        +--------+--------+
                                 |  WHIP / RTMP
                                 v
                        +--------+--------+
                        |   Origin SFU    |
                        +--+---------+----+
                           |         |
              WebRTC RTP   |         |  RTP → decode → HLS pkg
                           v         v
              +------------+--+   +--+--------------+
              |  Cascading   |   | WebRTC → HLS    |
              |  SFU tree    |   |   bridge        |
              +------+-------+   +-------+---------+
                     |                   |
            WebRTC + WHEP            LL-HLS + CDN
                     |                   |
                     v                   v
              [ 5K viewers ]       [ 5M viewers ]
              (200–500 ms)         (1.5–3 sec)

The bridge piece — a server that takes the WebRTC RTP stream, decodes it once, packages it as CMAF, and emits HLS or DASH — runs once per stream, not once per viewer. The decode-once cost is real (1.0–2.5 CPU cores per rendition continuously while the bridge runs), but the output then scales with HTTP edge caching like any other LL-HLS stream. The viewer's experience differs by tier: viewers who picked "Interactive" or who triggered an interactive feature like chat or hand-raise get routed to the WebRTC tier and pay for sub-500 ms latency in the form of higher infrastructure cost; viewers in the broadcast tail get LL-HLS at much lower cost per viewer, with 2–3 second latency, which is good enough to follow the show.

This is the architecture Cloudflare Stream, AWS IVS Real-Time + IVS Channel, LiveKit (via its multistreaming-to-RTMP feature), and Dolby OptiView all ship. The dividing line between tiers is product, not engineering — every vendor surface lets you decide whether a given viewer joins the WebRTC tier or the broadcast tier, often based on whether the user has clicked anything that needs interactivity.

What WebRTC delivery does not do well

Three known limitations make WebRTC delivery the wrong tool for some workloads even when its latency advantage looks attractive.

Smart TV and set-top box reach is uneven. Browsers handle WebRTC natively. iOS and Android handle it natively. The 2026 reality on smart TVs is patchier: Samsung Tizen, LG webOS, and Vidaa all ship WebRTC stacks in their built-in browsers, but the implementations are 2–3 years behind Chrome and frequently miss the codec profiles the SFU is sending. Roku BrightScript has no WebRTC stack at all. Apple TV's native player does not speak WebRTC; you need a custom tvOS app that uses the WebKit-based WKWebView or a third-party WebRTC SDK. If your audience is dominated by smart-TV and set-top viewers, WebRTC delivery is a much harder fit than LL-HLS, which every smart TV supports.

DRM is a hard problem. Common Encryption with cbcs is the standard DRM mode for HLS and DASH (and we covered it in detail in the CMAF article and the Common Encryption article). WebRTC does not have a native equivalent — SRTP encryption protects the media in transit between the SFU and the viewer's WebRTC stack, but once the stack hands the frame to the renderer, the bytes are decrypted in browser memory and are accessible to a determined attacker. Vendors who need DRM-equivalent protection on WebRTC delivery either re-encrypt at the application layer (which the player must support, and which is non-standard) or accept that the protocol is wrong for high-value content. Most production WebRTC delivery deployments serve user-generated or interactive content rather than premium licensed catalogue.

Captions, multi-audio, and trick play are weak. HLS and DASH have a mature stack for caption rendering (WebVTT, IMSC), multiple audio renditions (Dolby Atmos, multilingual), and trick-play modes (seek, scrub, DVR). WebRTC's model is real-time-by-design: there is no manifest to enumerate alternates, no segment structure to seek inside, no spec for caption frames. Vendors patch this with custom data-channel streams for captions, which the player must handle out-of-band, and they patch the multi-audio problem with multiple RTP audio streams in the same WebRTC session, which works but is fiddly. If your product needs broadcast-grade accessibility and a feature-rich viewer experience, LL-HLS or DASH is the cheaper engineering path.

The full feature-by-feature comparison is the subject of the protocol comparison matrix article; the short version is that WebRTC delivery is purpose-built for sub-second interactive use cases and is consistently the wrong choice for everything else.

Figure 2. The 2026 delivery-protocol landscape on a latency-versus-scale plane. WebRTC delivery occupies the sub-second corner up to roughly a million concurrent viewers; LL-HLS and LL-DASH dominate from 2-second latency upward at billion-viewer scale; HESP slots between them with a 400 ms target at lower scale; Media over QUIC is the emerging protocol that aims to combine WebRTC-like latency with HLS-like fan-out economics.

A short tour of the 2026 vendor landscape

The platforms shipping WebRTC delivery in 2026 cluster into three groups.

The cloud platforms. Cloudflare Stream, AWS IVS Real-Time (Stages), and Mux Live each ship managed WebRTC delivery with WHIP ingest, WHEP egress, and a hybrid LL-HLS tail. Cloudflare's pitch is the largest edge footprint (300+ cities, points-of-presence within 50 ms of 95% of the internet's users), Workers and Durable Objects for routing logic, and aggressive sub-second latency in browser. AWS IVS Real-Time uses the Twitch-derived stack and emphasises interactive Stages with up to 12 hosts and "beyond 25,000" concurrent viewers per stage. Mux Live focuses on developer ergonomics and ships open standards (WHIP / WHEP) by default.

The specialist real-time platforms. Dolby OptiView Real-time Streaming (formerly Millicast / Dolby.io) is the oldest pure-play WebRTC delivery service, advertising 500 ms latency to audiences of 60,000+. LiveKit is the leading open-source platform, with a horizontally-scaling SFU mesh and millions-of-viewers claims on its Cloud service. Both Dolby and LiveKit publish substantial engineering content on how their cascading-SFU stacks are built.

The self-hosted stacks. Ant Media Server, OvenMediaEngine, Janus Gateway, mediasoup, Pion, and the older Jitsi Videobridge all let teams run their own SFU. The trade is that the operator is responsible for cascading, NAT traversal, TURN provisioning, monitoring, and scaling — which is a real engineering investment, but it is the right choice for teams that need the data-residency, compliance, or cost control that a managed service does not give.

The boundary between these groups is not sharp — LiveKit Cloud is a managed offering on top of an open stack, Cloudflare Stream uses pieces that look like an open SFU under proprietary routing, and Ant Media offers both self-hosted and cloud SKUs. What matters for an architecture choice is not the vendor name but the answer to a small list of questions: Is the latency target really sub-second, or would 2–3 seconds do? How many concurrent viewers, peak? Do they need DRM? Do they need smart-TV reach? Are they willing to operate WebRTC infrastructure, or do they want a managed service? Those answers route the design into one of the three groups in less than an hour.

Common mistakes (the "we wish we'd known" list)

Treating WebRTC like a CDN. The most expensive mistake is treating a WebRTC SFU as if it were a CDN edge — provisioning for "viewer count × average bitrate" instead of "viewer count × per-connection CPU". A 100k-viewer event in LL-HLS sits on the edge cache at near-zero compute cost. The same 100k-viewer event over WebRTC needs roughly 3–10 SFU machines actively running, not just bandwidth. Teams that draw the architecture as if the SFU is a cache discover the cost gap on the bill.

Assuming the broadcaster can simulcast. Simulcast multiplies the publisher's encode cost. A mid-range laptop encoding 1080p H.264 with a software encoder can handle one rendition; ask it for three simulcast renditions and the CPU saturates and the framerate drops to single digits. Production WebRTC delivery either runs simulcast on the broadcaster's hardware encoder (NVENC on a desktop, or a hardware capture card), or it runs SVC with AV1 (which is one stream and lighter on the publisher), or it does the multi-rendition split server-side, which costs CPU at the SFU instead.

Skipping TURN. WebRTC's UDP-first design works great until the viewer is on a corporate network that blocks UDP entirely. Without a TURN relay configured, those viewers fail to connect with no useful error message. Every production stack needs TURN provisioned, sized for the percentage of viewers that fall back to it (typically 10–20% of corporate viewers, 1–3% of consumer viewers), and monitored separately. TURN bandwidth is also more expensive than direct WebRTC bandwidth because it runs through the TURN server's egress rather than the SFU's; budget accordingly. The full mechanics live in the NAT, STUN, TURN, ICE article.

Picking codecs the SFU cannot route. Not every SFU supports every codec. Most SFUs handle H.264 baseline and main profile and VP8 by default; H.265 / HEVC is rarely supported in WebRTC SFUs because of the patent licensing; AV1 is supported in modern SFUs (LiveKit, mediasoup as of 2024, Cloudflare Stream) but is not yet universal. Picking a codec that the viewer's browser supports but the SFU cannot route results in the SFU dropping the stream entirely. Test the codec-and-rendition matrix end-to-end before shipping.

Underestimating the player work. A WebRTC delivery player is not a drop-in replacement for an HLS player. It must speak WHEP, handle the SDP exchange, manage the ICE candidates, recover from network interruptions in real time (not just buffer-and-retry like an HLS player), and render frames the moment they arrive rather than after a buffer fills. A team that has only ever shipped HLS players underestimates this work by a factor of two or three. The cleanest pattern in 2026 is to use a vendor SDK (LiveKit, Cloudflare Stream WebRTC SDK, Dolby OptiView SDK) for the WebRTC layer and integrate it with the rest of the player UI rather than building from RTCPeerConnection upward.

Where Fora Soft fits in

Fora Soft has shipped WebRTC-based real-time video systems since 2011, across all five of the verticals where sub-second latency actually changes the product: video conferencing (the original WebRTC use case), live broadcast with interactive layers (game shows, auctions, live shopping), telemedicine (clinician-patient consultations where the delay must feel like a phone call), e-learning (live classes with hand-raise and shared whiteboard), and video surveillance (operator-to-camera control where every second of lag is a missed event). We have built on mediasoup, LiveKit, Janus, and proprietary SFUs in different projects; we have shipped WHIP and WHEP endpoints; and we have wired the WebRTC-to-LL-HLS bridge for production hybrid stacks. The architecture choices in this article are the ones we make in scoping calls every week.

Call to action

Talk to a streaming engineer — book a 30-minute scoping call to talk through your webrtc streaming plan.
See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
Download the WebRTC Delivery Decision Worksheet (2026) — A one-page printable worksheet for the seven architecture questions every team should answer before committing to WebRTC delivery: latency target, peak concurrent viewers, device coverage, DRM requirement, hybrid-stack tolerance,….

References

IETF RFC 8825 — Overview: Real-Time Protocols for Browser-Based Applications. H. Alvestrand, February 2021. The umbrella applicability statement for the WebRTC protocol family. <https://datatracker.ietf.org/doc/html/rfc8825>
IETF RFC 9725 — WebRTC-HTTP Ingestion Protocol (WHIP). S. Garcia Murillo, A. Gouaillard, March 2025. Standards Track. The first standardised WebRTC signalling protocol. <https://www.rfc-editor.org/rfc/rfc9725.html>
IETF draft-ietf-wish-whep-03 — WebRTC-HTTP Egress Protocol (WHEP). S. Garcia Murillo, A. Gouaillard, March 2026. Internet-Draft — subject to revision before RFC publication. The viewer-side mirror of WHIP. <https://datatracker.ietf.org/doc/html/draft-ietf-wish-whep-03>
W3C webrtc Candidate Recommendation Snapshot — WebRTC: Real-Time Communication in Browsers. C. Jennings, F. Boström, J.-I. Bruaroey, H. Boström. Snapshot 2024-11-26. The JavaScript API specification. <https://www.w3.org/TR/webrtc/>
IETF RFC 3550 — RTP: A Transport Protocol for Real-Time Applications. H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, July 2003. The base RTP protocol every SFU forwards. <https://www.rfc-editor.org/rfc/rfc3550>
IETF RFC 5764 — DTLS Extension to Establish Keys for the Secure Real-time Transport Protocol (SRTP). D. McGrew, E. Rescorla, May 2010. The DTLS-SRTP key agreement that protects every WebRTC media leg. <https://www.rfc-editor.org/rfc/rfc5764>
Cloudflare blog (2022-09) — WebRTC live streaming to unlimited viewers, with sub-second latency. The original architectural description of Cloudflare Stream's WebRTC delivery, including Workers, Durable Objects, and WHIP/WHEP. <https://blog.cloudflare.com/webrtc-whip-whep-cloudflare-stream/>
Cloudflare Stream documentation — WebRTC beta. The current production documentation for Cloudflare Stream's WHIP/WHEP endpoints. <https://developers.cloudflare.com/stream/webrtc-beta/>
LiveKit engineering blog — How we built a globally distributed mesh network to scale WebRTC. The published architecture of LiveKit Cloud's cascading SFU topology, peer-to-peer routing via Redis, and sub-100 ms backbone claims. <https://blog.livekit.io/scaling-webrtc-with-distributed-mesh/>
Dolby OptiView documentation — Real-time Streaming overview. The architecture of the Dolby OptiView (formerly Millicast / Dolby.io) cascading WebRTC CDN, including the 500 ms / 60k viewers claim. <https://optiview.dolby.com/docs/millicast/>
AWS IVS Real-Time Streaming developer guide — What is Amazon IVS Real-Time Streaming? The architecture of AWS IVS Stages, including the 12-host and 25k-viewer ceilings and the Composition + Channel hybrid stack. <https://docs.aws.amazon.com/ivs/latest/RealTimeUserGuide/what-is.html>
mediasoup documentation — The open-source SFU's architecture, simulcast and SVC support, and production tuning notes referenced for the 500–800 video-participants-per-node figure. <https://mediasoup.org/documentation/>
Bitmovin Video Developer Report 2025/26 — The annual industry survey covering protocol mix, WebRTC adoption, and codec deployment across 700+ surveyed streaming engineers. <https://bitmovin.com/video-developer-report/>

In conflicts between sources, this article followed the standards documents (tier 1) and noted where lower-tier sources (vendor blogs, draft Internet-Drafts) sit. WHEP is currently an Internet-Draft (draft-ietf-wish-whep-03); the protocol design is stable across vendor implementations, but the document is subject to change before RFC publication and should be re-verified against the latest draft at time of build.

WebRTC Delivery: From Peer-to-Peer to Scaled Distribution

Why This Matters

What WebRTC was designed for, and what changed

From two peers to many — the three topologies

Mesh

MCU — Multipoint Conferencing Unit

SFU — Selective Forwarding Unit

How the SFU actually scales — simulcast and SVC

Simulcast

SVC — Scalable Video Coding

Cascading SFUs — how WebRTC reaches a million viewers

WHEP — the standardised egress for WebRTC delivery

The latency budget — where the 200–500 ms goes

The cost — why WebRTC delivery is the expensive option

The hybrid stack — WebRTC for the front row, LL-HLS for the back

What WebRTC delivery does not do well

A short tour of the 2026 vendor landscape

Common mistakes (the "we wish we'd known" list)

Where Fora Soft fits in

What to read next

Call to action

References

Related glossary terms

WebRTC Delivery: From Peer-to-Peer to Scaled Distribution

Why This Matters

What WebRTC was designed for, and what changed

From two peers to many — the three topologies

Mesh

MCU — Multipoint Conferencing Unit

SFU — Selective Forwarding Unit

How the SFU actually scales — simulcast and SVC

Simulcast

SVC — Scalable Video Coding

Cascading SFUs — how WebRTC reaches a million viewers

WHEP — the standardised egress for WebRTC delivery

The latency budget — where the 200–500 ms goes

The cost — why WebRTC delivery is the expensive option

The hybrid stack — WebRTC for the front row, LL-HLS for the back

What WebRTC delivery does not do well

A short tour of the 2026 vendor landscape

Common mistakes (the "we wish we'd known" list)

Where Fora Soft fits in

What to read next

Call to action

References

Related glossary terms

WebRTC delivery (egress)

mediasoup

Cache hit ratio (CHR)

Trick play

Jitsi Videobridge

Congestion control

Live streaming

BrightScript