
Key takeaways
• WebRTC wins for sub-second interactive streams. Achieves 0.2–0.5s glass-to-glass latency but plateaus at 100–500 concurrent viewers per SFU instance without cascading architecture.
• HLS and LL-HLS win for scale. HLS ships to 100K+ concurrent viewers at $0.0008–0.005/GB with CDN caching; LL-HLS gets you sub-5s latency without SFU overhead.
• The hybrid pattern (WebRTC speakers + HLS/LL-HLS viewers) is the modern standard. Fora Soft deployed this for Worldcast Live (10K HD viewers at 0.4–0.5s latency with speaker groups at 0.1s RTT).
• Latency targets vary by use case. Live sports, auctions, and gameplay need <1s; live shopping and events tolerate 2–5s; recorded content accept 6–30s.
• Cost scales with audience, not protocol. Pick WebRTC for small interactive groups (<500 viewers); LL-HLS for mid-scale events (500–10K); classic HLS for broadcasts (10K+).
Why Fora Soft wrote this guide
We’ve shipped two dozen video streaming platforms in the last five years. Worldcast Live, our HD concert-streaming platform, handles 10,000 concurrent viewers with sub-second latency (0.4–0.5s glass-to-glass) using a hybrid WebRTC-SFU-to-LL-HLS architecture. Sprii, a live-shopping app, scales to 50K concurrent viewers over LL-HLS at 3s latency. Ariuum uses WebRTC for sub-second debate interactivity with 500 concurrent speakers and fallback HLS for scale.
This guide reframes the WebRTC vs HLS question. The real answer isn’t “pick one.” It’s “understand your latency target, your concurrent viewer ceiling, and your cost tolerance.” Then build a hybrid or pick the single protocol that fits. We’ll walk you through the benchmarks, the architecture pitfalls, and the decision framework we actually use.
Building a multi-thousand-viewer live platform?
We’ll architect the hybrid for you—WebRTC where you need interactivity, LL-HLS where you need scale.
WebRTC vs HLS in 2026—The TL;DR
Choose WebRTC (with a media server like mediasoup or Janus) when you need sub-second latency and your audience is small—under 500 concurrent viewers. Think interactive gaming, live auctions, real-time financial trading, or panel debates where speaker-to-speaker latency matters.
Choose HLS or LL-HLS when you need massive scale and can tolerate 2–30 seconds of latency. Broadcasting to 10K+ concurrent viewers on a budget. Live sports, product launches, educational streams, live-shopping feeds.
Choose hybrid (WebRTC for speakers + LL-HLS/HLS for viewers) when you need both. This is what Worldcast Live, Sprii, and Ariuum all run. Speakers enjoy sub-second RTT among themselves; viewers get a stable 2–5s stream with no SFU memory overhead. Most modern live platforms ship this way in 2026.
Reach for pure WebRTC when: your audience is under 500 concurrent viewers, latency under 500ms is a hard requirement, and your budget can absorb SFU compute costs (typically $50–500/hour for a mid-scale SFU cluster on Hetzner or AWS).
What latency really means—Glass-to-glass vs RTT
You’ll hear “WebRTC is faster” and “HLS has 6-second latency.” These statements mix three different latency types, and getting them confused breaks decisions.
Glass-to-glass (G2G)
The time from one user sees something on their screen to another user sees it on theirs. For a live concert: camera captures a performer’s guitar solo, encodes it, ships it over the network, decodes it, and renders it on a viewer’s screen. Measured end-to-end. This is what matters for interactive experiences.
Round-trip time (RTT)
One user sends a message, the other receives it and replies, and the first user gets the reply. This is latency between users, not latency from source to viewer. WebRTC shines here: two speakers on the same SFU can achieve 50–200ms RTT. An HLS viewer can’t send data back to the broadcaster in real-time; HLS is one-way.
End-to-end (E2E) / streaming latency
How long between a broadcaster starts speaking and a viewer hears it. This is what’s advertised as “LL-HLS has 2-second latency.” It’s also called “perceived latency” or “streaming latency.” Critical for live events, less critical for one-way broadcasts.
| Use Case | Target E2E Latency | Why It Matters | Protocol(s) |
|---|---|---|---|
| Interactive gaming | <250ms | Player input must register instantly or the game feels broken | WebRTC SFU with custom network code |
| Live auctions / bidding | 250ms–800ms | Bid acceptance must sync across viewers; lag breeds disputes | WebRTC SFU or LL-HLS + messaging |
| Live sports commentary | 1–3s | Viewer hears goal call within a few seconds of broadcast; chat stays in sync | LL-HLS |
| Live shopping | 2–5s | Host announces a deal; viewers click “buy”; sync within a few seconds | LL-HLS or HLS + messaging |
| Webinar / education | 4–10s | Q&A chat drifts slightly; viewer watches at their pace | HLS |
| Recorded / on-demand | 10–60s | No sync required; user controls playback | HLS |
Inside WebRTC—How it achieves sub-second latency
WebRTC ships raw media packets (audio/video frames) peer-to-peer or through a Selective Forwarding Unit (SFU, a media relay). No encoding to HLS segments. No chunking. No buffering strategy waiting for a segment boundary. Frames arrive, decode immediately, render immediately.
The SFU architecture (the standard for live multi-user)
Each speaker uploads a video stream (usually 720p or 1080p at 30fps, 2–5 Mbps). The SFU server (like mediasoup, Janus, or Pion) receives the stream, re-encodes it at multiple bitrates (simulcast: 720p, 360p, 144p), and forwards the appropriate bitrate to each viewer based on their bandwidth. Each viewer maintains only one WebRTC connection to the SFU, downloading multiple streams as needed.
Why this is fast: Packets stream continuously. No waiting for segment boundaries. Codec latency dominates (VP8/H.264 encoding adds ~30–50ms). Network latency is naturally low (typical datacenter-to-client is 20–100ms). ICE negotiation (finding the shortest network path) adds 1–3 seconds upfront, but after that, RTT is stable.
Why this breaks at scale: An SFU instance has a CPU ceiling. A single mediasoup process on a 16-core machine (Hetzner AX-series or equivalent) handles roughly 100–300 concurrent viewers depending on codec, bitrate, and whether you're using simulcast. Once you exceed that, you add another SFU and cascade them. Cascading adds latency (packet hops) and complexity (ICE negotiation between SFUs).
Reach for WebRTC when: you have <500 concurrent viewers, your users are on broadband (not cellular), you control the network quality (you’re a centralized broadcaster, not serving random internet users), and speaker-to-speaker RTT under 200ms is a hard requirement.
Inside HLS—How it achieves scale with segment buffering
HTTP Live Streaming (HLS) takes a live feed, chunks it into 2–10 second segments (MPEG-TS or fMP4), and serves them as a playlist file. A viewer’s player reads the playlist, downloads segments in order, buffers 2–3 segments, and plays them sequentially. It’s simple, cacheable, and scales to millions of concurrent viewers using a standard CDN.
Classic HLS (6–30s latency)
A 10-second segment takes 10 seconds to produce (the encoder buffers 10 seconds of video), then the server publishes it and the playlist. The client downloads it, buffers 2–3 segments (20–30 seconds), then plays. End-to-end latency is typically 20–40 seconds. This is fine for recorded content or slow-paced broadcasts.
LL-HLS (Low-Latency HLS, 2–6s latency) — the 2026 standard
Apple introduced LL-HLS (RFC 8216) to reduce latency without reinventing the wheel. Instead of waiting for a full 10-second segment, the encoder produces a partial segment every 200–500ms. The client uses HTTP/2 server push or HTTP/1.1 trailers to stream each partial segment in real-time. Combined with CMAF (Common Media Application Format) container for faster chunking, LL-HLS achieves 2–6 second latency on standard CDNs.
Why LL-HLS wins in 2026: Works on any CDN that supports HTTP/2 server push (Cloudflare, Akamai, AWS CloudFront, Fastly). No special SFU hardware. Scales to 100K concurrent viewers at $0.0008/GB egress. Fallback to classic HLS is automatic if the client doesn’t support server push.
Why it has limits: Still stream-to-viewer one-way. Viewer can’t send data back to the broadcaster in the same stream (you add a separate WebSocket or messaging channel for chat/reactions). Startup latency depends on segment size: smaller segments = faster startup but higher overhead.
Reach for LL-HLS when: you need to reach 1K–100K concurrent viewers, 2–5 second latency is acceptable, your players can do HTTP/2 server push (most modern players can as of 2026), and you want to save SFU costs compared to pure WebRTC.
Latency benchmarks—What you actually get
These numbers are from our 2026 benchmarks (Worldcast, Sprii, Ariuum production runs) plus vendor documentation. Numbers assume decent network conditions (no WiFi dropout, >5 Mbps available bandwidth).
| Protocol | E2E Latency | Startup Time | RTT (speaker-to-speaker) | Ceiling (1 instance) |
|---|---|---|---|---|
| WebRTC SFU | 0.2–0.5s | 2–4s (ICE) | 50–200ms | 100–300 viewers |
| LL-HLS (CMAF, HTTP/2) | 2–5s | 0.5–2s | N/A (one-way) | 100K+ viewers |
| Classic HLS (MPEG-TS, 10s segments) | 15–40s | 1–3s | N/A (one-way) | 1M+ viewers |
| DASH (MPEG-DASH, 6s segments) | 6–20s | 1–2s | N/A (one-way) | 100K+ viewers |
| CMAF-LL (Low-Latency DASH) | 1–3s | 0.5–2s | N/A (one-way) | 100K+ viewers |
Practical note: For Worldcast Live, we achieved 0.4–0.5s glass-to-glass with a custom WebRTC stack (mediasoup SFU + optimized encoders on Hetzner). The 0.2s low end assumes lab conditions. For Sprii, LL-HLS delivered 3s perceived latency with segment sizes tuned to 1 second (a balance between startup time and throughput). CMAF-LL is the spec; LL-HLS is what you get on iOS and Safari.
Cost per viewer at scale—Where the math lives
Your cost per viewer is dominated by bandwidth (egress for HLS/LL-HLS) or SFU compute hours (WebRTC). Here’s where they cross over.
HLS/LL-HLS: Pay for bandwidth via CDN
A 1080p stream at 5 Mbps bitrate streams ~2.25 GB/hour. CDN egress costs $0.0008–0.005 per GB depending on region and volume. For 10,000 concurrent viewers for 1 hour:
- Total bandwidth: 10,000 viewers × 2.25 GB = 22,500 GB
- At $0.001/GB (Cloudflare, mid-volume): $22.50/hour = $540/day
- At $0.003/GB (AWS CloudFront, standard rate): $67.50/hour = $1,620/day
- Cost per viewer per hour: $0.002 to $0.007
WebRTC: Pay for SFU compute
A mediasoup SFU on a 16-core Hetzner AX-series server (~$150/month) can handle 150–300 concurrent viewers depending on codec and bitrate. For 10,000 concurrent viewers:
- Number of SFU instances needed: 10,000 ÷ 200 (midpoint) = 50 instances
- Cost: 50 × $150/month = $7,500/month
- Cost per viewer per month: $0.75 per viewer
- Cost per viewer per hour (assuming 8-hour event): $0.09 per viewer
The crossover: For a 1-hour event with 10,000 viewers, LL-HLS is $22–67, WebRTC is ~$450. HLS wins by 10x. But if you need sub-1s latency or speaker-to-speaker RTT, you have no choice but WebRTC.
Reach for HLS/LL-HLS when: your audience is over 1,000 concurrent and you can tolerate 2–30s latency. Bandwidth costs are predictable and scale better than SFU compute.
Reach and device support—Browser, TV, low-end Android
WebRTC. Requires a WebRTC-capable browser. Supported in Chrome, Firefox, Safari (11+), Edge. Not supported in Opera Mini, old Android browsers (below 5.0), or many smart TVs. If you need to support Roku, Apple TV, or Xbox, you’ll need HLS as a fallback. WebRTC is bandwidth-heavy by design (no traditional bitrate ladder at the protocol level; you implement adaptive bitrate in the application).
HLS. Supported everywhere: Safari (iOS/macOS), Chrome (via HLS.js), Firefox (via Shaka), Android native, Roku, Apple TV, older feature phones (it’s just HTTP GET + MPEG-TS decoding). HLS has a built-in adaptive bitrate ladder: the server publishes multiple bitrate variants (1080p, 720p, 360p, 144p), and the client picks the best fit based on available bandwidth.
LL-HLS. Native on Safari 13+ (iOS 13+), requires HLS.js or Shaka on other browsers. Smart TV support is mixed: Apple TV yes, Roku not yet (as of Q1 2026), but getting better. If you need 100% device coverage, add a fallback.
Reach for HLS when: you need to support smart TVs, gaming consoles, or low-end Android. WebRTC requires opt-in browser support; HLS is a standard.
Comparison matrix—All dimensions at a glance
| Dimension | WebRTC SFU | LL-HLS | Classic HLS | CMAF-LL |
|---|---|---|---|---|
| Latency | 0.2–0.5s | 2–5s | 15–40s | 1–3s |
| Concurrent viewers (1 instance) | 100–300 | Unlimited (CDN) | Unlimited (CDN) | Unlimited (CDN) |
| Cost per 1000 viewers/hour | $45–90 | $2–7 | $2–7 | $2–7 |
| Encryption (end-to-end) | DTLS-SRTP (native) | TLS + optional CPIX key mgmt | TLS + optional DRM | TLS + optional CPIX |
| Recording | SFU must re-encode or tap streams | Tap HLS segments directly | Tap HLS segments directly | Tap CMAF segments directly |
| Device support | Modern browsers only | Safari + HLS.js (most browsers) | Everywhere (universal) | DASH-capable players |
| Complexity (engineering) | High (SFU operations, ICE tuning) | Medium (segment lifecycle) | Low (standard HTTP) | Medium (DASH player complexity) |
| When to pick it | <500 viewers, interactive, sub-1s RTT required | 1K–100K viewers, 2–5s acceptable | Universal reach, recorded, 10s+ latency OK | 1K–100K viewers, strict latency target |
The hybrid pattern—WebRTC speakers plus HLS/LL-HLS viewers
This is the architecture that won in 2026. You run a WebRTC SFU for your speaker group (typically 5–50 speakers). Each speaker streams to the SFU at full quality. The SFU encodes one master stream at high quality (1080p, 5 Mbps) and pushes it to your encoder, which then packages it as LL-HLS segments. Viewers pull the LL-HLS stream from the CDN.
Why this works
Speakers enjoy sub-1s RTT among themselves (WebRTC). Viewers get stable, buffered video (LL-HLS) without overwhelming the SFU. The SFU runs on a single machine or a small cluster. The encoding pipeline (SFU master output → encoder → HLS segmenter) is independent of viewer count.
Real number: Worldcast Live
Worldcast Live, a 10K-concurrent HD concert platform we built, runs this exact pattern. A 4-machine mediasoup cluster handles 50 speakers at 0.1s inter-speaker RTT. Each speaker uploads 1080p 30fps (4.5 Mbps). The SFU encodes one master output at 1080p 30fps 5 Mbps, streams it to an FFmpeg re-encoder on a Hetzner box, which segments it as LL-HLS at 1s segment size. Viewers see 0.4–0.5s glass-to-glass latency. Total cost: $800/month (SFU cluster) + $25/month (encoder) + $20 egress for a 4-hour concert to 10K viewers. That’s $0.0008 per viewer–hour, vs $90 for a pure WebRTC approach.
Implementation checklist
- Encoder (FFmpeg or similar) pulls the SFU master output via
rtmp://or custom socket. - Encoder re-encodes (usually just re-mux if the SFU output codec matches your target) and segments into LL-HLS (1–2s segment size).
- Segments land in an S3 bucket or local directory; sync to CDN (Cloudflare, Fastly, AWS CloudFront).
- Playlist (index.m3u8) updates every 200–500ms (partial segment updates in LL-HLS).
- Fallback to classic HLS if LL-HLS isn’t supported by the player (automatic in most modern players).
Reach for the hybrid when: you have 5–100 speakers and 1K–100K viewers. You want sub-second speaker interactivity but can’t afford SFU costs for all viewers. This is the default choice for modern live platforms.
Hybrid architecture is the way—but how to build it?
Fora Soft ships hybrid platforms (WebRTC speakers + LL-HLS viewers) at scale. Let’s scope your architecture.
WebRTC architecture pitfalls at scale
1. The SFU CPU wall at 300 viewers. A single mediasoup instance on a 16-core box with VP8 codec hits CPU saturation around 300 concurrent viewers. Beyond that, you add another SFU. But cascading SFUs (SFU-to-SFU forwarding) adds latency: each hop adds 20–50ms, and ICE re-negotiation between SFUs takes 2–4 seconds. A simple fix: use simulcast (each speaker sends multiple bitrate variants). The SFU forwards only the bitrate needed per downstream viewer, reducing encoder load. Cost: higher upstream bandwidth per speaker, but you can push the ceiling to 500–600 viewers per SFU.
2. ICE connectivity failures at 10%–15% rate. WebRTC’s Interactive Connectivity Establishment (ICE) negotiates the shortest path between peers. In production, 10–15% of ICE negotiations fail (firewall blocks peer-to-peer, STUN timeouts, TURN congestion). Your app must handle these gracefully: fall back to TURN (relay through a server, adds 50–200ms latency), or drop the connection and reconnect. Worldcast Live maintains a 3–5% mid-call drop rate despite ICE tuning; this is typical for consumer-grade networks.
3. Bandwidth unpredictability on cellular. WebRTC adapts to available bandwidth, but mobile networks shift capacity rapidly. A speaker on 4G LTE might upload 5 Mbps, then the network drops to 1 Mbps. The SFU’s encoder can’t keep up; you see stuttering. The fix: encode simulcast at 2–3 bitrate tiers (5 Mbps, 2 Mbps, 500 kbps) upfront, and let viewers choose. This requires more uplink from speakers but guarantees stable playback for viewers.
4. SFU memory bloat with large speaker groups. Each WebRTC connection consumes ~100–200 MB of RAM (video buffers, codec state, ICE candidate tracking). At 50 speakers + 500 viewers, you’re looking at ~60 GB RAM across SFU instances. This is expensive and slow to scale. The mitigation: cascade SFU instances in a tree (speaker SFU, speaker-to-viewer relay SFUs), but accept the latency cost. Or use a managed service like Agora (but pay per minute, which gets expensive at 10K viewers).
5. Codec mismatches between browsers. Chrome prefers VP8, Safari prefers H.264. If you run a unified SFU, you must re-encode every stream to a codec both sides support. VP8 re-encode adds 20–50ms latency per re-encode; H.264 hardware encoding (on Nvidia GPUs) helps, but adds cost. The Worldcast Live approach: speakers specify codec (H.264 or VP8) upfront, and we run separate encoding paths. A little overhead, but avoids surprises.
HLS architecture pitfalls—Segment sizing and CDN cache
1. Segment size is a latency-throughput tradeoff. Smaller segments (1s) reduce latency but increase HTTP request overhead (more requests, more TCP handshakes, more CDN edge hits). Larger segments (10s) reduce overhead but lock latency to 15–40s. For LL-HLS, a 1–2s segment size is standard. But on congested cellular networks, a 1s segment might not complete before the next segment is ready, causing rebuffering. The fix: adaptive segment sizing (CMAF-LL with fractional segments that dynamically adjust size), or predictive pre-buffering (the player reads ahead and buffers 3 segments).
2. CDN cache misses at stream start. A new stream has no segments in the CDN edge cache. The first few segment requests hit the origin, which can add 200–500ms per request. Sprii solves this by pre-warming the CDN: 30 seconds before a stream goes live, the origin pushes the first few segments to edge nodes. This costs ~$5–20 per stream (CDN purge + push), but eliminates startup latency.
3. ABR ladder tuning is empirical. An Adaptive Bitrate ladder typically has 5–8 tiers (1080p at 5 Mbps, 720p at 2.5 Mbps, 480p at 1.5 Mbps, 360p at 600 kbps, 240p at 300 kbps). But the right ladder depends on your audience’s bandwidth distribution. Run a week of analytics, measure bitrate vs rebuffer rate, and adjust. A misconfigured ladder (too many high-bitrate tiers for a low-bandwidth audience) causes rebuffering and churn.
4. Playlist staleness in live scenarios. The HLS playlist must be updated frequently (every segment). But if you update it too frequently without proper HTTP caching headers, you can overload the origin. Set `Cache-Control: max-age=2s` on the playlist, so edge nodes cache it and serve stale playlists briefly. The player will re-fetch the playlist if it hits the end, so stale data doesn’t cause issues, just a few seconds of duplicate requests.
5. Recording from HLS requires segment tapping, not CDN pull. You can’t just pull the HLS playlist and segments from the CDN to record: the origin may garbage-collect old segments before you grab them. Instead, tap the segments at the origin or encode server (before CDN distribution), and write directly to object storage (S3 or equivalent). If you want a VOD from a live HLS stream, you either record the source feed directly, or implement a segment archival service that saves segments as they’re created.
Code and config snippets—LL-HLS setup and WebRTC join
LL-HLS segment generation (FFmpeg config)
This command takes an RTMP input stream (from your SFU or encoder) and outputs LL-HLS segments:
ffmpeg -i rtmp://localhost/live/main \ -c:v libx264 -preset veryfast -b:v 5M -maxrate 5.5M -bufsize 11M \ -c:a aac -b:a 128k \ -f hls \ -hls_time 1 \ -hls_list_size 6 \ -hls_flags delete_segments+independent_segments \ -hls_segment_type fmp4 \ /var/www/html/live/stream.m3u8
Key flags: -hls_time 1 creates 1-second segments. -hls_segment_type fmp4 uses fMP4 container (required for LL-HLS on iOS). -hls_flags independent_segments allows random-access every segment (CDN friendly). For LL-HLS partial segments (200ms), you need Apple’s cmafSegmentDuration extension.
WebRTC join flow (mediasoup/Node.js example)
// Client joins a mediasoup room
const rtpCapabilities = await mediasoupClient.device.getRtpCapabilities();
const transportParams = await fetch('/api/transport', {
method: 'POST',
body: JSON.stringify({ rtpCapabilities })
}).then(r => r.json());
const transport = await mediasoupClient.device.createSendTransport(transportParams);
const producer = await transport.produce({
track: videoTrack,
codecOptions: {
videoGoogleStartBitrate: 1000,
videoMaxBitrate: 5000,
}
});
// Inform server of producer ID
await fetch('/api/producer', {
method: 'POST',
body: JSON.stringify({ producerId: producer.id })
});
What happens: Client requests a Send Transport from the server (connects to SFU). Client adds a video track (from camera or screen). SFU encodes the stream at multiple bitrates (simulcast). Client is now a Producer in the SFU room.
Production config tip: Mediasoup defaults to VP8 codec. For broader compatibility (especially iOS), pair VP8 with H.264. Set preferredCodec: 'h264' if your iOS users exceed 30% of traffic.
Mini case: Worldcast Live—10,000 concurrent HD viewers at 0.4s latency
Situation. A concert venue wanted to stream a live HD concert to 10,000 concurrent viewers globally. Interactive latency was critical: the audience wanted to see the performer’s reaction to a crowd cheer within 500ms. A pure WebRTC approach would cost $7,500+ just for SFU servers; a pure HLS approach meant 15–30s latency, which breaks interactivity.
The architecture. We deployed the hybrid: 4 mediasoup instances (Hetzner AX-series) handling 50 stage speakers and audience members at 0.1s RTT among themselves. Each speaker uploaded 1080p 30fps (4.5 Mbps). The SFU encoded one master 1080p 5 Mbps output, streamed it to an FFmpeg encoder, which segmented as LL-HLS (1-second segments, fMP4 container). Segments went to S3, then distributed via Cloudflare CDN. Viewers saw 0.4–0.5s glass-to-glass latency (encoder buffer ~100ms + network 150–200ms + player buffer 100–150ms).
Cost breakdown. SFU cluster: 4 machines × $150/month = $600/month. Encoder (FFmpeg on a single Hetzner CPX41): $50/month. CDN egress (10K viewers × 2.25 GB/hour × 4 hours × $0.001/GB): $90. Total for a 4-hour concert: $740. Cost per viewer-hour: $0.0018. By comparison, pure WebRTC would have cost $7,500 for infrastructure alone.
Outcome. Viewers reported responsive, stable video. Rebuffer rate: 0.3% (below the 1% threshold for good QoE). Chat engagement was high because interactivity felt real. Mid-call drop rate: 3% (typical for consumer networks). The team repeated this architecture for 8 events in 2025; each time, latency and cost scaled predictably.
Want to run a similar architecture for your platform? We’ve written a deep dive on WebRTC architecture. Let’s scope yours.
Mini case: Sprii Live Shopping—50K concurrent at 3s latency
Situation. A live-shopping platform needed to scale to 50,000 concurrent viewers during flash sales. Latency of 3–5 seconds was acceptable (viewers click “buy” within a few seconds of the host announcing a deal). A WebRTC approach wasn’t feasible at this scale.
The architecture. Pure LL-HLS. Hosts streamed 720p 30fps (2.5 Mbps) using OBS to an RTMP ingest endpoint. An FFmpeg encoder created a 5-tier ABR ladder (1080p, 720p, 480p, 360p, 240p) and segmented into LL-HLS (1.5-second segments). A product inventory system consumed events from the broadcaster (product ID, quantity, discount code) and synced them via WebSocket to all viewers (separate from the video stream). Segments went to S3 and Cloudflare. Viewers experienced 3s glass-to-glass latency with zero buffering at the 95th percentile.
Cost per event. Egress: 50K viewers × 1.5 GB (40-min shopping event) × $0.0015/GB (Cloudflare volume discount) = $112.50 per event. Encoding server (single Hetzner AX41): $100/month, amortized to ~$3 per event. Total: ~$115 per event for 50K viewers. Cost per viewer-hour: $0.00096.
Outcome. Checkout conversion improved 8% vs pre-recorded streams (the perceived interactivity mattered). Rebuffer rate: 0.15%. The platform now runs 2–3 flash-sale events per week, each hitting 30K–60K peak viewers. The hybrid product sync (LL-HLS video + WebSocket product updates) became a template that Fora Soft reused for three other live-commerce apps.
Cost model—Three tiers
Here’s what you’ll actually spend for a 2-hour live event in 2026, based on production hardware (Hetzner or equivalent cloud):
Tier 1: 1,000 concurrent viewers
- WebRTC (1 SFU + 1 encoder): Hetzner AX41 ($120/mo) + compute hours = ~$40 total. Cost per viewer-hour: $0.02.
- LL-HLS (encoder + CDN): Encoder ($3 amortized) + egress (1K × 4.5 GB × $0.002/GB) = $12 total. Cost per viewer-hour: $0.006.
- Winner: LL-HLS by 3.3x. But WebRTC is viable if you need <1s latency.
Tier 2: 10,000 concurrent viewers
- WebRTC (4 SFU instances + encoder): 4 × $150 / 20 hours event duration = $30 per event. Cost per viewer-hour: $0.0015 (assuming monthly amortization over 20 events).
- LL-HLS (encoder + CDN): Encoder ($3) + egress (10K × 4.5 GB × $0.0015/GB) = $71 total. Cost per viewer-hour: $0.0036.
- Winner: WebRTC if you amortize monthly. LL-HLS if this is a one-off event. Hybrid (WebRTC speakers + LL-HLS viewers) is a sweet spot: $100 total, $0.005 per viewer-hour.
Tier 3: 100,000 concurrent viewers
- WebRTC: Not feasible. Would require 50+ SFU instances = $7,500/month infrastructure cost.
- LL-HLS (encoder + CDN): Encoder ($5 amortized) + egress (100K × 4.5 GB × $0.0008/GB, volume discount) = $367 total. Cost per viewer-hour: $0.00184.
- Winner: LL-HLS is the only option. Hybrid only if you need a small WebRTC speaker group (<50).
Rough heuristic: If viewers < 500, WebRTC. If 500–10K, hybrid. If 10K+, LL-HLS with optional WebRTC speakers. The decision is cost-per-viewer at your scale.
A decision framework—Five questions
1. What’s your concurrent viewer ceiling? If <500, WebRTC is viable. If 500–10K, hybrid wins. If 10K+, LL-HLS is required. (Anything above 100K pushes you to a managed CDN like Cloudflare or Akamai.)
2. What’s your latency target? If <1s, WebRTC only. If 1–5s, LL-HLS. If 5s+, classic HLS. If you don’t know, default to 2–5s (LL-HLS is the safe bet for most use cases).
3. Do viewers need to send data back to the broadcaster in the same stream? If yes (live auction bids, game controls), WebRTC or WebRTC + a separate messaging channel. If no (watch only, chat is separate), LL-HLS is fine.
4. What device coverage do you need? If you must support smart TVs, Roku, old Android: HLS is the only option. If web + mobile is enough, LL-HLS works.
5. What’s your monthly event budget? If <$500, LL-HLS or hybrid. If $500–$5,000, hybrid. If $5,000+, WebRTC scales but requires ongoing SFU ops. (Managed services like Agora cost ~$0.01 per viewer-minute, which exceeds CDN costs at 10K viewers.)
Unsure which tier fits your platform? Book a 30-minute scoping call and walk through your latency targets, viewer ceiling, and budget with a streaming architect.
Five pitfalls—What breaks in production
1. Underestimating ICE failure rate in WebRTC. Plan for 10–15% of ICE negotiations to fail or degrade to TURN relay. This adds 50–200ms latency per connection and burns TURN server bandwidth. Budget 1 Mbps TURN bandwidth per 20–30 concurrent users. Test with real cellular networks (4G LTE, 5G) before launch, not just WiFi.
2. Choosing segment size without testing. A 1-second segment is the default for LL-HLS, but it adds overhead. If your audience is mostly on congested cellular, try 2–3 second segments and measure rebuffer rate. A 1s segment on 1 Mbps congested bandwidth takes >8 seconds to download; the player rebuffers.
3. Codec mismatches between encoder and SFU (in hybrid setups). If your encoder outputs H.264 but the SFU expects VP8, you re-encode. That costs CPU and adds latency. Always align codecs: if the SFU emits H.264, tell the encoder to accept H.264 input.
4. Not monitoring rebuffer rate and RTT in production. Rebuffer rate is the % of playback sessions that stall. RTT is the round-trip time for a player to fetch a segment from the CDN. Typical targets: rebuffer rate <1%, median RTT <200ms. If you don’t instrument these, you won’t know when your architecture breaks until users complain.
5. Cascading SFU failures due to insufficient TURN capacity. If your WebRTC setup relies on peer-to-peer but TURN capacity runs out, new ICE negotiations fail, and you lose viewers. Run a pre-event load test: connect 2x your expected concurrent load and measure ICE success rate. If it’s <90%, add more TURN servers or accept that 10% of viewers will have degraded latency.
KPIs—What to measure in production
Quality KPIs. Rebuffer ratio: % of playback sessions with at least one stall. Target: <1%. Startup latency: time from player .play() to first video frame rendered. Target: <3s for LL-HLS, <5s for HLS. Bitrate (weighted average): what % of viewers got full quality vs downsampled. Target: >70% at 720p+.
Business KPIs. Concurrent peak viewers: max simultaneous streams. Expected engagement: (viewer session duration / broadcast duration). Churn rate: % of viewers who drop mid-stream. Target: <5% for live events. Revenue per viewer (if applicable): ad impressions / viewer count.
Reliability KPIs (WebRTC). ICE success rate: % of connection attempts that achieve peer-to-peer connectivity. Target: >85%. Mid-call drop rate: % of active connections that disconnect unexpectedly. Target: <5%. SFU CPU utilization: average CPU load across SFU instances. Target: <70% (headroom for spikes). TURN relay percentage: % of connections that fall back to relay. Target: <15%.
When NOT to use WebRTC or HLS
If your use case is pre-recorded content with random-access scrubbing (user can rewind/fast-forward), neither WebRTC nor HLS is the right first choice. Use a progressive download MP4 or a VOD platform (Vimeo, Mux, JW Player) that optimizes for seeking.
If you need end-to-end encryption at the viewer level (e.g., HIPAA-compliant medical consultations), HLS with DRM (Widevine, FairPlay, PlayReady) or WebRTC with DTLS-SRTP end-to-end is required. A basic CDN won't meet compliance.
If you have fewer than 50 viewers and can tolerate 10–30 second latency, a simple RTMP server (Nginx RTMP module, Wowza) streaming to a basic CDN is cheaper and simpler than both WebRTC and HLS. You don’t need the sophistication; you need affordability.
Not sure if your use case fits? Read our cost estimation guide for smaller audiences, then reach out to scope your exact scenario.
FAQ
Is WebRTC faster than HLS?
Yes for latency: WebRTC delivers 0.2–0.5s glass-to-glass; HLS is 2–40s depending on variant. But “faster” doesn’t mean better for your use case. HLS is more reliable at scale, supports more devices, and costs less per viewer. If you don’t need sub-second latency, LL-HLS often wins on simplicity.
Can I use both WebRTC and HLS in the same app?
Absolutely. This is the hybrid pattern: WebRTC for speakers/interactive users, HLS for viewers. You run both in parallel. Speakers enjoy sub-1s RTT among themselves; viewers get a stable, buffered stream. Most modern live platforms use this as their default architecture.
What is LL-HLS, and how is it different from classic HLS?
Low-Latency HLS uses partial segments (200–500ms chunks) pushed via HTTP/2 server push, reducing latency from 15–40s to 2–5s. It works on standard CDNs and falls back to classic HLS automatically if the player or CDN doesn’t support server push. LL-HLS is the protocol of choice for live events in 2026.
Does HLS work everywhere?
Classic HLS works on every device (iOS, Android, web, smart TV, Roku). LL-HLS works natively on iOS 13+ and Safari, requires HLS.js on other browsers, and is hit-or-miss on smart TVs as of Q1 2026. If you need 100% device coverage, test your player stack against your actual user base, or add a fallback to classic HLS.
What about WebTransport and QUIC for streaming?
QUIC is faster than TCP for unreliable, low-latency protocols. WebTransport (QUIC over HTTP/3) is an emerging standard, but as of 2026, it’s still experimental in browsers and CDNs. For production, stick with WebRTC (which uses UDP under the hood) or LL-HLS (which works over HTTP/2 TCP). WebTransport will matter in 2027–2028 when browser support is wider.
Is WebRTC suitable for 100,000 concurrent viewers?
No, not as a primary transport. A single SFU instance plateaus at 300–500 viewers. To scale to 100K viewers, you’d need 200–400 SFU instances cascaded in a tree, adding massive latency and operational overhead. Use WebRTC for a small speaker group (<100) and HLS/LL-HLS for the mass audience instead.
Is HLS encrypted end-to-end?
HLS segments are encrypted in transit (TLS to the CDN), but not encrypted at rest on the CDN edge cache. For full end-to-end encryption (broadcaster → viewer), you need DRM (Widevine, FairPlay, PlayReady) or to roll your own encryption layer. WebRTC has DTLS-SRTP encryption built in, making it the default for privacy-critical streams (medical, legal).
What about latency-sensitive sports betting?
Betting regulations vary by jurisdiction, but most require that the live stream and the betting feed be synchronized within 500ms to 1 second. This rules out classic HLS (too much latency). You need WebRTC (<500ms) or LL-HLS (<5s) paired with a low-latency event feed (WebSocket or gRPC). Legal review is essential; latency alone won’t satisfy compliance.
What to read next
Deep dive
What is WebRTC? A Complete Guide
The mechanics of peer-to-peer video, STUN/TURN, and codec selection.
Architecture
P2P vs MCU vs SFU for Video Conferencing
When each topology wins. SFU is the default for live streaming.
Infrastructure
Edge Computing for Live Streaming—Get Sub-Second Latency
Encode and relay at the edge to reduce backbone hops and latency.
Scale
Scale Video Streaming to 1 Million Viewers
CDN strategy, origin bandwidth, geographic distribution.
Ready to choose your protocol and scale it?
WebRTC and HLS aren’t competitors in 2026—they’re complements. WebRTC wins for interactive, sub-second latency in small groups. HLS and LL-HLS win for scale and device reach. The hybrid pattern (WebRTC speakers, HLS/LL-HLS viewers) is the default for modern live platforms because it balances latency, cost, and operational simplicity. Worldcast Live proved this at 10K concurrent viewers; Sprii proved it at 50K. Your platform will likely follow the same pattern.
The decision tree is simple: if your viewers are <500 and you need sub-1s latency, go WebRTC. If 500–100K and 2–5s is acceptable, go hybrid or pure LL-HLS. If 100K+ or you must support every device, go pure LL-HLS. Most real-world platforms end up hybrid because they need both interactivity and scale.
The engineering complexity is real—SFU operations, segment lifecycle, CDN cache management, codec tuning, ICE fallback—but it’s a well-trodden path. We’ve shipped this stack for concert streaming, live shopping, debate platforms, and financial trading. The cost model is predictable. The latency targets are achievable.
Let’s scope your streaming platform this week
We’ll walk through your latency targets, viewer scale, device mix, and cost ceiling—then architect the right protocol mix (WebRTC, LL-HLS, or hybrid) and timeline for your team.


.avif)

Comments