Picking a Delivery Protocol in 2026: A Decision Tree

Why This Matters

If you have read every article in Block 4 of this section, you now know what each delivery protocol does, where it came from, and how it works. What the deep-dives do not tell you is which one to pick — because the answer is almost never one. The cost of getting this wrong is not theoretical. Picking WebRTC for a million-viewer broadcast tail blows up your cloud bill by a factor of 10. Picking HLS for a live auction loses you the auction. Picking MoQ in production today buys you a quarter of engineering R&D before you ship a single byte to a viewer. This article is the canonical Fora Soft decision tree — the same tree we walk every new architecture conversation through, and the same tree we publish here so a product manager, a founder, or a streaming engineer can run it without us. Pair it with the protocol comparison matrix, the hybrid stacks article, and the gated PDF at the bottom of this page, and you have everything you need to defend a delivery-protocol decision in front of a CTO, a CFO, and a head of product in the same meeting.

How To Use This Article

The article has four parts. The first walks the seven inputs to the decision and explains, in plain language, what each input actually measures. The second presents the decision tree itself — an eight-node, top-down walk that takes the seven inputs and returns a recommended stack. The third walks eight worked scenarios from the audits we run most often: a sports OTT, a live shopping platform, a telemedicine consultation, a stadium contribution feed, an e-learning broadcast, a corporate town hall, a multiplayer game stream, and a low-bandwidth field-reporter setup. The fourth is a "pitfalls" section that lists the eight mistakes we see in almost every architecture review.

Read the seven inputs first, then jump to whichever worked scenario looks closest to what you are building. Use the decision tree as the reusable scaffolding for any scenario the worked examples do not cover. The gated PDF at the bottom of the page is a poster-grade printout of the same tree, sized for an architecture-review-room wall.

Figure 1. The eight-node delivery-protocol decision tree. Each node is a single yes/no or one-of-three question. The recommended stack at every leaf is a combination, not a single protocol.

The Seven Inputs

Every delivery-protocol decision is the answer to the same seven questions, asked in the same order. The order matters because each subsequent input narrows the answer space the previous one opened. Skip an input and the recommendation either over-engineers (you ship WebRTC where LL-HLS would have worked) or under-engineers (you ship HLS where the audience expected sub-second feedback). The seven inputs, defined.

Input 1 — Latency target

The single most important number. Latency in streaming means glass-to-glass latency — the time between light hitting the camera lens at the source and light leaving the viewer's screen showing the same frame. This includes capture, encoding, contribution to the origin server, packaging, distribution over a CDN, the player's buffer, decoding, and rendering. The latency glass-to-glass article walks the math; here you only need the bands.

Pick the band, not a number. Engineering for "sub-2-second latency" is fundamentally different from engineering for "sub-500-millisecond latency" — the first is a tuning problem on an HTTP-based stack, the second forces you onto UDP. The bands we use in this tree:

Sub-second (under 1 second) — interactive use cases: live auctions, real-time betting, live shopping with two-way audio, telemedicine, video conferencing. Forces WebRTC or a real-time UDP-based variant.
Low-latency (1–4 seconds) — broadcast use cases that need to feel "live" but tolerate a small lag: live sports, live news, esports, live concerts. LL-HLS and LL-DASH territory.
Standard-latency (4–15 seconds) — broadcast where the absolute latency does not matter, only that the stream plays smoothly: 24/7 channels, replays, time-zoned re-broadcasts. Classic HLS or DASH.
VOD (no latency target) — the file is already on the server; you are choosing for cost and device coverage only. HLS or DASH over CDN with the cheapest manifest format your device coverage supports.

The single most common mistake we see is teams quoting a sub-second target out of habit when the actual use case sits in the 2–5 second band. A passive sports broadcast does not need sub-second; the viewer has no way to tell whether a goal is shown 0.5 or 3.5 seconds after it was scored. Asking "what happens to the viewer if we add another second?" usually moves the target up by 2–3 seconds and unlocks LL-HLS over a CDN as the answer, which cuts the per-viewer cost by an order of magnitude.

Input 2 — Peak concurrent viewers

The number of viewers watching the same stream at the same time, at peak. Note three subtleties.

First, peak matters far more than average. A live event with 100 average viewers and 100,000 peak viewers is a 100,000-viewer architecture, not a 100-viewer architecture. Pick infrastructure for peak; cost-optimise the off-peak case separately.

Second, concurrent on the same stream matters more than the platform's total user count. A 10-million-user platform with no event larger than 5,000 concurrent viewers is a 5,000-viewer architecture per stream. WebRTC scales fine to 5,000 with a couple of well-placed SFUs. The same platform with one breakthrough event at 500,000 concurrent forces a CDN-based stack for that event.

Third, per stream matters for protocol choice; aggregate matters for capacity planning. The decision tree branches on peak concurrent viewers per stream. The bands:

Tens (1–100) — small meetings, intimate broadcasts, one-to-one. WebRTC peer-to-peer works.
Hundreds (100–1,000) — webinars, small classrooms, internal town halls. WebRTC with a single SFU works; LL-HLS over a CDN also works.
Thousands (1,000–10,000) — large webinars, mid-size live shopping. WebRTC with cascaded SFUs starts to get expensive; LL-HLS over a CDN becomes the cost-efficient default.
Tens of thousands (10,000–100,000) — large e-learning broadcasts, mid-size live sports. LL-HLS over a CDN is the default; WebRTC + WHEP is still possible but the SFU cost dominates.
Hundreds of thousands+ (100,000+) — major live sports, breakthrough live events, national broadcasts. LL-HLS or HLS over a multi-CDN is the only protocol shape that does not collapse.

Input 3 — Device coverage requirement

Which devices do your viewers actually use? The answer drives protocol choice because not every protocol plays on every device.

The lattice you must memorise:

Apple devices (iOS, iPadOS, macOS Safari) — native HLS only. DASH cannot play natively. WebRTC plays in Safari, but with quirks that engineers without WebRTC scars routinely underestimate. If your audience is meaningfully Apple, HLS or LL-HLS is in the stack.
Non-Apple browsers (Chrome, Firefox, Edge, Opera, Brave) — HLS via hls.js, DASH via dash.js or Shaka, WebRTC natively. Both HLS and DASH are options; the choice is operational, not technical.
Smart TVs (Tizen, webOS, Roku, Android TV, Fire TV, Vidaa) — varies. Tizen and webOS prefer DASH; Roku prefers HLS; Android TV does both. The smart-TV players article maps the matrix in detail. For Smart TV reach, ship both HLS and DASH off the same CMAF source.
Embedded devices (set-top boxes, in-vehicle screens, surveillance NVRs) — vendor-specific. Validate per device. The default safe bet is HLS over HTTP/1.1, because every embedded player ever shipped has an HLS-over-HTTP/1.1 implementation in its boot ROM somewhere.
Native mobile apps (iOS app, Android app) — your choice of player SDK. ExoPlayer on Android does HLS, DASH, and SmoothStreaming; AVPlayer on iOS does HLS only without a third-party library. Most apps ship HLS for both platforms for simplicity.

Two practical rules. First, if your audience is mixed, ship HLS — it is the only protocol that plays everywhere with no second variant. LL-HLS is HLS with extensions, so the same statement holds. Second, if you need WebRTC for any reason, you almost certainly need a non-WebRTC fallback for the long tail of devices that either do not support WebRTC well (older Smart TVs, set-top boxes) or where bandwidth-limited viewers benefit from the lower-bitrate rungs that a segmented protocol can serve.

Input 4 — Interactivity model

Does the viewer need to send anything back? The answer is binary but the implications are large.

Passive viewing — the viewer only receives. The protocol can be one-way, the CDN does most of the work, and the entire architecture is HTTP-based. HLS, LL-HLS, DASH, LL-DASH, HESP, and MoQ all fit.
Two-way audio / video — the viewer sends audio, video, or both back. The protocol must be bidirectional and real-time, which means WebRTC.
Two-way data only (chat, reactions, polls) — the video is one-way, but the viewer's interactions are real-time. This is the most common case for live shopping, live events with chat, and second-screen experiences. Solution: one-way video over LL-HLS or LL-DASH, plus a separate real-time data channel (WebSockets, WebRTC data channels, or a low-latency pub-sub). Do not pick WebRTC for the video just because the chat needs to be real-time — the chat does not affect the video protocol.

The classic mistake is treating "real-time chat" as a reason to ship WebRTC for the video. The chat is a separate channel with separate engineering. We have seen teams blow up their cloud bill by an order of magnitude because they conflated "real-time" with "video real-time".

Input 5 — Monetisation model

How does the stream make money? This determines what your delivery protocol must support.

Subscription (SVOD) — paid access only. You need DRM (digital rights management) on the player to prevent unauthorised redistribution. DRM rules out raw WebRTC delivery because WebRTC's encryption layer (SRTP) is not a DRM, and the three commercial DRMs (FairPlay, Widevine, PlayReady) are built around HLS and DASH with Common Encryption (CENC) packaging. See the DRM 101 article.
Ad-supported (AVOD or FAST) — free to watch, monetised by ads. You need a delivery protocol with a strong ad-insertion story. Server-side ad insertion (SSAI) is solidly supported on HLS and DASH; the SSAI article walks the spec. WebRTC has no native ad-insertion story; teams that need to monetise WebRTC streams with ads typically run a parallel HLS rendition for the broadcast tail and reserve WebRTC for the interactive top tier.
Transactional (TVOD) — pay per view. Same DRM story as SVOD. Same protocol implications.
Brand / free (no monetisation) — no DRM, no ad insertion. Any protocol works.

The summary rule: if you need DRM, HLS or DASH are in the stack. If you need ad insertion, HLS or DASH are in the stack. WebRTC is the answer for the interactive layer, almost never for the monetised layer.

Input 6 — Security and compliance posture

Who watches the stream and what happens if it leaks?

Public broadcast — no access control, no leak concern. Skip this input.
Authenticated access — the viewer must log in. Solution: signed URLs from the origin to the CDN edge, an authenticated session at the player. Works with every protocol on this list. The token authentication article covers the mechanics.
DRM-protected — see Input 5. HLS or DASH with CENC and the three DRMs.
Geo-restricted — viewers in some countries can watch, others cannot. CDN-based protocols (HLS, DASH) handle this at the edge via the CDN's geo-IP database. WebRTC handles this at the SFU or via authenticated signalling. The geo-blocking article walks the details.
Forensic watermarking required — high-value live sports, big-budget studio content. Forensic watermarking requires either server-side variant generation or client-side overlay; both are well-supported on HLS and DASH (see the forensic watermarking article). Forensic watermarking on WebRTC is technically possible but operationally unusual and is not a 2026 default.
HIPAA / regulated medical — telemedicine. The protocol itself is not the regulated artifact; the recording, transmission, and storage are. WebRTC with DTLS-SRTP is the default for live consultations because the encryption is end-to-end at the transport layer; recordings get re-encrypted at rest with the customer's KMS keys. The WebRTC security article covers the encryption story.

Input 7 — Team capability and operational budget

The most under-discussed input in every architecture document we audit, and the one that most often drives the final decision. The protocol your team can keep running at 3 AM is more important than the protocol that wins on paper.

Streaming-native team (≥ 3 engineers, 1+ year streaming experience) — any protocol on the list is a candidate. Pick on technical merit.
General backend team (1–2 engineers, no streaming background) — HLS or DASH over a managed CDN. The operational story is HTTP — your team already knows HTTP. WebRTC, MoQ, and SRT all require streaming-specific operational expertise to run reliably.
No engineering team (you are using a managed platform end-to-end) — your platform vendor's protocol. The choice is the vendor, not the protocol.
MoQ or HESP in production today — requires a dedicated streaming engineering function. MoQ is pre-RFC and ships with operational gaps that vendor engineers fill on first deployments; HESP requires a commercial player SDK from the HESP Alliance. Neither is a "throw it over the fence" technology in 2026.

The budget question pairs with the team question. A WebRTC stack with cascaded SFUs across three regions costs an order of magnitude more per concurrent viewer than an LL-HLS stack on a CDN. If your budget is "as little as possible per viewer", LL-HLS over a CDN is almost always the answer regardless of every other input. If your budget is "whatever it takes to ship the experience the product team described", the protocol that ships the experience wins.

Figure 2. The seven inputs to the delivery-protocol decision, arranged in the order the tree asks them. Each input's bands determine which branches of the tree apply.

The Decision Tree

The tree has eight nodes. Each node asks a single question. The path from the root to a leaf gives you a recommended stack. The leaves are explicit stacks — not single protocols — because every real product ships a stack.

Node 1 — What is the latency target?

The first question because it eliminates the most options the fastest.

Sub-second (under 1 second) → go to Node 4 (interactivity model). You are on a WebRTC path; the only question is what shape.
Low-latency (1–4 seconds) → go to Node 2 (audience scale). You are on an LL-HLS / LL-DASH path; the audience size determines whether a single-CDN or multi-CDN stack is right.
Standard-latency (4–15 seconds) → go to Node 2 (audience scale). You are on a classic HLS / DASH path.
VOD only → ship HLS + DASH from a CDN, with CMAF as the underlying packaging. The decision tree ends here for VOD; the deep choices are at the packaging deep-dive and the CDN economics article. Done.

Node 2 — How many peak concurrent viewers per stream?

If you took the LL-HLS/LL-DASH or HLS/DASH branch from Node 1:

Under 10,000 → single CDN, single-tier origin. Ship LL-HLS for the LL branch, HLS for the standard branch. Go to Node 3 (device coverage).
10,000 to 100,000 → single CDN with origin shielding, one origin replica per region. Ship the same protocols. Go to Node 3.
Over 100,000 → multi-CDN with content steering. Ship the same protocols. Go to Node 3.

If you took the WebRTC branch from Node 1, you are on a different path; jump to Node 4.

Node 3 — What is the device coverage requirement?

For the HLS/DASH branches:

Apple-only → HLS or LL-HLS only. Skip DASH; you save the packaging step on one rendition path.
Non-Apple browsers only → DASH or LL-DASH (often cheaper than HLS at scale because the manifest is one XML file rather than one per-rendition M3U8).
Mixed (Apple + non-Apple) → both HLS and DASH off the same CMAF source. The CMAF article walks the packaging.
Smart TV reach required → both HLS and DASH off CMAF, plus a TV-side validation pass per major OS.
Embedded devices in scope → HLS over HTTP/1.1 as the fallback rendition.

Go to Node 5.

Node 4 — What is the interactivity model? (WebRTC branch)

If you took the sub-second branch:

Two-way audio / video → WebRTC with an SFU. SFU choice from the SFU comparison article. For ingest, WebRTC native or WHIP per RFC 9725.
One-to-many broadcast, sub-second → WebRTC with WHEP for egress per draft-ietf-wish-whep-03, plus an SFU mesh. Scales to tens of thousands per stream with multi-region SFU cascading.
One-to-many broadcast, with a non-interactive tail → hybrid stack. WebRTC + WHEP for the sub-second tier (the small interactive audience), LL-HLS over a CDN for the broadcast tail (the large passive audience). See the hybrid stacks article.
Data-only interactivity (chat, polls) → you do not need WebRTC for the video. Go back to Node 1 and pick the LL-HLS branch; add a separate real-time data channel for the chat.

Node 5 — What is the monetisation model?

For any branch that landed here:

Subscription, transactional, or any DRM requirement → confirm HLS or DASH in the stack. WebRTC alone does not carry DRM. If you are on a WebRTC-only path and need DRM, redesign: WebRTC for the interactive layer, HLS or DASH for the monetised broadcast layer.
Ad-supported (SSAI or CSAI) → confirm HLS or DASH in the stack. The ad-insertion ecosystem is concentrated on these protocols.
Brand / free → any stack works; no constraint added here.

Go to Node 6.

Node 6 — Security and compliance

For any branch:

Geo-restriction needed → CDN-based stack only (HLS, DASH, LL-HLS, LL-DASH). WebRTC needs application-layer geo-blocking at the SFU.
Forensic watermarking → HLS or DASH with A/B variant streaming.
HIPAA / regulated medical → WebRTC with DTLS-SRTP for live; encrypted-at-rest with customer KMS for recordings.

Go to Node 7.

Node 7 — Team capability and budget

For any branch:

Streaming-native team, healthy budget → ship the technically-best stack from Nodes 1–6. The tree's recommendation is the final answer.
General backend team → simplify the stack. Drop the second protocol if you can (e.g. if you were going to ship both HLS and DASH, pick the one that covers 90% of your audience and ship one). Use a managed CDN, a managed origin, and a managed transcoder.
No team (managed platform) → the platform's protocol stack. The choice is the platform; the tree has already given you the criteria to evaluate the platform on.
Budget-constrained → drop the most expensive component. If WebRTC was in the stack for sub-second interactivity, downgrade to LL-HLS at 2–4 seconds and remove the WebRTC layer. The cost saving is large; the user-experience cost is bounded.

Go to Node 8.

Node 8 — Future-proofing

The optional final node. For any branch:

MoQ pilot in scope → run MoQ in parallel with the production stack for a 6–12 month evaluation. Do not replace the production stack with MoQ in 2026; the Media over QUIC article explains why pre-RFC technology in production is a quarter-to-quarter risk. The May 2026 draft of draft-ietf-moq-transport is the current document of record; expect 1–2 more drafts before the working group ships an RFC. Cloudflare, Meta, and Google have early production deployments; their public commentary is the best benchmark of where the protocol is reliable.
HESP evaluation in scope → license a player SDK from the HESP Alliance, evaluate against the same LL-HLS stack you would otherwise ship, decide on the 100–400 ms latency gain vs the player-SDK licensing cost.
No future-proofing layer required → ship the stack from Node 7. Plan to revisit in 12 months.

The tree ends. The path you walked from Node 1 to Node 8 is the recommended stack.

Eight Worked Scenarios

The decision tree is the scaffold. Worked examples are how you internalise it. Here are eight scenarios we audit most often, with the path walked and the resulting stack.

Scenario A — Live sports OTT, 500,000 peak concurrent

Inputs: latency 3–5 seconds (low-latency, but not sub-second — the viewer cannot tell whether the goal was 1 or 3 seconds ago), 500,000 peak concurrent, mixed device coverage (Apple + non-Apple browsers + Smart TVs), passive viewing, ad-supported plus subscription tiers, geo-restricted, streaming-native team, healthy budget.

Path: Node 1 → low-latency. Node 2 → over 100,000, so multi-CDN. Node 3 → mixed, so HLS + DASH off CMAF, plus Smart TV validation. Node 5 → DRM in scope (subscription tier), SSAI in scope (ad tier), so confirm HLS + DASH. Node 6 → geo-restriction at CDN edge. Node 7 → streaming-native, ship the technical stack. Node 8 → no MoQ pilot in production yet; revisit in 12 months.

Stack: LL-HLS + LL-DASH off a common CMAF source, served over multi-CDN with content steering, FairPlay (Apple) + Widevine + PlayReady DRM, SSAI for ad insertion, geo-IP at the CDN edge. Origin: clustered with origin shielding and per-region replicas.

Why not WebRTC: the user-experience cost of 3 seconds vs sub-second is zero for a passive sports viewer; the infrastructure cost is 10–20× higher for WebRTC at this scale. The arithmetic is in the streaming cost economics article.

Scenario B — Live shopping with two-way audio, 5,000 peak concurrent

Inputs: latency sub-second for the host's interactive call-ins (a viewer says "show me the back of the bag" and expects a real response), 5,000 peak concurrent on the broadcast, mixed device coverage, two-way audio (the viewer occasionally joins the call), ad-supported (sponsored products in the broadcast), no DRM, public broadcast, streaming-native team.

Path: Node 1 → mixed — sub-second for the interactive caller, low-latency for the broadcast tail. This is the canonical hybrid stack case. Node 4 → one-to-many broadcast with a non-interactive tail. Node 5 → no DRM, ads OK on the LL-HLS tail. Node 7 → ship the technical stack.

Stack: WebRTC + WHEP for the interactive tier (the host + the small set of viewers who join the call), LL-HLS over a CDN for the broadcast tail (the 5,000 passive viewers), plus a separate WebSocket channel for chat. Cross-tier sync: the LL-HLS stream is ~2 seconds behind the WebRTC tier, so the chat shows the comment that the host made 2 seconds ago. Acceptable for live shopping. See the hybrid stacks article.

Why not WebRTC for the whole tail: 5,000 concurrent WebRTC viewers cost roughly 10× more than 5,000 LL-HLS viewers on a CDN. The viewer who is not on the call cannot tell that one tier is at 200 ms and the other at 2 seconds.

Scenario C — Telemedicine consultation, 1-on-1 to small group

Inputs: latency sub-second (the doctor and patient are talking), 2–6 concurrent participants, mixed device coverage, two-way audio + video, no monetisation per call (paid via a separate flow), HIPAA compliance, recording required for archive, streaming-native team for the platform vendor.

Path: Node 1 → sub-second, WebRTC branch. Node 4 → two-way audio/video. Node 6 → HIPAA-regulated. Node 7 → vendor-team for the platform; the customer integrates a hosted service.

Stack: WebRTC with DTLS-SRTP, a LiveKit or mediasoup SFU, TURN servers for NAT traversal, recordings written to encrypted-at-rest object storage with the customer's KMS. No HLS or DASH layer.

Why no broadcast tier: there is no broadcast — this is a 1-on-1 or small-group consultation. The architecture is purely WebRTC.

Scenario D — Stadium contribution feed to broadcast tower

Inputs: this is contribution, not delivery — but the tree is asked all the time. Latency sub-second (the broadcaster needs the feed live), 1 viewer (the broadcaster's ingest), packet-loss tolerant network (4G/5G from the stadium), encryption required.

Path: the tree is for delivery. For contribution, see the ingest decision tree article. The short answer: SRT for the cellular contribution leg, with RIST as the broadcaster-grade alternative. RTMPS only if the camera does not speak SRT.

Mentioned here only because we get the question via the delivery tree at least quarterly. The two trees compose: a stadium feed contributed over SRT, transcoded at the origin, delivered as LL-HLS to viewers.

Scenario E — E-learning broadcast, 30,000 students, classroom + recording

Inputs: latency 2–4 seconds (students need the lecture in sync with the chat; sub-second is overkill), 30,000 peak concurrent at exam-week, mixed device coverage (laptops + phones + occasional Smart TV), passive viewing for the broadcast, chat for the interactivity layer, no monetisation per stream (institutional subscription), authenticated access, recording for VOD library after the live broadcast, general-backend team (the university IT team is not a streaming-native team).

Path: Node 1 → low-latency. Node 2 → 30,000 concurrent, single CDN with origin shielding. Node 3 → mixed device coverage, HLS + DASH off CMAF. Node 4 → data-only interactivity for the chat — the video stays one-way. Node 5 → no per-stream monetisation. Node 6 → authenticated access via signed URLs. Node 7 → general backend team, simplify. Drop DASH if 90% of the students are on Apple + Chrome devices that play HLS well. Use a managed CDN and a managed transcoder. Node 8 → no future-proofing.

Stack: LL-HLS only (drop DASH for simplicity), over a managed CDN with origin shielding, signed-URL authentication, a separate real-time chat channel via WebSockets, a post-broadcast VOD recording packaged as classic HLS for the library.

Why not the full HLS + DASH stack: the general-backend team will struggle to keep two formats in sync; the operational cost of debugging "the DASH player on Smart TVs is one rendition behind the HLS player on iPhones at 3 AM" is real. Pick one format and cover 90% of the audience well.

Scenario F — Corporate town hall, 8,000 employees, internal network

Inputs: latency 5–10 seconds (the CEO is talking; no one needs sub-second), 8,000 peak concurrent on the corporate WAN + remote employees, mixed device coverage (mostly Windows + Chrome), passive viewing with Q&A via chat, no monetisation, authenticated access only, general-backend team running the IT stack.

Path: Node 1 → standard-latency. Node 2 → under 10,000, single CDN or internal eCDN. Node 3 → Chrome-heavy, DASH works fine. Node 5 → no monetisation. Node 6 → authenticated only. Node 7 → general backend, simplify.

Stack: classic DASH over a managed eCDN (enterprise CDN — peer-assisted delivery for corporate networks), SSO-based authentication via the corporate IdP, a WebSocket chat channel for Q&A. No LL-HLS; standard latency is fine. No WebRTC; the Q&A is text.

Why not HLS: Apple penetration on the corporate fleet is low; DASH covers Chrome/Edge well and is the more common enterprise default. The eCDN cuts WAN bandwidth by 70–90% for large internal broadcasts; the CDN article covers the math.

Scenario G — Multiplayer game stream with overlay interactions, 50,000 peak

Inputs: latency sub-second for the players' interactive layer, 1–4 seconds for the spectators, 50,000 spectators at peak (occasional breakthrough event), mixed device coverage, two-way data interactivity for the players (the spectators can vote on in-stream events), no DRM, ad-supported on the spectator tier, streaming-native team, healthy budget.

Path: Node 1 → mixed: sub-second for players, low-latency for spectators. Hybrid stack. Node 4 → one-to-many broadcast with sub-second top tier. Node 5 → ad-supported, confirm HLS or DASH for the spectator tier. Node 7 → streaming-native, ship the technical stack. Node 8 → MoQ pilot for the spectator tier is a candidate.

Stack: WebRTC + WHEP for the players (sub-second), LL-HLS for the spectator tier (1–4 seconds), SSAI on the spectator tier, a real-time data channel (WebSockets or WebRTC data channels) for the voting overlay. Optional MoQ pilot for the spectator tier, run in parallel with LL-HLS for 6–12 months.

Why a MoQ pilot: spectator audiences for game streams are the cleanest fit for MoQ's "low latency at HLS scale" promise. The 2026 risk is the protocol's pre-RFC status — run it as a parallel pilot, never as the only spectator tier.

Scenario H — Low-bandwidth field reporter contribution + local viewing

Inputs: latency 2–4 seconds for the viewers, 200 peak concurrent viewers in the reporter's region, mobile-heavy device coverage, passive viewing, brand / no monetisation, authenticated viewing only, general-backend team.

Path: the contribution side is again on the ingest tree; the answer there is SRT or WHIP over the cellular network. For delivery: Node 1 → low-latency. Node 2 → under 10,000, single CDN. Node 3 → mixed mobile. Node 5 → no monetisation. Node 7 → general backend, simplify.

Stack: LL-HLS only, mobile-optimised bitrate ladder (5 rungs from 240p to 720p), managed CDN, signed-URL authentication.

Why not WebRTC: the latency target is 2–4 seconds and there is no interactivity. WebRTC adds operational cost without delivering UX value.

A two-by-four grid showing the eight worked scenarios as small cards. Each card has a scenario name, latency target, audience size, recommended stack one-line summary, and a colour band on the left edge indicating the dominant protocol family — green for HLS-based, blue for hybrid WebRTC plus HLS, purple for pure WebRTC. Scenarios from top-left across then down: Sports OTT (green) LL-HLS plus LL-DASH multi-CDN; Live shopping (blue) hybrid WebRTC plus LL-HLS; Telemedicine (purple) WebRTC only; Stadium contribution (grey) SRT contribution then LL-HLS delivery; E-learning broadcast (green) LL-HLS single CDN; Corporate town hall (green) DASH on eCDN; Game stream (blue) hybrid WebRTC plus LL-HLS plus MoQ pilot; Field reporter (green) LL-HLS only.

Figure 3. Eight worked scenarios with the recommended stack for each, colour-coded by dominant protocol family.

Eight Common Mistakes

Every architecture review surfaces the same eight mistakes. Most architecture documents make at least two of them. The pattern is recognising the mistake fast enough to redirect before the team has shipped a quarter of code against the wrong protocol.

Mistake 1 — Picking WebRTC for a passive broadcast

Symptom: the architecture document says "we picked WebRTC for low latency". The use case is a passive sports broadcast at 100,000+ concurrent viewers.

Why it is wrong: WebRTC's per-session cost is dominated by the SFU and the bandwidth-estimation logic for each connection. At 100,000 concurrent viewers, the cost is 10–20× a CDN-cached LL-HLS stack. The viewer cannot tell the difference between 300 ms and 3 seconds for a passive broadcast.

Redirect: run the streaming cost economics calculator on both stacks at the actual peak concurrent number. Show the team the gap.

Mistake 2 — Picking HLS for a sub-second interactive use case

Symptom: the team picks HLS or LL-HLS for a live auction or a real-time betting stream because "the rest of the company uses HLS".

Why it is wrong: HLS's low latency floor is around 2 seconds. The auction or bet that arrives 2 seconds late lost the auction. Even LL-HLS's 1–2 second floor is too high for the genuinely interactive use cases.

Redirect: measure the latency budget explicitly. If the use case has a deadline of 500 ms or less for round-trip, WebRTC is the only fit.

Mistake 3 — Picking MoQ in production today

Symptom: the architecture says "we picked MoQ for future-proofing".

Why it is wrong: MoQ is pre-RFC. The current draft (draft-ietf-moq-transport published 2026-05-01) is the working group's document of record but is subject to change before publication as an RFC. Vendor SDKs are early. Operational tooling is sparse. CDN support is limited to QUIC-aware CDNs.

Redirect: run MoQ as a parallel pilot alongside a production LL-HLS or WebRTC stack. Do not let MoQ be the only path to the viewer in 2026.

Mistake 4 — Conflating chat-real-time with video-real-time

Symptom: the team picks WebRTC for video because the product needs real-time chat.

Why it is wrong: chat and video are separate channels with separate engineering. Real-time chat is a WebSocket or a WebRTC data channel — it does not require WebRTC for the video.

Redirect: split the channels. Pick the video protocol on its own merits; pick the chat protocol on its own merits. The two cost models add, not multiply.

Mistake 5 — Ignoring the device coverage matrix

Symptom: the architecture picks DASH only, then discovers two months in that Safari users on iPhone cannot play the stream.

Why it is wrong: Apple devices play HLS natively, not DASH. A DASH-only stack excludes the entire Apple ecosystem on the browser side.

Redirect: validate the device coverage matrix before picking the protocol. If Apple is in the audience, HLS or LL-HLS is in the stack.

Mistake 6 — Forgetting DRM in the protocol choice

Symptom: the team picks WebRTC for delivery, then realises in month 6 that the studios require Widevine or FairPlay DRM.

Why it is wrong: WebRTC does not carry the three commercial DRMs. They are built around HLS and DASH with CENC packaging.

Redirect: ask the monetisation team about DRM requirements at Node 5 of the tree, before the protocol decision is final.

Mistake 7 — Under-estimating multi-CDN at scale

Symptom: the team ships single-CDN at 100,000+ concurrent, then has a regional outage and loses 30% of the audience for an hour.

Why it is wrong: single-CDN at six-figure concurrent is an availability risk. Multi-CDN with content steering is the standard answer for any audience above the 50,000 concurrent threshold.

Redirect: at Node 2 of the tree, anything over 100,000 is multi-CDN territory by default. The cost is 10–20% above single-CDN; the availability gain is large.

Mistake 8 — Picking the protocol the team likes rather than the protocol the use case demands

Symptom: the team has WebRTC engineers, so every architecture document includes WebRTC.

Why it is wrong: the protocol the team is most comfortable with is rarely the right protocol for the use case. WebRTC engineers will architect WebRTC stacks; HLS engineers will architect HLS stacks. The use case must drive the protocol, not the other way round.

Redirect: walk the tree blind to team capability for Nodes 1–6. Add team capability only at Node 7. Make team capability a constraint on simplification, not a driver of the protocol choice.

A Worked Numeric Example — Per-Viewer Cost At Scale

The single most under-modelled number in any architecture decision is per-viewer cost at peak. Let us do the math for one canonical scenario: 100,000 peak concurrent viewers on a sports OTT, average bitrate 4 Mbps (a 1080p H.264 stream), broadcast duration 2 hours.

Bandwidth per viewer-hour:

Bitrate × seconds-in-an-hour ÷ bits-per-byte
4,000,000 bits/s × 3,600 s/hr ÷ 8 bits/byte
= 1,800,000,000 bytes/hour
= 1.8 GB/hour per viewer

Total bandwidth for the 2-hour broadcast:

100,000 viewers × 1.8 GB/hr × 2 hours
= 360,000 GB
= 360 TB

Per-viewer cost on an LL-HLS-over-CDN stack at a 2026-typical CDN rate of $0.012 per GB at a 500 TB monthly commit tier:

360 TB × 1000 GB/TB × $0.012/GB
= $4,320 for the broadcast
÷ 100,000 viewers
= $0.043 per viewer for the 2-hour broadcast

Per-viewer cost on a WebRTC-over-SFU stack at a 2026-typical $0.40/hour per concurrent participant on a managed SFU (LiveKit Cloud, Mux Real-Time, Daily, similar):

100,000 viewers × $0.40/hr × 2 hours
= $80,000 for the broadcast
÷ 100,000 viewers
= $0.80 per viewer for the 2-hour broadcast

The ratio: $0.80 ÷ $0.043 ≈ 19× higher per viewer on WebRTC. The arithmetic is brutal at scale. For a passive sports broadcast where the viewer cannot perceive the latency difference between 300 ms and 3 seconds, the WebRTC stack wastes $76,000 per broadcast. Over a season of 30 broadcasts, that is $2.3 million in cloud bills with no user-experience gain.

This is the math behind every "stop, do not ship WebRTC for the broadcast tier" recommendation we make in audits. The decision tree's Node 1 → low-latency branch exists precisely to keep this mistake from being made.

Where Fora Soft Fits In

We have shipped streaming, WebRTC, OTT, conferencing, telemedicine, e-learning, and surveillance stacks since 2005 across 239+ projects. The decision tree in this article is the same tree we walk every new architecture conversation through — for live-shopping platforms in the 5,000-concurrent tier, telemedicine providers under HIPAA, e-learning broadcasters at exam-week peaks, and OTT operators at the six-figure-concurrent broadcast tier. We have audited stacks that picked WebRTC for the broadcast tail (and helped migrate them to LL-HLS, cutting cloud spend by an order of magnitude). We have audited HLS-only stacks for live auctions (and helped them migrate to WebRTC + LL-HLS hybrid, reclaiming the auction-window latency). The pattern in both audits is the same: the protocol fits the use case, or the cloud bill or the user experience eventually demands a rework. The tree exists to avoid the rework.

Call to action

Talk to a streaming engineer — book a 30-minute scoping call to talk through your how to choose a streaming protocol plan.
See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
Download the Delivery protocol decision worksheet — A printable, single-page poster of the eight-node decision tree, sized for an architecture-review-room wall.

References

IETF RFC 8216 — HTTP Live Streaming (R. Pantos, W. May, August 2017). The canonical HLS specification. Defines the M3U8 playlist format, the segment structure, the EXT-X-* tags. https://www.rfc-editor.org/rfc/rfc8216

Apple HLS Authoring Specification for Apple Devices, revision 2025-09. The controlling document for LL-HLS extensions to RFC 8216 (parts, preload hints, blocking playlist reloads, rendition reports). Apple removed HTTP/2 server push from the spec in the 2023 revision; older articles describing push as required are out of date. https://developer.apple.com/documentation/http-live-streaming/hls-authoring-specification-for-apple-devices

ISO/IEC 23009-1:2022 — Information technology — Dynamic adaptive streaming over HTTP (DASH) — Part 1: Media presentation description and segment formats. The canonical DASH specification. (Paywalled at the ISO catalogue; DASH-IF Implementation Guidelines mirror the normative behaviour for free.) https://www.iso.org/standard/83314.html

ISO/IEC 23000-19:2024 — Information technology — Multimedia application format (MPEG-A) — Part 19: Common media application format (CMAF) for segmented media. The CMAF specification. https://www.iso.org/standard/85623.html

IETF RFC 9725 — WebRTC-HTTP Ingestion Protocol (WHIP) (S. Murillo, A. Gouaillard, March 2025). The standardised HTTP-based ingest protocol for WebRTC, used by every major encoder vendor (OBS, Larix, FFmpeg, Cloudflare, Dolby, AWS) for interoperable WebRTC contribution. https://www.rfc-editor.org/rfc/rfc9725

draft-ietf-wish-whep-03 — WebRTC-HTTP Egress Protocol (WHEP) (2026). The egress companion to WHIP, still an IETF Internet-Draft. Subject to change before RFC publication. https://datatracker.ietf.org/doc/html/draft-ietf-wish-whep-03

draft-ietf-moq-transport (published 2026-05-01) — Media over QUIC Transport (MOQT). The working group's current draft of record, with intended status Standards Track and expiration 2026-11-02. Defines a publish/subscribe protocol over QUIC and WebTransport with streams, datagrams, priorities, and partial reliability, operating point-to-point and through relays. Subject to revision before RFC publication; expect 1–2 more drafts before the working group ships an RFC. https://datatracker.ietf.org/doc/draft-ietf-moq-transport/

draft-theo-hesp-06 — High Efficiency Streaming Protocol (HESP) (THEO Technologies / Dolby OptiView). The IETF Internet-Draft for HESP. Production deployments use commercial player SDKs from the HESP Alliance. https://datatracker.ietf.org/doc/draft-theo-hesp/

draft-sharabayko-srt — Secure Reliable Transport (SRT) Protocol (M. Sharabayko, M. Sharabayko, J. Kim). The IETF Internet-Draft for SRT, the dominant open contribution protocol over the public internet. https://datatracker.ietf.org/doc/draft-sharabayko-srt/

W3C WebRTC 1.0: Real-time Communication Between Browsers, W3C Recommendation. The browser-side WebRTC API specification. https://www.w3.org/TR/webrtc/

IETF RFC 8825 — Overview: Real-Time Protocols for Browser-Based Applications. The umbrella RFC for the WebRTC protocol family (RFC 8825 through RFC 8866 cover the full stack). https://www.rfc-editor.org/rfc/rfc8825

IETF RFC 9000 — QUIC: A UDP-Based Multiplexed and Secure Transport (J. Iyengar, M. Thomson, May 2021). The QUIC specification that underpins MoQ. https://www.rfc-editor.org/rfc/rfc9000

IETF RFC 9114 — HTTP/3 (M. Bishop, June 2022). The HTTP/3 specification, mapping HTTP onto QUIC. https://www.rfc-editor.org/rfc/rfc9114

DASH-IF Implementation Guidelines: Low-Latency Modes for DASH (DASH Industry Forum). The de-facto production profile for LL-DASH with CMAF chunked transfer. https://dashif.org/guidelines/

Cloudflare blog — Media over QUIC production deployments (2025–2026). First-party engineering posts from a moq-transport co-editor's organisation, covering the operational state of MoQ in production pilots. https://blog.cloudflare.com/

Bitmovin Video Developer Report 2025. Industry-wide adoption numbers for streaming protocols, DRM systems, and CDN strategies. https://bitmovin.com/video-developer-report-2025

Conviva State of Streaming 2025. Aggregated QoE benchmarks across the major streaming platforms, including by-protocol failure modes and latency distributions. https://www.conviva.com/state-of-streaming/

Apple HLS Authoring Specification — Change Log (the revision history Apple ships at the top of the spec). The source for the September 2023 removal of HTTP/2 push from LL-HLS. https://developer.apple.com/documentation/http-live-streaming/hls-authoring-specification-for-apple-devices

IETF moq working group charter (https://datatracker.ietf.org/wg/moq/about/). The charter that defines what MoQ is and is not, and what the working group is committed to delivering before RFC publication.

Fora Soft internal architecture-review notes (2023–2026). Anonymised audit findings across 60+ live-streaming, OTT, conferencing, telemedicine, and live-shopping engagements. Source of the eight worked scenarios and the eight common-mistake patterns. Not externally cited.