Why This Matters
Most "we picked HLS" decisions are not real architecture decisions; they are summaries of the delivery leg of a stack that also carries RTMP on ingest, WebRTC for a moderator screen, and a sidecar WebSocket for chat. If you are a product manager, founder, or operations lead making a build-vs-buy call on a live-video product, you need to know that the answer is almost never one protocol — it is a stack — and the cost, latency, and scale profile of that stack depends on which legs you chose and how they hand off to each other. This article is the canonical Block 4 reference on hybrid stacks: what "hybrid" means in this context, which combinations are common in 2026, why they exist, how they hand off internally, and where they break. It complements the comparison-matrix article (4.13) and the decision-tree article (4.14) — the matrix gives you the per-protocol data; this article gives you the architecture that combines them.
What "Hybrid Stack" Actually Means
A streaming pipeline has four jobs to do — capture, contribute, deliver, and play. Each job can in principle use a different protocol, and in practice it almost always does. The word hybrid stack names the everyday case: more than one protocol live on the wire inside the same product.
The reason hybrid stacks are normal — not a sign of indecision — is the push–pull split the section's foundations article covers in detail. The job called contribution (also ingest) is the upstream leg from an encoder in a venue, a studio, or a phone to your media server. The job called distribution (also delivery or egress) is the downstream leg from the media server to every viewer. They have completely different requirements: contribution is one-to-one and tolerates a few hundred milliseconds of buffering; distribution is one-to-many and has to spread bytes over a public CDN at internet scale. A protocol optimised for one is rarely the right tool for the other.
This is why almost every architecture diagram you read in 2026 has at least two protocols. RTMPS or SRT carries the contribution leg from OBS or a hardware encoder up to the origin. HLS, LL-HLS, DASH, LL-DASH, WebRTC, or WHEP carries the distribution leg from the origin out to viewers. Inside the origin, the stream is repackaged once into a format both sides can read — typically CMAF fragmented MP4 segments. The architecture you ship is the choice of those two or three protocols and the repackaging in the middle, not the choice of "one" protocol.
The Four Patterns You Will Actually See
The combinatorial space is large, but four hybrid patterns cover more than ninety per cent of live-video products in 2026. Each pattern exists because a different combination of latency, scale, and cost is what the product needs.
Pattern A — Interactive top, broadcast tail (live shopping, auctions, town halls)
The use case: a host on camera, a handful of guests or buyers who join on camera, and a large passive audience that watches and types in chat. The latency requirement splits in two — the host and guests need sub-second round trip so the conversation does not collapse, and the passive audience needs the stream to land within two to four seconds so the latency does not break the chat experience.
The architecture: WebRTC or WHEP carries the interactive ring of host and guests; LL-HLS or LL-DASH carries the broadcast tail. Both legs feed off the same CMAF segments inside the origin, so the audio and video are bit-identical across the two surfaces — only the transport differs.
Why teams pick this stack: Mux's published case study with the Lyvecom shoppable-video platform makes the cost-of-latency argument concrete — when stream-to-chat lag passes ten seconds, conversion rates fall measurably, and the buyer experience stops feeling like a live event. Sub-second is overkill for the passive ninety per cent of viewers and would cost ten to twenty times more than HTTP-cacheable delivery; pure LL-HLS is too slow for the host and guests. Hybrid is the only stack that hits both targets at the same cost ceiling.
Pattern B — Broadcast main, interactive moderator (sports, news, premium live)
The use case: a one-to-many broadcast where the producer in the control room needs a sub-second preview of the on-air feed in order to make cut decisions, run a moderator panel, or sanity-check what is shipping to viewers. The viewer audience can ride LL-HLS at two to three seconds; the control room cannot.
The architecture: LL-HLS or LL-DASH for the public viewer tier; a private WHEP endpoint for the producer's preview surface. Same origin, same CMAF segments, two delivery formats — one of them deliberately not cached at the CDN.
Why teams pick this stack: building a parallel preview path on plain HLS adds an extra two-to-three-second blind spot to the control room, which is operationally unacceptable for live news or live sport. A second WebRTC publishing leg out of the origin is a small, bounded engineering cost — the producer audience is in the tens, not the millions — and it removes the blind spot for the people making the on-air decisions.
Pattern C — Real-time core, recording-and-replay tail (conferencing, telemedicine, e-learning)
The use case: a meeting, consultation, or class is run end-to-end on WebRTC because every participant is interactive. After the session, the recording has to be available as a normal on-demand video — clip it, share it, watch it on a phone three weeks later — and that means HLS or DASH.
The architecture: an SFU (Selective Forwarding Unit, the WebRTC topology covered in 8.4) carries the meeting. A side recorder subscribes to the SFU's tracks, composes them into a single mixed stream, and writes out fragmented MP4 plus an HLS manifest. After the session ends, the manifest goes to a CDN and the recording plays back from any browser, phone, or smart TV.
Why teams pick this stack: WebRTC is the right tool for the live meeting (the conferencing-grade SFU article covers why); HLS is the right tool for the replay, because the cost of playback over a long tail (months or years after the session) on commodity HTTP is roughly two orders of magnitude lower than keeping a WebRTC media server alive for every replay. The protocols are not in competition here — they are doing different jobs at different points in time.
Pattern D — MoQ trialled in parallel with LL-HLS (future-leaning live)
The use case: a team that wants to be ready for the Media over QUIC transition without making MoQ a single point of failure today. Ship the proven LL-HLS stack to the entire audience and run a small percentage of traffic over MoQ as a parallel A/B.
The architecture: encoder pushes RTMPS or WHIP to the origin; origin produces both a CMAF + LL-HLS manifest and a parallel MoQ relay subscription. Players that opt in to MoQ (typically via a feature flag) connect over QUIC; everyone else lands on LL-HLS. The metrics dashboard records glass-to-glass latency, startup time, and rebuffering ratio separately for each leg.
Why teams pick this stack: as of draft-ietf-moq-transport-17 (January 2026), MoQ is still working its way through IETF and is not yet an RFC; production deployments today carry interoperability risk. Running MoQ in parallel with LL-HLS lets the team build the operational muscle — relays, player code, observability — while the load-bearing path remains the proven HTTP-based stack. The first NAB-class MoQ interop with eleven vendors happened in 2026; "MoQ-only" is a 2027–2028 conversation for most products, but "MoQ alongside LL-HLS" is a 2026 conversation today.
How The Handoff Inside The Origin Actually Works
The interesting engineering in a hybrid stack is not on either end — it is in the middle, at the moment the contribution stream becomes one or more delivery streams. Doing this well is what separates a stack that holds up at a million viewers from one that falls over at fifty thousand.
The discipline is to package once, deliver many. The origin transcodes the incoming RTMPS, SRT, or WHIP stream into a CMAF fragmented MP4 source. From that single source, the origin emits an HLS manifest, a DASH manifest, and (if needed) a WHEP endpoint that re-wraps the same media into a WebRTC delivery session. The audio and video bytes are identical on every leg — only the manifest format, the transport, and the framing differ. This is the design CMAF was created for, and it is the reason CMAF is on the inside of every modern hybrid stack even when CMAF is never named on the customer-facing side.
The discipline that kills costs is that the broadcast tail (HLS, LL-HLS, DASH, LL-DASH) is HTTP, which means every byte the origin emits is cacheable at the CDN. The interactive tier (WebRTC, WHEP) is per-session and never caches, but it is also a small audience by design. Mixing these intentionally — small interactive audience, large cached audience, same media source — is the architecture that gives you sub-second responsiveness for the host and CDN-grade economics for the crowd.
The Cost Arithmetic — Why Hybrid Is Cheaper Than Pure WebRTC At Scale
The case for hybrid stacks is not aesthetic; it is dollars. Work the numbers for a typical live-shopping event with one host, twenty interactive buyers on camera, and fifty thousand passive viewers in chat.
The interactive tier — twenty-one camera participants — runs on a WebRTC SFU at roughly 2 Mbps of upload from each participant and a similar download per other participant, plus the host. The audience for this tier is twenty-one people; the cost is whatever the SFU minute-rate is for twenty-one participants over the length of the event.
The broadcast tail — fifty thousand passive viewers — runs on LL-HLS over CMAF at, say, 4 Mbps per viewer in 1080p. The arithmetic:
Aggregate egress bandwidth:
50,000 viewers × 4 Mbps = 200,000 Mbps = 200 Gbps
CDN cost on a commodity tier-1 HTTP contract:
At $0.010 per GB delivered:
200 Gbps × 3600 s × 1 hour ÷ 8 bits/byte = 90,000 GB per hour
90,000 GB × $0.010 = $900 per hour
Now suppose the same audience watched the same stream on pure WebRTC, with no LL-HLS fallback. A WebRTC media server tier handles fifty thousand concurrent egress sessions instead of fifty thousand cacheable HTTP fetches. The same media server fleet typically costs three to ten times more than HTTP-cached delivery — the vendor has to provision per-session capacity rather than amortise across cached objects. The same hour, at the same bitrate, on the same audience, on the same hardware: three thousand to nine thousand dollars instead of nine hundred. Sub-second latency for fifty thousand people who type at most a few times per hour is paying ten times the price for a feature ninety per cent of them will not use.
The hybrid stack splits the audience by need: the twenty-one people who genuinely need sub-second latency pay the sub-second price; the fifty thousand who need two-second latency pay the two-second price. That split is the architecture's entire economic justification.
A Common Mistake — Treating The Stack As Monolithic
The single most common architecture failure in this space is shipping the live-shopping product (Pattern A) on pure WebRTC because "we want low latency for everyone", running the bill up by an order of magnitude, and then telling the board the streaming spend is out of control.
The failure pattern: someone reads that WebRTC is "the lowest latency protocol", and treats every minute every viewer is on the stream as a place to pay for sub-second latency. The mistake is conceptual: latency is not a constant per audience — it is per role. The host and the buyers on camera are in the conversation; they need sub-second. The fifty thousand viewers typing emoji in chat are watching the conversation; they need the stream to land before the chat scrolls, not before the host blinks.
The fix is to design the architecture as a function of role, not a function of audience size. The interactive tier — buyers, panelists, moderators, the producer — gets the protocol that buys sub-second responsiveness for the cost of per-session capacity. The broadcast tier — everyone else — gets the protocol that buys CDN economics. Both tiers see the same content. The product feels live to everyone; the bill is paid for by the people who genuinely need real-time, not by the long tail.
Where Fora Soft Fits In
We have shipped hybrid stacks across every vertical the section covers — live-shopping platforms with a WebRTC interactive ring on top of an LL-HLS broadcast tail, OTT and Internet-TV builds with a CMAF-packaged origin feeding both HLS and DASH manifests, telemedicine and e-learning systems with WebRTC for the live consultation or class plus an HLS recording for replay, and video-surveillance platforms with WHEP for the operator preview and HLS for the multi-viewer review wall. The pattern that keeps showing up is the same as the one in this article: package once, deliver many, and split the audience by role. The cost ceiling and the responsiveness floor are both architecture choices, not protocol choices.
A Worked Example — Re-Architecting A 2018 Pure-WebRTC Build To A Hybrid Stack
Make the upgrade concrete. Suppose you inherit a 2018-vintage live-event product that ships everything — host, buyers, audience — on a single WebRTC SFU. The product works at five hundred concurrent viewers. At five thousand, the SFU costs double; at fifty thousand, the SFU costs would consume the entire revenue line.
Step one — split the audience. Identify the smallest set of users who genuinely need sub-second latency: in a live-shopping product, that is the host, the on-camera buyers, the moderators, and (if relevant) the producer. Everyone else is audience, not participant.
Step two — add a CMAF + LL-HLS surface. At the origin, subscribe to the host's video track from the SFU and re-encode it into a CMAF fragmented MP4 source. Emit an LL-HLS manifest from that source. Deploy the manifest behind a commodity HTTP CDN. The audience now plays from LL-HLS, not from the SFU.
Step three — keep the chat real-time. The chat service was almost certainly already on WebSocket; leave it there. The only audio and video that need to be sub-second are the on-camera people; the chat does not have a latency budget tied to the SFU.
Step four — keep WebRTC for the interactive ring. Do not delete the SFU. The host and the on-camera buyers still publish and subscribe through it. The architecture is now a hybrid stack: WebRTC for the interactive ring, LL-HLS for the broadcast tail, WebSocket for chat.
Step five — measure the bill. The SFU fleet now serves the host plus the on-camera buyers — typically twenty to fifty concurrent publishers and subscribers — instead of fifty thousand. The LL-HLS fleet serves the fifty thousand on commodity HTTP. The change is a six-to-ten-times reduction in egress cost at the same audience and the same product experience.
This is the canonical 2026 modernisation pattern for any live-event product that started life on pure WebRTC and ran into the cost ceiling: split by role, package once, deliver many.
Comparison — When To Use Which Hybrid Pattern
The four patterns above map to four product profiles. Use this table when you need to pick a starting architecture for a new build.
| Product profile | Interactive tier | Broadcast tier | Replay tier | Pattern |
|---|---|---|---|---|
| Live shopping, auctions, town halls | WebRTC / WHEP for host + on-camera buyers | LL-HLS or LL-DASH for passive audience | Optional HLS VOD | Pattern A |
| Premium sports, live news | Private WHEP for control room / moderator preview | LL-HLS or LL-DASH for viewer tier | HLS VOD highlights | Pattern B |
| Conferencing, telemedicine, e-learning | WebRTC SFU end-to-end | None (every viewer is a participant) | HLS recording for replay | Pattern C |
| Future-leaning live (sports, real-time) | WebRTC or WHEP for interactive tier (optional) | LL-HLS for the load-bearing path; MoQ in parallel A/B | HLS VOD or MoQ recorded objects | Pattern D |
| OTT live linear (channels, FAST) | None (no interactive tier by design) | HLS for Apple / iOS, DASH for Android / TV, CMAF inside | HLS VOD | Hybrid HLS + DASH, no WebRTC |
What To Read Next
- Picking a delivery protocol in 2026: a decision tree — the BOFU article that takes the patterns here and turns them into a per-product recommendation.
- Protocol comparison matrix — the per-protocol data that explains why each leg of a hybrid stack lands on the protocol it does.
- CMAF — the packaging format that unified HLS and DASH — the format that makes "package once, deliver many" practical.
Talk To Us / See Our Work / Download
- Talk to a streaming engineer — book a 30-minute scoping call to map your product's latency and audience profile onto a hybrid stack.
- See our case studies — production hybrid stacks across live shopping, OTT, telemedicine, e-learning, surveillance.
- Download the Hybrid stack architecture pattern card — a one-page PDF that summarises the four canonical hybrid patterns and the role-split logic. Download (PDF)
References
- IETF RFC 9725, WebRTC-HTTP Ingestion Protocol (WHIP), March 2025. Canonical specification for the WebRTC ingest leg used in Pattern A and Pattern D contribution. https://www.rfc-editor.org/rfc/rfc9725
- IETF draft-ietf-wish-whep (latest revision, 2026), WebRTC-HTTP Egress Protocol (WHEP). Specification for the WebRTC egress leg used in the interactive tier of Patterns A and B. https://datatracker.ietf.org/doc/draft-ietf-wish-whep/
- IETF RFC 8216, HTTP Live Streaming, August 2017. The HLS base specification (controlling document for the broadcast tail in every pattern except pure Pattern C). https://www.rfc-editor.org/rfc/rfc8216
- Apple, HLS Authoring Specification for Apple Devices, revision 2025-09. Apple's normative layer on top of RFC 8216, including the LL-HLS extensions used in Patterns A, B, and D. https://developer.apple.com/documentation/http-live-streaming/hls-authoring-specification-for-apple-devices
- ISO/IEC 23009-1:2022, Information technology — Dynamic adaptive streaming over HTTP (DASH) — Part 1: Media presentation description and segment formats. The DASH base standard. Used on the broadcast tail of any stack that targets Android and smart TV alongside Apple.
- ISO/IEC 23000-19:2024, Common Media Application Format (CMAF) for segmented media. The packaging format on the inside of every modern hybrid stack — what makes "package once, deliver many" possible.
- IETF draft-ietf-moq-transport-17, Media over QUIC Transport, January 2026. The Media over QUIC base specification cited for Pattern D; Internet-Draft, subject to revision before publication as an RFC. https://datatracker.ietf.org/doc/draft-ietf-moq-transport/
- Cloudflare, Cloudflare Stream documentation: Stream live video. Public reference for an RTMPS / SRT / WHIP ingest plus HLS / DASH / WHEP playback hybrid. https://developers.cloudflare.com/stream/stream-live/
- Cloudflare blog, WebRTC live streaming to unlimited viewers, with sub-second latency. Production-deployer source for the WHIP-WHEP egress half of Pattern A and Pattern B. https://blog.cloudflare.com/webrtc-whip-whep-cloudflare-stream/
- Mux, It's all smooth shopping from here: how low latency spikes sales in live shoppable videos (Lyvecom case study). Production deployer source for the latency-to-conversion argument used to justify Pattern A. https://www.mux.com/case-studies/lyvecom
- LiveKit blog, A tale of two protocols: comparing WebRTC against HLS for live streaming. Practitioner discussion of the WebRTC vs HLS trade-off that drives every hybrid stack. https://blog.livekit.io/webrtc-vs-hls-livestreaming/
- Flussonic, Low-Latency WebRTC Streaming: Real-Time Video at Scale. Vendor engineering source on the per-session cost shape of WebRTC at the broadcast tail. https://flussonic.com/blog/article/low-latency-webrtc-streaming
- Ant Media, Video Streaming Protocols (2026 Update). Vendor engineering source on multi-protocol output pipelines (single RTMP / SRT ingest, simultaneous WebRTC / LL-HLS / CMAF delivery). https://antmedia.io/streaming-protocols/
- Wowza blog, What Is Media Over QUIC (MoQ)?. Vendor engineering source on the MoQ-plus-LL-HLS hybrid in Pattern D. https://www.wowza.com/blog/what-is-media-over-quic-moq-and-why-are-people-talking-about-it


