The "Switching Protocols" Reality: Hybrid Stacks

Why This Matters

Most "we picked HLS" decisions are not real architecture decisions; they are summaries of the delivery leg of a stack that also carries RTMP on ingest, WebRTC for a moderator screen, and a sidecar WebSocket for chat. If you are a product manager, founder, or operations lead making a build-vs-buy call on a live-video product, you need to know that the answer is almost never one protocol — it is a stack — and the cost, latency, and scale profile of that stack depends on which legs you chose and how they hand off to each other. This article is the canonical Block 4 reference on hybrid stacks: what "hybrid" means in this context, which combinations are common in 2026, why they exist, how they hand off internally, and where they break. It complements the comparison-matrix article (4.13) and the decision-tree article (4.14) — the matrix gives you the per-protocol data; this article gives you the architecture that combines them.

What "Hybrid Stack" Actually Means

A streaming pipeline has four jobs to do — capture, contribute, deliver, and play. Each job can in principle use a different protocol, and in practice it almost always does. The word hybrid stack names the everyday case: more than one protocol live on the wire inside the same product.

The reason hybrid stacks are normal — not a sign of indecision — is the push–pull split the section's foundations article covers in detail. The job called contribution (also ingest) is the upstream leg from an encoder in a venue, a studio, or a phone to your media server. The job called distribution (also delivery or egress) is the downstream leg from the media server to every viewer. They have completely different requirements: contribution is one-to-one and tolerates a few hundred milliseconds of buffering; distribution is one-to-many and has to spread bytes over a public CDN at internet scale. A protocol optimised for one is rarely the right tool for the other.

This is why almost every architecture diagram you read in 2026 has at least two protocols. RTMPS or SRT carries the contribution leg from OBS or a hardware encoder up to the origin. HLS, LL-HLS, DASH, LL-DASH, WebRTC, or WHEP carries the distribution leg from the origin out to viewers. Inside the origin, the stream is repackaged once into a format both sides can read — typically CMAF fragmented MP4 segments. The architecture you ship is the choice of those two or three protocols and the repackaging in the middle, not the choice of "one" protocol.

Figure 1. The canonical 2026 hybrid stack. One contribution leg, two delivery legs, one shared CMAF packaging in the middle. The architecture is the combination, not any single line on the diagram.

The Four Patterns You Will Actually See

The combinatorial space is large, but four hybrid patterns cover more than ninety per cent of live-video products in 2026. Each pattern exists because a different combination of latency, scale, and cost is what the product needs.

Pattern A — Interactive top, broadcast tail (live shopping, auctions, town halls)

The use case: a host on camera, a handful of guests or buyers who join on camera, and a large passive audience that watches and types in chat. The latency requirement splits in two — the host and guests need sub-second round trip so the conversation does not collapse, and the passive audience needs the stream to land within two to four seconds so the latency does not break the chat experience.

The architecture: WebRTC or WHEP carries the interactive ring of host and guests; LL-HLS or LL-DASH carries the broadcast tail. Both legs feed off the same CMAF segments inside the origin, so the audio and video are bit-identical across the two surfaces — only the transport differs.

Why teams pick this stack: Mux's published case study with the Lyvecom shoppable-video platform makes the cost-of-latency argument concrete — when stream-to-chat lag passes ten seconds, conversion rates fall measurably, and the buyer experience stops feeling like a live event. Sub-second is overkill for the passive ninety per cent of viewers and would cost ten to twenty times more than HTTP-cacheable delivery; pure LL-HLS is too slow for the host and guests. Hybrid is the only stack that hits both targets at the same cost ceiling.

Pattern B — Broadcast main, interactive moderator (sports, news, premium live)

The use case: a one-to-many broadcast where the producer in the control room needs a sub-second preview of the on-air feed in order to make cut decisions, run a moderator panel, or sanity-check what is shipping to viewers. The viewer audience can ride LL-HLS at two to three seconds; the control room cannot.

The architecture: LL-HLS or LL-DASH for the public viewer tier; a private WHEP endpoint for the producer's preview surface. Same origin, same CMAF segments, two delivery formats — one of them deliberately not cached at the CDN.

Why teams pick this stack: building a parallel preview path on plain HLS adds an extra two-to-three-second blind spot to the control room, which is operationally unacceptable for live news or live sport. A second WebRTC publishing leg out of the origin is a small, bounded engineering cost — the producer audience is in the tens, not the millions — and it removes the blind spot for the people making the on-air decisions.

Pattern C — Real-time core, recording-and-replay tail (conferencing, telemedicine, e-learning)

The use case: a meeting, consultation, or class is run end-to-end on WebRTC because every participant is interactive. After the session, the recording has to be available as a normal on-demand video — clip it, share it, watch it on a phone three weeks later — and that means HLS or DASH.

The architecture: an SFU (Selective Forwarding Unit, the WebRTC topology covered in 8.4) carries the meeting. A side recorder subscribes to the SFU's tracks, composes them into a single mixed stream, and writes out fragmented MP4 plus an HLS manifest. After the session ends, the manifest goes to a CDN and the recording plays back from any browser, phone, or smart TV.

Why teams pick this stack: WebRTC is the right tool for the live meeting (the conferencing-grade SFU article covers why); HLS is the right tool for the replay, because the cost of playback over a long tail (months or years after the session) on commodity HTTP is roughly two orders of magnitude lower than keeping a WebRTC media server alive for every replay. The protocols are not in competition here — they are doing different jobs at different points in time.

Pattern D — MoQ trialled in parallel with LL-HLS (future-leaning live)

The use case: a team that wants to be ready for the Media over QUIC transition without making MoQ a single point of failure today. Ship the proven LL-HLS stack to the entire audience and run a small percentage of traffic over MoQ as a parallel A/B.

The architecture: encoder pushes RTMPS or WHIP to the origin; origin produces both a CMAF + LL-HLS manifest and a parallel MoQ relay subscription. Players that opt in to MoQ (typically via a feature flag) connect over QUIC; everyone else lands on LL-HLS. The metrics dashboard records glass-to-glass latency, startup time, and rebuffering ratio separately for each leg.

Why teams pick this stack: as of draft-ietf-moq-transport-17 (January 2026), MoQ is still working its way through IETF and is not yet an RFC; production deployments today carry interoperability risk. Running MoQ in parallel with LL-HLS lets the team build the operational muscle — relays, player code, observability — while the load-bearing path remains the proven HTTP-based stack. The first NAB-class MoQ interop with eleven vendors happened in 2026; "MoQ-only" is a 2027–2028 conversation for most products, but "MoQ alongside LL-HLS" is a 2026 conversation today.

How The Handoff Inside The Origin Actually Works

The interesting engineering in a hybrid stack is not on either end — it is in the middle, at the moment the contribution stream becomes one or more delivery streams. Doing this well is what separates a stack that holds up at a million viewers from one that falls over at fifty thousand.

The discipline is to package once, deliver many. The origin transcodes the incoming RTMPS, SRT, or WHIP stream into a CMAF fragmented MP4 source. From that single source, the origin emits an HLS manifest, a DASH manifest, and (if needed) a WHEP endpoint that re-wraps the same media into a WebRTC delivery session. The audio and video bytes are identical on every leg — only the manifest format, the transport, and the framing differ. This is the design CMAF was created for, and it is the reason CMAF is on the inside of every modern hybrid stack even when CMAF is never named on the customer-facing side.

The discipline that kills costs is that the broadcast tail (HLS, LL-HLS, DASH, LL-DASH) is HTTP, which means every byte the origin emits is cacheable at the CDN. The interactive tier (WebRTC, WHEP) is per-session and never caches, but it is also a small audience by design. Mixing these intentionally — small interactive audience, large cached audience, same media source — is the architecture that gives you sub-second responsiveness for the host and CDN-grade economics for the crowd.

A side-by-side comparison of two origin patterns. On the left, a wrong pattern labeled Separate encodes shows three encoder boxes feeding three independent delivery surfaces: one encoder per delivery protocol, three sets of bitstream copies, three sets of CDN traffic, three sets of bills. On the right, a correct pattern labeled Package once, deliver many shows a single encoder feeding one CMAF fragmented MP4 packager, from which three thin arrows fan out to an HLS manifest, a DASH manifest, and a WHEP endpoint. A label between the two patterns reads cost ratio approximately 3 to 1 in favour of the right pattern. A footnote at the bottom reads CMAF was designed so the same fragmented MP4 segments serve both HLS and DASH, removing the duplicate-encoding cost from the hybrid stack.

Figure 2. The single most important pattern in any hybrid stack — one encode, one CMAF source, many delivery surfaces. Building three separate encode-to-delivery pipelines is the most common way teams turn a hybrid stack into a runaway cloud bill.

The Cost Arithmetic — Why Hybrid Is Cheaper Than Pure WebRTC At Scale

The case for hybrid stacks is not aesthetic; it is dollars. Work the numbers for a typical live-shopping event with one host, twenty interactive buyers on camera, and fifty thousand passive viewers in chat.

The interactive tier — twenty-one camera participants — runs on a WebRTC SFU at roughly 2 Mbps of upload from each participant and a similar download per other participant, plus the host. The audience for this tier is twenty-one people; the cost is whatever the SFU minute-rate is for twenty-one participants over the length of the event.

The broadcast tail — fifty thousand passive viewers — runs on LL-HLS over CMAF at, say, 4 Mbps per viewer in 1080p. The arithmetic:

Aggregate egress bandwidth:
  50,000 viewers × 4 Mbps = 200,000 Mbps = 200 Gbps

CDN cost on a commodity tier-1 HTTP contract:
  At $0.010 per GB delivered:
  200 Gbps × 3600 s × 1 hour ÷ 8 bits/byte = 90,000 GB per hour
  90,000 GB × $0.010 = $900 per hour

Now suppose the same audience watched the same stream on pure WebRTC, with no LL-HLS fallback. A WebRTC media server tier handles fifty thousand concurrent egress sessions instead of fifty thousand cacheable HTTP fetches. The same media server fleet typically costs three to ten times more than HTTP-cached delivery — the vendor has to provision per-session capacity rather than amortise across cached objects. The same hour, at the same bitrate, on the same audience, on the same hardware: three thousand to nine thousand dollars instead of nine hundred. Sub-second latency for fifty thousand people who type at most a few times per hour is paying ten times the price for a feature ninety per cent of them will not use.

The hybrid stack splits the audience by need: the twenty-one people who genuinely need sub-second latency pay the sub-second price; the fifty thousand who need two-second latency pay the two-second price. That split is the architecture's entire economic justification.

A Common Mistake — Treating The Stack As Monolithic

The single most common architecture failure in this space is shipping the live-shopping product (Pattern A) on pure WebRTC because "we want low latency for everyone", running the bill up by an order of magnitude, and then telling the board the streaming spend is out of control.

The failure pattern: someone reads that WebRTC is "the lowest latency protocol", and treats every minute every viewer is on the stream as a place to pay for sub-second latency. The mistake is conceptual: latency is not a constant per audience — it is per role. The host and the buyers on camera are in the conversation; they need sub-second. The fifty thousand viewers typing emoji in chat are watching the conversation; they need the stream to land before the chat scrolls, not before the host blinks.

The fix is to design the architecture as a function of role, not a function of audience size. The interactive tier — buyers, panelists, moderators, the producer — gets the protocol that buys sub-second responsiveness for the cost of per-session capacity. The broadcast tier — everyone else — gets the protocol that buys CDN economics. Both tiers see the same content. The product feels live to everyone; the bill is paid for by the people who genuinely need real-time, not by the long tail.

Where Fora Soft Fits In

We have shipped hybrid stacks across every vertical the section covers — live-shopping platforms with a WebRTC interactive ring on top of an LL-HLS broadcast tail, OTT and Internet-TV builds with a CMAF-packaged origin feeding both HLS and DASH manifests, telemedicine and e-learning systems with WebRTC for the live consultation or class plus an HLS recording for replay, and video-surveillance platforms with WHEP for the operator preview and HLS for the multi-viewer review wall. The pattern that keeps showing up is the same as the one in this article: package once, deliver many, and split the audience by role. The cost ceiling and the responsiveness floor are both architecture choices, not protocol choices.

A Worked Example — Re-Architecting A 2018 Pure-WebRTC Build To A Hybrid Stack

Make the upgrade concrete. Suppose you inherit a 2018-vintage live-event product that ships everything — host, buyers, audience — on a single WebRTC SFU. The product works at five hundred concurrent viewers. At five thousand, the SFU costs double; at fifty thousand, the SFU costs would consume the entire revenue line.

Step one — split the audience. Identify the smallest set of users who genuinely need sub-second latency: in a live-shopping product, that is the host, the on-camera buyers, the moderators, and (if relevant) the producer. Everyone else is audience, not participant.

Step two — add a CMAF + LL-HLS surface. At the origin, subscribe to the host's video track from the SFU and re-encode it into a CMAF fragmented MP4 source. Emit an LL-HLS manifest from that source. Deploy the manifest behind a commodity HTTP CDN. The audience now plays from LL-HLS, not from the SFU.

Step three — keep the chat real-time. The chat service was almost certainly already on WebSocket; leave it there. The only audio and video that need to be sub-second are the on-camera people; the chat does not have a latency budget tied to the SFU.

Step four — keep WebRTC for the interactive ring. Do not delete the SFU. The host and the on-camera buyers still publish and subscribe through it. The architecture is now a hybrid stack: WebRTC for the interactive ring, LL-HLS for the broadcast tail, WebSocket for chat.

Step five — measure the bill. The SFU fleet now serves the host plus the on-camera buyers — typically twenty to fifty concurrent publishers and subscribers — instead of fifty thousand. The LL-HLS fleet serves the fifty thousand on commodity HTTP. The change is a six-to-ten-times reduction in egress cost at the same audience and the same product experience.

This is the canonical 2026 modernisation pattern for any live-event product that started life on pure WebRTC and ran into the cost ceiling: split by role, package once, deliver many.

Comparison — When To Use Which Hybrid Pattern

The four patterns above map to four product profiles. Use this table when you need to pick a starting architecture for a new build.

Product profile	Interactive tier	Broadcast tier	Replay tier	Pattern
Live shopping, auctions, town halls	WebRTC / WHEP for host + on-camera buyers	LL-HLS or LL-DASH for passive audience	Optional HLS VOD	Pattern A
Premium sports, live news	Private WHEP for control room / moderator preview	LL-HLS or LL-DASH for viewer tier	HLS VOD highlights	Pattern B
Conferencing, telemedicine, e-learning	WebRTC SFU end-to-end	None (every viewer is a participant)	HLS recording for replay	Pattern C
Future-leaning live (sports, real-time)	WebRTC or WHEP for interactive tier (optional)	LL-HLS for the load-bearing path; MoQ in parallel A/B	HLS VOD or MoQ recorded objects	Pattern D
OTT live linear (channels, FAST)	None (no interactive tier by design)	HLS for Apple / iOS, DASH for Android / TV, CMAF inside	HLS VOD	Hybrid HLS + DASH, no WebRTC

The cell people miss most often is the bottom one: even an OTT linear channel that has no interactive tier at all is a hybrid stack, because Apple devices need an HLS manifest and Android plus most smart TVs prefer a DASH manifest. The CMAF packaging in the middle is what lets you serve both without re-encoding.

Call to action

Talk to a streaming engineer — book a 30-minute scoping call to talk through your hybrid streaming protocols plan.
See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
Download the Hybrid stack architecture pattern card — One-page reference of the four canonical hybrid stack patterns, the package-once handoff, the role-not-audience cost split, and the steps to re-architect a pure-WebRTC build to a hybrid stack.

References

IETF RFC 9725, WebRTC-HTTP Ingestion Protocol (WHIP), March 2025. Canonical specification for the WebRTC ingest leg used in Pattern A and Pattern D contribution. https://www.rfc-editor.org/rfc/rfc9725
IETF draft-ietf-wish-whep (latest revision, 2026), WebRTC-HTTP Egress Protocol (WHEP). Specification for the WebRTC egress leg used in the interactive tier of Patterns A and B. https://datatracker.ietf.org/doc/draft-ietf-wish-whep/
IETF RFC 8216, HTTP Live Streaming, August 2017. The HLS base specification (controlling document for the broadcast tail in every pattern except pure Pattern C). https://www.rfc-editor.org/rfc/rfc8216
Apple, HLS Authoring Specification for Apple Devices, revision 2025-09. Apple's normative layer on top of RFC 8216, including the LL-HLS extensions used in Patterns A, B, and D. https://developer.apple.com/documentation/http-live-streaming/hls-authoring-specification-for-apple-devices
ISO/IEC 23009-1:2022, Information technology — Dynamic adaptive streaming over HTTP (DASH) — Part 1: Media presentation description and segment formats. The DASH base standard. Used on the broadcast tail of any stack that targets Android and smart TV alongside Apple.
ISO/IEC 23000-19:2024, Common Media Application Format (CMAF) for segmented media. The packaging format on the inside of every modern hybrid stack — what makes "package once, deliver many" possible.
IETF draft-ietf-moq-transport-17, Media over QUIC Transport, January 2026. The Media over QUIC base specification cited for Pattern D; Internet-Draft, subject to revision before publication as an RFC. https://datatracker.ietf.org/doc/draft-ietf-moq-transport/
Cloudflare, Cloudflare Stream documentation: Stream live video. Public reference for an RTMPS / SRT / WHIP ingest plus HLS / DASH / WHEP playback hybrid. https://developers.cloudflare.com/stream/stream-live/
Cloudflare blog, WebRTC live streaming to unlimited viewers, with sub-second latency. Production-deployer source for the WHIP-WHEP egress half of Pattern A and Pattern B. https://blog.cloudflare.com/webrtc-whip-whep-cloudflare-stream/
Mux, It's all smooth shopping from here: how low latency spikes sales in live shoppable videos (Lyvecom case study). Production deployer source for the latency-to-conversion argument used to justify Pattern A. https://www.mux.com/case-studies/lyvecom
LiveKit blog, A tale of two protocols: comparing WebRTC against HLS for live streaming. Practitioner discussion of the WebRTC vs HLS trade-off that drives every hybrid stack. https://blog.livekit.io/webrtc-vs-hls-livestreaming/
Flussonic, Low-Latency WebRTC Streaming: Real-Time Video at Scale. Vendor engineering source on the per-session cost shape of WebRTC at the broadcast tail. https://flussonic.com/blog/article/low-latency-webrtc-streaming
Ant Media, Video Streaming Protocols (2026 Update). Vendor engineering source on multi-protocol output pipelines (single RTMP / SRT ingest, simultaneous WebRTC / LL-HLS / CMAF delivery). https://antmedia.io/streaming-protocols/
Wowza blog, What Is Media Over QUIC (MoQ)?. Vendor engineering source on the MoQ-plus-LL-HLS hybrid in Pattern D. https://www.wowza.com/blog/what-is-media-over-quic-moq-and-why-are-people-talking-about-it

The "Switching Protocols" Reality: Hybrid Stacks

Why This Matters

What "Hybrid Stack" Actually Means

The Four Patterns You Will Actually See

Pattern A — Interactive top, broadcast tail (live shopping, auctions, town halls)

Pattern B — Broadcast main, interactive moderator (sports, news, premium live)

Pattern C — Real-time core, recording-and-replay tail (conferencing, telemedicine, e-learning)

Pattern D — MoQ trialled in parallel with LL-HLS (future-leaning live)

How The Handoff Inside The Origin Actually Works

The Cost Arithmetic — Why Hybrid Is Cheaper Than Pure WebRTC At Scale

A Common Mistake — Treating The Stack As Monolithic

Where Fora Soft Fits In

A Worked Example — Re-Architecting A 2018 Pure-WebRTC Build To A Hybrid Stack

Comparison — When To Use Which Hybrid Pattern

What To Read Next

Call to action

References

Related glossary terms

The "Switching Protocols" Reality: Hybrid Stacks

Why This Matters

What "Hybrid Stack" Actually Means

The Four Patterns You Will Actually See

Pattern A — Interactive top, broadcast tail (live shopping, auctions, town halls)

Pattern B — Broadcast main, interactive moderator (sports, news, premium live)

Pattern C — Real-time core, recording-and-replay tail (conferencing, telemedicine, e-learning)

Pattern D — MoQ trialled in parallel with LL-HLS (future-leaning live)

How The Handoff Inside The Origin Actually Works

The Cost Arithmetic — Why Hybrid Is Cheaper Than Pure WebRTC At Scale

A Common Mistake — Treating The Stack As Monolithic

Where Fora Soft Fits In

A Worked Example — Re-Architecting A 2018 Pure-WebRTC Build To A Hybrid Stack

Comparison — When To Use Which Hybrid Pattern

What To Read Next

Call to action

References

Related glossary terms

WebRTC delivery (egress)

Streaming pipeline

Video startup time

LiveKit

Contribution

Segment

Live streaming

WebRTC ingest