If you've already read our overview of QUIC and MoQ for product teams, you know the business case: sub-second latency at broadcast scale, one protocol in place of the RTMP/HLS/WebRTC patchwork, Cloudflare's global relay network live since August 2025.
This piece is for the engineers who have to build on it. We'll walk through the protocol stack layer by layer, how objects actually move from publisher to subscriber, where congestion is handled and why that matters in production, and the engineering challenges you'll hit before you ship — including the ones the spec doesn't warn you about.
Key Takeaways
- MoQ is three layers, not one. Transport (QUIC / WebTransport), pub/sub signaling (MoQT), and the streaming format (WARP, hang, custom) evolve independently. A change at one layer does not ripple through the others.
- The data model (tracks → groups → objects) is what enables both fast joins and graceful congestion handling. Subscribers join at group boundaries. Relays make priority calls without parsing media payloads.
- Fan-out scale comes from the relay model. One upstream subscription serves unlimited downstream viewers — no SFU cluster required.
- The spec is still draft. Cloudflare deployed against draft-07. IETF milestones run into 2027. Abstract your transport integration from day one.
- FETCH (catch-up / VOD) is not in current relay implementations. SUBSCRIBE (live) is. Plan accordingly.
- Safari does not support WebTransport. Build the fallback from the start, not after launch.
- Observability tooling is immature. Budget more time on logging and tracing than you'd spend on WebRTC or HLS.
The Three-Layer Protocol Stack
MoQ looks like a single protocol from the outside, but internally it is three layers with cleanly separated jobs. Getting this mental model right early saves a lot of architecture confusion later.
Figure 1: MoQ splits transport, session, and streaming format into three layers over UDP.
Layer 1 — Transport: QUIC and WebTransport
QUIC (RFC 9000) runs over UDP. The property that matters most for media is independent streams: a dropped packet on one stream — audio, say — has no effect on a parallel video stream. On TCP, that same packet drop blocks every byte behind it on the entire connection. That's head-of-line blocking, and it's why RTMP streams stutter when an encoder hiccups.
QUIC also handles: 0-RTT connection resumption so returning viewers start instantly, seamless migration when a device switches from Wi-Fi to cellular mid-stream, and mandatory TLS 1.3 on every connection — no opt-in required.
For browser clients, MoQ uses WebTransport, the W3C API that exposes QUIC's multiplexed streams and unreliable datagrams directly to JavaScript. This is what RTMP and SRT never had: the same protocol handles encoder ingest and browser viewer delivery, end to end, with no transcoding gateway in the middle.
Layer 2 — MoQT: The Pub/Sub Session
MoQT defines the session handshake (SETUP), the control messages (ANNOUNCE, SUBSCRIBE, PUBLISH), and the data hierarchy. It is intentionally media-agnostic. Relays speak MoQT. They do not speak H.264 or Opus or anything codec-specific.
This is the key architectural decision the IETF working group made. Rule 1 of the moq-dev reference implementation puts it plainly: "The CDN MUST NOT know anything about your application, media codecs, or even the available tracks." The relay is dumb by design. That is what makes end-to-end encryption possible at the media layer.
Layer 3 — Streaming Format: WARP, hang, or Custom
Media-specific logic — codec negotiation, manifest structure, containers, catalogs — lives here. WARP (draft-ietf-moq-warp) is the IETF's standard format. hang is the practical, open-source format from the moq-dev project. If you control both publisher and subscriber, you can define your own format and ship custom container structures without touching the transport layer beneath.
This separation is why two different organizations can build MoQ clients that interoperate at the transport layer while running entirely different codec strategies.
The Data Model: Tracks → Groups → Objects
MoQ organizes everything into a three-level hierarchy. This structure is what enables both low-latency joins and intelligent congestion handling.
Figure 2: Tracks are named streams. Groups are independently decodable chunks. Subgroups carry priority.
Tracks are named streams: video-1080p, audio-english, captions-fr. Subscribers request specific tracks by name. Publishers announce track namespaces. Relays route SUBSCRIBE messages to the right source.
Groups are independently decodable chunks within a track. For video, a group maps to a GOP — a group of pictures starting with a keyframe. New subscribers can join at any group boundary; no partial decode, no waiting for the next keyframe. For a viewer joining mid-stream, that group boundary is the join point.
Objects are the individual packets on the wire. Each belongs to a track and carries a position within a group. Relays forward objects without parsing the payload.
The practical consequence: your relay infrastructure makes forwarding decisions — drop this, prioritize that — based on object position and group metadata, never by inspecting the media itself. Codec changes don't require relay changes.
How Data Moves: The ANNOUNCE / SUBSCRIBE Flow
Here is what actually happens between publisher and subscriber through a relay chain, step by step.
Figure 3: One upstream subscription feeds every downstream viewer on the same relay.
1. Publisher connects and announces
The publisher connects to its nearest relay and sends an ANNOUNCE message with its track namespace. The relay registers this namespace in a shared control plane. In Cloudflare's deployment, Durable Objects hold that distributed state — a publisher announcing to a relay in London makes the namespace discoverable to a subscriber hitting a relay in Singapore.
2. Subscriber connects and subscribes
The subscriber connects to its local relay and sends a SUBSCRIBE for a specific track name. The relay queries the control plane, finds the source, and forwards the SUBSCRIBE upstream toward the publisher's relay.
3. Path established, objects flow
With the subscription path established, the publisher starts sending objects. They flow publisher → relay A → relay B → subscriber. If a second subscriber on relay B asks for the same track, relay B serves them from the existing upstream subscription — no new connection back to the publisher. This is the fan-out model: one upstream subscription, unlimited downstream viewers through the relay hierarchy.
Newer drafts also introduce a push-based PUBLISH flow: the publisher sends PUBLISH, the relay replies PUBLISH_OK, and objects can start flowing before any subscriber asks for them. That's useful for ingest scenarios where you want media ready at the edge the instant the first viewer connects.
Congestion Handling: Where MoQ Earns Its Keep
Degradation under pressure is where MoQ's transport foundations matter most in production.
Figure 4: Under pressure, higher-numbered subgroups drop first. The base layer keeps playing — no rebuffer.
MoQ uses subgroups — subdivisions within a group that map directly to individual QUIC streams. All objects within a subgroup are delivered in order on that stream. Subgroup numbering encodes priority: lower number means higher priority.
With layered video encoding (SVC), a typical mapping looks like this:
- Subgroup 0 — base layer (the one that keeps you on-air)
- Subgroup 1 — enhancement to 720p
- Subgroup 2 — enhancement to 1080p
When a relay detects congestion, it drops objects from higher-numbered subgroups first. Viewers see reduced quality, not rebuffering. The base layer keeps playing.
For live edge catch-up, the spec supports descending group order: a viewer who falls behind receives the newest group first, potentially skipping intermediate groups to return to live. That's a meaningful UX improvement over HLS, where a buffering viewer has no good options except wait.
One honest note: the IETF working group describes optimal strategies for these features as "an open research question." The mechanisms are well-specified; how to tune them for specific use cases (interactive broadcast, gaming, large events) is still being worked out empirically.
MoQ vs. WebRTC vs. LL-HLS: The Engineering Decision Matrix
The business comparison is in our overview article. This table focuses on the architectural decisions that affect your engineering choices.
| Dimension | MoQ | WebRTC | LL-HLS |
|---|
| Transport | QUIC / WebTransport | SRTP over UDP + ICE | HTTPS / TCP |
| End-to-end latency | Sub-second (~300-500 ms) | Sub-second (100-300 ms) | 3-10 s typical |
| Scale model | Relay fan-out, one upstream sub serves N | SFU cluster, N-squared mesh for peers | CDN HTTP chunking |
| Browser coverage | Chrome / Edge / Opera; Firefox partial; Safari no | Universal | Universal |
| Two-way conversation | Supported, but not the sweet spot | Best-in-class under ~50 participants | Not applicable |
| Spec maturity | IETF draft, 2027 milestone | Mature, standard | Mature, standard |
| Congestion UX | Graceful subgroup drop, keeps base layer | Bandwidth adaptation, SVC optional | Rebuffer or step down bitrate |
| DVR / catch-up | FETCH is spec, not yet in relays | Not in spec | Native HTTP range |
| Protocol count to ship | One (MoQ end-to-end) | One, plus signaling | HLS + LL-HLS + fallback |
| Sweet spot | Sub-second at fan-out scale | Interactive conversation | Broadcast with 3 s+ budget |
The honest read: WebRTC wins for two-way conversational video under ~50 participants. LL-HLS wins where latency above ~3 s is acceptable and CDN cost optimization matters. MoQ wins when you need sub-second latency at fan-out scale with a single architecture.
The Real Engineering Challenges
1. The spec is a moving target
Cloudflare shipped their relay against draft-07, which became the de facto interop target across open-source implementations (moq-dev, Meta's Moxygen, Norsk, Vindral). IETF milestones run into 2027. Draft changes are real and will keep happening.
Mitigation: abstract your MoQ integration behind a thin transport interface. When the spec updates, you swap the transport layer without touching the application logic above it.
2. WebTransport browser support has a gap
Chrome, Edge, and Opera support WebTransport. Firefox is intermittent. Safari does not, as of early 2026. For consumer apps that need universal browser coverage, you need a fallback path — typically WebSocket or WebRTC for Safari clients.
Build the fallback from the start, not as an afterthought. Users do not care which protocol served their stream.
3. FETCH isn't widely available yet
The spec distinguishes SUBSCRIBE (live content, receive objects from now forward) from FETCH (past content, retrieve cached objects). Current relay implementations, including Cloudflare's, implement SUBSCRIBE only. If you're building DVR-style catch-up, on-demand replay, or VOD delivery over MoQ, plan around this constraint. It's on roadmaps, not in production.
4. Distributed relay state management
When a publisher announces a namespace to one relay node, other relay nodes need to discover that to route subscriber requests correctly. This is a distributed state problem. Cloudflare solves it with Durable Objects. If you run your own relay infrastructure, this is the hardest part of the deployment — not the protocol itself.
5. Observability tooling is early
WebRTC ships with years of built-in browser tooling: chrome://webrtc-internals, RTCP stats APIs, RTP packet inspection. MoQ has none of that yet. Plan for significantly more time on logging, tracing, and debugging infrastructure than you would budget for a mature protocol.
What You Can Build on MoQ Today
The Cloudflare relay network is in free tech preview at any scale. The moq-dev library provides Rust and TypeScript implementations. A live demo at moq.dev/publish shows real sub-second browser streaming with no special setup.
Live events with interactive latency at broadcast scale. Sports, auctions, concerts — use cases where WebRTC's SFU costs become prohibitive above a few thousand viewers, but HLS's 3-30 s delay kills the product experience. MoQ's relay fan-out handles this at CDN scale without a private SFU cluster.
Large virtual classrooms. WebRTC's N-squared topology collapses past ~50 simultaneous participants in a mesh. With MoQ, a relay serves hundreds of class members at sub-second latency with the same architecture as a 10-viewer private session.
Real-time data alongside media, in one session. MoQ is generic, not just video. Sports overlays, live captions, trading data, chat — all can run as separate named tracks in the same MoQ session. No separate WebSocket channel alongside your video stream.
Modern ingest pipelines without RTMP. The PUBLISH flow gives you native QUIC ingest from encoder to relay, with no RTMP-to-HLS transcoding gateway in the middle. One protocol, start to finish.
Weighing MoQ against WebRTC or LL-HLS for your project?
We've shipped real-time media for telehealth, concert streaming, and e-learning at tens of thousands of concurrent viewers. We'll tell you straight whether MoQ fits — and if it doesn't, what does.
Chat on WhatsAppEmail the teamOur Experience in Real-Time Media
We've been building real-time media systems for over a decade — HIPAA-compliant telehealth video at 1,500-patient scale, concert streaming platforms with 10,000 simultaneous viewers at under one second end-to-end, and live e-learning that holds up to 2,000 concurrent students through exam peaks without rebuffering.
The pattern across all of them: transport-layer decisions made early determine what's possible years later. Teams that picked WebRTC when they only needed two-party video end up building private CDN infrastructure when they scale. Teams that picked HLS because it was simple end up patching Low-Latency HLS extensions that fight the protocol's design.
MoQ changes the calculus. When we evaluate it for a project, three questions decide fit:
- Does the use case genuinely require sub-second latency and fan-out scale, simultaneously?
- Is browser-native delivery required without a transcoding gateway?
- Can the team accept spec-draft risk with proper abstraction?
If all three are yes, MoQ is the right foundation. If the first answer is no, LL-HLS on a proven CDN is simpler and lower-risk for most broadcast scenarios today.
FAQ
What does MoQ add over raw QUIC for media delivery?
Raw QUIC gives you a fast, reliable, multiplexed transport. MoQ adds the pub/sub session model, named track discovery, relay fan-out, object delivery semantics, and congestion-aware prioritization. Without MoQ you'd build all of that yourself on top of QUIC — which is essentially what early WebRTC SFU builders did on top of UDP.
Can I run my own MoQ relay, or must I use Cloudflare's?
You can run moq-relay from the moq-dev repository — it's an open-source, clusterable Rust binary. The engineering challenge is the distributed state management: relays need a shared control plane to route SUBSCRIBE requests to the right ANNOUNCE source. Cloudflare solves this with Durable Objects. For self-hosted deployments you need to solve that problem yourself, which is non-trivial at scale.
How do I handle the Safari gap in production?
The standard approach is capability detection at connection time: attempt WebTransport, fall back to WebSocket (for data channels) or WebRTC (for low-latency video) if unsupported. Keep media delivery logic behind an abstraction that swaps the transport without changing your application layer.
Does MoQ support end-to-end encryption?
Yes, and it's more cleanly supported than in WebRTC. Because relays forward opaque objects without parsing media payloads, the media layer can be E2E encrypted between publisher and subscriber. Relays access only the object metadata needed for routing and prioritization — they can't read media content.
What's the realistic path from prototype to production?
A local dev setup with moq-dev takes under 40 minutes. A prototype against Cloudflare's relay — browser publisher, browser subscriber, sub-second latency — is achievable in 2+ days. Production deployment requires a WebTransport fallback strategy, observability infrastructure, relay deployment (or Cloudflare integration with proper auth), and spec-tracking to handle draft updates. Plan 4-8 weeks for a production-grade integration depending on your existing media pipeline.
How does MoQ interact with existing RTMP ingest infrastructure?
Short answer: you replace it eventually, not immediately. The moq-cli and moq-mux tools in moq-dev include fMP4/CMAF and HLS muxers that can bridge existing media pipelines into MoQ broadcasts. For teams with mature encoder infrastructure on RTMP today, start by adding MoQ as a parallel output path before cutting over fully.
Which teams should wait for the spec to stabilize?
Teams building consumer-facing products where a breaking transport change would require a coordinated multi-team release. Teams with strict enterprise change-management processes. Teams whose primary latency constraint is already satisfied by LL-HLS. If you're building infrastructure for others to build on, waiting for draft finalization reduces downstream churn.
Next Steps
If you're at the evaluation stage and trying to decide whether MoQ belongs in your architecture, the business case article covers the metrics and decision criteria. If you've got a specific architecture question, a prototype you'd like a second set of eyes on, or you're weighing MoQ against WebRTC or LL-HLS for a real project — we're glad to think through it with you.
Get a quick initial quote for your MoQ build
Use the calculator for a ballpark, or tell us your idea directly. We reply fast and give straight feedback.
Get an instant quoteSend an emailWhat to Read Next
Comments