Blog: Building Applications with Media over QUIC: Architecture, Challenges, and Solutions

If you've already read our overview of QUIC and MoQ for product teams, you know the business case. Sub-second latency at broadcast scale, a single protocol replacing the RTMP/HLS/WebRTC patchwork, Cloudflare's global relay network live since August 2025.

This article is for devs who want to build on it. We'll cover the full protocol stack, how data actually moves from publisher to subscriber through relay infrastructure, where congestion handling happens and why it matters, and the real challenges you'll hit before you ship , including the ones the spec doesn't warn you about.

Key Takeaways

  • MoQ is three layers, not one. Transport (QUIC/WebTransport), pub/sub signaling (MoQT), and streaming format (WARP/hang/custom) are independent. Changes at one layer don't ripple through the others.
  • The data model (tracks, groups, objects) is what enables both fast joins and graceful congestion degradation. Subscribers join at group boundaries. Relays make priority decisions without parsing media payloads.
  • Fan-out scale comes from the relay model. One upstream subscription serves unlimited downstream viewers. No SFU cluster required.
  • The spec is draft. Cloudflare deployed against draft-07. IETF milestones run through 2027. Abstract your transport integration.
  • FETCH (catch-up/VOD) isn't in current relay implementations. SUBSCRIBE (live) is. Plan accordingly.
  • Safari doesn't support WebTransport. Build your fallback from the start, not after launch.
  • Observability tooling is immature. Budget more time on logging and tracing than you'd spend on WebRTC or HLS.

What the Protocol Stack Actually Looks Like

MoQ is three distinct layers, each with a specific job. Getting this mental model right early saves a lot of architecture confusion later.

Layer 1 – Transport: QUIC and WebTransport

QUIC (RFC 9000) runs over UDP. The property that matters most for media is independent streams: a dropped packet on one stream, say, an audio track, has no effect on a parallel video stream. On TCP, that same packet drop blocks every byte behind it on the entire connection. That's head-of-line blocking, and it's why RTMP streams stutter when your encoder hiccups.

QUIC also handles: 0-RTT connection resumption (returning viewers start instantly), seamless migration when a device switches from Wi-Fi to cellular mid-stream, and mandatory TLS 1.3 encryption on every connection – no opt-in required.

For browser clients, MoQ uses WebTransport – the W3C API that exposes QUIC's multiplexed streams and unreliable datagrams directly to JavaScript. This is what RTMP and SRT never had. The same protocol handles encoder ingest and browser viewer delivery, end to end, no transcoding gateway in the middle.

Layer 2 – MoQT: The Pub/Sub Signaling Layer

MoQT defines the session handshake (SETUP), control messages (ANNOUNCE, SUBSCRIBE, PUBLISH), and the data hierarchy. It is intentionally media-agnostic. Relays speak MoQT. They do not speak H.264 or Opus or anything codec-specific.

This is the key architectural decision the IETF made: Rule 1 of the moq-dev reference implementation puts it plainly – "The CDN MUST NOT know anything about your application, media codecs, or even the available tracks." The relay is dumb by design. That's what makes E2E encryption possible at the media layer.

Layer 3 – Streaming Format: WARP, hang, or Custom

Media-specific logic: codec negotiation, manifest structure, containers, catalog, lives here. WARP (draft-ietf-moq-warp) is the IETF's standard format. hang is the practical open-source format from the moq-dev project. If you control both publisher and subscriber, you can define your own format and ship custom container structures without touching the transport layer beneath.

This separation is why two different organizations can build MoQ clients that interoperate at the transport layer while running entirely different codec strategies.

The Data Model: Tracks, Groups, Objects

MoQ organizes everything in a three-level hierarchy. This structure is what enables both low-latency joins and intelligent congestion handling.

Tracks are named streams: "video-1080p", "audio-english", "captions-fr". Subscribers request specific tracks by name. Publishers announce track namespaces. Relays route SUBSCRIBE messages to the right source.

Groups are independently decodable chunks within a track. For video, a group maps to a GOP (Group of Pictures) starting with a keyframe. New subscribers can join at any group boundary – no partial decode, no waiting for the next keyframe. For a viewer joining mid-stream, this is the join point.

Objects are the individual packets on the wire. Each belongs to a track and carries a position within a group. Relays forward objects without parsing their payload.

The practical consequence: your relay infrastructure makes forwarding decisions (drop this, prioritize that) based on object position and group metadata, not by inspecting the media itself. Codec changes don't require relay changes.

How Data Actually Moves: The ANNOUNCE/SUBSCRIBE Flow

Here's what happens from publisher to subscriber through a relay chain, step by step.

1. Publisher connects and announces

The publisher connects to its nearest relay and sends an ANNOUNCE message with its track namespace. The relay registers this namespace in a shared control plane. In Cloudflare's deployment, Durable Objects handle this distributed state: a publisher announcing to a relay in London makes that namespace discoverable to a subscriber connecting through a relay in Singapore.

2. Subscriber connects and subscribes

The subscriber connects to its local relay and sends a SUBSCRIBE message for a specific track name. The relay queries the control plane, finds the source, and forwards the SUBSCRIBE upstream toward the publisher's relay.

3. Path established, objects flow

With the subscription path established, the publisher starts sending objects. They flow publisher → relay A → relay B → subscriber. If a second subscriber on relay B requests the same track, relay B serves them from the existing upstream subscription. No new connection to the publisher required. This is the fan-out model – one upstream subscription serves unlimited downstream viewers through the relay hierarchy.

There's also a PUBLISH flow in newer spec drafts: a push-based model where the publisher sends a PUBLISH message and the relay's PUBLISH_OK confirms it will accept objects. This is useful for ingest scenarios: media is available at the relay the instant the first subscriber connects, without waiting for a SUBSCRIBE request to establish the path first.

Congestion Handling: Where MoQ Earns Its Keep

The degradation-under-pressure story is where MoQ's transport foundations matter most in production.

MoQ uses subgroups – subdivisions within a group that map directly to individual QUIC streams. All objects within a subgroup are delivered in order on that stream. Subgroup numbering encodes priority: lower number means higher priority.

With layered video encoding (SVC):

Subgroup 0: Base layer (360p)     → must deliver
Subgroup 1: Enhancement to 720p  → deliver if bandwidth allows  
Subgroup 2: Enhancement to 1080p → first to drop under congestion

When a relay detects congestion, it drops objects from higher-numbered subgroups first. Viewers see reduced quality, not rebuffering. The base layer keeps playing.

For live edge catch-up, the spec supports descending group order: a viewer who falls behind receives the newest group first, potentially skipping intermediate groups to return to live. This is a meaningful UX improvement over HLS, where a buffering viewer has no good options except wait.

One honest note: the IETF working group describes optimal strategies for these features as "an open research question." The mechanisms are well-specified. How to tune them for specific use cases (interactive broadcast, gaming, large-scale events) is still being worked out empirically.

MoQ vs. WebRTC vs. HLS: The Technical Decision Matrix

The business comparison is in our overview article. This table focuses on the architectural decisions that affect your engineering choices.

The honest read: WebRTC wins for two-way conversational video under ~50 participants. LL-HLS wins where latency above ~3s is acceptable and CDN cost optimization matters. MoQ wins when you need sub-second latency at fan-out scale with a single architecture.

The Real Engineering Challenges

1. The spec is a moving target

Cloudflare shipped their relay against draft-07, which became the de facto interoperability target across open-source implementations (moq-dev, Meta's Moxygen, Norsk, Vindral). The IETF milestones run through 2027. Draft changes are real.

Mitigation: Abstract your MoQ integration behind a thin transport interface. When the spec updates, you swap the transport layer without touching application logic above it.

2. WebTransport browser support has a gap

Chrome, Edge, and Opera support WebTransport. Firefox is intermittent. Safari does not, as of early 2026. For consumer applications that need universal browser coverage, you need a fallback path, typically WebSocket or WebRTC for Safari clients.

Build the fallback from the start, not as an afterthought. Users don't care which protocol served their stream.

3. FETCH isn't widely available yet

The spec distinguishes SUBSCRIBE (live content → receive objects from now forward) from FETCH (past content → retrieve cached objects). Current relay implementations, including Cloudflare's, implement SUBSCRIBE only. If you're building DVR-style catch-up, on-demand replay, or VOD delivery over MoQ, plan around this constraint. It's on roadmaps, not in production.

4. Distributed relay state management

When a publisher announces a namespace to one relay node, other relay nodes need to discover that to route subscriber requests correctly. This is a distributed state problem. Cloudflare solves it with Durable Objects. If you run your own relay infrastructure, this is the hardest part of the deployment, not the protocol itself.

5. Observability tooling is early

WebRTC ships with years of built-in browser tooling: chrome://webrtc-internals, RTCP stats APIs, RTP packet inspection. MoQ has none of this yet. Plan for significantly more time on logging, tracing, and debugging infrastructure than you'd budget for a mature protocol.

What You Can Build on MoQ Today

The Cloudflare relay network is in free tech preview at any scale. The moq-dev library provides Rust and TypeScript implementations. A live demo at https://moq.dev/publish/ shows real sub-second browser streaming with no special setup.

Live events with interactive latency at broadcast scale. Sports, auctions, concerts – use cases where WebRTC's SFU costs become prohibitive above a few thousand viewers, but HLS's 3-30s delay kills the product experience. MoQ's relay fan-out handles this at CDN scale without a private SFU cluster.

Large virtual classrooms. WebRTC's N-squared topology collapses past ~50 simultaneous participants in a mesh. With MoQ, a relay serves hundreds of class members at sub-second latency with the same architecture as a 10-viewer private session.

Real-time data alongside media, in one session. MoQ is generic, not just video. Sports overlays, live captions, trading data, chat can all run as separate named tracks in the same MoQ session. No separate WebSocket channel alongside your video stream.

Modern ingest pipelines without RTMP. The PUBLISH flow gives you native QUIC ingest from encoder to relay, no RTMP-to-HLS transcoding gateway in the middle. One protocol, start to finish.

Our Expertise in This Space

We've been building real-time media systems for over a decade – across telehealth platforms handling HIPAA-compliant video at 1,500 patient scale, concert streaming platforms with 10,000 simultaneous viewers at under one second of end-to-end delay, and live e-learning systems that scale to 2,000 concurrent students without rebuffering during exam peaks.

The pattern across all of them: transport layer decisions made early determine what's possible years later. Teams that picked WebRTC when they only needed two-party video found themselves building private CDN infrastructure when they scaled. Teams that picked HLS because it was simple found themselves patching Low-Latency HLS extensions that fight against the protocol's design.

MoQ changes the calculus. When we evaluate it for a project, three questions determine fit:

  • Does the use case genuinely require sub-second latency at fan-out scale simultaneously?
  • Is browser-native delivery required without a transcoding gateway?
  • Can the team accept spec-draft risk with proper abstraction?

If all three are yes, MoQ is the right foundation. If the first answer is no, LL-HLS on a proven CDN is simpler and lower-risk for most broadcast scenarios today.

FAQ

What does MoQ add over raw QUIC for media delivery? 

Raw QUIC gives you a fast, reliable, multiplexed transport. MoQ adds the pub/sub session model, named track discovery, relay fan-out, object delivery semantics, and congestion-aware prioritization. Without MoQ, you'd build all of that yourself on top of QUIC, which is essentially what early WebRTC SFU builders did on top of UDP.

Can I run my own MoQ relay, or must I use Cloudflare's? 

You can run moq-relay from the moq-dev repository – it's an open-source, clusterable Rust binary. The engineering challenge is the distributed state management: relays need a shared control plane to route SUBSCRIBE requests to the right ANNOUNCE source. Cloudflare solves this with Durable Objects. For self-hosted deployments, you need to solve this problem yourself, which is non-trivial at scale.

How do I handle the Safari gap in production? 

The standard approach is capability detection at connection time: attempt WebTransport, fall back to WebSocket (for data channels) or WebRTC (for low-latency video) if unsupported. Keep media delivery logic behind an abstraction that swaps the transport without changing your application layer.

Does MoQ support end-to-end encryption? 

Yes, and it's more cleanly supported than in WebRTC. Because relays forward opaque objects without parsing media payloads, the media layer can be E2E encrypted between publisher and subscriber. Relays access only the object metadata needed for routing and prioritization decisions, they can't read media content.

What's the realistic path from prototype to production? 

A local dev setup with moq-dev takes under 40 minutes. A prototype against Cloudflare's relay (browser publisher, browser subscriber, sub-second latency) is achievable in 2+ days. Production deployment requires: WebTransport fallback strategy, observability infrastructure, relay deployment or Cloudflare integration with proper auth, and spec-tracking to handle draft updates. Plan 4-8 weeks for a production-grade integration depending on your existing media pipeline.

How does MoQ interact with existing RTMP ingest infrastructure? 

Short answer: you replace it eventually, not immediately. The moq-cli and moq-mux tools in the moq-dev repo include fMP4/CMAF and HLS muxers that can bridge existing media pipelines into MoQ broadcasts. For teams with mature encoder infrastructure using RTMP today, start by adding MoQ as a parallel output path before cutting over fully.

Which teams should wait for the spec to stabilize? 

Teams building consumer-facing products where a breaking transport change would require a coordinated multi-team release. Teams with strict enterprise change management processes. Teams whose primary latency constraint is satisfied by LL-HLS today. If you're building infrastructure for others to build on, waiting for draft finalization reduces downstream churn.

Next Steps

If you're at the evaluation stage, deciding whether MoQ belongs in your architecture, the business case article covers the metrics and decision criteria. If you've got a specific architecture question, a prototype you want a second set of eyes on, or you're weighing MoQ against WebRTC or LL-HLS for a real project, we're glad to think through it with you.

Ready to Start Your Project?

Tell us your idea via WhatsApp or email. We reply fast and give straight feedback.

💬 Chat on WhatsApp ✉️ Send Email

Or use the calculator for a quick initial quote.

📊 Get Instant Quote
  • Technologies
    Services
    Processes
    Development