Why This Matters

For two decades, low-latency live and CDN-scale broadcast lived in two separate worlds. WebRTC delivered roughly 300 ms glass-to-glass, but every viewer cost a session on a Selective Forwarding Unit, the relays were stateful, and the CDN was a separate cost line. HLS and DASH delivered to billions of viewers cheaply, but the latency floor — even with LL-HLS or chunked CMAF — was 1.5 to 3 seconds in production. Engineers writing live sports apps, betting platforms, auctions, second-screen experiences, live commerce, and interactive entertainment all hit the same wall: pick latency or scale.

MoQ is the most serious attempt yet to delete that trade-off. It does so by stealing the publish/subscribe model from WebRTC SFUs, putting it directly on top of QUIC (the same transport HTTP/3 uses), and then letting any cache-friendly intermediary along the path act as a relay. The relays are stateless about who is watching but stateful about what is being delivered — which is exactly the inversion that lets a CDN edge fan out to millions of viewers without paying per-viewer signalling cost. The 2026 deployments aren't theoretical: Cloudflare's first MoQ CDN ships open-source moq-rs relays on every server in 330+ datacenters; Bitmovin's player feeds from that network in a published demonstration; Norsk and Wowza ship origin encoders that publish CMAF chunks straight into MoQ tracks.

This article is the canonical reference for the engineers, founders, and product leads who keep hearing "MoQ" in 2026 pitches and need the precise picture: what MoQ actually is on the wire, why the data model looks the way it does, what's standardised vs still moving, who has shipped what, and the brutal honest question of which production workloads should adopt it today versus wait. We will cite draft-ietf-moq-transport-17 and the May 2026 revision by section number, name every vendor's exact 2026 capability, and finish with a decision framework that does not assume MoQ is right for everyone.

What MoQ Actually Is — In One Page

MoQ is an application-layer publish/subscribe protocol for media that runs over QUIC (RFC 9000) or WebTransport (a W3C / IETF stack that exposes QUIC streams to browsers through an HTTP/3-style upgrade). The protocol is being standardised at the IETF in the Media over QUIC working group, whose primary document is draft-ietf-moq-transport. The working group also produces a small family of companion drafts: draft-ietf-moq-msf (MOQT Streaming Format — the catalog and timing layer), draft-ietf-moq-cmsf (CMSF — how to put CMAF-packaged media into MSF), and draft-ietf-moq-secure-objects (end-to-end object encryption). The current published moq-transport revision is -17 (2 March 2026), with revision -latest published 1 May 2026 and intended-status Standards Track. None of the documents are yet an RFC; they will become one when the working group completes Working Group Last Call, IESG review, and AUTH48.

Mechanically, MoQ borrows two ideas from the streaming world and one from networking. From WebRTC it borrows the publish/subscribe model: a publisher (an encoder, an origin packager, a browser running getUserMedia) announces a track by name, and subscribers (viewers, downstream relays) request that track and receive its objects. From HLS and DASH it borrows the idea of independently decodable chunks of media addressed by structure rather than by byte offset — except that in MoQ those chunks are first-class protocol objects, not files in a manifest. From QUIC it borrows the idea that a single connection carries many independent streams with their own flow control and head-of-line independence, plus datagrams for messages that should not be retransmitted at all.

The hierarchy that organises everything is exact and worth memorising: a track is a named stream of media (think "1080p video of channel 47", or "audio for caller Bob"); a group is the largest unit of media a subscriber can independently join, typically corresponding to a closed GOP or a CMAF segment (think "2 seconds of GOP starting at PTS 41200"); a subgroup is a subdivision of a group that maps directly onto one underlying QUIC stream, carrying objects with shared priority and ordering (think "the layer-1 base-layer slice of this GOP"); and an object is one indivisible piece of media payload — a CMAF chunk, an audio frame, a metadata blob (think "frame 7 of the GOP, an inter-coded P-frame, 1.4 KB").

A new subscriber joins at the latest group boundary, fetches that group's objects in order through one or more subgroup QUIC streams, and immediately starts decoding — because the group is independently decodable by construction. There is no manifest fetch, no segment list, no DNS round-trip to a different host; the same QUIC connection that carries the live edge carries the join. That is the architectural difference that lets MoQ aim at sub-500 ms latency over CDN-scaled relay networks: the cost of joining is one round-trip, not a chain of fetches.

The wire transport is QUIC. The application stack above it is intentionally thin — no JSON manifests, no XML, no DRM-specific framing in the base protocol. Everything specific to "how do you actually carry CMAF over MoQ" lives in the streaming format layer (MSF + CMSF). Everything specific to encryption beyond the QUIC channel lives in draft-ietf-moq-secure-objects. Everything specific to multi-relay topology lives in informational companion drafts. The base moq-transport document is deliberately one transport, one publish/subscribe model, one data hierarchy.

A four-layer architecture diagram showing the MoQ stack from bottom to top. Bottom layer: QUIC transport (RFC 9000) drawn as a wide horizontal blue bar with three labelled features inside —

Figure 1. The MoQ stack from QUIC up to application. The base transport is one document; the streaming format layer and end-to-end secure objects sit alongside it as separate companion drafts. Together they describe how CMAF live edge becomes MoQ on the wire.

The Data Model — Tracks, Groups, Subgroups, Objects

This is the part of MoQ that engineers who have lived inside RTP or CMAF often need to read twice. The model is small but unfamiliar, so we will define each term precisely and then walk a single live sample through it.

A track is a named, ordered sequence of media. In MoQ-land a name is not a file path; it is a structured pair of a namespace (a tuple of byte strings, often interpreted as a hierarchical identifier the way a DNS name is) and a track name (one byte string). For a sports broadcaster the namespace might be ("acme-sports", "live", "match-2026-05-21-arsenal-spurs") and the track names might be video-1080p, video-720p, audio-en, audio-es, captions-en. The publisher announces these names through an ANNOUNCE control message; the subscriber requests them through a SUBSCRIBE (or in the newer push flow, the publisher initiates with PUBLISH and the relay responds PUBLISH_OK).

A group is the smallest unit a subscriber can independently start at. Internally a group is a sequence of objects with strictly increasing object IDs, and the group as a whole is independently decodable from the start — which for video means it begins with a key frame, for audio means it begins on a configuration-stable frame, and for metadata means it is application-defined. In a CMAF-over-MoQ workflow (the CMSF profile), every CMAF segment maps to exactly one group. Group boundaries are the join points: a player switching variants, a viewer joining the stream, a relay re-establishing a connection — they all land on the next group boundary.

A subgroup is a subdivision of a group. The defining property of a subgroup is that all the objects in it share one underlying QUIC stream — which means they share QUIC's in-order delivery, retransmission, and priority semantics. Layered video codecs use subgroups to separate base-layer and enhancement-layer slices; many implementations use them to separate per-frame data from per-frame metadata; the MOQtail academic prototype (ACM MMSys 2026) tested subgroup splits as a way to drop enhancement layers under congestion without dropping base layers. A simple H.264 stream with no SVC typically puts every group into one subgroup; a complex SVC stream uses multiple.

An object is one indivisible piece of payload. The protocol does not interpret object payload at all — that is the job of the streaming-format layer. From moq-transport's perspective an object is "a chunk of bytes with an Object ID, optional priority, optional extensions". In the CMSF profile, each object's payload contains "at least one Movie Fragment Box (moof) followed by a Media Data Box (mdat)" — i.e., one CMAF chunk. In a raw-RTP-style profile (under discussion in the working group but not the production path), each object could carry a single H.264 NAL or a single Opus packet.

To make the model concrete, here is one live 50 fps 1080p stream walked through MoQ end to end:

  • The encoder produces a 2-second closed GOP, so every 100 frames a new group begins. The publisher gives the track namespace ("acme-sports","live","match-2026-05-21") and track name video-1080p. It announces and starts publishing.
  • For each GOP the publisher opens a new group. The group ID is monotonically increasing — group 0, group 1, group 2 — and conventionally tied to the GOP's wallclock time.
  • Inside group 7 (say, the GOP starting at presentation timestamp 14 seconds into the show) the publisher opens one subgroup, which under the hood is one QUIC stream. The CMAF packager hands the publisher 100 chunks: chunk 0 is a key-frame chunk (IDR + moof+mdat), chunks 1 through 99 are P/B-frame chunks. The publisher writes each chunk as one MoQ object with object IDs 0 through 99 onto that QUIC stream.
  • A relay subscribed to that track receives the objects as they arrive, caches each one in memory (and optionally on disk for VOD-style replay), and forwards them onto every subscriber QUIC stream that has asked for this group. Because every subgroup's objects share one QUIC stream, in-order delivery is automatic for the subscriber; because different subgroups (e.g., audio vs video) ride different QUIC streams, head-of-line blocking across modalities is gone.
  • A new viewer joining at presentation timestamp 14.7 seconds subscribes to the track with a "latest group" filter. The relay starts them at group 7, object 0 — they get the IDR and start decoding immediately. The cost is one RTT to subscribe, plus the time for object 0 to traverse the relay-to-viewer leg.

That walkthrough is the entire foundational rhythm of MoQ. Everything else — the priority knobs, the partial-reliability options, the SUBSCRIBE_UPDATE messages that change the live edge, the FETCH messages for catalogue-style retrieval — is policy on top of "publish a track, organise media into groups, address objects, let relays cache them".

A horizontal diagram of the MoQ data model. Top row: one labelled

Figure 2. The MoQ data model. A track is the named stream a subscriber asks for; a group is what they can independently start at; a subgroup maps to one QUIC stream; an object is one chunk of media bytes. CMAF chunks become objects; GOP boundaries become group boundaries.

Streams, Datagrams, And Why MoQ Picked QUIC

A reader familiar with HLS or DASH has every right to ask: why QUIC? CMAF over HTTP/1.1 with chunked transfer encoding already gets close to one-second latency. The answer is in the three QUIC features that no TCP-based protocol can match.

First, independent streams without head-of-line blocking. A QUIC connection carries up to billions of bidirectional and unidirectional streams concurrently. Loss on one stream blocks only that stream; the other streams keep flowing. For media this is exactly the property an SFU-style fan-out wants: a slow viewer who falls behind on the high-bitrate video track does not block the audio track, the captions, or any other viewer's streams. The MoQ design uses one subgroup per QUIC stream precisely so that the cancel-and-priority semantics of QUIC streams apply directly to media units that share a fate.

Second, datagrams. QUIC supports an unreliable datagram extension (RFC 9221) that delivers application messages on the same connection as streams but skips retransmission. MoQ's "datagram forwarding preference" tells the transport "this object is too time-sensitive to wait for a retransmit — if the network drops it, drop it from the application too". The classic use case is audio at very low latency, where a 20 ms lost packet is better dropped than recovered with a 200 ms retransmit; the working group also discusses datagrams for telemetry, signalling, and metadata that has its own application-level reliability.

Third, 0-RTT and 1-RTT connection establishment. QUIC's cryptographic handshake folds TLS into the transport handshake — the first RTT establishes a connection and completes TLS 1.3 in parallel, and a returning client can send 0-RTT data on the first packet. A new subscriber joining a MoQ stream pays roughly half the connection-setup cost of an equivalent HLS-over-HTTPS join over TCP+TLS.

There is also a fourth, less-discussed reason: the transport is the same one HTTP/3 uses. That means a MoQ deployment shares its connection-level investments — congestion control, ECN, BBR experiments, anti-amplification protections — with the entire HTTP/3 ecosystem. Cloudflare's moq-rs is built directly on the same quiche and tokio-quiche QUIC stack that serves Cloudflare's HTTP/3 traffic. That code path has had years of production hardening across petabytes of traffic.

The trade-off — and it is honest to name it — is that QUIC requires UDP, and a meaningful fraction of corporate firewalls still block UDP on ports other than 53 and 123. The MoQ working group's pragmatic answer is WebTransport: a W3C / IETF specification that lets a browser open a QUIC connection through an HTTP/3 upgrade, which itself can fall back to HTTP/2 over TCP when UDP is blocked. WebTransport-based MoQ trades some performance for reach. Browser-facing deployments in 2026 typically support both QUIC-direct and WebTransport paths and let the client pick.

The Control Plane — SUBSCRIBE, ANNOUNCE, PUBLISH, FETCH

Above the data plane sits a small set of bidirectional control messages on a dedicated QUIC stream, exchanged between every adjacent pair of MoQ endpoints (publisher↔relay, relay↔relay, relay↔subscriber).

The original "pull" model is subscribe-driven: a subscriber sends SUBSCRIBE with a namespace + track name + a filter (latest group, specific group range, absolute start), and the publisher responds with SUBSCRIBE_OK plus a stream of media objects. SUBSCRIBE_UPDATE lets the subscriber narrow the window; UNSUBSCRIBE tears the subscription down. Relays maintain a subscription forest: each downstream subscriber's subscription rolls up into the relay's single upstream subscription to the publisher, which is the mechanism behind subscribe-as-aggregation.

The newer "push" model added in the recent draft revisions is publish-driven: a publisher sends PUBLISH to the relay before any subscriber has asked, the relay responds PUBLISH_OK, and objects begin flowing into the relay's cache so that the first subscriber arrives to a warm cache. This is the model that production deployments — especially live event broadcasting — tend to prefer, because it makes the relay pull data without waiting for the first viewer to arrive.

ANNOUNCE lets a publisher tell a relay "I own this namespace; expect track names under it" without yet publishing any data; the relay can then advertise the namespace to upstream relays so subscriptions can be routed. FETCH is a recent addition for catalogue-style retrieval — give me the past N groups of this track — and is what makes VOD-style replay over MoQ possible without re-architecting around HLS-style manifests. GOAWAY lets a relay or publisher gracefully shed connections.

The control plane is intentionally small and intentionally on its own QUIC stream so that control messages cannot block media objects (and vice versa). The MoQ working group has spent multiple draft revisions tightening which messages are mandatory vs optional, and how relays should respond when a subscriber asks for a namespace the relay does not own — convergence on these questions is the main reason the draft has gone through 17+ revisions.

Where The 500 ms Latency Number Comes From

The "sub-500 ms broadcast at million-viewer scale" headline is not marketing; it is a budget you can write out. Take a live 50 fps stream with a 2-second GOP, encoded with one CMAF chunk per frame, published into a relay tree that is two hops deep on the way to the viewer. The latency budget, with numbers, looks like this:

  • Encoder. One frame arrives every 20 ms. The CMAF packager writes the chunk (moof+mdat) as soon as the encoder emits the slice. Modern hardware encoders close a chunk in 10 to 30 ms after the frame finishes encoding. Total: 30 to 50 ms from frame capture to chunk closure.
  • Publish to ingest relay. The publisher writes the chunk as one MoQ object on the live subgroup's QUIC stream. The ingest relay receives it within one RTT plus the object's wire time. On a fibre uplink that is 5 to 20 ms.
  • Relay to relay. A two-hop relay tree adds one more inter-relay leg, typically 5 to 50 ms over a backbone path. Cloudflare's deployment in 330+ cities puts most viewers within one or two hops of the encoder.
  • Relay to viewer. The edge relay forwards the object onto every subscriber stream that wants this group. Edge-to-viewer is 10 to 100 ms depending on access network.
  • Player jitter buffer. The player holds a small jitter buffer so that one delayed object does not cause a stall. MoQ players running aggressively low ship with 100 to 250 ms buffers — much smaller than the 1 to 3 second HLS-equivalent because MoQ's per-stream priority + datagram drop means stall risk is structurally lower.

Summing the budget on a same-region path: 30 + 10 + 20 + 30 + 150 = roughly 240 ms. Across regions with a transatlantic hop somewhere in the relay tree: add roughly 100 ms per transatlantic RTT, so 340 to 440 ms. The "sub-500 ms" figure is achievable on real production deployments today; the "400 ms" reported on the Cloudflare + Bitmovin NAB demo is consistent with that arithmetic.

Compare with WebRTC delivery on the same content. WebRTC achieves about 300 ms glass-to-glass on a clean same-region path, but every viewer needs an SFU session — the cost is one RTT per viewer on the signalling side, plus the SFU's per-viewer state. MoQ achieves a comparable number with the relays being stateless about viewers. Compare with LL-HLS: the LL-HLS spec floor is 2 seconds for the join (one part-fetch + one partial-segment + the player's 3-part buffer); the in-production median is 3 to 5 seconds. MoQ is mechanically lower because the join cost is one round-trip to the nearest relay rather than a manifest fetch + a segment-aligned wait.

A horizontal stacked bar chart comparing glass-to-glass latency budgets across four delivery models for the same 1080p 50 fps stream on a clean same-region path. Each bar is composed of coloured segments labelled encoder, publish, network, jitter buffer, with the total annotated at the right end. Top bar labelled

Figure 3. The four sub-second-to-ten-second delivery models on a single latency budget chart. MoQ earns its 400 ms slot through one-RTT subscribes and small-jitter buffer playback; the relay model means that latency is preserved while scaling to million-viewer fan-out.

The Relay — Where MoQ's Architecture Beats Both WebRTC And HLS

The relay is the single design choice that justifies the protocol's existence. Stripping the abstraction, a MoQ relay is a process that:

  1. Accepts QUIC connections from publishers and downstream subscribers.
  2. Maintains a set of subscriptions — both its own subscriptions to upstream relays or publishers, and the subscriptions of its own downstream subscribers.
  3. Aggregates: when N downstream subscribers all want the same track, the relay maintains exactly one upstream subscription to the publisher and fans the inbound objects out to all N downstreams.
  4. Caches: when an object arrives, the relay holds it in memory (and optionally on disk) so that a new subscriber that joins one or two RTTs later gets a warm hit instead of waiting for the next object from upstream.
  5. Re-prioritises: the relay re-encodes the upstream object's priority onto the downstream QUIC stream so that local congestion control decisions reflect each downstream's needs.

Read those five properties carefully. Properties 1 through 3 are what an HLS CDN edge does, mechanically. Properties 4 through 5 are what a WebRTC SFU does. MoQ is the first protocol whose relay does both jobs in one process. That is the engineering claim that gets MoQ taken seriously: a single uniform middlebox handles both the "fan out cheaply to ten million viewers" CDN job and the "preserve sub-second latency with priority and partial reliability" SFU job.

In Cloudflare's deployment the relay is moq-rs, the same open-source Rust implementation Cloudflare ships under the MIT/Apache-2.0 license. The relays run on every server in every Cloudflare datacenter — the same machines that serve HTTP/3 and Cloudflare Workers — and use Durable Objects to coordinate cross-region state (a publisher announcing in London makes the namespace discoverable to a subscriber hitting the relay in Singapore). The first MoQ CDN, announced by Cloudflare in mid-2025 and ramped through 2025 and early 2026, runs across 330+ cities. Norsk's NAB 2026 demo published into that relay network and served sub-second latency to viewers in multiple regions through it.

Other production relay implementations exist. Red5 ships a managed MoQ relay through its CacheFly partnership (announced February 2026). Wowza demonstrated a Wowza Streaming Engine origin publishing CMAF chunks via CMSF directly into a relay-acting-as-CDN at NAB 2026, with the relay being either Wowza's own or a partner's. Multiple academic prototypes (MOQtail, the DiVA evaluation paper) implement relays in Rust and Go for benchmarking. The implementation cost of a basic relay is not large — moq-rs's relay binary is a few thousand lines of Rust on top of quiche. The complexity is in the operational hardening: congestion control across hundreds of downstream subscribers, fair-queueing under congestion, denial-of-service resistance (which has its own dedicated working-group draft, draft-englishm-moq-relay-dos).

A topology diagram with three columns showing how the same media goes through three different distribution models. Left column: WebRTC fan-out — one Publisher icon at top connects to one SFU box in the middle, which connects to five Viewer icons at the bottom. Each Publisher↔SFU and SFU↔Viewer line is labelled

Figure 4. WebRTC fan-out, HLS fan-out, and MoQ fan-out on the same content. MoQ's relay is the first middlebox shape that combines WebRTC's priority + low-latency behaviour with HLS's stateless-cache-per-viewer scale.

Common Mistake — "MoQ Replaces WebRTC"

A surprisingly large fraction of 2026 vendor talk implies MoQ will replace WebRTC. It will not, and the working group itself does not claim that. The two protocols solve different problems and excel at different things.

WebRTC is designed for bidirectional, peer-driven, browser-native, sub-300-ms two-way media where every participant is both a sender and a receiver. Video calling, conferencing, voice chat, peer-to-peer screen sharing — these are WebRTC's territory and will remain so, because WebRTC's signalling, NAT traversal, echo cancellation, jitter buffer, and bandwidth estimation are all tuned for a small number of peers exchanging both directions of media in real time. MoQ has no bidirectional peer model: it is publish-and-subscribe, one direction per track, designed for one-to-many or few-to-many distribution.

MoQ is designed for one-to-many or one-to-very-many sub-second broadcast where the senders are a small number of encoders and the receivers are thousands to millions of viewers. Live sports, live betting, live commerce, live music, interactive game streams, second-screen experiences — these are MoQ's territory, the cases where WebRTC's per-viewer SFU session was always the wrong shape and HLS's segment-aligned latency was always too slow.

The most honest framing of the relationship in 2026: MoQ is a complement to WebRTC, not a replacement. Many real production stacks will use both — WebRTC for the contribution leg (the streamer's webcam into the platform) and MoQ for the distribution leg (the platform's encoded output out to the audience). Several NAB 2026 demonstrations explicitly showed this hybrid: a WebRTC publisher feeding a MoQ relay tree, with the relay performing the protocol translation at the edge.

What's Actually Standardised, And What's Still Moving

This is the question every CTO asks first and the question vendor pitches answer evasively. Here is the precise May 2026 picture, by document.

draft-ietf-moq-transport. The base transport. Revision -17 published 2 March 2026; revision -latest published 1 May 2026, intended status Standards Track. The data model (tracks, groups, subgroups, objects), the control messages (SUBSCRIBE, ANNOUNCE, PUBLISH, FETCH, UNSUBSCRIBE, GOAWAY), and the QUIC/WebTransport binding are all converged. The working group is debating residual issues — exact format of object metadata extensions, the boundaries between SUBSCRIBE and PUBLISH flows, how to express priority across relays. Working Group Last Call has not yet been called; once it is, RFC publication is six to twelve months out. Treat the wire format as stable enough to build against; treat the absolute field numbers as still potentially changing.

draft-ietf-moq-msf. The streaming-format layer. Revision -00 published in early 2026, the first MSF working-group draft. Defines the catalog (how a publisher advertises track metadata), the timeline (how groups relate to wallclock), and ABR switching at the streaming-format level. Still early; expect significant evolution before RFC.

draft-ietf-moq-cmsf (and the earlier draft-wilaw-moq-cmsf). The CMAF-over-MSF binding. Defines exactly how a CMAF init segment becomes the catalog's initData, how a CMAF fragment (moof+mdat) becomes a single MoQ object, and how segment / GOP boundaries map to group boundaries. This is the document a CMAF-shop's engineers will read to understand "how do I convert my existing pipeline to MoQ". The LOCMAF effort (Low Overhead CMAF for MOQ, hosted at locmaf.dev) is a related industry initiative that overlaps with CMSF on the same questions.

draft-ietf-moq-secure-objects. Revision -00, working group draft. End-to-end encryption of objects independent of the QUIC channel encryption — so that a relay can forward objects it cannot decrypt. This matters for use cases where the relay operator is not trusted with the plaintext (e.g., when the relay is a third-party CDN edge but the content has a strict DRM model). Early; not yet implemented in production deployments.

draft-englishm-moq-relay-dos and other informational drafts. Companion documents on operational topics — denial of service protection, relay-network topology guidance, AI-agent track binding. Not standards-track; useful reading for relay operators.

The honest bottom line on standardisation: the base transport is converged enough that the production deployments shipped in 2026 are tracking the spec closely and will only need minor changes to follow the final RFC. The streaming format layer is earlier and will move more. Anyone building a MoQ product in 2026 should follow the moq-wg mailing list and pin to a known draft revision in their code, not "the latest".

The 2026 Deployment Map — Who Has Shipped What

The vendor table below summarises the May 2026 MoQ coverage across the platforms a 2026 streaming product is likely to evaluate. Three columns matter: origin/encoder (can your contribution-side pipeline produce a MoQ track), relay/CDN (can a network forward MoQ at scale), and player (can the viewer's device decode and render the stream). The "draft tracked" column names the moq-transport revision the vendor's public material cites; mismatches mean interoperability tests are required.

Vendor / PlatformMoQ originMoQ relay / CDNMoQ playerDraft tracked (May 2026)
CloudflareReference (moq-rs publisher tools)Yes — first MoQ CDN, 330+ citiesmoq-js (web), open-source-17 / -latest
BitmovinYes (encoder + packager)Via Cloudflare integrationBitmovin Player (MoQ profile)-17
WowzaWowza Streaming Engine originVia partner relay (incl. Cloudflare)Demo player at NAB 2026-17
Norsk (id3as)Yes (Norsk origin)Via Cloudflare relay networkDemo player at NAB 2026-17
Ant MediaAnt Media Server publishes MoQ alongside WebRTCSelf-hosted or auto-scaledAnt Media web player-17
BroadpeakBkS400 packagerPartner relay (Oracle OCI demo)Via partner-17
OracleYes (Oracle Video at the Edge on OCI)OCI-hosted relayVia partner-17
AWSDemonstrated at NAB 2026Demonstrated at NAB 2026Via partner-17
Red5Yes (Red5 Pro publisher)Via CacheFly partnership (announced Feb 2026)Red5 Pro web player-17
CacheFlyVia Red5 partnershipYes — global MoQ CDNn/a-17
SynamediaDemonstrated at NAB 2026Synamedia delivery platformn/a-17
Nomad MediaDemonstrated at NAB 2026Via partnern/a-17
nanocosmosnanoStream origin (since IBC 2025)Via partnernanocosmos H5Live player-16/-17
THEO TechnologiesRoadmap; not yet GAn/aTHEOplayer MoQ profile (roadmap)tracking
MuxNot yet announcedNot yet announcedNot yet announcedtracking
AkamaiNo first-party MoQ packagerUDP / WebTransport passthroughNo nativetracking
AppleNo native MoQn/aNo native iOS/Safari playertracking
Two things stand out from this list. First, the origin side has the broadest vendor coverage — most encoder vendors had a working MoQ origin by NAB 2026, because the work is mostly "publish CMAF chunks as MoQ objects" and that is mechanically straightforward once a QUIC stack is in hand. Second, the relay side is dominated by Cloudflare; the second tier (CacheFly, Oracle OCI, vendor-self-hosted relays at Norsk/Ant Media/Wowza) is real but small. The player side is the most fragmented: there is no native browser support yet (no MoQ-aware video element), so every player today is a JavaScript engine driving MSE or WebTransport, and every vendor's player is incompatible with every other vendor's at the manifest / catalog layer until MSF and CMSF stabilise.

The practical effect of this map: a 2026 MoQ deployment is feasible if (a) your encoder vendor is on the list, (b) Cloudflare is acceptable as the relay tier or you operate your own relays, and (c) your audience accepts a vendor-specific MoQ-capable player. The "all-MoQ end-to-end" deployment is for a defined viewer population (your own app, your own embed, your own betting platform). Mixed deployments where MoQ is one of several distribution legs (MoQ for the interactive viewers, LL-HLS for the rest) are the dominant production pattern.

Use Cases — Where MoQ Actually Justifies The Switch

The cluster of use cases where MoQ is genuinely better than the alternatives is narrower than the marketing suggests but real. The honest framing: MoQ wins where the workload combines broadcast scale with sub-second latency requirements that LL-HLS cannot reach and where WebRTC's per-viewer SFU model is too expensive. In rough order of how clean the case is:

Live sports betting and iGaming. Viewer counts in the thousands to millions; latency tolerance under 500 ms because the betting market is moving against the picture; willingness to pay for any architecture that delivers the latency reliably. Multiple NAB 2026 demonstrations targeted this case explicitly.

Live commerce and shopping streams. Sub-second is the difference between "viewer sees the live host hold up the item and clicks the buy button" and "viewer sees a stale frame and clicks too late". Audience sizes are large (thousands per stream); latency is the differentiating product feature.

Live auctions. A bid placed during the closing seconds of a live auction needs to be visible to the auctioneer within the bidding window, which is sub-second. Audience sizes vary but the latency-to-scale ratio is in MoQ's territory.

Second-screen interactive experiences. A companion app that synchronises a betting prompt, a poll, or a stat overlay to the live broadcast needs sub-second to feel synchronised. Audience sizes are broadcast-scale.

Large-scale interactive game streams and concerts. When chat or audience reactions need to feel like part of the show.

Cloud gaming and remote production. Where MoQ's per-stream priority and partial reliability are a closer match for the workload than WebRTC's SFU model.

Where MoQ does not justify the switch in 2026:

  • Standard VOD playback. Latency does not matter; HLS and DASH are cheaper, more interoperable, more cacheable, and supported on every device.
  • Linear OTT broadcasting where 3 to 6 seconds is acceptable. LL-HLS and LL-DASH already work; replacing them with MoQ adds operational complexity for no headline win.
  • Two-way video calling and conferencing. WebRTC is the right answer and will remain so.
  • Workloads where the audience cannot install a custom player. No native browser support means a vendor JavaScript player; if the deployment context cannot ship one, MoQ is out.

The Honest Counter-Argument

It is worth taking seriously the contrary case, which has been argued publicly by Tsahi Levent-Levi (BlogGeek.me) among others. The strongest version of the argument runs: WebRTC won because it had a new use case (in-browser video calling) that did not exist before the protocol — there was no prior solution to dislodge. MoQ is being proposed for use cases (live broadcast, sub-second delivery) that already have solutions (LL-HLS, WebRTC delivery, HESP); the win has to be measured against an installed base, not against nothing. The five-month-into-2026 reality is that vendor announcements have multiplied but production POCs — meaning customer-bearing, traffic-bearing deployments where the business depends on MoQ — are still rare and modest in scale.

That is a fair point and worth holding alongside the technical merits. The counter-counter-argument is that the integrated relay+CDN architecture is structurally lower-cost than running a WebRTC SFU at the same scale, which means as deployments mature the economics should favour MoQ; and the standardisation work has gone faster than most working groups (17+ revisions in roughly three years, with a steady cadence). But the honest position in May 2026 is: MoQ is the most interesting protocol in the section, and the case for it is strong enough that every serious streaming team should have read the draft, run the moq-rs tutorial, and understood the architecture — and any team for whom a 500 ms latency floor would unlock product features should be running a POC. The case for replacing a working LL-HLS or WebRTC deployment today, with no specific latency-driven product feature on the line, is much weaker.

A two-column decision tree. Left column header:

Figure 5. When to pick MoQ in 2026. The product-feature question (sub-500 ms latency at scale) is the only one that justifies the switch; the technology-curiosity question is not the same question.

A Numeric Example — Sizing A MoQ Deployment

To make the architecture concrete, consider a worked example: a live sports betting platform broadcasting one event to 100,000 simultaneous viewers, ladder of three video bitrates (1080p 6 Mbps, 720p 3 Mbps, 480p 1.5 Mbps) plus stereo audio (128 kbps), 2-second GOP, 50 fps.

Per-viewer bandwidth (a typical ABR mix favouring the middle rung): 60% on 720p + 30% on 1080p + 10% on 480p, average audio 128 kbps. Average per-viewer bitrate: 0.6 × 3 + 0.3 × 6 + 0.1 × 1.5 + 0.128 = 3.728 Mbps ≈ 3.73 Mbps. Total egress at the relay edge: 100,000 × 3.73 Mbps = 373 Gbps.

On a WebRTC SFU model, the publisher writes once into the SFU layer; the SFU layer fans out at 373 Gbps but maintains 100,000 stateful sessions. Each session has signalling overhead, ICE, DTLS, an SRTP context, jitter buffer state — call it 5 KB per session worst-case, 500 MB of relay memory.

On an HLS CDN model, the publisher writes once into the origin; the CDN fans out at 373 Gbps via stateless caches. Each viewer's session state is essentially zero (a TCP connection with a few KB of state). The latency floor, however, is 3 to 5 seconds.

On a MoQ relay model, the publisher writes once into the relay tree; the relay tree fans out at 373 Gbps via subscription aggregation; each downstream subscription's relay state is small (a control-stream pointer and the object metadata table for the in-flight window — call it 200 bytes per subscription, 20 MB of relay memory across the 100,000 viewers). Latency target: 400 ms.

The MoQ model is roughly 25× more memory-efficient than the SFU model (20 MB vs 500 MB across 100,000 sessions), achieves an HLS-comparable scaling profile, and lands within 100 ms of the WebRTC latency. That is the architectural arithmetic behind MoQ's pitch. The egress bandwidth is the same in all three; the difference is the cost of the middlebox doing the fan-out, and the latency floor of the resulting delivery.

Where Fora Soft Fits In

Fora Soft has built video infrastructure since 2005 — 239+ shipped projects across WebRTC, conferencing, OTT, telemedicine, e-learning, surveillance, and AR/VR. Our 2026 work increasingly sits at the boundary between WebRTC contribution and modern HTTP-based distribution, which is exactly the territory MoQ targets. We are tracking the moq-transport draft revisions, running internal POCs on moq-rs for low-latency interactive video, and advising clients in iGaming, live shopping, and second-screen sports on whether to wait for the RFC or move to a vendor-tracked production stack now. The decision is usually about how tightly the product feature requires sub-second delivery; we have seen workloads where the answer is "wait", workloads where the answer is "ship now on Cloudflare + Bitmovin + the vendor player", and workloads where the answer is "build the bridge — WebRTC for contribution, MoQ for distribution, LL-HLS as the long-tail fallback".

Common Mistake — Treating MoQ Like HLS

A surprisingly common engineering pitfall in early 2026 implementations: writing a MoQ origin that publishes one track per CMAF segment and one group per segment, then asking "why does the latency look like HLS?" The mistake is the granularity. In HLS the segment is the atomic addressable unit; in MoQ the object is, and a CMAF chunk inside a segment becomes one object. A 2-second segment with 50 fps gives you 100 objects per group, which is the granularity at which MoQ's per-object priority, per-object drop, and per-object cache hit actually do work. If you collapse all 100 chunks into one object you get HLS's segment-aligned latency back. The CMSF draft is explicit about this: "the payload of each Object must contain at least one Movie Fragment Box (moof) followed by a Media Data Box (mdat)" — at least one, meaning one is the right answer, not all of them.

What To Read Next

CTA

  • Talk to a streaming engineer — book a 30-minute scoping call with our team on MoQ POC design, vendor selection, and migration sequencing.
  • See our case studies — read how Fora Soft has built low-latency live and interactive streaming for iGaming, e-learning, telemedicine, and OTT clients since 2005.
  • Download the MoQ Readiness Checklist — a one-page printable list of the architectural, vendor, and operational questions to answer before kicking off a MoQ pilot. Download (PDF).

References

  1. draft-ietf-moq-transport-17Media over QUIC Transport. IETF MOQ Working Group. Published 2 March 2026; intended-status Standards Track; expires 3 September 2026. . Tier 1 (official IETF working-group draft). Primary source for the data model (tracks/groups/subgroups/objects, §3), the control messages (SUBSCRIBE/ANNOUNCE/PUBLISH/FETCH), the QUIC and WebTransport bindings, and the priority/forwarding-preference semantics.
  2. draft-ietf-moq-transport-latestMedia over QUIC Transport. IETF MOQ Working Group. Published 1 May 2026; intended-status Standards Track; expires 2 November 2026. . Tier 1. Used to verify which changes between -17 and -latest are wire-format affecting; called out in the article as "still moving" where the working group is mid-discussion.
  3. draft-ietf-moq-msf-00MOQT Streaming Format. IETF MOQ Working Group. Published 2026; working-group draft. . Tier 1. Source for the catalog, timeline, and ABR-switching layer above the transport.
  4. draft-wilaw-moq-cmsf and the chartered successor draft-ietf-moq-cmsf-00. CMSF — a CMAF compliant implementation of MOQT Streaming Format. IETF MOQ Working Group. . Tier 1. Primary source for the CMAF init = catalog initData / CMAF moof+mdat = Object / GOP boundary = Group mapping.
  5. draft-ietf-moq-secure-objects-00End-to-End Secure Objects for Media over QUIC Transport. IETF MOQ Working Group. . Tier 1. Source for the end-to-end encryption layer that allows relay forwarding without plaintext access.
  6. RFC 9000QUIC: A UDP-Based Multiplexed and Secure Transport. J. Iyengar, M. Thomson (Ed.). May 2021. . Tier 1. Cited for the streams, datagrams, and 0-RTT/1-RTT handshake properties MoQ depends on.
  7. RFC 9114HTTP/3. M. Bishop (Ed.). June 2022. . Tier 1. Cited for the WebTransport upgrade path and the shared HTTP/3 + MoQ runtime.
  8. RFC 9221An Unreliable Datagram Extension to QUIC. T. Pauly, E. Kinnear, D. Schinazi. March 2022. . Tier 1. Source for MoQ's "datagram" forwarding preference.
  9. W3C WebTransport specification. . Tier 1. Cited for the browser-facing fallback path.
  10. "MoQ: Refactoring the Internet's real-time media stack" — Cloudflare blog, 2025. . Tier 3 (first-party engineering blog from spec contributor). Cited for Cloudflare's moq-rs relay architecture and the 330+ city deployment.
  11. "The First MoQ CDN: Cloudflare" — moq.dev blog, 2025/2026. . Tier 3. Cited for the "first MoQ CDN" claim and the publisher-relay-subscriber walk-through.
  12. "Media over QUIC (MoQ) with Bitmovin and Cloudflare" — Bitmovin blog, 2026. . Tier 4 (vendor production deployment). Cited for the joint Bitmovin + Cloudflare playback demonstration and the NAB 2026 interop. We followed the spec's wire format where the blog's wording was less precise.
  13. "What Is MOQ (Media over QUIC) and Why It Matters" — Red5 blog, 2026. . Tier 4. Cited for the Red5 / CacheFly partnership and the NAB 2026 demo summary.
  14. "MOQ Debut Proved to Be a Chart Topper at NAB 2026" — Red5 blog, April 2026. . Tier 4. Cited for the NAB 2026 vendor list (Ant Media, AWS, Bitmovin, Broadpeak, CacheFly, Cloudflare, Nomad Media, Oracle, Norsk, Synamedia, Red5).
  15. "NAB 2026: Oracle and partners showcase MoQ-based streaming ecosystem" — Sports Video Group, 19 April 2026. . Tier 4. Cited for the Oracle + Bitmovin + Broadpeak joint demonstration.
  16. "Joint Demos at NAB Show 2026 Power Live Streaming, Monetization and Security at Scale" — Broadpeak blog, April 2026. . Tier 4. Cited for the Broadpeak packaging + Oracle Video at the Edge + Bitmovin playback demo.
  17. "Wowza Demos AI Workflows at NAB Show 2026" — Wowza blog, April 2026. . Tier 4. Cited for the Wowza Streaming Engine + Cloudflare MoQ demonstration with CMAF/CMSF over MoQT.
  18. "MOQ is lacking a compelling adoption reason" — BlogGeek.me, 2026. . Tier 4. Cited as the counter-argument source in "The Honest Counter-Argument" section; we deliberately included a sceptical perspective to balance the vendor pieces.
  19. "What Is Media Over QUIC (MoQ)? How It Works + Why It Matters" — Wowza blog, 2026. . Tier 4. Cross-checked for the hierarchical data model description; primary source for our wording remained the IETF draft (§3).
  20. MOQtail: Open-Source, IETF-Compliant MOQT Protocol Libraries — Proceedings of the ACM Multimedia Systems Conference 2026 (MMSys 2026). . Tier 5 (peer-reviewed academic paper). Cited for the subgroup-as-priority-layer prototype results.
  21. Evaluating Media over QUIC (MoQ) for Low-Latency — DiVA Portal, 2026. . Tier 5. Used as an independent verification of the latency-budget arithmetic in "Where The 500 ms Latency Number Comes From".
  22. "WebRTC vs. MoQ by Use Case" — webrtcHacks, 2026. . Tier 3. Used to cross-check the "complement not replacement" framing.
  23. cloudflare/moq-rs — Rust implementation of the IETF MoQ Transport protocol. GitHub repository. . Tier 2 (reference implementation from a working-group contributor). Used as the authoritative source for the relay implementation behaviour where the draft was ambiguous.
  24. IETF MOQ Working Group charter and minutes. . Tier 1. Used to confirm the working group's chartered scope and the in-progress vs adopted document status.