Why This Matters
If you are scoping a video product, sizing a streaming budget, briefing engineers, or comparing protocols on a vendor's slide, the answer to every important question begins with "is this live, VOD, or near-live?". The choice between HLS at 20-second latency and WebRTC at 300 milliseconds is not a preference — it is a consequence of the deadline. The same is true of bitrate ladders, CDN cache policies, DRM licence economics, and the failure modes you will spend the next year debugging. We wrote this for the smart non-technical reader first; a senior streaming engineer should still respect every fact. By the last paragraph you will be able to tell which of the three problems any given streaming pitch is actually solving — and where its bill comes from.
Three deadlines, one pipeline
Every streaming product walks through the same five-stage pipeline — capture, encode, package, deliver, play — that we drew in What is video delivery, and why is it harder than serving a JPEG?. What changes between live, VOD, and near-live is not the stages. It is the deadline attached to each stage, and therefore which trade-offs the engineering team is allowed to make.
The deadline is the single most useful lens. It explains why a 30-minute Netflix episode and a 30-minute Twitch stream feel like the same product to a viewer and look like utterly different problems on the inside.
In VOD, no part of the pipeline races a clock. The file exists before anyone presses play, so the encoder can spend hours on a single asset, the packager can pre-cut every segment, the content delivery network — abbreviated CDN, a chain of servers placed close to viewers — can replicate everything to its edges at leisure, and the player can buffer aggressively. The constraint is cost, not time.
In live, every stage is on a clock. The encoder is racing the camera, the packager is cutting segments that did not exist a second ago, the CDN has to propagate fresh segments before they expire, and the player has to keep a buffer that is thin enough to feel "live" but thick enough to absorb a sneezing Wi-Fi router. The constraint is latency, and latency costs scale, cost, or resilience — sometimes all three.
In near-live, the content is being produced live, but a deliberate, fixed delay is inserted between capture and viewer — typically 5 to 90 seconds — so that the system gets most of live's freshness with most of VOD's resilience. The constraint is the chosen delay; the engineering job is to honour it precisely and predictably, not to minimise it.
A note on the words
The vocabulary in this space is a small war over precision. Three definitions are worth pinning down before we go further, because vendors use them loosely and the loose use is where overpaying happens.
Video on demand (VOD), also called "on-demand streaming" and sometimes "non-linear" when referring to broadcast, is the delivery of content that is fully encoded, packaged, and stored before the viewer requests it. The viewer can start, pause, seek backward, seek forward, and resume at will. Examples: a Netflix series, a recorded webinar, a YouTube upload, a university lecture archive.
Live streaming is the delivery of content that is being produced while it is being watched. The publisher's camera and the viewer's screen are coupled in time; the only question is how tightly. Examples: a Twitch broadcast, a sports match, a live news feed, a video call.
Near-live is live content with a deliberately inserted delay. The term is industry shorthand — there is no ISO definition — but in practice it covers any pipeline where the producer intends a 5- to 90-second gap between capture and viewer playback. Examples: sports streams synchronised to a betting window, live shopping channels with moderator review, live news with a profanity-and-graphics buffer, a corporate town hall with a translation delay.
There is one more term worth knowing because it gets confused with all three. Linear streaming refers to channels that play a fixed programming schedule, the way traditional television does: viewers tune in to whatever is on right now, with no ability to seek backward beyond a DVR window. Linear can be live (a live news channel) or near-live (a sports channel with a five-second betting delay) or even a 24/7 loop of pre-recorded VOD. It is a packaging concept, not a deadline concept, and we treat it as orthogonal to the three categories above.
The deadline determines the protocol
The biggest practical consequence of picking live, VOD, or near-live is what protocol family you ship on, and therefore which engineers, which licences, and which monthly CDN bill you will be paying. The table below collapses two decades of streaming history into a single picture; later articles unpack each row.
| Category | Typical glass-to-glass latency | Typical protocols (2026) | Player buffer | What it optimises for |
|---|---|---|---|---|
| VOD | not measured | HLS, DASH, CMAF over HTTP | 20–60 s | Cost, quality, reliability |
| Near-live (broadcast-grade) | 5–30 s | HLS, DASH, CMAF | 6–18 s | Predictable delay, resilience |
| Live (standard) | 6–30 s | HLS, DASH (3–6 s segments) | 6–30 s | Reach, scale |
| Live (low-latency HTTP) | 2–5 s | LL-HLS, LL-DASH, CMAF chunked | 2–6 s | Latency without leaving CDNs |
| Live (sub-second) | 0.2–1 s | WebRTC, Media over QUIC | 0.05–0.5 s | Interactivity |
| Live (broadcast contribution) | 0.05–1 s | SRT, RIST, NDI, ST 2110 | small | Reliability over public internet |
Each row is its own article in this section. We cover HLS in depth, LL-HLS in depth, WebRTC delivery from peer-to-peer to scale, SRT, and the delivery-protocol family tree. The glass-to-glass numbers above come from the spec ranges defined in IETF RFC 8216 (HLS), in Apple's HLS Authoring Specification (revision 2025-09) for LL-HLS, in ISO/IEC 23009-1:2022 for DASH, and from current production deployments by Netflix, Twitch, Mux, and Cloudflare; the WebRTC range is bounded below by the W3C WebRTC Candidate Recommendation and the RFC 8825–8866 family.
VOD in plain terms
VOD is the cheapest streaming you can ship at scale, and almost every engineering lever points the same way: pre-do the work.
The encoder gets a full file and unlimited time. A modern VOD encoder will sweep multiple bitrate ladders — a list of versions of the same video at increasing bitrates, called rungs — at speeds chosen for quality, not for the wall clock. A two-hour film can be analysed scene by scene, encoded once at high quality, and stored forever; the same encoding budget would be impossible on a live event. We dig into ladder design in Building a bitrate ladder: classic Netflix ladder, per-title, per-shot.
The packager and the CDN get the same gift. Segments — short, self-contained chunks of video, typically 2 to 6 seconds long — are cut once, stored once, and copied to CDN edges that serve them to viewers from a city nearby. A single VOD asset can sit on a CDN edge for weeks; once the cache is warm, the cache hit ratio — the share of requests answered without a trip back to the origin server — routinely sits at 95–99% for VOD platforms according to vendor data from Akamai and Cloudflare. The higher that ratio is, the cheaper every byte of delivery becomes.
The player has the same freedom. Because there is no live deadline, the player can buffer ten, twenty, or sixty seconds ahead. Big buffers absorb jitter, called the variation in arrival time of consecutive packets, and they almost eliminate visible stalls on a healthy network. A startup delay of 1–2 seconds is acceptable for VOD; a startup delay of 1–2 seconds in a live sports app would make the product unusable for a chunk of its audience.
The arithmetic of VOD economics is friendly. Consider a service with a 1,000-title catalogue, each title 90 minutes at an average rendition of 4 megabits per second. Replicating one title to one CDN region costs:
4,000,000 bits/sec × 5,400 sec ÷ 8 = 2.7 gigabytes per rendition
× 6 rungs (240p, 360p, 540p, 720p, 1080p, 4K) = ~16 GB per title
× 1,000 titles = 16 TB of storage per region
Sixteen terabytes is roughly the cost of a couple of mid-range cloud-storage subscriptions. The actual money in VOD goes to egress — the bandwidth bill paid each time a viewer downloads a segment from an edge — not to encoding or storage. Once the cache is warm, the marginal cost of one additional viewer in a region where the asset already sits is tiny. We dig into the dollars in CDN cost economics: 95th-percentile, commit, overage, transit.
Pitfall: treating VOD as "live with no clock". The biggest mistake we see in early-stage projects is reusing the same low-latency live infrastructure to serve VOD, or vice versa. A live origin running CMAF chunked transfer with 1-second chunks will burn unnecessary HTTP overhead serving a Netflix-style catalogue; a VOD-style 6-second segment with a 30-second buffer cannot be retrofitted into a betting product. Pick the architecture for the deadline you actually have.
Live in plain terms
In live, the camera sets the deadline. Every downstream stage gets the bytes a fixed time later and must do its job before the viewer's player wants to draw the next frame. The cheap shortcuts that make VOD profitable — pre-encoding, pre-packaging, long edge TTL, deep buffers — are simply not available.
Latency is the headline number, and it is built from a budget. Glass-to-glass latency — the time between something happening in front of the lens and a viewer's eyes seeing it — is the sum of the time each pipeline stage costs. A typical 2026 classic HLS workflow looks something like this:
Capture 5–20 ms
Encode 50–500 ms
Packager 200 ms – 4 s (segment cut-up dominates)
Origin/CDN 50–500 ms (propagation, edge cache miss)
Player buffer 6–30 s (the dominant cost)
TOTAL ~6 – 35 s
The single largest contributor is the player's buffer. The buffer is not optional: it is the only thing that hides network jitter from the eye. Pulling the buffer down to one second yields LL-HLS or LL-DASH, with glass-to-glass landing around 2–5 seconds. Pulling it down further requires leaving HTTP entirely — WebRTC, Media over QUIC — and rebuilding a chunk of the pipeline around UDP-based transports. We unpack the budget in Latency, glass-to-glass, end-to-end.
The CDN economics flip too. Live segments expire fast. A segment from 30 seconds ago is no longer of interest to almost any viewer — they have already moved on with the program. Cache hit ratios for live edges sit well below VOD: vendor reports from CDN providers put live cache hit ratios at 70–90%, and the gap costs real money at scale. Live also stresses cache invalidation, because the manifest — the small text file the player reads to learn which segments to fetch — is rewritten every few seconds. Edge configurations that work great for VOD will silently break for live unless the TTL on the manifest is set differently from the TTL on the segments themselves.
Scale concentrates rather than spreads. A 1-million-viewer live event has all million viewers requesting roughly the same segments at roughly the same moment. The CDN's edge has to be able to serve those concurrent requests; the origin has to have published the segment just in time; and the network in between has to push the traffic. Compare with a VOD library where a million daily viewers spread their requests across a thousand titles and 24 hours — different segments, different times, different edges.
The viewer's experience model is also different. Live's failure mode is a stall or a quality drop right when the goal happens; VOD's failure mode is a quality drop you can wait out. Live audiences are far less forgiving, and live engineering teams obsess about quality of experience, abbreviated QoE, in ways VOD teams do not.
Near-live: the underused middle
Near-live is live with the player buffer left intentionally long, plus (in most production systems) a programmable delay inserted on the encoder side. The result is the best of both worlds for a wide class of products that do not actually need sub-second latency.
The reasons to introduce a delay are practical:
- Betting and trading windows. A sports betting house needs every customer to see the same play within a few seconds of each other and after the bookmaker has had time to close the betting window. Industry-published latency targets sit around 5–10 seconds for in-play wagering, per Stats Perform's 2026 Super Bowl Latency Report and Dolby's OptiView guidance; tighter than that produces arbitrage between fast and slow viewers, looser than that kills the bet.
- Moderation. Live shopping channels, live auctions, and live talent programmes insert a 5–30-second moderation buffer so that a moderator can pull a stream that turns inappropriate. The delay is the moderation tool.
- Live news and broadcast graphics. Broadcast cues, on-screen graphics, closed captions, and overlay branding need a head start. A 10–60-second delay gives the graphics rig time to overlay scores, lower thirds, and translations on the live feed before viewers see it.
- Synchronisation across very different paths. When the same content has to reach satellite TV, terrestrial DTT, IPTV, and OTT viewers within a few seconds of each other, the slowest path sets the budget for all of them. Near-live makes that synchronisation possible by holding the fast paths back.
The engineering story is the friendliest of the three categories. Near-live ships on the same HLS/DASH/CMAF pipeline as VOD and standard live; the only difference is the buffer policy and the inserted delay. The protocol is HTTP-based, so the CDN economics are close to VOD's. The cache hit ratio is closer to VOD than to low-latency live, because the larger buffer means a single segment is requested by many viewers across a wide time window. The player is allowed a deep buffer, so the failure mode is forgiving.
We see near-live as the most underused category in early-stage product scopes. Founders frequently ask for "live, the lowest latency possible" because they think low-latency is impressive; what they actually need, once we count the verticals they serve, is a predictable 5- to 15-second near-live stream that costs a fraction of a sub-second WebRTC build. Near-live is also where most television-style products live: when you watch a live sports channel on a connected TV, you are almost always seeing near-live with a 10–45-second delay.
A worked example: one match, three products
Consider the same source — a 90-minute football match — packaged for three different products. The numbers below are typical for a 2026 production stack and are the kind we plan against when scoping engagements.
| Stage | Product A: VOD highlight reel | Product B: Near-live broadcast feed | Product C: Sub-second betting overlay |
|---|---|---|---|
| Encoder budget | 30 min processing, 7-rung ladder | Real-time, 6-rung ladder, CMAF chunked | Real-time, 3-rung ladder, SVC |
| Segment / chunk length | 6 s | 2 s | 50 ms (WebRTC RTP packet stream) |
| Player buffer | 30 s | 8 s | 80 ms |
| Glass-to-glass latency | not measured (recorded) | 12 s (deliberate) | 0.4 s |
| Protocol | HLS over HTTP | LL-HLS over HTTP | WebRTC over UDP |
| Distribution | CDN, ~99% cache hit | CDN, ~92% cache hit | SFU regional bridges, no CDN cache |
| Audience model | Long tail across days | Concurrent peak during match | Concurrent peak, limited to bettors |
A streaming product almost never ships with one of these three in isolation. The football operator above is likely to ship all three at once: a live near-live feed for general fans, a sub-second WebRTC feed for the betting product, and a VOD highlights catalogue after the whistle.
What changes inside the engineering team
The deadline category also shapes the team. We have run engagements in all three; the shape of the engineering work is recognisably different.
VOD-shaped projects spend their time on catalogue, codecs, and cost. Where does the ladder sit? What is the per-title encoding policy? How much do we pay per terabyte at the 95th percentile? How do we balance H.264 reach against AV1 bitrate savings? The work is repeatable, predictable, and rewards careful measurement.
Live-shaped projects spend their time on the chain of clocks and the failure modes. What is the contribution protocol from the venue? How do we recover from a 200-millisecond network glitch in the encoder-to-origin hop? How do we keep the manifest fresh at the edge without invalidating the segment cache? How do we observe latency in production with low overhead? Live work rewards instrumentation and ruthless on-call rotations.
Near-live projects look like live with a calmer rhythm. The on-call burden is lower because the buffer hides more, but the team still owns a clock — the targeted delay must be precise and predictable, because betting, moderation, and broadcast all depend on knowing exactly where every viewer is on the timeline. We have learned to ship near-live products with explicit timecodes embedded in every segment, so support engineers can answer "what did the viewer see at 14:32:07" without guessing.
The mistake we see most often in early-stage products is over-fitting to one shape and then asking the second product line to live inside it. A team that learned to ship VOD will, by default, design live like VOD with a tighter buffer; a team that learned WebRTC will, by default, ship VOD on a real-time pipeline that costs five times what it should. The article you are reading is the first defence against that mistake: pick the shape first, then pick the stack.
Where Fora Soft fits in
We have shipped all three flavours of streaming for clients across video conferencing, OTT, e-learning, telemedicine, and live broadcast. Our live e-learning stacks routinely run near-live with a 6–15-second delay so an instructor can manage the chat lane without missing student questions; our OTT engagements ship CMAF live-and-VOD pipelines with shared CDN economics; and our telemedicine and surveillance products use WebRTC with sub-second latency where a clinician's reaction time or an operator's monitoring loop sets the budget. The choice between architectures is the conversation we have on day one; the second meeting is about which combination — almost always more than one — the product actually needs.
What to read next
- What is video delivery, and why is it harder than serving a JPEG? — the pipeline these three categories share.
- Latency, glass-to-glass, end-to-end — the budget that decides which category you can afford.
- The delivery protocol family tree — how HLS, DASH, LL-HLS, WebRTC and Media over QUIC relate.
Talk to us · See our work · Download
- Talk to a streaming engineer about whether your product is live, VOD, near-live, or all three.
- See our case studies in OTT, e-learning, telemedicine, and live broadcast.
- Download the Live vs VOD vs near-live decision card — one page, three categories, the deadline-driven trade-offs and the protocol families that fit each.
References
- IETF RFC 8216 — HTTP Live Streaming, August 2017. The canonical HLS protocol description; segment, manifest, and Media Sequence Number semantics.
- Apple HLS Authoring Specification for Apple Devices, revision 2025-09. The normative source for LL-HLS partial segments, EXT-X-PART, preload hints, and blocking playlist reload. Apple removed HTTP/2 push from LL-HLS in the September 2023 revision; this 2025 revision still reflects that decision.
- ISO/IEC 23009-1:2022 — Dynamic adaptive streaming over HTTP (DASH) — Part 1: Media presentation description and segment formats, 2022. The base DASH specification; MPD, periods, adaptation sets, representations. (Normative text paywalled; cited from the ISO catalogue page and DASH-IF's open implementation guidelines.)
- ISO/IEC 23000-19:2024 — Common Media Application Format (CMAF) for segmented media, 2024. The container format that unified HLS and DASH; CMAF chunked transfer is the basis for LL-HLS and LL-DASH.
- W3C WebRTC 1.0: Real-Time Communication Between Browsers, W3C Candidate Recommendation. The browser-side WebRTC API; bounds the sub-second live latency category. Read alongside IETF RFC 8825–8866 family.
- IETF RFC 9000 — QUIC: A UDP-Based Multiplexed and Secure Transport, May 2021. The transport underpinning HTTP/3 and Media over QUIC.
- **Mux — An update on Low Latency HLS live streaming, engineering blog, by an LL-HLS implementer. Production-deployer perspective on LL-HLS segment, part, and rendition-report behaviour against the spec.
- Akamai — Cache Hit Ratio: The Key Metric for Happier Users and Lower Expenses. Vendor reference for the 95–99% cache-hit benchmark for healthy VOD workloads and the lower live numbers.
- Dolby OptiView — Streaming Latency: What is It and When Does It Matter? and The Latency Debate in Live Sports: Consistency vs. Speed. Industry blog posts from a production-deployer that situate the 5–10 second target for live sports betting.
- Bitmovin — Video Developer Report 2025. Survey data on codec, protocol, and ABR adoption used to anchor the 2026 production-stack ranges in the protocol table.


