Why this matters
Latency is one of the few streaming numbers a non-technical viewer notices immediately and judges harshly: for live sport, news, auctions, betting, and any "watch-along," being thirty seconds behind real time is not a quality nuance, it is a broken product, because a push notification, a text from a friend, or a cheer through the wall spoils the moment before the screen catches up. But low latency is not free, and the most expensive mistake teams make is treating "make it real-time" as a single switch instead of a budget with line items and a bill. This article is for the founder, product manager, or streaming engineer who needs to decide how low to go, brief an encoder and CDN vendor on the right target, and understand why chasing the last second can quietly raise rebuffering and CDN cost for everyone. By the end you will be able to read a latency budget line by line, explain why plain HLS is slow and how LL-HLS and LL-DASH speed it up, and put a sensible number on the latency your product actually needs — neither carelessly high nor expensively low.
A one-minute refresher: where the seconds come from
This article builds on how a CDN delivers video and the packaging mechanics in packaging: CMAF, HLS, and DASH from one mezzanine; here is the one idea you need in front of you.
The delay a viewer feels has a name. Glass-to-glass latency is the gap, measured in seconds, between the moment a camera lens captures something (the first glass) and the moment it appears on the viewer's screen (the second glass). It is not one delay but a chain of small ones added together: the camera and capture buffer, the encoder that compresses the picture, the packager that wraps it into deliverable files, the contribution link that carries it to your platform, the content delivery network (the worldwide fleet of caching servers, or CDN, that holds copies of your video near viewers) that fans it out, the player buffer (the few seconds of video a player holds in reserve so a network hiccup does not freeze the picture), and the screen's own refresh.
Here is the fact that organizes everything below: in that chain, almost every multi-second number on a spec sheet comes from one box — the player buffer. Add up the encoder, packager, contribution, CDN, and decode and you are rarely past a second of unavoidable work. The player buffer is the term that swings the total from a quarter of a second to twenty-five, because the buffer is a deliberate cushion, and the size of the cushion is mostly a choice. Low-latency streaming is, almost entirely, the engineering of a smaller cushion that still absorbs the bumps.
Figure 1. The latency budget. Capture, encode, package, and CDN add up to roughly a second across every mode; the player buffer is the swing term that takes the total from ~25 seconds (plain HLS) to ~2–5 seconds (low-latency) to sub-second (WebRTC).
Why plain HLS and DASH are slow on purpose
To make video survive the open internet, HTTP streaming — delivering video as ordinary web files over the same protocol that serves web pages — chops the stream into segments, short files of a few seconds each, and writes a manifest (a playlist that lists the segments in order). The player downloads the manifest, then fetches segments one after another. This is the design that let streaming ride on cheap, ordinary CDNs and reach millions of viewers — covered in how a CDN delivers video.
The slowness is baked into a safety rule. To avoid freezing when the network wobbles, a player keeps a reserve of already-downloaded video before it starts playing — and the standards bake in how much reserve. HTTP Live Streaming (HLS), defined by Apple in IETF RFC 8216 and its second edition, tells the player through a value called HOLD-BACK to start at least three Target Durations from the end of the live playlist; the spec says this value "MUST be at least three times the Target Duration," and absent any value the player assumes exactly that. MPEG-DASH deployments use the same rule of thumb — the DASH-IF benchmark configuration sets the player buffer at "3 times maximum/target segment duration." Three segments of cushion is what keeps playback smooth; it is also exactly what makes plain HTTP streaming far behind live.
Walk the arithmetic out loud, because the size of the number is the whole problem:
Standard live-edge rule: start ≈ 3 × segment duration behind live
(HLS HOLD-BACK ≥ 3 × Target Duration; DASH player buffer ≈ 3 × segment)
6-second segments: 3 × 6 s = 18 s behind live (before any other delay)
10-second segments: 3 × 10 s = 30 s behind live
Add encoder + packager + contribution + CDN + decode (~1–2 s)
→ a real plain-HLS/DASH live stream lands ~20–30 s behind reality.
That is why Apple's own early guidance recommended ten-second segments and why, as the DASH-IF report puts it bluntly, the legacy settings — "10 seconds segments, buffer 2–3 segments at the client" — "immediately result beyond 30 seconds end-to-end latency." Plain HTTP streaming was, in Apple's words, never meant to be a low-latency live system. The obvious fix — just make the segments tiny — does not work, because shorter segments mean more files, more requests, worse compression, and a higher chance of stalls. The low-latency standards needed a way to deliver pieces smaller than a segment without paying the small-segment tax. That way is the CMAF chunk.
Figure 2. The three-segment rule. A normal player holds about three segments before it plays, so six-second segments put the viewer ~18 seconds behind live, and ten-second segments ~30 seconds — before any network delay is counted.
The one trick both standards share: the CMAF chunk
Here is the idea that makes both LL-HLS and LL-DASH work, and it is the single most important thing to understand in this article. A segment does not have to be delivered as one finished file. The Common Media Application Format (CMAF), standardized as ISO/IEC 23000-19, defines a smaller unit called a CMAF chunk: one or more frames of video packaged in a self-contained little box (technically a moof header plus an mdat data pair) that can be read and decoded on its own, before the rest of its parent segment even exists.
That changes the timing completely. Instead of the encoder holding a whole six-second segment until the last frame is in, then handing over one big file, the packager emits a stream of tiny chunks — often around 200 milliseconds each — the instant each is encoded. The transport that carries them is HTTP chunked transfer encoding, a standard feature of HTTP that lets a server start sending a response body before it knows the full length and keep appending to it. So the server begins sending the segment's first chunk while it is still producing the segment's last chunk, and the player at the live edge consumes chunks as they arrive instead of waiting for a closed file. The whole-segment wait — the dominant delay in plain HTTP streaming — disappears, while the segments themselves stay a comfortable few seconds long, so compression efficiency and cache behaviour barely change.
Two things follow, and they matter for the rest of the article. First, the CMAF chunk is the shared building block: LL-HLS and LL-DASH are two manifests and two sets of signalling wrapped around the same chunked media. You encode and chunk once, and package the result for both. This is the low-latency cousin of the "encrypt once, package for both" idea in packaging: CMAF, HLS, and DASH. Second, the chunk is exactly where the cost shows up: more, smaller pieces mean more requests and a lower cache-hit ratio, which we return to when we count the bill.
Figure 3. The shared building block. The encoder emits ~200 ms CMAF chunks as it produces them; HTTP chunked transfer ships each chunk before the segment is finished; the same chunks feed both the LL-HLS and the LL-DASH manifest.
LL-HLS: parts, preload hints, and a playlist that waits
Low-Latency HLS, added to HLS in its second edition (the ongoing revision of RFC 8216), exposes those chunks to the player as Partial Segments, usually just called parts. In the spec's own words, partial segments "provide a parallel channel for distributing media at the live edge," where the media "is divided into a larger number of smaller pieces, such as CMAF Chunks," so each part "can be packaged, published, and added to the Media Playlist much earlier than its Parent Segment." A part is tagged EXT-X-PART; the playlist advertises the part length with a PART-TARGET value. The viewer is allowed to play much closer to live: where the full-segment rule held the player three segments back, the low-latency rule, PART-HOLD-BACK, must be "at least twice the Part Target Duration" and "SHOULD be at least three times" — so with 200 ms parts the live edge is now well under a second of held-back media, not eighteen.
Three smaller mechanisms make that fast without melting the server, and you do not need to implement them to brief a vendor — you need to know they exist. Preload hints (EXT-X-PRELOAD-HINT) let the playlist tell the player the URL of the next part before it exists, so the player's request is already in flight when the part lands. Blocking Playlist Reload lets the player ask for a playlist it has not seen yet — it appends _HLS_msn (the media sequence number) and _HLS_part parameters to the request — and the server holds the response open until that newer playlist is ready, instead of making the player poll over and over. A server advertises this with CAN-BLOCK-RELOAD=YES. And Delta Updates and Rendition Reports keep the constant playlist refreshes small and let a player that switches quality keep its low-latency position. For the frame-by-frame mechanics of each tag, the Video Streaming section's LL-HLS deep dive goes deeper than we will here; at the product level the point is simple: parts carry the media, preload hints and blocking reload remove the round trips, and the viewer lands two to five seconds behind live.
LL-DASH: chunked segments and a manifest that declares its target
Low-Latency DASH reaches the same place from the DASH side. MPEG-DASH (ISO/IEC 23009-1) added a low-latency mode that signals to the player that a segment "can be accessed earlier than full availability at the server" — that is, it can be fetched and played as chunked, progressive data rather than as a closed file. The media is the same chunked CMAF; the difference is in the manifest, the DASH MPD (Media Presentation Description). The MPD carries a couple of low-latency signals the player reads: an availabilityTimeOffset that tells the player a segment is available earlier than its nominal time (so it can start fetching mid-segment), and a ServiceDescription element that states the operator's target latency — the live position the player should aim for and hold. Delivery again rides HTTP chunked transfer from the packager to the CDN edge and on to the player.
The practical upshot is that LL-DASH and LL-HLS are far more alike than the protocol names suggest. Both ride ~200 ms CMAF chunks; both use HTTP chunked transfer; both keep ordinary segments a few seconds long so caching and compression stay healthy; both land in the two-to-five-second range. They differ in the manifest and in the small print of how the player chases the live edge — LL-HLS leans on blocking playlist reload and preload hints, LL-DASH on the MPD's availability offset and a target-latency the player servo-controls toward. The honest engineering reality of 2026 is that you usually ship both, because device support is split: Apple's ecosystem (Safari, iOS, tvOS) is the LL-HLS world, while many smart-TV, Android, and web players run DASH — the same client-coverage split covered in the OTT client matrix. The protocol-level comparison lives in the Video Streaming low-latency article; the platform decision — encode and chunk once, package for both — is the one this section owns.
Figure 4. Two manifests, one media. LL-HLS uses parts, preload hints, and blocking playlist reload; LL-DASH uses chunked segments, an availability offset, and a declared target latency. Both ride the same CMAF chunks and both land at ~2–5 seconds.
A worked latency budget: where every second goes
Numbers make this concrete. Below is an illustrative glass-to-glass budget for the same live feed delivered three ways — plain HLS/DASH, low-latency (LL-HLS/LL-DASH), and WebRTC, the real-time protocol used for sub-second interaction (link out below). The capture, encode, package, contribution, CDN, and decode lines barely move between the three; the player buffer is the line that decides the product.
| Latency line item | Plain HLS/DASH | Low-latency (LL-HLS/LL-DASH) | WebRTC |
|---|---|---|---|
| Capture + camera buffer | ~0.1 s | ~0.1 s | ~0.1 s |
| Encoder | ~0.2 s | ~0.2 s | ~0.15 s |
| Packager (segment vs chunk) | ~6 s (waits for whole segment) | ~0.2 s (emits each chunk) | n/a |
| Contribution + CDN | ~0.3 s | ~0.3 s | ~0.2 s |
| Player buffer (the swing term) | ~12–22 s (≈ 3 segments) | ~1–3 s (a few chunks) | ~0.1 s |
| Decode + render | ~0.1 s | ~0.1 s | ~0.1 s |
| Glass-to-glass total | ~20–30 s | ~2–5 s | ~0.3–0.5 s |
Table 1. An illustrative latency budget for one feed delivered three ways. Everything except the player buffer (and, for plain HTTP, the whole-segment packager wait) is roughly constant; the buffer is where the seconds are won or lost. WebRTC wins on latency but does not ride ordinary CDNs the way HLS/DASH do, which is why it is the choice for interaction, not for mass one-to-many scale — see the cross-link below. Figures are illustrative; measure your own pipeline.
Read the table as a set of levers. The packager line collapses from six seconds to a fifth of a second the moment you switch from whole-segment delivery to chunked CMAF — that is the CMAF-chunk win. The player-buffer line falls from three segments to a few chunks the moment the player is allowed to sit at the live edge (PART-HOLD-BACK in LL-HLS, the target latency in LL-DASH) — that is the low-latency-mode win. The other lines are physics and processing you cannot wish away; note that the contribution and CDN lines carry the distance tax from geography, regions, and global delivery, so a viewer on another continent starts the low-latency game with a worse hand.
What low latency costs: the trade-off the specs admit
Now the part the vendor demos skip. Lower latency is bought, and the currency is stability. The HLS specification states the trade-off in plain terms: "A shorter Target Duration reduces latency but also reduces available buffer, handicaps adaption and increases delivery overhead, increasing the likelihood of playback stall. A longer Hold Back can mitigate the playback stall likelihood, but increases the latency." The DASH-IF report says the same thing from the other side: "the further behind live the player chooses to play, the more stable the delivery system is, which leads to antagonistic demands on any production system of low latency and stability." Low latency and reliability pull in opposite directions; you are choosing a point on a line, not flipping a switch.
The cost shows up in three places, and a platform owner should price all three. First, the safety cushion is thinner: a player sitting one second behind live has one second to ride out a bandwidth dip before it stalls, where a player sitting twenty seconds back has twenty. Push the latency too low for the audience's real network and you trade a slow start for a spinner mid-stream — usually the worse experience. Second, the request rate rises and the cache-hit ratio falls: instead of one request per six-second segment, the player now makes many small requests per second for parts or chunks, and tiny, frequently-changing objects are harder for a CDN to cache than big stable ones. Lower cache efficiency means more traffic reaches the origin, which ties straight to the edge-caching and cache-key design work and the origin and origin shielding you need so the origin survives. Third, the CDN and origin bill goes up: more requests and lower offload mean more origin egress and more edge transactions — the line items in CDN cost engineering — and a live premiere makes all of it spike at once, the subject of live event delivery and the premiere spike.
So the question is not "how low can we go" but "how low do we need to go," and the honest answer depends on the product, not the technology. A movie or a back-catalogue SVOD title has no live reference point at all — latency is irrelevant, so spend the budget on quality and stability instead. A news or talk channel is comfortable at five to ten seconds. Live sport, auctions, live shopping, and betting genuinely need the two-to-five-second low-latency range, because a delayed reaction breaks the product. And truly interactive, sub-second use cases — a video call, an auctioneer taking live bids, a watch-party where people talk back — are usually not an LL-HLS/LL-DASH job at all but a WebRTC one, a different architecture that trades CDN-scale for real-time interaction; we cover where that line falls in the Video Streaming low-latency article. Choosing the right tier is a scaling decision as much as a latency one, which is why it sits next to scaling and concurrency.
Figure 5. How low do you need to go? Latency need rises from on-demand video (no target) through news (5–10 s) to live sport, auctions, and betting (2–5 s via LL-HLS/LL-DASH); true interaction (sub-second) is a WebRTC job. Lower latency costs stability and CDN budget — match the tier to the product.
Common mistakes that make low latency backfire
Most low-latency failures are not bugs in the standards; they are a target chosen without a budget.
The headline error is chasing a latency number the audience's network cannot hold — setting a one-second live edge for a mobile audience on variable connections, so the thin buffer empties on the first dip and the viewer trades a tolerable slow start for an intolerable mid-stream freeze. Its cousin is shrinking segments instead of chunking them — dropping to one- or two-second segments to cut latency the crude way, which multiplies requests, wrecks compression, and lowers cache-hit ratio without the clean win that CMAF chunks give. A third is forgetting the origin and cache consequences: turning on low latency without revisiting cache keys, origin shielding, and surge capacity, so the extra request load quietly melts the origin during the first popular live event. A fourth is measuring latency at the office — confirming two seconds on the studio LAN and shipping it, when a real viewer two continents away, carrying the distance tax from geography and global delivery, never sees that number. A fifth is paying for low latency where the product does not need it — engineering a sub-five-second pipeline for a movie catalogue, spending stability and CDN budget to win seconds no viewer will ever notice. And the quiet sixth is shipping only one protocol: building LL-HLS and skipping LL-DASH (or the reverse) and silently leaving half the device matrix on a slow fallback or no stream at all — the coverage gap that the OTT client matrix exists to close.
Where Fora Soft fits in
Low latency is a scale-and-cost decision before it is a protocol one: the real question is not "can we make it real-time" but "how many seconds does this product actually need, what does each second below the comfort line cost in stalls and CDN bills, and which devices must we reach to deliver it." Fora Soft has built video streaming, OTT/Internet TV, live event, WebRTC, and interactive video software since 2005, across 625+ shipped projects for 400+ clients, and that work runs straight through this layer: setting a latency target from the product rather than the hype, chunking once and packaging for both LL-HLS and LL-DASH so the whole device matrix is covered, tuning the player's live-edge hold-back against the audience's real networks, and — because low latency raises request load — pairing it with the cache, origin-shield, and surge-capacity work that keeps the platform stable when a premiere spikes. When a media company needs live that is genuinely a few seconds behind reality on every screen and still affordable at scale, that budget-first engineering is the capability we bring.
What to read next
- Packaging: CMAF, HLS, and DASH from one mezzanine
- Live event delivery and the premiere spike
- Quality of experience: startup time and rebuffering
Call to action
- Talk to a streaming engineer — book a 30-minute scoping call to talk through your low latency hls plan.
- See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
- Download the Low-Latency Readiness & Latency-Budget Worksheet — A one-page worksheet to size a low-latency live delivery target before turning it on: pick a latency tier from the product (VOD none / news 5–10 s / sport-betting 2–5 s / interactive sub-second WebRTC), walk the glass-to-glass budget….
References
- HTTP Live Streaming 2nd Edition (draft-pantos-hls-rfc8216bis) — IETF (R. Pantos, Ed.), 2026 draft revising RFC 8216. Defines Partial Segments ("such as CMAF Chunks"),
EXT-X-PART/PART-TARGET,EXT-X-SERVER-CONTROLwithPART-HOLD-BACK(≥ 2×, SHOULD ≥ 3× Part Target Duration) andCAN-BLOCK-RELOAD, Preload Hints, Blocking Playlist Reload, and the explicit latency-vs-stability statement quoted in the body. Tier 1 (format specification). https://datatracker.ietf.org/doc/draft-pantos-hls-rfc8216bis/ (accessed 2026-06-16) - RFC 8216 — HTTP Live Streaming (HLS) — IETF (R. Pantos, Ed.), August 2017. The original HLS manifest-plus-segments model and the live-playlist reload behaviour; the
HOLD-BACK ≥ 3 × Target Durationlive-edge rule that makes plain HLS ~3 segments behind live. Tier 1 (format specification). https://www.rfc-editor.org/rfc/rfc8216 (accessed 2026-06-16) - ISO/IEC 23000-19 — Common Media Application Format (CMAF) — ISO/IEC. Defines the CMAF chunk: one or more frames in a
moof+mdatpair, independently decodable, deliverable before its parent segment is complete — the shared building block of both LL-HLS parts and LL-DASH chunked segments. Tier 1 (format specification). https://www.iso.org/standard/85623.html (accessed 2026-06-16) — confirm current edition before publish. - ISO/IEC 23009-1 — Dynamic Adaptive Streaming over HTTP (MPEG-DASH) — ISO/IEC, 2022 edition. The MPD model and the low-latency mode that signals a segment "can be accessed earlier than full availability" (chunked delivery), plus
availabilityTimeOffsetand theServiceDescriptiontarget-latency signal. Tier 1 (format specification). https://www.iso.org/standard/83314.html (accessed 2026-06-16) - DASH-IF / DVB Report on Low-Latency Live Service with DASH — DASH Industry Forum, 2017. The latency-budget decomposition (encoder, upload, edge retrieval, player retrieval, hold-back, buffer), the benchmark "player buffer = 3× target segment duration," the "antagonistic demands of low latency and stability" framing, and the legacy-HLS 30-second analysis. Tier 2 (issuing-body guidance). https://dashif.org/docs/Report%20on%20Low%20Latency%20DASH.pdf (accessed 2026-06-16)
- DASH-IF Low-Latency Modes for DASH (Implementation Guidelines / Community Review) — DASH Industry Forum. The interoperability guidance pairing CMAF chunked encoding with HTTP chunked transfer "from ingest into the packager up to the CDN edge," and the encoder chunk-duration configuration. Tier 2 (issuing-body interop guidance). https://dashif.org/docs/CR-Low-Latency-Live-r8.pdf (accessed 2026-06-16) — confirm latest revision before publish.
- Discover and Reduce Latency with HLS — Blocking Preload Hints & Blocking Playlist Reload (WWDC20) — Apple, 2020. First-party walkthrough of how preload hints and blocking playlist reload remove round trips at the live edge so a viewer starts 2–5 seconds behind live. Tier 3 (first-party engineering). https://developer.apple.com/videos/play/wwdc2020/10229/ (accessed 2026-06-16)
- Low-Latency Live Streaming Developer's Guide: LL-HLS, WebRTC, and CMAF — Mux, 2026. Orientation source for the typical latency tiers (plain ~20–30 s, low-latency ~2–5 s, WebRTC ~0.2–0.5 s) and the player-buffer-as-swing-term framing; used for orientation, not as a spec citation. Tier 6 (educational). https://www.mux.com/articles/low-latency-live-streaming-developers-guide-ll-hls-webrtc-cmaf (accessed 2026-06-16)
- W3C Media Source Extensions™ (MSE) — W3C Recommendation. The browser API by which a web player feeds buffered media segments/chunks to the video element — the mechanism under the player-buffer line in the budget. Tier 1 (W3C Recommendation). https://www.w3.org/TR/media-source-2/ (accessed 2026-06-16)
Source note (per §4.3.2): the protocol mechanics trace to tier-1 standards — LL-HLS parts, preload hints, blocking reload, and the latency-vs-stability statement to HLS 2nd Edition (ref 1) and the base model to RFC 8216 (ref 2); the chunk to CMAF ISO/IEC 23000-19 (ref 3); the DASH low-latency mode and MPD signals to ISO/IEC 23009-1 (ref 4). The budget decomposition and the low-latency/stability trade-off are cited to the DASH-IF report (ref 5) and the DASH-IF interop guidance (ref 6). Apple WWDC (ref 7) is first-party engineering. The latency-tier numbers (ref 8) are vendor/educational orientation, labelled as such; where popular "just shrink the segments" advice conflicts with the spec, the article follows the spec (chunk, do not shrink) and says why.


