Why this matters
If you are a founder, product manager, or first-time streaming CTO, the CDN is where your platform's scale and your platform's margin are decided at the same time, and it is easy to treat it as a black box you simply switch on. That is the expensive mistake. A CDN that offloads 99% of your bytes and one that offloads 90% look identical to a viewer, but the second one sends ten times as much traffic back to your origin and can quietly double the bill that grows every month your audience does. This article gives you the mental model to tell the difference: what origin and edge actually are, how a request gets routed to a cache near the viewer, why video segments cache far better than a typical web page, and how to read a delivery setup and spot the leaks. By the end you will be able to ask the three questions that decide delivery cost — what is our offload ratio, how long do we cache each kind of file, and is anything in our setup forcing the cache to keep separate copies it should be sharing.
Two problems a CDN solves at once: distance and repetition
Start with the problem, because the CDN is the answer to two problems wearing one coat.
The first problem is distance. Your video files live somewhere — a server or storage bucket in, say, Northern Virginia. A viewer in Tokyo who asks that server directly has to send every request and receive every byte across the Pacific. The round trip for a single request over that distance is roughly 100 to 150 milliseconds, and streaming a video is not one request — it is a steady stream of requests for one small chunk of video after another, each one waiting on the last. Distance turns into delay, delay turns into the spinning loader, and the viewer leaves. The fix is to keep a copy of the video on a server in Tokyo, so the Tokyo viewer's request travels a few miles instead of a few thousand, and the round trip drops from ~120 milliseconds to single digits.
The second problem is repetition at scale. A popular title is not watched once; it is watched by a hundred thousand people, often around the same time. If every one of those viewers pulls every second of video from your one origin server, that server has to send the same bytes a hundred thousand times, and you pay for every copy. The fix is the same copy in Tokyo: once it is there, every Tokyo viewer is served from it, and your origin sends those bytes once to fill the local copy, not once per viewer.
A CDN is a global network of servers — the company's own machines sitting in data centers and internet exchange points around the world — that exists to solve both problems together. It holds copies of your content close to viewers (fixing distance) and serves the many requests for popular content from those local copies (fixing repetition), so your origin is touched as rarely as possible. The phrase to hold onto is close and shared: content kept close to the viewer, and one local copy shared by many viewers.
Figure 1. The request journey. DNS and Anycast steer each viewer to the nearest edge. A cache hit is served on the spot; only a miss travels inward — through a shield — to the origin, and only once.
Origin and edge: the two ends of the delivery path
Two words do most of the work in this article, so define them precisely.
The origin is the one authoritative source of your video — the server or cloud storage that holds the real, master copy of every file. There is conceptually one origin (it may be replicated for safety, but it is the single source of truth). It is where a file exists before any viewer has ever requested it, and it is the only place a CDN can get a file it does not already have.
The edge is the set of CDN servers spread around the world that hold temporary copies and do the actual talking to viewers. Each cluster of edge servers in one location is called a Point of Presence (PoP) — literally, the CDN's presence at that point on the map. A large CDN has hundreds of PoPs across dozens of countries. When we say a viewer is "served from the edge," we mean the nearest PoP handed them the bytes without bothering the origin.
Between those two ends sits the whole game. Every byte a viewer receives took one of two paths: it came from the edge (cheap, fast, close) or it came all the way from the origin (expensive, slow, far). The job of CDN engineering is to make the first path the overwhelming default and the second path a rare exception. Everything below is about how that happens and how you measure whether it is happening.
How a request finds the nearest edge
A viewer's player does not know where the PoPs are. So how does its request land on the closest one? Two mechanisms work together, and you do not need to operate them, only to recognize them.
The first is DNS-based routing. When a player wants a video file, it first looks up the address for your streaming hostname (something like video.yourservice.com). The CDN controls the answer to that lookup, and it answers with the address of an edge near the viewer rather than the address of your origin. A common form is GeoDNS, where the CDN reads the rough geographic location implied by the request and returns a PoP in that region. This is the CDN inserting itself into the very first step, so the player connects to the edge from the start.
The second is Anycast routing. Many CDNs give a whole group of edge servers the same network address and let the internet's own routing deliver the request to whichever one is closest in network terms — fewest hops, lowest latency. The viewer's request is addressed to one IP, and the network hands it to the nearest machine answering on that IP. DNS picks the right region; Anycast picks the right machine inside it. Together they mean the viewer's player reliably ends up talking to a nearby edge without anyone configuring the viewer's device.
The practical takeaway for a platform owner: you point your streaming hostname at the CDN, and the CDN handles getting each viewer to a good edge. Your job is not the routing — it is making sure that when the request arrives, the edge already has the file. That is caching, and it is where the money is.
Why video caches so well: small, identical, unchanging segments
Here is the fact that makes streaming economics work, and it traces straight back to how modern video is packaged.
Adaptive streaming formats — HTTP Live Streaming (HLS, defined in IETF RFC 8216) and MPEG-DASH (ISO/IEC 23009-1) — do not deliver a video as one giant file. They chop each quality version of a title into a long row of small segments, each holding a few seconds of video, and they publish a small text index called a manifest that lists those segments in order. The player reads the manifest, then fetches segment after segment over ordinary HTTP, the same protocol that delivers web pages. We cover how those files are built in packaging: CMAF, HLS, and DASH from one mezzanine, and the protocol mechanics themselves — how HLS and DASH manifests and segments are structured — in our Video Streaming section's HLS vs DASH explainer. Here, what matters is what segments do to a cache.
Three properties make a segment close to the perfect thing to cache. First, a segment is small — a few seconds, not a few hours — so an edge can hold many of them and serve each one fast. Second, a segment is immutable: once written, the bytes of segment number 47 of a given rendition never change. A cache loves content that never changes, because it can keep it and reuse it without ever asking "is this still current?" Third, and most important, the segment has the same URL for every viewer. When ten thousand people watch the same film at the same quality, all ten thousand players request the exact same seg_1080p_00047.m4s from the exact same address. The edge fetches it from origin once, stores it, and serves the other 9,999 requests from that one stored copy.
This is the deep reason a CDN can offload almost all of a popular title's traffic. A web page personalized for each user is hard to cache because every copy is different. A video segment is the opposite — it is the same bytes for everyone, so a single cached copy does enormous work. It also explains why "package once" matters upstream: if your packaging emits one shared set of segments (the modern CMAF approach) rather than separate files per device family, every viewer of a rendition converges on the same URL and the cache holds one copy instead of two. Packaging decisions made weeks earlier show up here as your cache-hit ratio.
Figure 2. The first request for a segment misses and fetches once from origin; the edge stores the copy, and every later request for that identical URL is a cache hit. One copy serves the crowd.
Cache hit, cache miss, and the number that decides your margin
Now the vocabulary that runs every delivery conversation.
When a viewer's request arrives at an edge and the file is already there, that is a cache hit — served immediately, locally, free of any origin traffic. When the file is not there, that is a cache miss, and the edge has to fetch it from the origin (or from an inner cache layer) before it can answer. Hits are what you want; misses cost you latency and origin traffic.
The headline metric is the cache-hit ratio (CHR) — the share of requests served as hits. If 95 out of 100 requests are hits, your CHR is 95%. It sounds simple, and it hides a trap that is worth understanding because it is the difference between thinking your delivery is healthy and knowing it is.
The trap is that cache-hit ratio counts requests, and not all requests are the same size. A request for a tiny manifest and a request for a large high-bitrate segment both count as one request, but they move wildly different numbers of bytes, and bytes are what you pay for. For video, the metric that actually tracks cost is the offload ratio (also called origin offload), which Fastly defines precisely as "the ratio of bytes served to end users that were cached inside the CDN (not fetched from the origin), over total bytes served to end users." An offload of 100% means every byte came from the CDN and your origin sent nothing. Cache-hit ratio counts how often you hit; offload ratio counts how much of your data volume you kept off the origin. For a streaming platform, the second is the one tied to the bill.
And here is the insight that surprises almost everyone, because the arithmetic is not linear. Moving your cache-hit ratio from 90% to 95% does not shave off "5%." It cuts your miss rate from 10% to 5% — it halves the traffic hitting your origin. Going from 95% to 99% cuts the miss rate from 5% to 1%, cutting origin load by another factor of five. The last few points of offload are worth far more than the first ninety, which is exactly why streaming engineers obsess over them and why a 1% improvement in cache-hit ratio can save a service at Netflix's scale petabytes of origin traffic a year.
The arithmetic, out loud
Numbers make this concrete, so walk the math the way a viewer's bytes actually flow.
Take a mid-size service: 100,000 active viewers, each watching about 10 hours a month, at an average delivered bitrate of 5 megabits per second (a reasonable blend across phones, web, and TVs on an adaptive ladder). First, the bytes delivered to one viewer in a month:
per viewer = 5 Mbit/s × 10 h × 3,600 s/h ÷ 8 bits/byte
= 5 × 36,000 ÷ 8
= 22,500 MB
= 22.5 GB per viewer per month
Across the whole audience:
total delivered = 22.5 GB × 100,000 viewers
= 2,250,000 GB
≈ 2.25 PB (petabytes) per month
That 2.25 PB is what the edge sends to viewers, and the cost of those bytes is your CDN bill — the recurring egress charge we break down fully in CDN cost engineering: egress, commits, and the 95th percentile. But notice what the offload ratio controls: the slice of that 2.25 PB your origin has to send to keep the edges fed.
origin egress = total delivered × (1 − offload ratio)
at 90% offload: 2.25 PB × 0.10 = 225,000 GB from origin
at 95% offload: 2.25 PB × 0.05 = 112,500 GB from origin
Five points of offload — 90% to 95% — removed 112,500 GB of monthly origin traffic, exactly halving it. If your origin is a cloud bucket or server that charges on the order of $0.08 per GB to send bytes out to the CDN, that slice alone is:
saving = 112,500 GB × $0.08/GB ≈ $9,000 per month ≈ $108,000 per year
…and that is before counting the origin compute and bandwidth capacity you no longer have to provision for the doubled load. The point is not the exact dollar figure, which depends on your origin and your CDN deal; the point is that a number most teams never look at — the last few points of offload — is worth six figures a year on a mid-size service, and more as you grow. Offload is the lever; everything else in this article is about how to pull it.
Figure 4. Origin egress for the worked 2.25 PB example as the offload ratio climbs. Each step up the curve removes a larger share of origin traffic — the 90%→95% step alone halves it.
Time-to-live: how long the edge keeps a copy
A cache cannot keep every copy forever, and for some files it must not. The rule that governs how long an edge holds a file before checking back is its time-to-live (TTL), and it is set by the same HTTP caching machinery that runs the whole web — standardized in IETF RFC 9111.
The mechanism is a small instruction your origin attaches to each file, the Cache-Control header. In RFC 9111's own terms, a CDN edge is a shared cache — "a cache that stores responses for reuse by more than one user" — and the directive that tells a shared cache how long a file stays fresh is s-maxage (RFC 9111 §5.2.2.10), with the more familiar max-age as the fallback. When the freshness time is up, the cached copy is "stale" and the edge revalidates it with the origin before serving it again. So your origin is not at the CDN's mercy; it dictates caching behavior file by file through these headers.
Video splits cleanly into two kinds of file with opposite needs, and getting the split right is one of the highest-impact things you can do.
Segments should be cached for a long time. A media segment is immutable — segment 47 is always segment 47 — so you can safely tell the edge to hold it for days or longer. Long segment TTLs are what drive your offload ratio up, because a segment fetched once keeps serving viewers for as long as the title is popular.
Manifests depend on whether the stream is live or on-demand. For video-on-demand, the manifest is fixed once published, so it too can be cached for a long time. For live, the manifest is rewritten every few seconds as new segments are produced, so caching it too long is actively harmful: an edge serving a stale live manifest hands players a list that is missing the newest segments, and playback stalls. The rule of thumb is to cache a live manifest for no more than about half its target segment duration — a six-second-segment live stream wants its manifest cached for roughly two to three seconds, fresh enough to always list the latest segment.
This is why "cache everything aggressively" is wrong for live and "cache nothing to be safe" is wrong for everything: the segment and the live manifest need opposite policies. The mechanism that decides which cached copy answers a request — the cache key, and the signed or tokenized URLs that gate access — is its own topic, covered in edge caching, cache keys, and tokenized URLs.
Figure 3. Segments are immutable and cache for a long time; a VOD manifest is fixed and caches long; a live manifest changes every few seconds and must cache only briefly. Same platform, opposite policies.
When the edge misses: shields, tiers, and the premiere stampede
A cache miss has to be filled from somewhere, and the naive answer — every edge in the world goes straight to your origin on a miss — is exactly how a live premiere takes your platform down.
Picture a new episode dropping at 8 p.m. to a global audience. In the first second, edges in dozens of regions all discover they do not yet have segment 1, and all of them turn around and ask the origin for it at once. That synchronized flood is the thundering herd, and an unprotected origin can buckle under it precisely when the most people are watching.
The defense is a layer between the edges and the origin called an origin shield (or mid-tier cache, or tiered caching). Instead of every edge talking to the origin, all edges in a region funnel their misses through one designated shield cache, and the shield talks to the origin. The shield adds one trick that changes everything: request collapsing. When a hundred edges ask the shield for segment 1 in the same instant and the shield does not have it yet, the shield sends one request to the origin, waits, and answers all hundred edges from the single response. The origin sees one fetch where it would otherwise have seen a hundred — reported to cut origin requests by 90% or more during a stampede. The shield also raises your offload ratio in normal operation, because two edges that each miss can both be served by the shield without two origin trips, which is part of why a simple edge-only cache-hit number can understate how much work the CDN is really doing.
This is the architecture that lets a platform survive a premiere it cannot re-run, and it has its own depth: see origin and origin shielding for the design, and live event delivery and the premiere spike for the operational playbook. When one CDN is not enough — for resilience or to negotiate egress — you run several behind a selection layer, the multi-CDN architecture of the next article, whose protocol-level switching internals live in our Video Streaming multi-CDN deep dive.
There is a still-closer tier worth knowing about because it shows where this logic ends. The largest streamers push caches all the way inside internet service providers: Netflix's Open Connect program places its own cache appliances in ISP networks, filled with content during off-peak hours, so a title crosses into the ISP once and is then served to every subscriber on that network from inside it. It is the same principle as a regional PoP — closer and shared — taken to its physical limit. Most platforms will never build this; you rent a commercial CDN that has done the equivalent. But it is the clearest illustration of the whole article's idea: the further you push the shared copy toward the viewer, the less the origin ever has to do.
Reading a CDN setup: a feature checklist
When you evaluate a CDN or read your own configuration, these are the capabilities that decide delivery cost and resilience for video. The "Matters for video because…" column is the translation from feature to consequence.
| Capability | What it is | Matters for video because… |
|---|---|---|
| Global PoP footprint | Edge locations in the regions your viewers are | Distance is latency; a viewer far from any PoP buffers regardless of your offload |
| Origin shield / tiered cache | A mid-tier cache that collapses misses | Protects the origin from the premiere stampede and lifts offload |
| Long segment TTLs honored | Edge keeps immutable segments for days | The main driver of a high offload ratio |
| Per-file cache control | Different TTLs for segments vs live manifests | Live needs short manifest TTL; segments need long — one policy can't fit both |
| Byte/offload reporting | Visibility into bytes offloaded, not just request CHR | Request CHR hides the cost picture for large files |
| Tokenized URL support | Signed URLs the cache still keys correctly | Access control without destroying cacheability |
| Predictable egress pricing | A pricing model you can model and commit to | Egress is the dominant recurring cost; surprises break margin |
Table 1. A capability is only useful if it is switched on and configured for video. The most common failure is not a missing feature — it is a present feature left at a web-page default that quietly suppresses the offload ratio.
A common mistake: the cache that is barely caching
The most expensive delivery error is invisible from the viewer's seat: a platform whose offload ratio sits at 70% when it should be 97%, paying three or four times the origin traffic it needs to, forever, while every video plays fine.
The usual cause is a cache key polluted by per-viewer noise. If your segment URLs carry a query string that differs by viewer — a session token, an analytics ID, a cache-buster appended by a player — and the CDN includes that query string in the cache key, then the edge treats seg_00047.m4s?user=alice and seg_00047.m4s?user=bob as two different files and stores a separate copy for each viewer. The one property that made video cache beautifully — the same URL for everyone — is destroyed, and your offload collapses toward zero on exactly your most-watched content. The fix is to strip viewer-specific query parameters from the cache key so every viewer of a rendition converges on one cached object; the deeper treatment is in edge caching, cache keys, and tokenized URLs.
Three siblings travel with it. The first is TTLs left at web defaults — short freshness windows meant for changing HTML applied to immutable segments, so the edge needlessly rechecks the origin and offload sags; set long s-maxage on segments deliberately. The second is caching the live manifest as long as a segment, which stalls live playback by serving a stale segment list; cache the live manifest for seconds, not hours. The third is a single origin with no shield, which works in testing and falls over at the first real premiere; put a shield in front before you need it, not after. None of these throws an error. All of them show up only in the bill and in the outage report — which is why the offload ratio deserves a dashboard of its own.
Where Fora Soft fits in
Delivery is where a streaming platform's recurring cost is won or lost, and the difference between an offload ratio of 90% and 98% is the difference between a CDN bill that grows with your audience and one that grows twice as fast — long before any viewer notices a thing. Fora Soft has built video streaming, OTT/Internet TV, e-learning, telemedicine, and video surveillance software since 2005, across 625+ shipped projects for 400+ clients, and that work centers on exactly this kind of scale-and-cost engineering: designing origin and cache architectures so segments cache long, manifests cache correctly for live and VOD, the cache key stays clean, and a shield stands between a premiere and your origin. When a media company needs a delivery setup whose economics survive a real, growing audience across regions, that origin-and-cache engineering is the capability we bring.
What to read next
- CDN cost engineering: egress, commits, and the 95th percentile
- Multi-CDN architecture and orchestration
- Origin and origin shielding
Call to action
- Talk to a streaming engineer — book a 30-minute scoping call to talk through your how does a cdn work for video plan.
- See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
- Download the CDN Delivery Readiness Checklist — A one-page checklist to audit your video delivery before you sign a CDN contract: confirm a global PoP footprint for your regions, set long segment TTLs and short live-manifest TTLs, keep the cache key free of per-viewer query strings,….
References
- RFC 9111 — HTTP Caching — IETF. Defines the shared-cache model a CDN edge operates under (§1.2 terminology: a "shared cache" stores responses for reuse by more than one user), freshness-lifetime calculation (§4.2.1), and the
s-maxage(§5.2.2.10) andmax-age(§5.2.2.1) response directives that set how long an edge keeps a file. Tier 1 (official standard). https://www.rfc-editor.org/rfc/rfc9111.html (accessed 2026-06-16) - RFC 8216 — HTTP Live Streaming (HLS) — IETF. The manifest (multivariant/media playlist) that lists media segment URIs fetched over HTTP — the segmentation that makes video cacheable. Tier 1. https://www.rfc-editor.org/rfc/rfc8216 (accessed 2026-06-16)
- ISO/IEC 23009-1 — Dynamic Adaptive Streaming over HTTP (MPEG-DASH) — ISO/IEC. The MPD that indexes the same HTTP-delivered segments for non-Apple devices; the second adaptive format an edge caches identically across viewers. Tier 1. https://www.iso.org/standard/83314.html (accessed 2026-06-16)
- Origin Offload: A measure of CDN efficiency for reducing egress cost — Fastly (Monique Barbanson et al., 2024). The precise definition of origin offload ("the ratio of bytes served to end users that were cached inside the CDN … over total bytes served"); the 90%→95% cache-hit-ratio change halving origin load; why request-based CHR understates byte savings and shielding effects. Tier 4 (vendor engineering). https://www.fastly.com/blog/origin-offload-a-measure-of-cdn-efficiency-for-reducing-egress-cost (accessed 2026-06-16)
- Cache Hit Ratio: The Key Metric for Happier Users and Lower Expenses — Akamai. Media-streaming cache-hit-ratio benchmarks and the petabyte-scale value of single-percentage-point gains for large catalogs. Tier 4 (vendor engineering). https://www.akamai.com/blog/edge/the-key-metric-for-happier-users (accessed 2026-06-16)
- Using CloudFront Origin Shield to protect your origin — Amazon Web Services. How a centralized mid-tier cache increases cache-hit ratio and collapses simultaneous requests for the same object across regions; origin→CloudFront transfer is free. Tier 4 (vendor engineering). https://aws.amazon.com/blogs/networking-and-content-delivery/using-cloudfront-origin-shield-to-protect-your-origin-in-a-multi-cdn-deployment/ (accessed 2026-06-16)
- Amazon CloudFront Pricing — Amazon Web Services. Tiered data-transfer-out model (US/EU from ~$0.085/GB for the first 10 TB, declining with volume; higher in APAC and South America; first 1 TB/month free). Tier 4 (vendor pricing — dated, re-verify). https://aws.amazon.com/cloudfront/pricing/ (accessed 2026-06-16)
- Origin Offload (definition and limits) — ioRiver. Vendor-neutral explanation that a well-configured CDN offloads 90–99% of traffic for popular VOD catalogs, and that byte-based offload is the right cost lens for large files. Tier 5 (institutional). https://www.ioriver.io/terms/cdn-offload (accessed 2026-06-16)
- Netflix Open Connect — Overview — Netflix. How ISP-embedded cache appliances are filled off-peak and serve every subscriber on a network from inside it — the "closer and shared" principle at its physical limit. Tier 3 (first-party). https://openconnect.netflix.com/Open-Connect-Overview.pdf (accessed 2026-06-16)
- Streaming configuration guidelines — Fastly Documentation. Practice for caching segments with long TTLs while keeping live manifests fresh (cache a live playlist for a fraction of its target duration); cache-key handling for segmented media. Tier 4 (vendor engineering). https://www.fastly.com/documentation/guides/full-site-delivery/video/streaming-configuration-guidelines/ (accessed 2026-06-16)
- MPEG-DASH & HLS Segment Length — Bitmovin. The relationship between segment duration, request count, and cache efficiency that underpins the segment/manifest TTL split. Tier 4 (vendor engineering). https://bitmovin.com/blog/mpeg-dash-hls-segment-length/ (accessed 2026-06-16)
Source note (per §4.3.2): the caching mechanism — the shared-cache model, freshness lifetime, and the s-maxage/max-age directives that govern an edge — traces to the tier-1 HTTP caching standard RFC 9111 (ref 1), and the segmentation that makes video cacheable traces to the tier-1 HLS and DASH standards (refs 2–3). The offload-ratio definition, the 90%→95% halving, cache-hit benchmarks, origin-shield request collapsing, and CDN pricing are first-party engineering and vendor sources (refs 4–8, 10–11), labelled in-text and dated; the Netflix Open Connect example is first-party (ref 9). No lower-tier source overrode a standard; where popular articles conflate cache-hit ratio with cost savings, the article follows the byte-based offload framing the vendor engineering sources establish.


