Why This Matters

If a streaming product spends real money, half of it is the CDN, and if a streaming product fails in production, the CDN is almost always either the first or the second thing in the incident timeline. Despite that, most teams treat the CDN as a black box bought from a sales engineer and run by someone else, which is what makes a CDN bill triple in a quarter and what makes an architect unable to answer the question "why did the stream rebuffer in Brazil last Tuesday at 21:00 local". This article gives a non-technical reader enough mechanics to plan and supervise the work, and gives an engineering reader the exact layer-by-layer model — origin, shield, mid-tier, edge, last mile — that the rest of Block 6 builds on. By the end you should be able to draw the diagram from memory, name the metric to track at each layer, and explain to a CFO why an origin shield reduces a $40,000-a-month bill to $4,000.

What a CDN Is, in One Paragraph

A Content Delivery Network is a globally distributed fleet of caching servers that copies your content closer to your viewers and serves it from those copies rather than from your origin. The CDN does five jobs at once: it shortens the network distance between content and viewer; it absorbs request load that your origin could never handle alone; it adds resilience by serving from many machines instead of one; it enforces edge-level policy like signed URLs and geo-blocking; and it gives the operator a single billing surface for what would otherwise be hundreds of co-location contracts. The whole construction is essentially one trick — cache things in many places — applied at scale, with enough surrounding machinery to keep the caches fresh, accurate, and accountable.

For a non-technical reader the cleanest analogy is a chain of corner stores. The origin is a single warehouse on the other side of the city. The edge is a corner store in every neighbourhood. When a customer wants a bottle of water, the corner store hands them one from its own shelf — fast, cheap, no warehouse trip. Behind the corner store sits a regional distribution centre that restocks dozens of corner stores from one warehouse run, so the warehouse itself sees a handful of bulk orders rather than thousands of individual ones. That distribution centre is the origin shield. Replace bottles of water with HLS segments and you have a streaming CDN.

The Streaming-Specific Cache Hierarchy

Most engineers first meet a CDN as a static-site accelerator: one origin, one cache layer at the edge, done. A streaming CDN is built differently. It has more layers, the layers are tuned for video traffic, and the metric the operator watches at each layer is the cache hit ratio — the percentage of requests served from the cache without going further up the hierarchy. A two-layer CDN with an edge hit ratio of 85% sends 15% of requests to the origin; a four-layer streaming CDN with the same edge hit ratio absorbs almost all of that 15% in the intermediate layers and lets perhaps 0.5% reach the origin. The arithmetic is what makes a streaming product affordable.

Read top to bottom, the canonical chain has five hops. The player is on the viewer's device. The edge POP — a Point of Presence — is the nearest CDN server to that viewer, usually in the same metropolitan area; an Akamai-scale operator runs more than 4,200 such locations as of Q2 2026, while Cloudflare's published 2026 count is 330+ cities. The mid-tier cache, or regional cache, sits behind a group of edges and holds the long tail of less-popular content the individual edges could not justify storing. The origin shield is a single regional cache layer — one chosen Region inside the CDN provider's network — through which every request from every edge that misses the regional layer must funnel before it is allowed to reach the origin. The origin is the operator's own packager or storage system, the only place in the world where the canonical copy of every segment lives.

End-to-end streaming CDN cache hierarchy from origin through shield, mid-tier, and edge POP to the viewer's player, with the typical request reduction at each layer. Figure 1. A streaming CDN's five-layer cache hierarchy. Each layer absorbs requests so that the origin sees a tiny fraction of what the viewers send. Numbers shown are illustrative production values for a popular live event.

The shape of that hierarchy is what separates a streaming CDN from a static-site CDN. A blog post can survive a single edge cache; a 100,000-viewer live concert cannot. The reason is the time pattern of streaming requests: every viewer is asking for the same handful of segments at the same time, every 2 to 6 seconds, for hours on end. The hierarchy turns "100,000 viewers ask for segment 4172" into "1 origin request, 4 shield requests, 25 mid-tier requests, hundreds of thousands of edge-served responses". Without the hierarchy the origin would be asked 100,000 times for the same file. With the hierarchy it is asked once.

The Five Jobs of the CDN, in Plain Numbers

Each job the CDN does maps to a number on the operator's dashboard.

The first job is shortening the network distance. The shorter the path between a viewer and the bytes they request, the faster the first byte arrives and the higher the bitrate the player will dare to ask for. Time to First Byte — abbreviated TTFB — drops from 300 to 800 milliseconds for an origin in another continent to under 50 milliseconds for a hit at a nearby edge. That single change is the largest contributor to a fast startup time.

The second job is absorbing request load. A live event at a one-segment-every-two-seconds cadence and 100,000 concurrent viewers asks for 50,000 segment fetches per second. A single origin web server handles roughly 1,000 to 5,000 requests per second depending on its hardware and the work each request does. The CDN must therefore absorb 95 to 99% of the load before it touches the origin. A four-layer hierarchy with an end-to-end cache hit ratio of 99.9% sends 50 requests per second to the origin — comfortable for a single server.

The third job is resilience. A single-origin streaming service is one fibre cut from being offline. A CDN with 4,200 edge locations is not. Even when the origin itself is unreachable, edges with valid cached content keep serving — many CDNs offer a "stale-while-revalidate" or "stale-if-error" policy that lets edges return slightly outdated content for a short window rather than failing.

The fourth job is edge-level policy enforcement. Signed URLs, token authentication, geo-restriction, viewer-IP allowlists, and rate limiting all happen at the edge — close to the viewer, before the request travels into the operator's network. The math: a geo-restriction rule that runs at the origin costs you the full origin round-trip even for blocked requests; a geo-restriction rule that runs at the edge blocks the request in microseconds without crossing an ocean.

The fifth job is billing consolidation. Without a CDN, a global streaming product would need transit contracts with dozens of ISPs in dozens of countries, peering agreements with the eyeball networks, physical co-location in every region, and a finance team large enough to argue with all of them. With a CDN, one bill replaces all of that. The trade-off is per-gigabyte pricing that includes the CDN operator's margin — typically $0.002 to $0.085 per gigabyte in 2026 — but the operational simplification is usually worth several engineers a year.

Cache Hit Ratio Is the Metric

A streaming engineer who tracks one CDN number tracks cache hit ratio. The formula is straightforward:

Cache Hit Ratio (CHR) = cache_hits / (cache_hits + cache_misses)

A cache hit is a request the cache served from its local storage. A cache miss is a request the cache forwarded upstream because the content was not present or had expired. The ratio is computed per layer — edge CHR, mid-tier CHR, shield CHR — and the operator's dashboard plots each one separately.

The benchmarks come straight from the industry. A healthy streaming CDN sits between 95% and 99% on its edge tier. A cache hit ratio consistently below 80% is a sign that something is misconfigured — almost always a cache key that includes a per-user query parameter or session token, fragmenting what should be a shared cache into millions of single-viewer copies. Netflix has engineered its Open Connect appliances to a 99%+ cache hit ratio for popular titles; the public engineering result they published a few years ago measured a 20% improvement in video start times and near-zero mid-stream rebuffering events as a direct consequence of the hit-ratio work.

Show the math on a single live event. A 100,000-viewer broadcast issues 50,000 segment fetches per second. At a 95% edge hit ratio, 2,500 requests per second escape the edge tier to mid-tier. At a mid-tier hit ratio of 92% on those escaped requests, 200 per second escape to the shield. At a shield hit ratio of 95% on those, 10 per second reach the origin. End to end the origin sees 10 requests per second out of 50,000 — a 5,000× reduction. Lose the shield and the origin sees 200 per second — twentyfold more, often the difference between an idle origin and a melted one.

The four common causes of a low hit ratio, with what to do about each one. Cache keys include user-specific data. Strip session tokens and auth headers out of the cache key; verify them at the edge with a separate signed-URL check rather than including the signature in the key. Per-viewer manifest customisation. SSAI personalises the manifest per viewer and explodes the cache key space; the fix is server-guided ad insertion (SGAI), which keeps the segments cacheable and personalises only the manifest. Short TTLs on VOD. A Cache-Control: max-age=60 on a movie that has not changed in three years is wasteful; set days or weeks. Frequent invalidations. Use versioned filenames (segment-00042-v2.m4s) rather than purging by URL when you re-encode; the new version populates as it is requested.

How a single popular segment travels through the cache hierarchy as a thundering herd is collapsed by each layer, with arithmetic of request counts at each tier. Figure 2. Request collapsing at each layer. A thundering herd of 100,000 concurrent viewers becomes 10 origin requests after the cache hierarchy and request-collapsing logic do their work.

How Live Differs from VOD at the CDN

Video-on-demand and live streaming look identical to the player — both fetch HLS or DASH segments over HTTPS — but the CDN treats them differently. A VOD title is written once and read for years; segments rarely change; cache TTLs run for days or weeks; cache hit ratios at the edge routinely reach 98–99% on popular titles. A live event is written second by second; segments exist for a few minutes before they slide out of the DVR window; cache TTLs run as short as 2 to 6 seconds matching the segment duration. The edge cache is being repopulated constantly; the question is not whether a particular segment is cached, but whether it is cached by the time the first viewer asks for it.

Because of that timing, live streaming uses two extra techniques that VOD rarely needs.

Request collapsing, also called request coalescing or "collapse forwarding", is the technique that solves the thundering herd. When 10,000 viewers ask the edge for segment 4172 at the same moment and the edge has not yet cached it, a CDN with request collapsing sends one upstream fetch and parks the other 9,999 requests in a queue, releasing them all when the response comes back. Cloudflare, AWS CloudFront, Akamai, Fastly, and Varnish-based custom shields all implement this. The official Cloudflare engineering write-up reports request coalescing reduces origin requests by more than 90% during stampedes, and Varnish's documentation gives the same headline figure. Without collapsing, the start of a live event is a guaranteed origin meltdown; with it, the start is uneventful.

Origin shielding, the second technique, is what we describe in detail in article 6.2; here it is enough to know that an origin shield is a single chosen Region that funnels every upstream request from every edge through one cache, increasing the likelihood of a cache hit before the request escapes to the origin. AWS CloudFront's official documentation states explicitly that Origin Shield "is ideal for workloads with viewers that are spread across different geographical regions or workloads that involve just-in-time packaging for video streaming, on-the-fly image handling, or similar processes". Activating Origin Shield on a busy live stream is one of the largest cost-reduction levers a streaming team has.

VOD also benefits from both techniques but tolerates their absence; live does not.

Numbers from Real Production: a $40,000 Bill Becomes $4,000

A worked example, drawn from a class of projects we have shipped. A regional sports streamer carries 50,000 concurrent viewers on Saturday evenings at an average video bitrate of 3 Mbps. The segment cadence is one 4-second segment per second of programming, so each viewer asks for one HLS segment every 4 seconds: 12,500 segment fetches per second.

Run without an origin shield, on a CDN configured with just edge POPs in front of an origin. Edge cache hit ratio measured at 88% — the live nature of the content keeps this below 95% no matter how the rest is tuned. 12,500 fetches per second × 12% miss rate = 1,500 fetches per second hitting the origin directly. At 1,500 requests per second the origin's edge proxy collapses many of them, but at the bandwidth level the origin uploads 4.5 Gbps of segment data to the CDN per second — roughly 2 PB of egress per month under that workload pattern. At AWS's standard $0.09 per GB EC2 egress price into the CDN, that is about $180,000 per month. Most teams negotiate this down, but in raw arithmetic it is the number.

Now activate Origin Shield in a single Region — say eu-west-1 — and configure the CDN so every edge miss funnels through the shield first. The shield's cache hit ratio on the misses it receives runs at roughly 92% for the same workload. 1,500 origin-bound fetches per second × 8% shield-miss rate = 120 fetches per second hitting the origin directly. The origin now uploads 360 Mbps of segment data per second instead of 4.5 Gbps. Origin egress drops by a factor of 12.5×. The CDN bill stays the same; the origin egress bill drops from $180,000 to ~$14,400. Plus the shield itself is cheap — typically a flat regional fee or a per-GB charge several times lower than transit egress. Net saving in this worked example: more than $150,000 a month, for a configuration that takes one engineer one afternoon to roll out.

Numbers above are illustrative of a class of workload, not a specific customer; the underlying ratios are consistent across the streaming projects we audit. The arithmetic is the point: the cache hierarchy is not a nice-to-have layer over the origin, it is the only reason streaming is affordable.

Bar chart comparing origin egress and CDN bill for the worked sports-streamer example, with and without origin shield. Figure 3. Worked example: the same 50,000-viewer live event with and without origin shield. Origin egress drops more than tenfold, and the CDN-side spend is functionally unchanged.

Common Pitfall — Treating the CDN Like a Black Box

The single most expensive mistake we see across streaming projects is the team that buys a CDN, points its DNS at the vendor, and walks away — never inspecting the cache hit ratio, never enabling origin shield, never instrumenting per-region throughput. The CDN sales engineer set up a sensible default for the first call; nobody touched it after that.

Six months later the streaming product is paying three to five times more than it should, performance in one region is half what it is in another, and the team has no idea why. The fix is always the same: read the four dashboards the CDN already exposes — request volume per edge group, cache hit ratio per layer, origin egress bytes, per-region TTFB — and act on what they show. Every commercial CDN exposes these as a matter of course; the failure is on the operator side for never opening them.

The corollary: every streaming team needs at least one engineer who owns the CDN configuration, reads the dashboards weekly, and is on the hook for the cache hit ratio number. Without that owner the configuration drifts, the bill grows, and the product gets blamed for performance problems the network is causing.

The CDN Market Landscape in 2026

A short tour, useful enough to talk to vendors. Akamai still operates the largest single-operator footprint on the public internet — 345,000+ edge servers across 4,200+ locations as of Q2 2026 per Akamai's own count — and dominates premium broadcast streaming where the customer pays for white-glove operations. Akamai's published share of the global CDN market is approximately 20%. AWS CloudFront is the default choice for AWS-native streaming stacks and ships well-integrated Origin Shield with Region-level controls. Cloudflare runs in 330+ cities with strong free-tier onboarding and the lowest per-GB list prices among the major commercial CDNs. Fastly posts the lowest absolute latencies in North America and Western Europe and is the favourite of teams that need programmable VCL at the edge. Google Media CDN — the streaming-specific surface on Google Cloud — combines Google's eyeball-network peering with origin-shield-like tiering and is the typical choice for YouTube-adjacent workloads. Below the top tier, Bunny.net, CacheFly, BlazingCDN, EdgeCast (Edgio), CDN77, KeyCDN, G-Core Labs, and 5centsCDN all serve streaming traffic at competitive per-GB prices, often with a strong regional advantage.

Two specialised models deserve mention. Embedded ISP appliances — Netflix Open Connect, YouTube Edge Cache, the various caches Akamai NetSession deployed — sit physically inside ISP networks, serve only the operator's content, and bypass the public internet entirely for the last hop. Netflix's Open Connect Appliances "have the same capabilities as the OCAs that we use in our 60+ global data centres" per Netflix's published partner documentation, and OCAs steer client traffic via BGP announcements coordinated with the ISP. The model is the gold standard for performance and the floor for cost — bandwidth that never traverses transit is free at the margin — but it requires being big enough that ISPs accept a custom appliance from you, which is essentially Netflix, YouTube, Disney+, and the Big Tech CDN operators themselves.

Multi-CDN is the second model and what most large streaming platforms actually run. Two or more CDNs, with a steering layer that routes traffic between them per region, per rendition, or per session. We dedicate articles 6.4 and 6.5 to it because the operational story is substantial.

Where Fora Soft Fits In

Fora Soft has been building video infrastructure since 2005, and the CDN layer touches every project we ship: live streaming for sports and esports operators, OTT and Internet TV platforms, video conferencing and telemedicine systems that bridge WebRTC peers out to a broadcast audience, e-learning platforms that mix VOD lectures with live workshops, and AR/VR experiences with their tighter latency budgets. We work across the major commercial CDNs, the regional players where they outperform the giants, and the embedded-cache deployments where customer scale justifies them. The work we do is rarely "pick a CDN" in isolation — it is the cache hierarchy design, the cost model, the multi-CDN steering, and the operator-side instrumentation that makes a streaming product affordable to run at scale.

What to Read Next

CTA

References

  1. AWS, "Use Amazon CloudFront Origin Shield", AWS CloudFront Developer Guide, accessed 2026-05-24. — Origin Shield positioning and use-case statement for streaming and JIT packaging.
  2. AWS, "Amazon CloudFront for Media — Best Practices for Streaming Media Delivery", AWS whitepaper, accessed 2026-05-24. — Reference architecture for the cache hierarchy and shield positioning.
  3. Akamai, "Reference Architecture — Content Delivery at the Edge", accessed 2026-05-24. — Tiered distribution model, 345,000+ edge server / 4,200+ location footprint counts (Q2 2026).
  4. Cloudflare Engineering, "Live video just got more live: Introducing Concurrent Streaming Acceleration", Cloudflare blog. — Request coalescing and chunked transfer behaviour at the edge for live video; the >90% origin-request reduction figure.
  5. Netflix, "Open Connect — Overview", Netflix Open Connect documentation, accessed 2026-05-24. — Embedded OCA model, BGP-driven steering, popularity-driven nightly fill, 99%+ cache hit ratio engineering posture for popular titles.
  6. Netflix Open Connect Partner Help Center, "Requirements for deploying embedded appliances", accessed 2026-05-24. — Embedded vs in-data-centre OCA capability parity statement.
  7. Fastly, "Time to first byte (TTFB)", Fastly Learning Center, accessed 2026-05-24. — TTFB definition, the order-of-magnitude reduction from edge caching, and impact on player startup time.
  8. Varnish Software, "Request coalescing and other reasons to use Varnish as origin shield", accessed 2026-05-24. — Mechanics of request coalescing in a streaming origin shield and the order-of-magnitude origin-load reduction figure.
  9. OTTVerse, "CDN Request Collapsing and the Thundering Herds Problem Simplified", accessed 2026-05-24. — Thundering-herd framing and a practitioner-level walkthrough of collapse forwarding for live streaming.
  10. Akamai blog, "Cache Hit Ratio: The Key Metric for Happier Users and Lower Expenses", accessed 2026-05-24. — Cache hit ratio as the single most actionable CDN metric for streaming operators.
  11. Conviva, "State of Streaming" 2026 report, in-house industry benchmark referenced in cost-economics article (6.8). — Industry-wide CDN performance and QoE benchmarks; full discussion in article 9.9 / 9.10.
  12. Streaming Video Technology Alliance (SVTA), "Open Caching" specification work, accessed 2026-05-24. — Industry working group on standardising multi-CDN and ISP-cache interfaces, contextual background to the embedded-cache and multi-CDN sections.