Live Origin Architecture: Clustered, Single, Replicated, JIT

Why this matters

When a live stream stops, viewers do not file a polite support ticket — they close the tab, tweet a screenshot, and the CFO asks why the watch-party page went dark during the goal. The origin is where most of those incidents are born and where almost all of them are prevented. The reader for this article is the head of streaming choosing between Wowza on one EC2 instance and a multi-region MediaPackage deployment, the founder being told "we need a redundant origin" without knowing what that costs, and the engineer translating a five-nines uptime target into specific boxes on an architecture diagram. We assume zero prior knowledge of origin internals — by the end you will know what every term in this paragraph means and what shape of origin your product actually needs.

What a live origin actually does, in plain language

An encoder is the box that takes the camera's raw video and turns it into a small number of compressed renditions — 1080p, 720p, 480p, 360p, and so on — that fit through the internet. A packager wraps those renditions into the small files that streaming protocols expect: HTTP Live Streaming segments, MPEG-DASH segments, CMAF chunks. An origin is the HTTP server that holds those segments and the manifest that lists them, and that answers every request from the content delivery network for the next piece of the stream.

In a live workflow, the origin's job sounds simple — serve files over HTTP — and is, in fact, the hardest distributed-systems problem in the pipeline. The files are being written, in real time, while millions of players are reading from them. The manifest changes every few seconds. A segment that did not exist 800 milliseconds ago suddenly must exist, must be the right length, must reference the right encryption key, and must be reachable from every edge node the content delivery network operates. If the origin fails for thirty seconds, the manifest goes stale, players run out of buffered segments, the rebuffer counter spikes, and the stream goes black. There is no "go back and re-render" option — live is live.

The origin is also where the contributing video timeline is canonicalised. Every encoder, every packager, every player downstream agrees on what "second 47 of the show" means because the origin says so. If you have two encoders running in parallel for redundancy, the origin is the place that picks one of them as authoritative at any given moment and ignores the other. If you fail over from primary to backup encoder, the origin is what hides the seam from the player. This is more than file serving — it is the live equivalent of a database write-ahead log, with the same constraints around durability, ordering, and consistency.

The four shapes of a live origin

There are four canonical shapes a live origin takes in production, in roughly increasing order of scale, cost, and reliability. The shapes are not exclusive — a product often grows through them as audience and stakes increase. Pick a shape that fits today's scale plus the next eighteen months, not the shape that fits the audience you hope to have in five years.

The four shapes are:

Single-server origin. One process, usually one server, doing ingest, packaging, and origin serving for one or a small number of streams. The Wowza-on-an-EC2-instance pattern.
Packager + origin pair. A dedicated packager produces segments; a separate origin server, often a tuned object store or HTTP cache, serves them. The Bitmovin-encoder-plus-Unified-Origin pattern.
Just-in-time origin cluster. A cluster of identical origin nodes that package each segment on request, transforming a single canonical media store into HLS, DASH, CMAF, or whatever the requesting player wants. The Unified Streaming and AWS Elemental MediaPackage v1 pattern.
Cross-region replicated cluster. Two or more independent origin clusters, in different geographic regions, fed by independent encoder pipelines, with the content delivery network or a steering layer in front choosing between them per request. The AWS Elemental MediaPackage v2 with cross-region failover pattern, and the architectural shape the broadcasters reach for when a single goal-line cut cannot be lost.

The remainder of this article unpacks each shape — what it solves, what it does not, what it costs, and where it sits on the reliability curve.

A four-quadrant diagram showing the four origin architectures stacked by complexity and reliability: single server at the bottom, packager plus origin pair, just-in-time origin cluster, and cross-region replicated cluster at the top, each annotated with its typical audience size and failure tolerance

Figure 1. The four canonical live origin architectures, ordered by the failure they are designed to survive.

Shape 1 — The single-server origin

The simplest live origin is a single server running a streaming engine that handles everything — receiving the contribution stream from the encoder, transcoding into renditions if needed, packaging into HLS and DASH, and serving the segments over HTTP. The classic stack is Wowza Streaming Engine on a single virtual machine, or Nimble Streamer on a small VPS, or an open-source equivalent like SRS or MediaMTX.

This shape is correct for a surprisingly large set of products. A 200-viewer training webinar, a parish-church live stream, an internal company all-hands, a one-off conference keynote with the audience pre-registered — all of these survive perfectly on one box. The contribution stream comes in over RTMP or SRT, the engine transmuxes to HLS, the player fetches segments directly from the same server. There is no content delivery network in the path because the audience is small enough that one well-provisioned server can serve all of it. Total cost: about $50 to $300 per month for the virtual machine, the streaming engine licence, and the bandwidth.

The single-server origin also makes sense as the ingest half of a larger system. A pattern that has been documented for over a decade — including in the Wowza-plus-Nimble Streamer integration guides — is to run Wowza Streaming Engine as a small origin that receives RTMP, transcodes if needed, and produces HLS, while a fleet of Nimble Streamer edge boxes pull from it and serve the audience. The Wowza side handles the heavy lift (ingest, transcoding, packaging); the Nimble side handles the cheap, high-volume part (HTTP delivery). It is a "one origin, many edges" pattern, and the origin is still a single box.

The limits are obvious. One server is one failure domain. If the hard disk dies, the network card flaps, the kernel panics, the data centre loses power, the licence expires — the stream stops. There is no warm spare automatically taking over. Hitless software upgrades are also impossible, because there is nothing to roll over to. And the box's bandwidth budget caps the audience hard: a 100 Mbps virtual machine can serve about 200 concurrent viewers of a 0.5 Mbps stream, and not one viewer more.

The single-server origin is the right answer when:

the audience is small enough that one machine can serve it directly, or
the product is in a pre-production phase where reliability matters less than time-to-market, or
the stream is non-critical (an internal demo, a hobby project, a low-stakes community broadcast).

For anything where a one-hour outage costs more than a junior engineer's day rate, you need at least two servers, which means moving to shape 2.

Shape 2 — The packager + origin pair

Separating the packager from the origin is the first architectural move every serious live product makes. The packager is the process that takes the encoder's renditions and produces the segment files and the manifest. The origin is the HTTP server that serves them. Splitting the two means each can scale, fail, and be replaced independently.

In a typical packager-plus-origin pair, a software packager — Unified Streaming's Unified Packager, Bitmovin's Live Packager, the open-source Shaka Packager, Nimble Streamer's packaging mode — runs on its own host, consumes the encoder's CMAF or RTMP feed, writes segment files into shared storage (an S3 bucket, an NFS export, an in-memory cache), and updates the manifest. A separate origin server — nginx tuned for streaming, an Apache Traffic Server, a CloudFront origin, or a tuned commodity HTTP daemon — reads from that storage and serves the content delivery network's edge nodes.

The first reliability win is independent failure. If the packager crashes, the origin keeps serving whatever segments are already on disk for the next thirty seconds, which is usually long enough for the packager to restart or for the secondary packager to take over. If the origin crashes, the packager keeps writing new segments, and a load balancer routes traffic to the surviving origin instance. Neither failure mode takes the whole stream down.

The second win is horizontal scaling on the origin side. The packager's CPU work is bounded by the encoder bitrate, which is fixed — a 6 Mbps total ABR ladder will produce a few hundred kilobytes per second of new segment data regardless of audience size. The origin's work, on the other hand, scales linearly with viewers. Putting them in separate processes lets you scale the origin out to a fleet of identical HTTP servers behind a load balancer while keeping the packager small.

The shape is also what makes CMAF ingest practical. The DASH-IF Live Media Ingest Protocol — published in version 1.0 in 2019, version 1.1 in 2022, and version 1.2 on February 28, 2024 — defines two interfaces for how an encoder hands off live media to a downstream packager or origin. Interface 1 uses fragmented MPEG-4, the same container that CMAF specifies, sent as a long-running HTTP POST from the encoder to the packager. Interface 2 uses the already-packaged DASH and HLS forms. Both are HTTP-POST-based, both support redundancy by sending the same stream from two encoders to two packagers, and both place the packager-plus-origin pair as the canonical landing point for the contribution traffic. If you are building anything that talks CMAF ingest, the shape you are building is shape 2 or above — there is no shape-1 way to implement it cleanly.

The cost is a step up but not huge. Three servers — one packager, two origins behind a load balancer — comes in around $400 to $1,500 per month on a major cloud, plus the packager licence if you are not using open-source Shaka Packager. The reliability improvement against shape 1 is large; recovery from a single-host failure measured in seconds rather than the time it takes a human to notice and respond.

The shape is the right answer when:

audience grows beyond what one server can deliver but a content delivery network is now in the path absorbing most of the load,
the team needs to do rolling restarts of packager or origin without taking the stream down, or
the stream needs to go out in HLS and DASH simultaneously and the packager is producing one canonical CMAF source that both protocols consume.

The next step up is when one packager is no longer enough.

Shape 3 — The just-in-time origin cluster

A just-in-time origin — abbreviated JIT origin, and sometimes called a smart origin or dynamic packager — is the design that wins when a single product has to serve the same live stream in many output formats (HLS-TS for legacy iOS, fMP4 HLS for everything else, MPEG-DASH for Android and web, low-latency CMAF for the latency-sensitive variant) and to many edge configurations (signed URLs per region, DRM keys per device class, language tracks per market). Pre-packaging every combination wastes storage and CPU; packaging on demand, from a single canonical media store, only when a request actually arrives is what JIT origin means.

The canonical implementation is Unified Streaming's Unified Origin, which has been shipping since 2010 and remains a reference for the pattern. Unified Origin runs as an nginx module that accepts an HTTP request for a manifest or a segment, identifies the underlying canonical CMAF or fragmented-MP4 track in storage, packages it just-in-time into the requested format (HLS m3u8, DASH MPD, MSS, fMP4 segment, TS segment), applies the right encryption keys, and serves it. A single canonical source can fan out into dozens of derived output formats with no pre-packaging step.

AWS Elemental MediaPackage v1 and Microsoft Azure Media Services worked on the same principle: the encoder produces one fragmented input, the origin produces many output formats. The economics are compelling at scale. A live channel that needs to be available in seven output formats no longer multiplies its packager CPU by seven; the origin packages the requested format only when an edge node misses cache and asks for it. Cache hit ratios on the content delivery network in front of the origin handle most of the load, and the origin's CPU is spent only on the long tail of unique requests.

JIT origin is also where multi-tenant live origins start to make sense. One origin cluster can serve hundreds of independent live channels because the per-channel cost is bounded by the storage of the canonical source plus the metadata for the active live window — typically a few minutes of segments held in memory or a fast cache. Adding the 101st channel does not double the CPU; it adds a few percent. This is what platforms like Mux, Bitmovin Live, AWS Elemental MediaPackage, and Microsoft Azure Media Services exploit to offer "thousands of concurrent channels" on a shared platform at unit-economic prices a single-tenant deployment cannot match.

The architecture inside a JIT origin cluster is worth a look. A small number of ingest nodes receive the contribution streams via CMAF ingest, RTMP, or SRT, and write incoming media into a shared media store — sometimes a distributed cache like memcached, sometimes a fast key-value store, sometimes an object store like S3 with a thin caching layer in front. A larger number of packaging nodes behind a load balancer answer HTTP requests for manifests and segments, reading from the media store and producing the requested format. All nodes are stateless with respect to a specific channel, so any node can serve any channel — channel-to-node affinity is handled by the load balancer or the consistent-hashing layer in front.

Netflix's own live origin, written up on the Netflix Tech Blog in December 2025, is a custom take on this pattern. The live origin sits between Netflix's cloud encoding pipeline and the Open Connect content delivery network, ingesting the live stream, persisting segments in a distributed key-value store on top of Apache Cassandra, and serving Open Connect's edge appliances. Netflix faced a specific scaling problem: read throughput in the hundreds of gigabits per second range was unacceptable to drive directly against Cassandra without degrading the write path. The fix was a write-through cache built on EVCache (Netflix's memcached-backed distributed cache) with a chunking protocol that splits multi-megabyte segment values into smaller chunks; almost all read load is served from cache, write load lands cleanly on Cassandra, and read throughput scales past 200 Gbps without affecting the write path. The architecture is unusual at Netflix's scale, but the principles — separate the read path from the write path, cache aggressively, chunk large blobs, isolate write contention — are general to every JIT origin design.

The cost of a JIT origin cluster is structurally higher than shape 2 because there are more components to keep healthy, but the per-channel cost falls sharply as you add channels. A three-node JIT origin cluster on a major cloud runs around $1,500 to $4,000 per month before you have ingested a single channel; the marginal cost of each additional channel is a few dollars in storage and a few dollars in CPU. For a product running fewer than ten concurrent live channels, JIT origin is overkill. For a product running fifty or more, it is the default.

A schematic of a just-in-time origin cluster showing a small number of ingest nodes feeding a shared media store and a fleet of packaging nodes behind a load balancer producing HLS, DASH, CMAF, and MSS outputs on demand

Figure 2. The just-in-time origin cluster — one canonical source, many output formats packaged on demand.

Shape 4 — The cross-region replicated cluster

The fourth shape — replicated multi-region origin — is what you build when "the stream went black for thirty seconds" is itself an incident report worth writing. It is the architecture you reach for when you carry a championship final, a presidential debate, a stock-exchange announcement, an enterprise earnings call, a paid concert, a religious holiday service — anything where a regional cloud outage during the broadcast is unacceptable.

The shape duplicates everything that matters across two or more geographic regions, runs both copies in parallel, and lets the content delivery network or a steering layer choose between them per request. Two independent encoder pipelines push to two independent packager-plus-origin (or JIT origin) clusters in different cloud regions. The contribution side runs dual encoding, ideally with the two encoders frame-synchronised via the same source-clock signal or via the CMAF ingest specification's timeline anchoring so their segment boundaries line up. Each cluster believes it is the only one in the world.

The architectural example almost every video architect studies is AWS Elemental MediaPackage v2 with cross-region failover, launched in June 2024 as an evolution of the platform's older same-region two-input redundancy. The model has two layers of redundancy. At the input redundancy layer, MediaPackage v2 accepts a primary and a secondary CMAF ingest stream from two independent encoders within a single region; the service runs both ingest endpoints on redundant instances across multiple AWS availability zones, picks one as the active source, and automatically fails over to the secondary if HLS segments stop arriving or arrive late from the primary. At the cross-region failover layer, a customer runs an entire MediaPackage v2 deployment in a primary region (say, us-east-1) and a parallel deployment in a secondary region (say, us-west-2), each fed by its own encoder pair, and configures the content delivery network — CloudFront, Akamai, Fastly, whichever — to use the force-endpoint error signal that MediaPackage emits to recognise that the primary origin's stream is stale and to route fetches to the secondary origin instead.

The mechanism that makes the cross-region failover transparent at the player level is worth slowing down on. The MediaPackage v2 origin returns HTTP error codes (the force-endpoint-error configuration is the customer's tunable for which conditions trigger this) that the upstream content delivery network is configured to interpret as "this origin is no longer healthy — try the next one". The content delivery network's origin-failover logic switches over without the player knowing, because the URL the player is fetching has not changed; only the underlying origin behind that URL has. The manifest the secondary origin returns is timeline-aligned with the primary (because both encoders pushed CMAF ingest streams with the same time base), so the player continues to fetch the next-expected segment without a discontinuity tag.

The principle is general — multi-CDN deployments with origin steering use the same shape, and the AWS Labs reference architecture aws-clustered-video-streams packages the pattern into deployable infrastructure-as-code. The idea is the same regardless of vendor: redundant origins, time-aligned content, a smart layer in front of them that decides which one is authoritative per request, and a recovery time bounded by the steering layer's polling interval rather than by a human operator's reaction time.

The cost of a cross-region replicated cluster is large. You are paying for two of everything — two encoder pipelines, two origin clusters, two contribution networks, two sets of operational staff watching the dashboards. The bandwidth bill doubles on the contribution side because both encoders are streaming continuously even though only one is being read at a time. A realistic budget for an end-to-end cross-region replicated live workflow on AWS is in the range of $4,000 to $25,000 per month before content delivery network egress, depending on channel count and the size of the encoder pipeline. The break-even with a single-region deployment is not about absolute viewer count — it is about what one minute of outage during the broadcast actually costs you.

The shape is the right answer when:

the stream carries revenue (paid event, ad-supported tier, sponsorship commitments that pay out per minute of uptime),
the audience is global enough that a single-region origin imposes uncomfortable round-trip-time penalties on viewers far from that region,
the brand or contractual SLA cannot tolerate even short outages, or
regulatory pressure — broadcast licences, election coverage, financial-services rules — makes "the system was down" a finding instead of a footnote.

Numeric example: sizing for a 100,000-viewer live event

Walk through the math for a single concrete case. The product is a paid e-sports finals stream with 100,000 expected concurrent viewers, an average bitrate of 3 Mbps across the ABR ladder (1080p60 top rendition, six ladder rungs), and a two-hour broadcast window.

Egress bandwidth: 100,000 viewers × 3 Mbps = 300,000 Mbps = 300 Gbps peak.
Total bytes delivered: 300 Gbps × 7,200 seconds × 0.7 utilisation factor (because not all viewers are on top rung the whole time) = ~1.07 Pb of egress, or about 134 TB.
Manifest fetches: each player polls the manifest every 2 seconds on average. 100,000 viewers × 3,600 polls per viewer × 2 hours = 720 million manifest requests over the event.
Segment fetches: a CMAF chunked 1-second-segment ladder produces six chunks per second per viewer; over two hours, 100,000 × 6 × 7,200 = 4.3 billion segment requests if no cache existed.

With a content delivery network in front of the origin and a 99% cache hit ratio, the origin sees roughly 1% of those requests: 4.3 million segment requests and about 7 million manifest requests over the event. That is roughly 600 requests per second at the origin layer, plus a sustained 1–3 Gbps of segment data flowing from the origin to the content delivery network's first cache tier.

A single shape-2 packager-plus-origin pair can absorb 600 requests per second comfortably. The question is whether you trust that single deployment to be your only one for a paid 100,000-viewer event. The answer in 2026, for almost every team, is no — you build shape 4 (cross-region replicated cluster) and pay the doubled infrastructure cost as event insurance.

The shape-4 budget for one event might break down as: $1,500 in MediaLive encoding (two pipelines), $800 in MediaPackage origin time (two regions), $4,500 in CloudFront egress at $0.05/GB blended (134 TB), and $200 in operational tooling. Total: ~$7,000 in cloud costs for the event, against a ticket revenue of potentially $1,000,000 for 100,000 paid viewers at $10 a ticket. The economic case for the replicated cluster is overwhelming once revenue per minute crosses about $100/minute — which on a paid event is essentially always.

Common pitfall: building one origin and "adding redundancy later"

The single most expensive architectural mistake in live origins is shipping a single-region single-origin deployment, growing the audience, and trying to retrofit redundancy after the first outage. The retrofit is much harder than the green-field build, for two reasons.

First, the encoders must produce identical, time-aligned outputs for the failover to be transparent. Two encoders that drift by 200 milliseconds will produce manifests with different segment boundary times, and the player will see a discontinuity when the content delivery network switches between origins — a black frame, a brief rebuffer, an audio pop. Fixing this means re-architecting the encoder side to use a common timing source (GPS-locked, PTP, or the CMAF ingest specification's timeline anchoring), which usually means upgrading or replacing the encoder. Doing this work under outage pressure, on a system that is already in production, is far worse than building it in from day one.

Second, the content delivery network and the manifest must be configured to support origin failover from the start. Several content delivery networks support multi-origin configurations natively; some require custom origin steering logic; some require Lambda@Edge or a similar compute-at-the-edge function to inspect the origin's response codes and choose between origins. Each of these is a project of its own. If your initial deployment chose a content delivery network without multi-origin support, your retrofit might include changing content delivery network too — and content delivery network migrations are large, multi-month projects in their own right.

The lesson: if the stream you are building will ever matter enough to need cross-region redundancy, build the timing discipline, the manifest discipline, and the content delivery network discipline in from day one — even if you only deploy a single origin until the audience justifies the second region. The cost of building those disciplines into the encoder, packager, and origin from the start is small. The cost of retrofitting them is enormous.

A second common pitfall is trusting a single cloud region's availability zone model to substitute for cross-region redundancy. Multi-AZ deployments inside a single region — three packager nodes spread across three availability zones in us-east-1, for example — are valuable and standard. But the production-grade outages that take a streaming product down are usually region-wide control-plane failures, regional networking incidents, or DNS-level events. Multi-AZ does not protect against any of those. Cross-region does. If the threat model includes "the region went down for two hours", a multi-AZ deployment does not solve it; only a multi-region deployment does.

A third pitfall is assuming the content delivery network's caching layer fully insulates the origin from request volume. It usually does, until a manifest cache miss cascades into a manifest stampede at the start of a peak event, or until a rare segment encryption-key event invalidates a large portion of the cache, or until a content delivery network configuration change accidentally lowers the cache TTL on the manifest from 2 seconds to 0 and the origin sees 100% of requests for a minute before someone notices. Origins must be sized for the cache-misses-during-incident case, not the cache-hits-during-steady-state case. The rule of thumb is to size the origin for at least 10x its steady-state request rate, so a moderate cache failure does not topple it.

Comparison table

Shape	Audience	Recovery time	Typical cost ($/mo)	Failure tolerance
1. Single-server	< 500 viewers	Minutes to hours (manual)	$50–$300	None
2. Packager + origin pair	< 50,000 viewers	Seconds (automatic)	$400–$1,500	Single host failure
3. JIT origin cluster	< 500,000 viewers, many channels	Sub-second (automatic)	$1,500–$4,000 + per-channel	Single availability-zone failure
4. Cross-region replicated	Any audience, broadcast-critical	Sub-second to seconds (automatic)	$4,000–$25,000+	Single-region cloud outage

The progression is not a ladder you must climb in order. Pick the shape that matches the highest of your audience size, your reliability requirement, and your operational budget. A 200-viewer internal stream that the CEO will hate to see fail might justify shape 2 even though it is over-engineered for the audience size; a 50,000-viewer non-critical streamer-influencer broadcast might do fine on shape 3 and skip shape 4 because the revenue per minute does not justify the second region.

How AWS Elemental MediaPackage's two-pipeline ingest actually works in production

The most-asked question I get on this topic is: "how does the two-pipeline ingest in MediaPackage actually pick which input is authoritative?". Worth a paragraph each on the four behaviours that matter in production.

Active and passive ingest. When you configure a MediaPackage channel with two ingest endpoints, both endpoints accept the encoder's CMAF ingest POST stream simultaneously. The service designates one as the active source — by default the one that started transmitting first — and the other as passive. Both streams are continuously consumed and validated, but only the active one is used to produce manifests and segments downstream. The passive stream is held in a buffer ready to take over.

Health-based failover. MediaPackage continuously monitors the active stream for missing segments, stale segments (segments that are too old relative to the manifest's expected timeline), and HTTP-level errors from the encoder side. If any of those signal a problem, the service automatically promotes the passive stream to active and demotes the previously-active one to passive. The promotion is fast — sub-second in well-tuned configurations — because the passive stream is already validated and buffered.

Timeline anchoring. For the failover to be seamless at the player level, both encoders must produce time-aligned CMAF tracks — the same segment boundary times, the same time base. The DASH-IF Live Media Ingest Protocol specification version 1.2 documents the timeline-anchoring rules and the conditions under which redundant ingest sources can be considered equivalent. In practice, this means both encoders must be locked to a common time source (GPS, PTP, or NTP at minimum) and configured with identical segment lengths and identical GOP structures.

Cross-region as the next layer. Cross-region failover, launched in MediaPackage v2 in June 2024, extends the same pattern across regions. A customer runs two complete MediaPackage v2 deployments — one in a primary region, one in a secondary region — each with its own pair of ingest endpoints fed by its own encoder pair. The content delivery network in front is configured to recognise the force-endpoint-error HTTP signal that MediaPackage emits when the primary deployment cannot serve a request (because both of its ingests have failed, or because the region itself is impaired), and to switch to the secondary region's MediaPackage deployment as the origin. The player URL never changes; only the origin behind it does.

A practical operational detail: cross-region failover assumes both deployments are configured to be active-active, not active-passive. Both regions consume their encoder pairs continuously, both regions produce manifests, both regions are ready to serve traffic. The content delivery network's choice of which one to fetch from is the deciding layer. This costs more than active-passive (you pay for two complete pipelines instead of one-and-a-quarter), but it is the only way to make the failover truly sub-second — an active-passive design has to spin up the secondary region after the primary fails, and that spin-up time is exactly the outage window you are trying to eliminate.

A diagram showing two AWS Elemental MediaPackage deployments in two regions, each with two ingest endpoints fed by two encoders, with a content delivery network in front using force-endpoint-error signals to switch between regions

Figure 3. AWS Elemental MediaPackage v2 cross-region failover — two layers of redundancy, transparent to the player.

Where Fora Soft fits in

At Fora Soft we have built more than 239 video products since 2005, and the live-origin question is one we work through with most of them. Our typical engagement on this topic is not to operate the origin ourselves but to help the team choose the right shape — sizing the architecture against the realistic audience and reliability needs, selecting between Wowza, Nimble, Unified Origin, MediaPackage, Mux, and the open-source alternatives, designing the multi-region failover plan when one is justified, and writing the runbook the on-call rotation will use during the event. We have shipped streaming-side architectures for OTT, video conferencing, telemedicine, e-learning, surveillance, and AR/VR products, and the four-shape map in this article is the conversation we open with on every live engagement.

CTA block

Talk to a streaming engineer — sit with our team on the right origin shape for the product you are building.
See our case studies — real-world live and on-demand systems we have built across OTT, telemedicine, conferencing, and surveillance.
Download the live-origin sizing checklist — a 1-page PDF you can take into the architecture review meeting.

Download the live-origin sizing checklist

References

Netflix Technology Blog — "Netflix Live Origin", Xiaomei Liu, Joseph Lynch, Chris Newton (netflixtechblog.com, December 2025). Source for the custom Live Origin design, EVCache + Cassandra write-through cache architecture, 200+ Gbps read throughput, chunked-storage protocol.
DASH-IF — "DASH-IF Technical Specification: Live Media Ingest Protocol, version 1.2" (dashif.org/Ingest/, published 2024-02-28). Source for Interface 1 (CMAF ingest) and Interface 2 (HLS/DASH ingest) definitions, HTTP POST mechanism, redundancy and failover guidelines, timeline anchoring rules.
DASH-IF news — "DASH-IF Publishes Live Media Ingest Technical Specification" and "DASH-IF publishes Live Media Ingest version 1.2" (dashif.org/news/, 2024). Source for the publication dates and version history.
AWS — "Cross-region failover now available in AWS Elemental MediaPackage" (aws.amazon.com/about-aws/whats-new/2024/06/cross-region-failover-aws-elemental-mediapackage/, June 2024). Source for the launch date of cross-region failover in MediaPackage v2 and the supported configuration shape.
AWS Elemental MediaPackage v2 User Guide — "Working with cross-Region failover in AWS Elemental MediaPackage" and "Live input redundancy processing flow" (docs.aws.amazon.com/mediapackage/, accessed 2026-05-24). Source for active/passive ingest behaviour, force-endpoint-error configuration, and CMAF Ingest (Interface 1) support for CDN failover.
AWS for M&E Blog — "How to set up a resilient end-to-end live workflow using AWS Elemental products and services" Parts 1–4 (aws.amazon.com/blogs/media/, accessed 2026-05-24). Source for the practical resilient-workflow architecture, dual-encoder synchronisation, and content-delivery-network failover patterns.
AWS Labs — "aws-clustered-video-streams" reference architecture (github.com/awslabs/aws-clustered-video-streams, accessed 2026-05-24). Source for the regional-failover clustered-stream pattern, infrastructure-as-code packaging of multi-region origin failover.
Unified Streaming — "Unified Origin LIVE" documentation (docs.unified-streaming.com/documentation/live/, accessed 2026-05-24). Source for the just-in-time packaging model, supported output formats (HLS, DASH, MSS, HDS), and nginx-module architecture of Unified Origin.
Unified Streaming — "Live Media Ingest (CMAF)" blog and CMAF ingest demo (unified-streaming.com/blog/live-media-ingest-cmaf, accessed 2026-05-24). Source for the CMAF Ingest Interface 1 architecture in a JIT origin context.
Wowza — "Configure a live stream repeater in Wowza Streaming Engine" and "Architecting a High-Performance Video Streaming Server" (wowza.com/docs/, wowza.com/blog/, accessed 2026-05-24). Source for the single-server origin and origin-edge repeater patterns at the small-scale end.
Softvelum — "Deploying Nimble Streamer in multi-tier streaming architectures" (softvelum.com/2025/08/multi-tier-streaming-architecture/, August 2025). Source for the Wowza-origin-plus-Nimble-edge pattern and multi-tier deployment economics.
ACM MMSys 2022 — "Nagare media ingest: a server for live CMAF ingest workflows" (dl.acm.org/doi/10.1145/3524273.3532888). Academic implementation reference for an open-source CMAF ingest server. Useful as an architectural reference even where production teams use commercial alternatives.

Live Origin Architecture: Clustered, Single, Replicated, JIT

Why this matters

What a live origin actually does, in plain language

The four shapes of a live origin

Shape 1 — The single-server origin

Shape 2 — The packager + origin pair

Shape 3 — The just-in-time origin cluster

Shape 4 — The cross-region replicated cluster

Numeric example: sizing for a 100,000-viewer live event

Common pitfall: building one origin and "adding redundancy later"

Comparison table

How AWS Elemental MediaPackage's two-pipeline ingest actually works in production

Where Fora Soft fits in

What to read next

CTA block

References

Related glossary terms

Live Origin Architecture: Clustered, Single, Replicated, JIT

Why this matters

What a live origin actually does, in plain language

The four shapes of a live origin

Shape 1 — The single-server origin

Shape 2 — The packager + origin pair

Shape 3 — The just-in-time origin cluster

Shape 4 — The cross-region replicated cluster

Numeric example: sizing for a 100,000-viewer live event

Common pitfall: building one origin and "adding redundancy later"

Comparison table

How AWS Elemental MediaPackage's two-pipeline ingest actually works in production

Where Fora Soft fits in

What to read next

CTA block

References

Related glossary terms

Tiered caching

Origin server

Cache hit ratio (CHR)

Contribution

Segment

Live streaming

EXT-X-DISCONTINUITY

Shaka Packager