Why this matters
If you are a founder, product manager, or first-time streaming CTO, "we'll add live later" is one of the most expensive sentences you can say without understanding what changes. Live is not a feature you bolt onto a VOD platform; it is a second pipeline with its own real-time transcoding, its own contribution path, and a failure mode — you cannot re-run a broadcast — that VOD never has. This article gives you the mental model to see which parts you genuinely build twice and which parts you build once and share. By the end you will be able to read a "live + VOD" proposal and tell whether the vendor is reusing the shared spine or quietly billing you for two platforms.
Two products that share a building
Start with the cleanest way to hold both ideas at once. VOD (video on demand) is content that already exists as a file — a movie, an episode, a recorded class — that a viewer can start at any time. Live is content that is happening now — a match, a concert, a webinar — that a viewer joins in progress. The difference sounds obvious, but it reaches all the way down into the engineering, because one pipeline gets to work ahead of time and the other has to keep up with reality.
Think of it as one building with two production lines. The VOD line is a print shop: a finished manuscript arrives, you typeset it carefully overnight, and you run as many copies as demand requires, whenever orders come in. The live line is a live television gallery: the show is happening in front of you, you have one pass to get it right, and the broadcast goes out as it is made. Same building, same loading dock, same delivery trucks — but the two lines have different machines at the front and different stakes when something jams.
Figure 1. Two pipelines, one platform. The front halves differ; the back half — package, protect, deliver, play, measure — is shared.
We will walk the VOD line first because it is the simpler of the two, then the live line, then the latency budget that makes live hard, then the shared back half that lets one platform serve both.
The VOD pipeline: encode once, store, serve
A VOD pipeline has the luxury of time. The source video — the high-quality master, called the mezzanine, the pristine copy you re-encode from and never show a viewer directly — arrives as a file and goes into object storage. Nothing is racing a clock. Because of that, every later step can be optimised for quality and cost rather than speed.
The transcoder builds the encoding ladder — the set of quality versions at different resolutions and bitrates that a player switches between as the network changes — and it can take its time. It can run per-title encoding, analysing each piece of content and tuning the ladder to it, because a film that will be watched a million times is worth a few extra minutes of compute to encode well. The transcode is a batch job: spin up machines, grind through the catalogue, spin them down. You pay for compute only while you encode, and you can run a thousand titles in parallel overnight.
Once encoded and packaged, the content sits on the origin and is served on demand. The same files answer a viewer in January and a viewer in December. This is why VOD delivery is so cache-friendly: a popular title is requested over and over, so the content delivery network's edge servers keep it close to viewers and rarely bother the origin. A high cache-hit ratio — the share of requests the edge serves from its own cache — is the norm for VOD, and a high cache-hit ratio is what keeps the delivery bill reasonable.
The defining property of VOD: encode once, store, serve many times. Time is on your side, quality is the priority, and the work is done before the first viewer arrives.
The live pipeline: encode in real time, no second chance
A live pipeline inverts every one of those luxuries. There is no file waiting in storage; there is a feed arriving right now, and the pipeline has to encode, package, and deliver it while it is still happening. Two properties define the live line, and both of them cost money and engineering.
First, the transcode runs in real time and never stops during the event. You cannot batch a live encode — the machines must be running and sized for the broadcast before it starts, and they stay hot for its full duration. Where VOD compute is elastic and pay-as-you-encode, live compute is an always-on fleet provisioned for the show. That changes the cost shape from "cheap per title, paid once" to "a running meter for the length of the event, sized for the peak."
Second, there is no second chance. A frame dropped on the way in is gone; a packaging error during a cup final cannot be fixed in post. This single fact drives a whole discipline of redundancy: backup contribution feeds, redundant encoders running in parallel, failover origins, and multi-CDN delivery so a single content-delivery-network brownout does not black out the event. VOD can tolerate a re-encode and a retry; live treats every component as something that must have a hot spare.
How live video arrives: the contribution path
Before the live pipeline can do anything, the video has to get in. The link that carries video to your platform is the contribution path (as opposed to distribution, which carries it onward to viewers), and it speaks a contribution protocol. Three matter in 2026, and a professional live platform supports all three because they trade latency against reach and ruggedness:
- RTMP (Real-Time Messaging Protocol) — old, runs over TCP, and supported by essentially every encoder and tool (OBS, vMix, hardware encoders). It recovers from packet loss by retransmitting the TCP stream, which causes latency spikes on a congested link, but it remains the most common studio contribution path.
- SRT (Secure Reliable Transport) — the modern broadcast-grade choice. It runs over UDP, recovers from packet loss with selective retransmission (holding up through roughly 25% loss at a one-second buffer), and carries built-in AES-256 encryption. SRT is the 2026 default for field contribution over the public internet and cellular links.
- WHIP (WebRTC-HTTP Ingestion Protocol) — the lowest-latency door, 200–500 milliseconds, ideal for browser-based input and interactive formats where the host is in a tab.
The internals of these protocols belong to our Video Streaming section; the point for this map is that live video must arrive before any other box can run, and the ingest endpoint needs a backup feed because a single dropped contribution link otherwise ends the broadcast.
The latency budget: why live is hard and VOD is not
For VOD, latency barely matters — a second or two to start playback is fine, because the content is not going anywhere. For live, latency is the product, and it is worth showing where the seconds go.
Glass-to-glass latency is the delay from light hitting the camera lens to the picture appearing on the viewer's screen. It is the sum of every stage: capture and contribution, real-time encoding, packaging into segments, the trip across the CDN, and the player's own buffer. The biggest single lever is segment duration — how long each downloadable chunk of video is — because a player typically waits to buffer two or three segments before it starts.
Walk the arithmetic once, because it explains every "why is my stream 30 seconds behind?" complaint:
Traditional HLS, 6-second segments, 3-segment player buffer:
player buffer = 3 × 6 s = 18 s
+ encode/package ≈ 2–4 s
+ CDN + network ≈ 1–2 s
glass-to-glass ≈ 18 s (buffer dominates everything else)
That 18-second figure is why naive HLS feels so far behind real time. The fix is to shrink the chunk the player waits for. Low-Latency HLS (LL-HLS), defined in Apple's HLS Authoring Specification (the low-latency additions, current revision September 2025), cuts the wait by splitting each segment into partial segments of roughly 0.2–0.5 seconds (the EXT-X-PART tag), so the player fetches in-progress media instead of waiting for a whole 6-second block. With preload hints and blocking playlist reloads, LL-HLS reaches 2–5 seconds glass-to-glass at CDN scale. Its DASH equivalent, Low-Latency DASH (LL-DASH), uses chunked CMAF to hit the same range.
Below that sits real-time communication. WebRTC delivers sub-500-millisecond glass-to-glass, but it trades CDN-scale economics and broad device reach for that speed, so it is the right tool for genuine interactivity (auctions, betting, two-way calls) and the wrong tool for a million-viewer broadcast.
Figure 2. The live latency spectrum. Lower latency narrows device reach and raises delivery complexity; pick the slowest tier your use case can tolerate.
The discipline is to pick the slowest latency tier your use case can actually tolerate, because each step down costs reach, money, or both. A film-festival watch-along is fine at 18 seconds. A live sports book needs sub-second and pays for it. VOD never has this conversation at all.
Where they diverge: scaling and failover
The two pipelines also scale differently, and confusing the two scaling problems is a classic and costly mistake.
VOD traffic is predictable and spread out. A catalogue's demand rises and falls with releases and time of day, but it rarely all lands in the same two seconds. You scale VOD by keeping the cache-hit ratio high and letting the CDN absorb the long tail.
Live traffic is spiky and synchronised. When a match kicks off, a hundred thousand viewers request the newest segment in the same instant — the thundering herd. Every request the edge cannot serve from cache, a cache miss, travels back to the origin, so a synchronised live start can flood an unshielded origin and take the event down for everyone at once. The defences are origin shielding (a mid-tier cache that collapses many edge misses into one origin request), pre-warming the edge before a scheduled start, and surge capacity planned for the peak, not the average. These get full treatment in our delivery block; the headline is that a live premiere is a denial-of-service attack you schedule against your own origin, and you plan for it or it happens to you.
Where they converge: the shared spine
Here is the good news that makes "one platform, both pipelines" realistic. Once the video is encoded and chunked, the live and VOD lines hand off to the same back half. You do not build delivery, protection, players, or analytics twice.
- Packaging. Both pipelines package into CMAF (Common Media Application Format, ISO/IEC 23000-19) — one fragmented-MP4 segment format that both an HLS
.m3u8and a DASH.mpdmanifest can address. Package once, serve every device, whether the source was a file or a feed. - Content protection. Both use the same Common Encryption (CENC, ISO/IEC 23001-7) step: encrypt the segments once with the
cbcsscheme, then issue Widevine, PlayReady, and FairPlay licences from those same files — "encrypt once, licence many." Multi-DRM is this section's unique core; see multi-DRM: one workflow, every device. - Delivery. Both pipelines deliver over the same CDN and origin. The cache behaviour differs — VOD is cache-friendly, live is spiky — but the infrastructure is shared.
- Players. The same player apps on web, mobile, and TV play both. A live stream and a VOD title are both just a manifest pointing at CMAF segments; the player adapts.
- Entitlement and analytics. The same entitlement service decides who may watch (subscription, rights, geo, concurrency) for both, and the same analytics sink records both sessions.
Figure 3. The divide is at the front. Ingest, transcoding, and the latency target are pipeline-specific; everything from packaging onward is one shared spine.
The architectural rule of thumb: the pipelines are different up to and including the transcoder, and the same from the packager onward. Build the shared spine once; build two front-ends into it.
Live ads and the live-to-VOD bridge
Two places where the pipelines touch are worth a closer look, because they are where teams underestimate the work.
Live ad insertion rides on SCTE-35, the binary cue standard (recently restructured as SCTE 35-1, with a streamlined SCTE 35-2 augmentation in 2025) that marks the exact frame where an ad break begins and ends inside a live stream. A splice_insert or time_signal cue tells the ad system "a break starts here, this long," and server-side ad insertion stitches ads into the manifest at that point. VOD ads can be planned against a known timeline; live ad breaks have to be signalled and filled while the event runs. The ad plumbing is monetisation, this section's other unique core — see SCTE-35 and ad signaling.
Live-to-VOD is the bridge that turns a finished broadcast into an on-demand asset. The same segments that were delivered live are recorded — often with a DVR window that already lets viewers pause and rewind the live stream — and then published as a catch-up or archive title. Because the live cues (SCTE-35 markers) are recorded too, the catch-up version can reuse the same ad break points. Done well, a live event becomes a VOD title minutes after it ends, with no re-encode: the live pipeline's output simply flows into the VOD library.
A common mistake: provisioning live like VOD
The single most expensive live mistake we see is sizing live compute and delivery like VOD. A team that has shipped VOD assumes live is "the same but streaming," provisions an average-sized transcode fleet and a single CDN, and then the first real event arrives: the encoder cannot keep up with the real-time feed, the unshielded origin melts under the synchronised start, and there is no second chance to re-run the broadcast. Live must be provisioned for the peak and the spike, with redundant encoders, a shielded origin, multi-CDN failover, and a pre-warmed edge. VOD's elastic, average-sized, retry-friendly mindset does not transfer.
A worked cost comparison
The cost shapes differ enough to be worth a side-by-side. Take a simple case: a 100-hour VOD catalogue versus a single 2-hour live event, both at a four-rung ladder.
| Dimension | VOD (100 h catalogue) | Live (one 2 h event) |
|---|---|---|
| Transcode | Batch, ~$0.015/output min, paid once | Real-time fleet, provisioned for 2 h, sized for peak |
| Transcode math | 100 h × 60 × 4 × $0.015 ≈ $360, one time | Always-on encoders billed for the full 2 h window |
| Storage | Mezzanine + renditions, kept forever | Segments recorded only if you keep a DVR/archive |
| Delivery cache | High cache-hit ratio (cache-friendly) | Spiky, synchronised; needs origin shield + surge |
| Failure tolerance | Re-encode and retry are fine | No second chance; needs hot spares everywhere |
| Latency target | A second or two is fine | 18 s naive, 2–5 s LL-HLS, sub-second WebRTC |
The numbers are illustrative (cloud transcode and egress list rates move; confirm at build time), but the shape is the lesson: VOD cost is dominated by storage and steady delivery, live cost is dominated by always-on real-time compute and the redundancy the no-second-chance rule demands.
Where Fora Soft fits in
Most streaming products are not purely live or purely VOD — they are a catalogue with live events on top, and the scale requirement is to run both without doubling the platform. Fora Soft has built video streaming, OTT/Internet TV, live broadcasting, video conferencing, and e-learning systems since 2005, across 625+ shipped projects for 400+ clients, which is exactly the experience of designing one shared delivery, protection, and analytics spine that two different front-ends — a batch VOD encoder and a real-time live encoder — both feed. We design the divergence where it belongs (ingest, transcoding, latency budget) and the convergence where it pays (packaging, DRM, CDN, players, entitlement), so a platform earns live without paying for a second copy of itself.
What to read next
- What is an OTT platform, end to end
- The streaming pipeline, box by box
- Scaling and concurrency: from 1,000 to 1,000,000 viewers
Call to action
- Talk to a streaming engineer — book a 30-minute scoping call to talk through your live vs vod plan.
- See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
- Download the Live vs VOD Pipeline Decision Worksheet — A one-page worksheet that walks each pipeline stage and records whether you build it live-specific, VOD-specific, or shared across both.
References
- IETF RFC 8216 — HTTP Live Streaming (HLS). IETF, 2017 (protocol version 7; second edition in draft). Manifest and media-segment format underlying both VOD and live HLS delivery. https://www.rfc-editor.org/rfc/rfc8216 — Tier 1.
- HTTP Live Streaming Authoring Specification (Low-Latency HLS additions). Apple, current revision 2025-09. Defines partial segments (
EXT-X-PART, 0.2–0.5 s), preload hints, and blocking playlist reload for 2–5 s live latency. https://developer.apple.com/documentation/http-live-streaming — Tier 3 (standards author). - ISO/IEC 23009-1 — Dynamic Adaptive Streaming over HTTP (MPEG-DASH). ISO/IEC, current fifth edition (2022). The
.mpdmanifest and live profile; basis for LL-DASH with chunked CMAF. https://www.iso.org/standard/83314.html — Tier 1. - ISO/IEC 23000-19 — Common Media Application Format (CMAF). ISO/IEC, 2017. Single fragmented-MP4 segment format both pipelines package into. https://www.iso.org/standard/85623.html — Tier 1.
- ISO/IEC 23001-7 — Common Encryption (CENC). ISO/IEC. Defines the
cenc(AES-CTR) andcbcs(AES-CBC pattern) schemes; basis for "encrypt once, licence many" shared by live and VOD. https://www.iso.org/standard/68042.html — Tier 1. - ANSI/SCTE 35 — Digital Program Insertion Cueing Message (now SCTE 35-1; SCTE 35-2, 2025). SCTE. In-band
splice_insert/time_signalcues that mark live ad breaks and carry into live-to-VOD. https://account.scte.org/standards/library/catalog/scte-35-digital-program-insertion-cueing-message/ — Tier 1. - SRT (Secure Reliable Transport) protocol overview. SRT Alliance. UDP contribution with selective retransmission (~25% loss tolerance at 1 s buffer) and AES-256; live ingest. https://www.srtalliance.org/ — Tier 3.
- WebRTC-HTTP Ingestion Protocol (WHIP). IETF WISH WG. Sub-second WebRTC contribution ingest; the lowest-latency live door. https://datatracker.ietf.org/doc/draft-ietf-wish-whip/ — Tier 2.
- Low-latency live streaming: LL-HLS, WebRTC, and CMAF. Mux engineering. Current glass-to-glass latency ranges (WebRTC sub-500 ms; LL-HLS/LL-DASH 2–5 s; segmented HLS ~18 s). https://www.mux.com/articles/low-latency-live-streaming-developers-guide-ll-hls-webrtc-cmaf — Tier 4 (re-verify ranges at publish).
Conflict note: where vendor copy implies live and VOD are "the same pipeline with a switch," this article follows the standards: the front halves differ (real-time vs batch transcode, contribution vs file ingest, latency target) while the CMAF/CENC back half (ISO/IEC 23000-19, 23001-7) is genuinely shared. The "same pipeline" framing was overridden as imprecise.


