Why this matters
If you are a founder, product manager, or first-time streaming CTO, packaging is the least glamorous step in the pipeline and one of the easiest places to silently double your bill. Every viewer's device expects video in a specific shape — Apple devices historically wanted one container, the rest of the world another — and a team that does not plan for this ends up encoding and storing two full copies of the entire catalog, then paying a content delivery network to cache and deliver both. This article gives you the mental model to avoid that: what packaging actually does, why a single CMAF master now serves both HLS and DASH, how the segment-length choice trades latency against efficiency, and how to read a packaging setup and tell whether it is built once or twice. By the end you will be able to ask the three questions that decide the cost: are we packaging into one container or two, how long are our segments and why, and are we encrypting once or once per format.
Encoding makes the quality; packaging makes the files
Two steps sit between your raw source and a viewer's screen, and teams blur them constantly. The first is encoding — compressing the video into the set of quality versions called the encoding ladder, where each version (a rendition) pairs a resolution with a bitrate. We cover that decision in full in the encoding ladder explained. Encoding decides how good the picture looks and how many bytes it takes.
The second step is packaging, and it is what this article is about. Packaging takes those encoded renditions and writes them into the precise file structure a player can fetch and play: it chops each rendition into small time-slices called segments, wraps them in a container format the player understands, and writes a small text index — the manifest — that lists every rendition and every segment so the player knows what to download and when. Encoding is the cooking; packaging is the plating and the menu. The food is the same; how you plate it decides who can eat it.
The single high-quality version that packaging starts from has a name: the mezzanine. Borrowed from broadcast, a mezzanine is the clean, high-bitrate master copy of a title — the intermediate between the original source and the compressed renditions you deliver. In a modern pipeline you encode the mezzanine once into your ladder of renditions, then package those renditions for delivery. The phrase "from one mezzanine" in this article's title is the whole point: one master, packaged once, served to every device.
The problem packaging used to create: two copies of everything
For most of streaming's history, packaging forced a painful choice, and understanding why is the key to understanding CMAF.
A container is the file wrapper that holds compressed video and audio plus the timing data a player needs — think of it as the box, not the contents. The contents (the actual compressed pixels, produced by a codec like H.264) can be identical; the box around them differed by delivery format. HLS, the format Apple created and the one every Apple device speaks, originally required its segments in one kind of box: the MPEG-2 Transport Stream (file extension .ts), a container inherited from digital broadcast. This is specified in the HLS standard, IETF RFC 8216 §3.2. MPEG-DASH, the format used almost everywhere outside the Apple ecosystem, used a different box: fragmented MP4 (fMP4), based on the ISO Base Media File Format (ISO/IEC 14496-12).
Same pixels, two boxes. To reach Apple devices and everyone else, you had to package every rendition of every title twice — once as .ts for HLS, once as fMP4 for DASH. That meant encoding and packaging work duplicated, two full copies sitting in storage, and a CDN caching both, which split your cache and delivered each title's bytes from origin more often. We call it the two-stack tax: you paid roughly twice for packaging and storage to cover a device split you did not create. Worse, the transport-stream container is itself less efficient than fMP4 — it carries more structural overhead, so the HLS copy was even a little larger than the DASH copy it duplicated.
Figure 1. The old way packaged every rendition twice — one container for HLS, another for DASH. CMAF replaces the two piles of media with one, and keeps only the two small text manifests separate.
CMAF: one box both formats agree to read
The fix arrived in 2016, when Apple and Microsoft jointly proposed a single, standard container both delivery formats could use, published in 2018 as the Common Media Application Format (CMAF), ISO/IEC 23000-19 (current edition 2024). CMAF is not a new protocol and not a competitor to HLS or DASH — it sits underneath them. It is a standardized way to package media as fragmented-MP4 segments, built on the same ISO Base Media File Format that DASH already used. A useful picture: HLS and DASH are two languages, and CMAF is a single alphabet both languages can now be written in. You write your media once in that alphabet, then add a short HLS "cover note" and a short DASH "cover note" on top.
The reason this works is written into the HLS standard itself, and it is worth quoting because most articles skip it. Alongside the legacy transport stream, RFC 8216 §3.3 defines fragmented MP4 as a supported HLS segment format and states plainly that "a Common Media Application Format (CMAF) Header meets all these requirements" and "a CMAF Segment meets these requirements." In other words, the official HLS specification explicitly accepts CMAF segments. Apple announced this fMP4 support for HLS at its 2016 developer conference, the same year CMAF was proposed. Because DASH was already an ISO-BMFF/fMP4 format, the same CMAF segments satisfy DASH too. One set of segments, two formats that both accept it.
So the modern packaging output looks like this: one folder of CMAF segments per rendition (plus a tiny initialization file per rendition), and two manifests that index those same segments — an HLS multivariant playlist (.m3u8) and a DASH media presentation description (.mpd). The manifests are small text files; the media is large. You duplicate the small thing and share the large thing.
A trimmed HLS manifest pointing at fMP4/CMAF segments — note the EXT-X-MAP tag, which RFC 8216 §4.3.2.5 requires for every fMP4 segment to name its initialization file:
#EXTM3U
#EXT-X-VERSION:7
#EXT-X-TARGETDURATION:6
#EXT-X-MAP:URI="init_1080p.mp4"
#EXTINF:6.000,
seg_1080p_00001.m4s
#EXTINF:6.000,
seg_1080p_00002.m4s
The DASH manifest indexes the same .m4s segments with different words — a Representation inside an AdaptationSet, the model defined in ISO/IEC 23009-1:
<AdaptationSet mimeType="video/mp4" segmentAlignment="true">
<Representation id="1080p" bandwidth="6000000" width="1920" height="1080" codecs="avc1.640028">
<SegmentTemplate media="seg_1080p_$Number$.m4s" initialization="init_1080p.mp4"
duration="6" startNumber="1"/>
</Representation>
</AdaptationSet>
Two manifests, one set of segment files. That is "package once" in concrete terms. The byte-level anatomy of these segments — the ftyp, moov, moof, and mdat boxes that make up an fMP4 file, and the CMAF track-fragment-chunk hierarchy — is delivery-protocol internals, and we keep it where it belongs: see CMAF: the packaging format that unified HLS and DASH in our Video Streaming section, alongside the HLS and MPEG-DASH deep dives. For the OTT product decision, the box anatomy matters only as far as it sets cost and reach, which is where we stay.
Figure 2. One mezzanine, encoded to a ladder, packaged once into CMAF, encrypted once, then indexed by two small manifests. The expensive media is shared; only the cheap text differs.
What a CMAF segment is, just enough to decide
You do not need to read the box specification to make product decisions, but two pieces of structure matter because they show up in your cost and latency.
First, each rendition is delivered as two kinds of file: one small initialization segment that tells the player how to set up its decoder (resolution, codec, timing — and nothing playable), and a series of media segments that carry the actual few-seconds-each slices of video. In HLS the initialization segment is named by that EXT-X-MAP tag; in DASH by the initialization attribute. Same file, two pointers. This is why "package once" is literally true at the file level: the player downloads the same init_1080p.mp4 and the same seg_1080p_00001.m4s whether it found them through an .m3u8 or an .mpd.
Second, a segment can be subdivided further into chunks — even smaller slices, typically a fraction of a second — which exist so a live stream can start sending the beginning of a segment before the whole segment is finished encoding. Chunks are the foundation of low-latency streaming, and they let you keep long segments (good for efficiency) while still delivering quickly (good for latency). The mechanics of chunked transfer and the low-latency profiles live in LL-DASH and low-latency CMAF; for packaging, the chunk is the unit that decouples your latency from your segment length, which we turn to next.
The segment-duration decision: your quiet latency-and-cost lever
Here is the packaging decision teams most often make by accident: how long is each segment? It sounds trivial and it sets three things at once — switching quality, delivery efficiency, and live latency — so it deserves a deliberate choice.
A segment must begin with a keyframe (a complete, standalone picture the decoder can start from without needing the frame before it). Keyframes are large compared to the frames between them, so segment boundaries are expensive. This single fact drives the whole trade-off.
Longer segments (say 6–10 seconds) put fewer keyframes in the stream, which compresses better and means fewer files and fewer network requests, so the CDN caches them well and origin load drops. The cost: the player can only switch renditions at a segment boundary, so long segments make adaptive switching coarse and sluggish, and for live they raise latency, because a player generally waits for whole segments. Shorter segments (1–2 seconds) let the player switch quality quickly and cut live latency, but multiply keyframes and request count, which lowers compression efficiency and loads origin harder.
Apple's HLS Authoring Specification recommends a 6-second target segment duration as the general default, and that is the sensible starting point for video-on-demand. For low-latency live you do not fix this by shrinking segments — that pays the keyframe tax repeatedly — you keep segments reasonable and use chunks inside them, so the player fetches sub-second chunks while segments stay efficient. Industry low-latency practice puts CMAF chunks in the 200–500-millisecond range and can reach a glass-to-glass delay of roughly twice the chunk duration, versus many times the segment duration without chunking.
Figure 3. Segment length trades latency and switching speed against compression and cache efficiency. Chunks let you keep efficient long segments while still delivering live quickly.
Encrypt once, not once per format
Packaging is also where content protection is applied, and the "package once" logic extends to it directly — which is exactly why getting the container right matters beyond storage. The system that keeps premium content from being copied, called digital rights management (DRM), relies on encrypting the segments so only an authorized player can unlock them. The standard that lets one set of encrypted segments serve different DRM systems is Common Encryption (CENC), ISO/IEC 23001-7. It defines two encryption schemes: cenc (which uses AES in counter mode) and cbcs (AES in cipher-block-chaining mode). The detail that decides your workflow: Apple's FairPlay requires the cbcs scheme, and cbcs is also accepted by the other major DRM systems, so encrypting your CMAF segments once with cbcs lets you issue licenses to Apple's FairPlay, Google's Widevine, and Microsoft's PlayReady from the same files.
Figure 4. Encrypt the CMAF set once with the
cbcs scheme and every DRM is served from the same files. Encrypt with cenc only and FairPlay — every Apple device — goes dark.
This is the same shape as the container story. Just as one CMAF container avoids two packaging stacks, one cbcs encryption avoids encrypting your catalog separately for each DRM. Do it wrong — encrypt with the older cenc scheme only — and FairPlay devices cannot play your content at all, quietly cutting off every iPhone and Apple TV. Because DRM is a decision with its own depth and its own failure modes, we treat it fully in Block 4: see CENC, CTR, and CBCS: common encryption explained and multi-DRM: one workflow, every device. For packaging, hold one rule: the container and the encryption are both "once" decisions, and CMAF plus cbcs is what makes "once" possible.
Static versus just-in-time: where the one copy lives
One more packaging choice affects storage: when do you actually create the segments and manifests? There are two models.
Static (pre-) packaging writes all the CMAF segments and both manifests to storage ahead of time, so the CDN pulls ready-made files. It is simple and fast to serve, but it stores the finished package for every title whether or not anyone watches it.
Just-in-time (dynamic) packaging stores only the mezzanine renditions and generates the segments and the requested manifest on the fly when a player asks, caching the result at the CDN. It minimizes what you store — one set of source files, no pre-built format variants — at the cost of some compute at request time. For a large catalog with a long tail of rarely watched titles, just-in-time packaging keeps storage small; for a small, heavily watched catalog, static packaging is often simpler and cheaper to serve. The trade-off and the origin architecture behind it are covered in just-in-time packaging versus pre-packaged origin. Either way, CMAF is what keeps it to one copy rather than two: you are choosing when to make the single shared package, not whether to make two.
From packaging to bill: the arithmetic that matters
Tie packaging back to money, because that is why "once" wins. Suppose a catalog of 1,000 two-hour films, each encoded to a seven-rung ladder whose renditions sum to about 17.5 Mbps of stored bitrate (the ladder math is worked out in the encoding ladder article). One packaged copy of one film is:
aggregate bitrate = 17.5 Mbit/s
film length = 2 hr × 3,600 s = 7,200 s
one packaged copy = 17.5 Mbit/s × 7,200 s ÷ 8 = 15,750 MB ≈ 15.75 GB
catalog, one copy = 15.75 GB × 1,000 films ≈ 15.75 TB
Package the old way — separate .ts for HLS and fMP4 for DASH — and you store two copies, about 31.5 TB, and the transport-stream copy is the larger of the two because of its overhead. Package once with CMAF and you store roughly 15.75 TB. The storage saving is close to half; vendor reports put the encoding-packaging-storage saving of moving to a single fMP4/CMAF output in the same ballpark — Bitmovin reports roughly 50% versus maintaining HLS-TS plus DASH. Container efficiency adds a little more, since transport-stream carries about 10% more overhead than fMP4.
Storage is the smaller prize. The larger one is delivery and caching. A CDN caches what it is asked for; if Apple viewers pull .ts files and everyone else pulls fMP4, the edge holds two copies of every popular segment and serves more requests from origin. With one CMAF set, every viewer of a given rendition pulls the same segment file, so the cache holds one copy and the cache-hit ratio rises — reported in practice to roughly double for the shared content — which directly lowers the egress you pay. Because delivery is the dominant recurring cost of a streaming platform, the cache effect of packaging once usually matters more than the storage line. The whole delivery-cost model is in CDN cost engineering and the platform-wide view in the OTT cost model.
Container and format support, at a glance
The table sketches which container each delivery format accepts and where each lands on the decisions above. The "HLS?" and "DASH?" columns are the coverage view that tells you whether one package can serve both.
| Packaging choice | HLS? | DASH? | Encrypt once (cbcs)? |
Low latency via chunks? | Notes |
|---|---|---|---|---|---|
MPEG-2 TS (.ts) |
Yes (legacy) | No | No | No | Old HLS-only path; ~10% more overhead than fMP4 |
fMP4 / CMAF (.m4s) |
Yes | Yes | Yes | Yes | The modern shared container — one copy for both |
CMAF + cbcs CENC |
Yes | Yes | Yes | Yes | Add multi-DRM from one encrypt; FairPlay-compatible |
| Two separate stacks | Yes | Yes | No (per format) | Partial | The two-stack tax — double storage and split cache |
Table 1. Packaging choices and what they cover. One CMAF set with cbcs encryption is the only row that serves HLS and DASH, encrypts once, and supports low latency without storing two copies.
A common mistake: shipping two stacks in 2026
The most expensive packaging error is the one a team never notices: standing up an HLS pipeline that outputs transport-stream segments and a separate DASH pipeline that outputs fMP4, because that is how the two tools came configured. It works — viewers see video — so nobody flags it. Meanwhile the platform stores two copies of the catalog, the CDN caches two copies of every popular segment, and the egress bill runs higher than it should, forever. The fix is to package both formats from one CMAF set; the savings compound every month the catalog grows.
Three related faults travel with it. The first is encrypting per format or with cenc only — which either doubles the encryption work or silently locks out every FairPlay (Apple) device; the cure is one cbcs encryption, covered in Block 4. The second is a segment duration chosen by default — usually too long for the live latency you promised or too short for the cache efficiency you needed; choose it on purpose against your use case. The third is forgetting the legacy-device tail: a small number of older devices still need transport-stream HLS, so if your audience includes them, plan a narrow .ts fallback rather than discovering the gap in support tickets — but make CMAF the default everything else shares, not the exception.
Where Fora Soft fits in
Packaging is where a streaming platform's storage footprint and cache efficiency are quietly decided, and getting it right at scale — one CMAF master per title, segment durations tuned to the use case, one cbcs encryption feeding every DRM, and the right static-versus-just-in-time call for the catalog shape — is the difference between an egress bill that grows linearly and one that grows twice as fast. Fora Soft has built video streaming, OTT/Internet TV, e-learning, telemedicine, and video surveillance software since 2005, across 625+ shipped projects for 400+ clients, and that work centers on exactly this kind of scale-and-cost engineering: designing packaging and origin workflows so one set of files serves every device, protected once and cached well. When a media company needs a streaming platform whose delivery economics survive a real, growing audience, that packaging-and-origin engineering is the capability we bring.
What to read next
- The encoding ladder explained: renditions, resolutions, bitrates
- CENC, CTR, and CBCS: common encryption explained
- CMAF: the packaging format that unified HLS and DASH
Call to action
- Talk to a streaming engineer — book a 30-minute scoping call to talk through your cmaf packaging plan.
- See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
- Download the CMAF Packaging Decision Worksheet — A one-page worksheet to lock your packaging before you ask a vendor for a quote: choose the container (CMAF fMP4 vs a narrow legacy-TS fallback), set the segment duration against your latency and cache goals, decide static vs….
References
- RFC 8216 — HTTP Live Streaming (HLS) — IETF. §3.2 (MPEG-2 Transport Stream segments), §3.3 (Fragmented MPEG-4 segments built on ISO BMFF, with the explicit statements that a "CMAF Header" and "CMAF Segment" meet the fMP4 requirements), and §4.3.2.5 (the
EXT-X-MAPtag naming the initialization segment). Tier 1 (official standard). https://www.rfc-editor.org/rfc/rfc8216 (accessed 2026-06-16) - ISO/IEC 23000-19 — Common Media Application Format (CMAF) for segmented media — ISO/IEC (current edition 2024; 1st ed. 2018). Defines the CMAF track format derived from ISO BMFF and the addressable media objects (CMAF tracks, fragments, chunks, segments) that let one set of files serve HLS and DASH. Tier 1. https://www.iso.org/standard/85623.html (accessed 2026-06-16)
- ISO/IEC 23009-1 — Dynamic Adaptive Streaming over HTTP (MPEG-DASH) — ISO/IEC. The MPD model (
Period→AdaptationSet→Representation→ segments) that indexes the same CMAF segments in DASH. Tier 1. https://www.iso.org/standard/83314.html (accessed 2026-06-16) - ISO/IEC 14496-12 — ISO Base Media File Format (ISO BMFF) — ISO/IEC. The fragmented-MP4 box structure (
ftyp,moov,moof,mdat) that both fMP4/CMAF segments and the HLS fMP4 format are built on. Tier 1. https://www.iso.org/standard/83102.html (accessed 2026-06-16) - ISO/IEC 23001-7 — Common Encryption (CENC) in ISO BMFF files — ISO/IEC. Defines the
cenc(AES-CTR) andcbcs(AES-CBC) schemes;cbcsis the scheme FairPlay requires and the basis for encrypting one CMAF set for all DRMs. Tier 1. https://www.iso.org/standard/84637.html (accessed 2026-06-16) - HLS Authoring Specification for Apple Devices — Apple Inc. The 6-second target segment-duration recommendation and the fMP4/CMAF authoring rules for HLS. Tier 3 (first-party standards-author). https://developer.apple.com/documentation/http-live-streaming/hls-authoring-specification-for-apple-devices (accessed 2026-06-16)
- DASH-IF Interoperability Guidelines / CMAF ingest and content protection — DASH Industry Forum. Practice for delivering DASH from CMAF segments and applying Common Encryption interoperably. Tier 2 (issuing-body guidance). https://dashif.org/guidelines/ (accessed 2026-06-16)
- Halve Your Encoding, Packaging and Storage Costs — HLS with Fragmented MP4 — Bitmovin. Reports roughly 50% encoding/packaging/storage saving from a single fMP4 output versus maintaining HLS-TS plus DASH, and the ~10% transport-stream overhead. Tier 4 (vendor engineering). https://bitmovin.com/blog/halve-encoding-packaging-storage-costs-hls-fragmented-mp4/ (accessed 2026-06-16)
- What Is CMAF (Common Media Application Format) — Wowza. Industry account of the 2016 Apple/Microsoft CMAF proposal, the single-container unification of HLS and DASH, and the cache-efficiency gain from one shared set of files. Tier 4 (vendor engineering). https://www.wowza.com/blog/what-is-cmaf (accessed 2026-06-16)
- The State of CMAF: The Holy Grail or Just Another Format? — Streaming Media. Industry assessment of CMAF adoption, the storage/cache savings, and the legacy-device caveats. Tier 5 (institutional/analyst). https://www.streamingmedia.com/Articles/ReadArticle.aspx?ArticleID=135142 (accessed 2026-06-16)
Source note (per §4.3.2): the packaging mechanism — fMP4/CMAF segments, the HLS EXT-X-MAP requirement, DASH Representations, ISO BMFF boxes, and the cenc/cbcs encryption schemes — traces to tier-1 standards (refs 1–5). The 6-second default traces to Apple's first-party authoring spec (ref 6) and DASH practice to DASH-IF (ref 7). The storage/cache savings percentages are vendor- and industry-reported (refs 8–10) and labelled as such in-text; no lower-tier source overrode a standard.


