Encoding QC and the Mezzanine Workflow for OTT

Why this matters

If you are a founder, product manager, or first-time streaming CTO, QC is the unglamorous step that decides whether your catalog looks professional or amateur — and it is almost always under-budgeted. A platform that ingests whatever files arrive, transcodes them, and ships them discovers its quality problems the way its viewers do: a black gap where an ad marker was, audio that is twice as loud on one title as the next, a show that fails a broadcaster's delivery spec and gets rejected. Each of those is cheap to catch at the front door and expensive to catch in production, because by then the defect has been copied into every rung of the ladder, cached across a content delivery network, and seen by real people. This article shows you how the master is prepared (the mezzanine and IMF), where the QC gate sits in the pipeline, what it checks and against which standards, and why the right answer is automated checks plus a human pass — not one or the other.

The mezzanine: one good master before a thousand smaller copies

Start with a word most newcomers have never heard: mezzanine. In a building, the mezzanine is the half-floor between the ground and the first floor — not the bottom, not the top, but the useful level in between. In video, a mezzanine file is the master that sits between the camera-original footage at the top and the small delivery files at the bottom: a high-quality, lightly compressed intermediate copy that everything else is made from. It is sometimes called the intermediate or the house master.

Why not just keep the camera original, or just make the delivery files directly? Because the two ends of the pipeline are built for opposite jobs. Camera-original and uncompressed footage is enormous and awkward to work with. Delivery codecs — H.264, HEVC, AV1, the formats described in the encoding ladder explained — are the opposite: they throw away as much data as they can to make the smallest possible file for streaming, and they are lossy, meaning quality is permanently discarded. You never want to edit, re-grade, or re-encode from a delivery file, because every pass loses more. The mezzanine is the deliberate middle: compressed enough to store and move without special hardware, but high enough quality that you can re-encode from it many times — into this year's ladder, next year's new codec, a partner's delivery spec — without visible loss. The phrase the industry uses is visually lossless.

A handful of codecs are built for this middle tier. Apple ProRes is the most common; its bitstream was published as an open standard, SMPTE RDD 36:2015, so it is no longer an Apple-only secret. Avid DNxHD and its 4K successor DNxHR do the same job and were standardized as SMPTE ST 2019 (the VC-3 codec). JPEG 2000 is the high-end choice used inside formal master packages. The point is not which one you pick but that you pick one as your house format — the single mezzanine codec your platform ingests, stores, and transcodes from — so every tool, every operator, and every QC template expects the same thing.

The cost of this tier is storage, and the arithmetic is worth seeing once, because it surprises people. ProRes 422 HQ, a common mezzanine choice, stores roughly 100 gigabytes per hour of content. Walk a modest catalog through it:

Mezzanine storage = catalog hours × GB-per-hour of the house format

  500 titles × 1.5 hours each      = 750 hours of content
  750 hours × 100 GB/hour (ProRes 422 HQ) = 75,000 GB
                                     = 75 terabytes of mezzanine masters

Seventy-five terabytes before you have encoded a single delivery file. That is why the mezzanine is a real line item in the OTT cost model, and why teams tier it — keeping active masters on fast storage and aging the rest to cheap archival storage. The mezzanine is not waste; it is the insurance policy that lets you re-encode the whole catalog when AV1 or the next codec earns its place, without going back to the original elements.

Pipeline from camera original through the mezzanine master and the QC gate to the encoding ladder and delivery. Figure 1. The mezzanine sits between camera-original footage and the delivery ladder: one near-lossless master, checked at the QC gate, that every delivery encode is made from.

IMF: when the master isn't a single file

For premium catalogs there is a more structured kind of master, and it is worth knowing by name because the big distributors require it. The Interoperable Master Format (IMF) is a standard way to package a finished master not as one giant file but as a set of parts plus an instruction sheet. It is defined by the SMPTE ST 2067 family of standards, whose core constraints live in ST 2067-2.

The idea is componentized. Each kind of essence — the video, each audio language, each subtitle track — is stored in its own track file, and a small text document called a Composition Playlist (CPL) lists which parts play in what order to assemble a given version. The payoff is enormous for anyone who ships the same title in many forms. To make a French version, or a version with one scene re-edited for an airline, you do not re-deliver the whole master; you add a small supplemental package with only the changed parts and a new playlist that points mostly at the files you already have. One master, many versions, almost no duplication — the same "package once, assemble many" principle that governs the audio and subtitle tracks in audio, subtitles, and accessibility tracks in the ladder.

This is not a niche format. The widely used profile, IMF Application #2E (SMPTE ST 2067-21), carries JPEG 2000 video in both lossless and lossy forms and supports high dynamic range up to 4096×2160, and major distributors standardize on it — Netflix, for example, requires deliveries to comply with one of the 2016, 2020, or 2023 editions of ST 2067-21 Application #2E, and publishes an open-source validator, Photon, to check that an IMF package is well-formed before it is accepted. That last detail is the tell: even the master format ships with its own automated QC, because the cost of accepting a broken master is too high to leave to inspection by eye.

The QC gate: where a broken master gets caught

Now the gate itself. Quality control (QC) in a streaming pipeline is the checkpoint that inspects media for defects and either passes it, flags it, or rejects it. The single most important decision about QC is where you put it, and the answer is: as early as possible, and again after the encode.

The principle is the oldest one in computing — garbage in, garbage out. If a master arrives with a defect and you transcode it straight into a six-rung encoding ladder, you have not made one broken file; you have made six, plus their packaged segments, plus whatever a content delivery network has already cached. So the first QC pass sits at ingest, before the transcode: files are checked the moment they arrive and held out of the transcoding farm until they pass. A second QC pass sits after the encode, to confirm the delivery files themselves are clean and conformant before they go near a viewer.

Walk the cost of getting this order wrong, with round numbers. Say file-based QC runs at roughly twice real time, so a two-hour master is checked in about an hour of compute — a small, fixed, predictable cost. Now compare the two paths when a master has a stretch of black frames a producer never noticed:

Path A — QC at ingest (correct):
  QC the 2-hour master once        ≈ 1 hour of compute, defect caught
  → reject, fix the master, re-ingest. The ladder is never built on bad input.

Path B — QC skipped at ingest (wrong):
  transcode master → 6-rung ladder    (full transcode compute spent)
  package every rung into segments     (packaging compute spent)
  push to origin + CDN, warm caches    (egress + storage spent)
  viewer complaint → investigate → re-encode all 6 rungs → re-package
  → purge every CDN cache → re-deliver  (all of the above, paid twice)

The defect is identical; the bill is not. Path A pays for one hour of QC. Path B pays for the entire encode-package-deliver chain, twice, plus the support cost and the reputation cost of viewers seeing the defect first. Faster-than-real-time QC at the front door is one of the cheapest insurance policies in the whole platform, and skipping it is one of the most expensive false economies.

What automated QC actually checks

"Check the file" is vague until you see the list. Automated QC software measures a master or an encode against a template — a named set of pass/fail rules — and the rules fall into a few families. None of this is guesswork: the important ones map to published standards.

On the video side, the checks catch the defects a human would notice but a pipeline would otherwise miss: black frames and freeze frames (unintended stretches of black or motionless picture), blockiness and macroblocking (the chunky artifacts of a bad earlier encode), color-bar test patterns left in by mistake, dead or stuck pixels, wrong field order or cadence, unexpected letterboxing or pillarboxing (black bars), and conformance of the basics — resolution, frame rate, codec, and bit depth — to the spec the file is supposed to meet. A category of its own is legal levels / gamut: video color values that sit outside the broadcast-legal range and will clip or shift on a real display.

The single most important video check from a duty-of-care standpoint is photosensitive epilepsy (PSE) screening — detecting flashing and patterns that can trigger seizures. This is measurable, not subjective. International guidance ITU-R BT.1702 (current edition BT.1702-3, 2023) defines the danger precisely: a harmful flash is a pair of opposing luminance changes of 20 cd/m² or more, where the flashing occupies more than a quarter of the screen at a frequency above 3 Hz. In the United Kingdom, the regulator Ofcom has required every program to be PSE-tested before transmission since 2019, and the long-standing reference tool is the Harding Flash and Pattern Analyser. The United States has no equivalent federal mandate, but any platform serving a broad audience screens for it, because the harm is real and the test is automatable.

On the audio side, the checks find silence where there should be sound, channels mapped to the wrong speakers, clipping, and — the one viewers feel most — loudness outside the catalog's target. Loudness is measured, not estimated, using the algorithm in ITU-R BS.1770 (current edition BS.1770-5, 2023), whose unit is the LUFS (identical to the LKFS you will see in American documents). Around that measurement sit the targets: −23 LUFS for European broadcast (EBU R128), −24 LKFS for US broadcast (ATSC A/85, the practice behind the CALM Act), and the quieter ≈ −27 LKFS that on-demand streamers commonly use. The loudness target decision belongs to your track plan — it is covered in audio, subtitles, and accessibility tracks, and the measurement internals live in our loudness normalization write-up — but the enforcement of it lives here, at the QC gate.

The defect that spans both lanes is lip-sync, the drift between picture and sound, usually quoted in milliseconds. It is the classic "looks fine in the edit, wrong on delivery" fault, and a dedicated A/V-sync check is the only reliable way to catch it across a whole catalog.

A QC gate fanning into video checks, audio checks, and the PSE safety check, each tied to its standard. Figure 2. The families of automated QC checks — video integrity, audio integrity, and the photosensitive-epilepsy safety screen — each tied to a named standard.

Baseband versus file-based QC

Two words appear in QC product sheets and confuse newcomers: baseband and file-based. They name when the check happens.

Baseband QC inspects the live video signal as it plays — the picture coming down a cable in real time, the way a broadcast control room has always monitored. It is real-time by nature: a two-hour program takes two hours to watch, and it needs hardware to tap the signal. File-based QC inspects the file on disk, decoding it as fast as the computer allows. This is the modern default for streaming, for one decisive reason: it can run faster than real time and in parallel. A QC farm can check a hundred files at once, each faster than you could watch it, which is the only way to keep up with a catalog measured in thousands of hours. For an OTT platform building a file-based pipeline, file-based QC is the natural fit; baseband QC remains relevant mainly where a live signal has to be monitored as it airs.

Automated plus human: the two-pass reality

Here is the part teams get wrong in both directions: they either trust software completely or insist a person watch everything. The right model is two passes, because the two reviewers catch different things.

Automated QC is unbeatable at the measurable and the tedious. It never gets bored on hour seven, it checks every frame rather than sampling, and it turns "is the loudness right?" into a number with a pass/fail line. What it cannot do is judge meaning. Software can confirm a subtitle track exists, is valid, and is in sync; it cannot reliably tell you the translation is wrong, the subtitle covers a face at the worst moment, the wrong episode was ingested under the right name, or a grade looks subtly off in a way no metric flags. Those are jobs for a trained human reviewer — the "eyeball" or "golden-eye" pass — spot-checking the content the way a viewer experiences it.

The economical pattern is to let the machine do the bulk filtering and point the human at what matters: every file gets the full automated pass, and a person reviews the flagged files plus a random sample of the clean ones. Automation gives you coverage and consistency; human review gives you judgment. Neither alone is enough — automated-only ships subtly wrong content with confidence, human-only cannot scale past a few titles a day and still misses the measurable faults a person's eyes gloss over.

QC after the encode: did the ladder survive?

The QC gate at ingest protects the master. A second question matters just as much: did the encode preserve it? Compression is lossy by design, and a ladder rung that is set too aggressively can introduce banding, blocking, or softness that was never in the master. Checking this by eye across every rung of every title does not scale, so the industry uses perceptual quality metrics — numbers that estimate how good an encode looks to a human compared with its source.

The metric that changed this field is VMAF (Video Multi-method Assessment Fusion), an open-source measure Netflix developed and released, and which won a Technology & Engineering Emmy. Unlike the older PSNR and SSIM metrics, which compare pixels mathematically and often disagree with human eyes across different shots and resolutions, VMAF is trained on human quality scores and predicts perceived quality more consistently. It ships inside the common FFmpeg toolchain, so it is within reach of any pipeline. The product use is simple: set a target VMAF score for your ladder, and let post-encode QC flag any rung that falls below it — a content-aware quality floor rather than a guess. This is the same machinery that powers the savings in per-title and context-aware encoding: you can only safely lower a bitrate if a perceptual metric confirms the quality held.

QC tools and delivery templates

You do not build these checks from scratch. A mature category of file-based QC software ships the checks, the standards, and the delivery templates — pre-built rule sets matching a specific distributor's spec, so "does this pass Netflix?" becomes one click. The table below compares the main platforms on the two axes that matter for a build-vs-buy decision: whether the tool can automatically correct common faults (not just flag them) and whether it ships ready-made distributor templates.

QC platform	Type	Auto-correct?	Distributor templates?	Best fit
Interra Systems BATON	File-based, AI/ML-assisted	Via add-on (Content Corrector)	Yes — Netflix, iTunes, DPP, CableLabs, ARD-ZDF	High-volume OTT / broadcast libraries
Telestream Vidchecker	File-based	Yes — fixes levels, loudness, gamut, then re-encodes	Yes — Netflix, DPP	Post-production and mid-size delivery
Venera Pulsar	File-based (incl. cloud)	Flag for remediation	Yes — broad template set, PSE module	Cloud-native and hybrid pipelines
Tektronix / Telestream Cerify	File-based	Flag for remediation	Yes	Established broadcast verification
FFprobe + VMAF (open source)	File-based, scriptable	No	No — you build the rules	Lean teams, custom pipelines, output QC

Table 1. File-based QC platforms with the two coverage columns that drive the decision — can it auto-correct, and does it ship the distributor templates you need. Capabilities and template lists are dated (2026) and change; verify against the current vendor docs and your exact delivery specs.

The auto-correct column is more consequential than it looks. A tool that only flags a loudness or gamut error sends the file back to an editor; a tool that corrects the level and re-encodes a compliant file closes the loop without a human round-trip. For a high-volume catalog that distinction is the difference between a QC step that keeps up and one that becomes the bottleneck. Delivery templates matter just as much for anyone feeding third parties: broadcaster and platform specs such as the UK's AS-11 / DPP (an MXF-based air-ready master format with its own mandatory QC checklist) are exacting, and a template that encodes the spec for you removes a whole class of rejected deliveries.

A common mistake: QC the output, never the source

The most common QC failure in a young platform is to test only the delivery files and never the master. It feels reasonable — the delivery files are what viewers get — but it inspects too late. By the time you QC the output, you have already spent the transcode and packaging compute, and if the fault was in the master it now exists in every rung you have to redo. The discipline is to QC the master at ingest and the encodes after, so a bad source is rejected before it is multiplied and a bad encode is caught before it is delivered.

Three related faults travel with it. The first is automated-only QC: trusting the software's green light and never having a person watch, which ships the wrong-episode, bad-subtitle, off-grade defects no metric catches. The second is no house format: ingesting every codec a supplier sends, so each title needs different handling and the QC templates never stabilize — pick one mezzanine format and require it. The third is no loudness gate: letting masters arrive at whatever loudness their mix happened to have, so viewers ride the volume control between titles even though loudness is one of the most measurable, most automatable checks there is. Each of these is cheap to design in at the gate and expensive to retrofit across a live catalog.

Where Fora Soft fits in

The QC gate and the mezzanine workflow are where a streaming catalog's reliability, compliance, and re-encode-ability are quietly decided, and engineering them so they scale — a single house mezzanine format, an ingest gate that rejects a bad master before it is multiplied, automated checks tied to ITU-R BT.1702 and ITU-R BS.1770 and the distributor specs, post-encode VMAF floors, and a human review pass pointed at what metrics miss — is the difference between a catalog that looks professional at ten thousand hours and one that ships its defects to viewers. Fora Soft has built video streaming, OTT/Internet TV, e-learning, telemedicine, and video surveillance software since 2005, across 625+ shipped projects for 400+ clients, and that work centers on exactly this kind of ingest-to-delivery pipeline engineering: media-handling workflows that validate, transcode, and package content at catalog scale. When a media company needs a QC and mastering pipeline that holds up under a real, growing library, that workflow engineering is the capability we bring.

Call to action

Talk to a streaming engineer — book a 30-minute scoping call to talk through your encoding qc plan.
See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
Download the Ingest QC Gate Checklist — A one-page checklist to design the QC gate before you ask a vendor for a quote: pick one house mezzanine format, define the automated video and audio check set, set the photosensitive-epilepsy (ITU-R BT.1702) and loudness (ITU-R….

References

SMPTE RDD 36:2015 — Apple ProRes Bitstream Syntax and Decoding Process — Society of Motion Picture and Television Engineers. The published standard for the ProRes mezzanine codec's bitstream and decoding, establishing ProRes as an open, documented intermediate format. Tier 1 (official standard). https://ieeexplore.ieee.org/document/7290956 (accessed 2026-06-16)
SMPTE ST 2019 (VC-3 / DNxHD) and ST 2067-70 (VC-3 in IMF) — SMPTE. The standardized form of the Avid DNxHD/DNxHR mezzanine codec family. Tier 1. https://www.smpte.org/blog/smpte-and-avid-publish-new-st-2067-70-standard-for-the-vc-3-codec-in-interoperable-master-format-imf (accessed 2026-06-16)
SMPTE ST 2067 — Interoperable Master Format (IMF) — SMPTE. The suite of standards defining IMF; core constraints in ST 2067-2; Application #2E in ST 2067-21 (2016/2020/2023 editions) carrying JPEG 2000 essence and HDR up to 4096×2160. Tier 1. https://www.smpte.org/standards/st2067 (accessed 2026-06-16)
Recommendation ITU-R BT.1702 — Guidance for the reduction of photosensitive epileptic seizures caused by television — ITU-R (current edition BT.1702-3, November 2023). Defines a harmful flash (≥ 20 cd/m² opposing luminance change, > 25% of screen, > 3 Hz). Tier 1. https://www.itu.int/rec/R-REC-BT.1702 (accessed 2026-06-16)
Recommendation ITU-R BS.1770 — Algorithms to measure audio programme loudness and true-peak audio level — ITU-R (current edition BS.1770-5, November 2023). The K-weighted loudness algorithm and the LUFS/LKFS unit that every loudness QC check measures against. Tier 1. https://www.itu.int/rec/R-REC-BS.1770 (accessed 2026-06-16)
EBU R 128 — Loudness normalisation and permitted maximum level of audio signals — European Broadcasting Union. The −23 LUFS integrated target and −1 dBTP true-peak ceiling enforced at the audio QC gate for European delivery. Tier 2 (standards-body recommendation). https://tech.ebu.ch/publications/r128 (accessed 2026-06-16)
ATSC A/85 — Techniques for Establishing and Maintaining Audio Loudness for Digital Television — Advanced Television Systems Committee. The −24 LKFS US broadcast target behind the CALM Act. Tier 2. https://www.atsc.org/atsc-documents/a85-techniques-for-establishing-and-maintaining-audio-loudness-for-digital-television/ (accessed 2026-06-16)
AMWA AS-11 / DPP UK HD Delivery Specification — Advanced Media Workflow Association / Digital Production Partnership. The MXF-based air-ready master format and the QC requirements that accompany a valid AS-11 DPP delivery. Tier 2 (industry-body specification). https://amwa-tv.github.io/AS-11_UK_DPP_HD/AMWA_AS_11_UK_DPP_HD.html (accessed 2026-06-16)
VMAF: Video Multi-method Assessment Fusion — Netflix (open source). The perceptual quality metric (with PSNR, SSIM, MS-SSIM implementations) used for post-encode quality QC; ships in FFmpeg via libvmaf. Tier 3 (first-party engineering / open-source). https://github.com/Netflix/vmaf (accessed 2026-06-16)
Photon — IMF package validator — Netflix (open source). A reference implementation that validates IMF packages against ST 2067, illustrating automated QC of the master format itself. Tier 3. https://github.com/Netflix/photon (accessed 2026-06-16)
BATON automated QC platform — Interra Systems. File-based, AI/ML-assisted QC with distributor test plans (Netflix, iTunes, DPP, CableLabs, ARD-ZDF) and the Content Corrector module. Tier 3 (vendor engineering material, dated 2026). https://www.interrasystems.com/Media-QC.php (accessed 2026-06-16)
Vidchecker — automated file-based QC and correction — Telestream. File-based QC that auto-corrects levels, loudness, and gamut and re-encodes a compliant file; ships Netflix and DPP templates. Tier 3 (vendor material, dated 2026). https://www.telestream.net/vidchecker/overview.htm (accessed 2026-06-16)

Source note (per §4.3.2): the mezzanine codecs and master format trace to tier-1 SMPTE standards (refs 1–3); the two safety-and-conformance measurements that anchor automated QC — photosensitive-epilepsy flashing and loudness — trace to tier-1 ITU-R Recommendations BT.1702-3 and BS.1770-5 (refs 4–5). Specific loudness targets are EBU/ATSC recommendations (refs 6–7) and the delivery spec is AMWA/DPP (ref 8); the −27 LKFS on-demand practice is vendor-reported and labelled in-text. Perceptual-quality and validation tooling (VMAF, Photon) and the commercial QC platforms (refs 9–12) are first-party/vendor sources, dated and used for "what actually ships," never to override a standard.

Why this matters

The mezzanine: one good master before a thousand smaller copies

IMF: when the master isn't a single file

The QC gate: where a broken master gets caught

What automated QC actually checks

Baseband versus file-based QC

Automated plus human: the two-pass reality

QC after the encode: did the ladder survive?

QC tools and delivery templates

A common mistake: QC the output, never the source

Where Fora Soft fits in

What to read next

Call to action

References

Related glossary terms

Encoding QC and the Mezzanine Workflow for OTT

Why this matters

The mezzanine: one good master before a thousand smaller copies

IMF: when the master isn't a single file

The QC gate: where a broken master gets caught

What automated QC actually checks

Baseband versus file-based QC

Automated plus human: the two-pass reality

QC after the encode: did the ladder survive?

QC tools and delivery templates

A common mistake: QC the output, never the source

Where Fora Soft fits in

What to read next

Call to action

References

Related glossary terms

Mezzanine

Ingest

Codec

VMAF

Encoding ladder

AV1

Transcoding

Bitrate