Published 2026-06-01 · 18 min read · By Nikolay Sapunov, CEO at Fora Soft

Why This Matters

If your product turns a prompt into a video clip, you are running a small factory that can produce something embarrassing, something expensive, or something illegal — sometimes all three from a single button press. The teams that ship generative video features and survive are not the ones with the best model; they are the ones with the best gates around an average model. This lesson is written for the product manager, founder, or engineering lead who has wired a generative video API into a product and now has to make it safe to operate at scale. It is the control-layer companion to the generative video landscape, which covers the models, and to the AI video editor tools lesson, which covers the editing pipeline those gates wrap around. The cost math here builds directly on the real cost of AI in video lesson.

Three Gates Around One Model

Start with the mental model that organizes everything below. A generative video feature is not just "model in, video out." It is a model wrapped in three checkpoints, each of which can stop a clip from reaching the user. Picture an airport: a plane (the clip) cannot reach the gate (the user) without passing security (quality), a ticket check (cost), and customs (disclosure). Skip any one and you have a problem waiting to happen.

The first checkpoint is the quality gate. After the model produces a clip, something has to decide whether the clip is good enough to ship. Is the face intact, or did it melt? Did the text in the scene come out as gibberish? Is the motion smooth, or does the video flicker? A quality gate is the automated judge that answers these questions and either passes the clip, sends it back for another try, or routes it to a human.

The second checkpoint is the cost gate. Generative video is metered, and the meter runs fast. A cost gate is the budget guard that stops a single request, a single user, or a single day from spending more than you decided to allow. Without it, one retry loop or one abusive user can produce a bill that arrives before your alert does.

The third checkpoint is the disclosure gate. Before the clip leaves your system, it must be marked as AI-generated — embedded with a signed record of what made it, stamped with an invisible watermark, and logged on your server. This is the checkpoint the law now cares about, and it is the one teams forget until a regulator or a journalist asks.

Hold this three-gate picture in your head. The rest of the lesson builds each gate in turn, then shows how they fit together as stations in one pipeline.

Three-gate pipeline diagram. A generation model on the left feeds a clip into three sequential checkpoints: Quality gate (automated judge: pass, retry, or send to human), Cost gate (budget guard: per-request, per-user, per-day caps), and Disclosure gate (C2PA manifest, invisible watermark, server log). A finished, marked clip exits on the right to the user. Each gate box lists what it checks and what it can do to a failing clip. Figure 1. The three gates every generative-video feature needs. The model is the easy part; the gates are what make it safe to operate.

Gate One: Quality — The Automated Judge

The quality gate answers one question: is this clip good enough to show a user without a human looking at it first? You need this because generative video fails in ways that are obvious to a person and invisible to the model that made it. The model thinks it succeeded; the face it rendered has three eyes.

Why you cannot just trust the model

A generative model returns a clip and a quiet implication that the clip is fine. It has no idea whether it is fine. It cannot see that the hands have six fingers, that a brand logo got mangled into a near-miss of a real trademark, or that the requested "calm product demo" came out frantic. Left ungated, these failures reach users, and on a platform that publishes them, they reach the public.

So you add a judge between the model and the user. The judge is itself a model — usually a vision-language model, the kind that can look at frames and answer questions about them, which we cover in the closed-frontier VLM lesson. You show it the generated clip and ask specific questions: does this match the prompt, are faces and hands intact, is there readable text and is it correct, is the motion stable. This pattern is often called "LLM-as-judge," and for video it means feeding sampled frames to a multimodal model and reading back a structured verdict.

Three layers of check, cheap to expensive

A good quality gate is not one check but a stack, ordered cheapest-first so you spend the most only on clips that survive the cheap checks.

The first layer is technical validation — the free checks. Did the file decode? Is it the right length, resolution, and frame rate? Is the average brightness in a sane range, so you catch all-black or all-white renders? These are arithmetic on the pixels and metadata; they cost almost nothing and catch the grossest failures.

The second layer is automated perceptual scoring — cheap model checks. Run a small model to detect faces and count fingers, or compute a no-reference quality score that estimates how "natural" the video looks without needing an original to compare against. This catches the melted-face class of failure.

The third layer is the VLM judge — the expensive check. Sample a handful of frames, send them to a multimodal model with your prompt and a rubric, and read back a pass/fail with reasons. You only reach this layer for clips that passed the two cheaper ones, so you pay for the smart judge sparingly.

The arithmetic of a retry budget

Here is the math that makes a quality gate safe instead of ruinous. Say a clip costs $0.50 to generate and your model passes the quality gate 70% of the time on the first try. If you blindly retry until you get a pass, the average number of attempts is one divided by the pass rate:

average attempts = 1 / 0.70 = 1.43 attempts
average cost     = 1.43 × $0.50 = $0.71 per delivered clip

That is fine. Now imagine the prompt is hard and the pass rate drops to 20%:

average attempts = 1 / 0.20 = 5.0 attempts
average cost     = 5.0 × $0.50 = $2.50 per delivered clip

The same feature now costs three and a half times as much, and a small slice of pathological prompts can drive the average far higher. This is the documented 2026 reality: teams report a clip they expected to cost $1.50 ballooning to $5 after five regeneration attempts. The fix is a hard cap: try at most N times, then stop and route to a human or return a clear failure. The quality gate and the cost gate are joined at the hip — the retry loop is exactly where an unbounded quality gate becomes an unbounded bill.

Common mistake: an unbounded retry loop with no human exit. The most expensive bug in generative video is the loop that says "if the clip fails quality, generate again" with no maximum and no human fallback. A handful of prompts the model simply cannot satisfy will retry forever, each retry billing you, until either your budget alert fires or a user gives up. Always cap retries at a small number — three is a common choice — and when the cap is hit, fail loudly or escalate to a person. A clip that needs five tries is telling you the prompt is wrong or the model is the wrong tool; spending more money will not fix it.

Gate Two: Cost — The Budget Guard

The cost gate exists because generative video is the most expensive thing your product does per unit of output, and the cost is variable, not fixed. You are not buying a flat subscription; you are paying per second of video, per generation, per retry. A budget guard turns an open-ended meter into a bounded one.

Three caps at three scopes

A complete cost gate enforces a limit at three levels, because spend can run away at any of them.

The per-request cap bounds a single generation: a maximum clip length, a maximum resolution, and the retry limit from the quality gate. This stops one request from being unexpectedly huge.

The per-user cap bounds what one account can spend in a window — a daily or monthly credit balance. This is what protects you from one heavy user, or one abusive one, draining your margin. It is also why nearly every consumer generative tool meters you in "credits" rather than charging flat: credits are the per-user cost gate, exposed to the customer.

The per-system cap bounds your total spend across all users in a window, with an alert and an automatic throttle when you approach it. This is the circuit breaker that keeps a viral spike or a bug from producing a five-figure surprise on your cloud bill overnight.

A worked example of the budget that protects margin

Walk one concrete case. Your product charges users a flat $20 a month and lets them generate video clips. Each clip costs you $0.50 to produce after quality-gate retries. To keep a healthy margin, you decide each user's generation cost should not exceed $8 a month — leaving $12 to cover everything else and profit.

monthly cost budget per user = $8.00
cost per delivered clip      = $0.50
clips allowed per user       = $8.00 / $0.50 = 16 clips per month

So you grant each user 16 credits a month. The per-user cap is not an arbitrary number; it is your margin target divided by your unit cost. When your unit cost rises — a model price increase, a harder average prompt, a worse pass rate — the same formula tells you to lower the credit grant or raise the price. The cost gate is where your unit economics become a rule the system enforces, instead of a number you discover at the end of the month. We go deep on this calculation in the cost-model lesson.

Cost gate diagram showing three nested budget scopes. An outer box labelled per-system cap (total monthly spend, alert and throttle at threshold) contains a middle box labelled per-user cap (credits per account per window), which contains an inner box labelled per-request cap (max length, max resolution, max retries). A worked formula panel on the right shows margin target divided by unit cost equals credits per user, with the 8 dollars divided by 0.50 equals 16 clips example. Arrows show a request being checked against all three caps before generation proceeds. Figure 2. The three nested cost caps. The per-user credit number is your margin target divided by your unit cost — not a guess.

Gate Three: Disclosure — The Law Walks Onto Your Pipeline

The third gate is the one that turned from good practice into legal obligation. From 2 August 2026, the European Union's AI Act, Article 50, sets transparency rules for AI-generated content, and two of its clauses land squarely on any product that generates or manipulates video.

What Article 50 actually says

Article 50 is short, and the two parts that matter to you are worth reading in plain terms.

Paragraph 2 — the marking obligation, on providers. Anyone who provides an AI system that generates synthetic audio, image, video, or text must ensure the outputs are "marked in a machine-readable format and detectable as artificially generated or manipulated." The marking must be effective, interoperable, robust, and reliable "as far as this is technically feasible." There is a carve-out: the obligation does not apply where the AI performs only an assistive function for standard editing, or does not substantially alter the input. So a tool that nudges color balance is exempt; a tool that generates a new scene is not.

Paragraph 4 — the disclosure obligation, on deployers. Anyone who deploys an AI system that generates or manipulates video constituting a "deep fake" must disclose that the content has been artificially generated or manipulated. The law defines a deepfake as content resembling real persons, places, or events that would falsely appear authentic. There is a lighter touch for evidently artistic or satirical work — you still disclose, but in a way that does not spoil the piece.

Read these against a real product. If your app generates a talking-head presenter from a script — the avatar and lip-sync tools we cover in the Tavus, HeyGen, and Synthesia lesson — paragraph 4 applies, because that is a deepfake, and you must disclose it. If your platform offers "generate a video from this prompt," paragraph 2 applies — you must mark every output as machine-readably AI-made. The difference between "provider" and "deployer" can both apply to you at once if you build on a model and ship to end users.

The price of getting it wrong

Article 50 is not advisory. Under Article 99, the penalties article, breaching the transparency obligations of Article 50 sits in the tier that reaches up to 15 million euros, or 3% of total worldwide annual turnover, whichever is higher. For a small company the 15-million figure dominates; for a large one the percentage does. For start-ups and small businesses the lower of the two amounts applies, but "lower" here is still a number that ends a runway. Treat disclosure as a feature with a deadline, not a nice-to-have.

The mechanism: C2PA, watermark, log

The law says "mark it" without prescribing exactly how, but the European Commission has been drafting a Code of Practice on marking and labelling AI-generated content to fill that gap. The Code's drafting ran from a kickoff in November 2025, through a first draft on 17 December 2025 and a second draft on 3 March 2026, toward a final version expected in mid-2026. While the Code is voluntary, it is the benchmark regulators and courts will reach for, so the mechanisms it centers on are the mechanisms to build.

Those mechanisms come in three layers, and a dependable disclosure gate uses all three because each covers the others' weakness.

The first layer is C2PA Content Credentials — the signed manifest. C2PA, from the Coalition for Content Provenance and Authenticity, defines a standard "manifest" that travels inside the media file. The manifest is a record: which AI system generated or edited the content, when, and which organization cryptographically signed the claim. Anyone can open the file and verify the signature, the way you verify a website's certificate. The latest technical specification is version 2.4. The manifest is attached with a "hard binding" — a SHA-256 hash over the file's bytes, so any tampering breaks the seal and is detectable.

The second layer is an invisible watermark — the durable backup. A hard binding is exact but brittle: re-encode the video, crop it, or screenshot a frame, and the embedded manifest can be stripped. So C2PA adds the concept of a "soft binding" — an imperceptible watermark or perceptual fingerprint woven into the pixels that survives compression, cropping, and format changes. Google's SynthID is the best-known example for video: a frame-level watermark added at generation time, detectable by a matching network, which by 2026 had marked more than ten billion pieces of content and shipped by default across Google's generative products. When the manifest is gone, the watermark still says "AI-made," and can even point back to a manifest stored in the cloud.

The third layer is a server-side log — the fallback of record. Independent of what survives in the file, you keep your own immutable record: this clip, generated by this model, for this user, at this time, with this prompt, disclosed in this way. When a regulator asks "can you show that you disclosed," the log is your evidence. It is also where your consent record lives — proof that the person whose likeness or voice was used agreed to it.

Disclosure layers diagram. A generated video clip in the center is wrapped by three concentric protective layers. Innermost: invisible watermark (SynthID-style, survives crop, compression, re-encode). Middle: C2PA Content Credentials manifest (signed record of generator, time, signer; hard-binding SHA-256 hash). Outer: server-side log (immutable record of clip, model, user, prompt, consent, disclosure). A side panel maps each layer to what it survives and what it proves, and notes the EU AI Act Article 50 deadline of 2 August 2026 with the 15M euro / 3 percent penalty. Figure 3. The three disclosure layers. Each covers the others' blind spot: the watermark survives when the manifest is stripped; the log survives when both are.

The user-facing half: a label and a consent record

Machine-readable marking satisfies paragraph 2, but paragraph 4 is about a human understanding that what they are watching is synthetic. So disclosure has a visible half too: a clear label on or beside the video — "AI-generated" — shown, as the law puts it, "at the latest at the time of the first interaction or exposure." For a deepfake of a real person, you also need the consent that made it lawful to create in the first place, captured and stored before generation, not after. The same consent-and-biometrics discipline shows up wherever a product handles a real person's face — the consent engineering we cover in the face detection and recognition under the EU AI Act lesson. The engineering rule is simple: capture consent at the start of the flow, mark and log at the end, and surface the label whenever the clip plays.

Putting The Three Gates Together

The three gates are not separate products bolted on at the end. They are stations in the same pipeline, in a fixed order, and the order matters.

A request arrives. First the cost gate's per-request and per-user caps check it — is this user within budget, is the request within size limits? If not, stop here, before spending a cent on generation. Then the model generates. Then the quality gate judges — pass, or retry within the capped budget, or escalate to a human. Only a clip that passes quality moves on. Then the disclosure gate marks it — embed the C2PA manifest, apply the watermark, write the log, attach the visible label. Only then does the clip reach the user. A clip that fails any gate never becomes a problem, because it never reaches an audience.

There is a discipline here worth naming. Each gate fails closed, not open. If the quality judge is unavailable, the safe default is to hold the clip for a human, not to ship it unchecked. If the disclosure service is down, the safe default is to not release the clip, not to release it unmarked. A gate that fails open is not a gate.

Decision-flow diagram of the full pipeline. Left to right: a request enters; first diamond checks cost caps (within budget?) with a fail path to reject; then a generate box; then a diamond for the quality gate (pass?) with a fail path that loops back to generate up to the retry cap, and an escalate-to-human path when the cap is hit; then a disclosure box that embeds C2PA, applies watermark, writes log, attaches label; then the marked clip reaches the user. A note states every gate fails closed: when a gate's service is unavailable, the clip is held, never shipped unchecked. Figure 4. The three gates as one ordered pipeline. Cost checks first (before you spend), quality second, disclosure last — and every gate fails closed.

Build, Buy, Or Inherit Each Gate

You do not have to build all three gates from scratch, and you should not assume your model vendor built them for you.

Some of this you inherit from the model provider. If you generate video through Google's Veo, the SynthID watermark is applied at the source — you inherit the watermark layer. Many frontier providers now emit C2PA manifests automatically. Your job there is to verify the vendor emits what you need and not to strip it downstream.

Some of this you buy. Content-authenticity tooling, watermark-detection services, and quality-scoring APIs exist as products. For a small team, buying the disclosure and quality layers can be faster than building them.

Some of this you build, always. The cost gate is yours — only you know your margins and your abuse patterns. The server-side log is yours — it is your evidence and your consent record. The wiring that makes the three gates fail closed in the right order is yours. The judgment of where each threshold sits — the retry cap, the credit grant, the quality bar — is a product decision no vendor makes for you.

The common failure is assuming the model vendor handled disclosure end-to-end. They handle their layer; the visible label, the consent record, the log, and the obligation to not strip the marking are yours as the deployer. Article 50 puts duties on both the provider and the deployer, and shipping to end users usually makes you the deployer.

Where Fora Soft Fits In

We build the video products these gates protect — video conferencing, OTT and Internet TV platforms, e-learning systems, telemedicine apps, and video surveillance software. When a client adds a generative feature — an AI presenter in an e-learning course, a synthetic b-roll generator for an OTT library, an avatar in a conferencing tool — the model is the smallest part of the work. The work is the three gates: a quality judge tuned to the client's content, a cost gate that protects their unit economics, and a disclosure layer that embeds C2PA, applies a watermark, logs consent, and shows the right label for the vertical and jurisdiction. We treat disclosure as a pipeline station designed in from the start, because in regulated verticals like telemedicine, retrofitting it after launch is the expensive path.

What To Read Next

Talk To Us / See Our Work / Download

  • Talk to a video engineer — bring us your generative video feature and we will design the three gates around your content, margins, and jurisdiction. Book a 30-minute scoping call.
  • See our case studies — video conferencing, OTT, e-learning, and telemedicine products we have shipped since 2005. View the portfolio.
  • Download the Generative Video Compliance Checklist — the three gates, the EU AI Act Article 50 obligations, the C2PA / watermark / log layers, and the consent steps, on one page. Download the PDF.

References

  1. EU Artificial Intelligence Act — Article 50, Transparency Obligations (full normative text: §2 marking obligation on providers, §4 deepfake disclosure on deployers, assistive-editing carve-out, §5 timing; in force 2 Aug 2026 per Article 113). https://artificialintelligenceact.eu/article/50/ — accessed 2026-06-01. Tier 1 (primary law; Regulation (EU) 2024/1689).
  2. EU Artificial Intelligence Act — Article 99, Penalties (Article 50 transparency breaches fall in the up-to €15,000,000 or 3% of worldwide annual turnover tier; lower of the two for SMEs/start-ups per Art. 99(6)). https://artificialintelligenceact.eu/article/99/ — accessed 2026-06-01. Tier 1.
  3. European Commission — Code of Practice on marking and labelling of AI-generated content (Working Group 1 Providers / WG2 Deployers; deepfake definition; drafting timeline: kickoff 5 Nov 2025, first draft 17 Dec 2025, second draft 3 Mar 2026, final mid-2026; voluntary tool to demonstrate Art. 50(2)/(4) compliance). https://digital-strategy.ec.europa.eu/en/policies/code-practice-ai-generated-content — accessed 2026-06-01. Tier 1.
  4. C2PA — Content Credentials Technical Specification v2.4 (Manifest, Claim, Claim Signature; hard binding SHA-256 over asset bytes; signer key model). https://spec.c2pa.org/specifications/specifications/2.4/specs/C2PA_Specification.html — accessed 2026-06-01. Tier 1 (standards body).
  5. C2PA — Soft Binding and Durable Content Credentials (soft binding via perceptual hash / invisible watermark to survive transcoding and discover a cloud-stored manifest when the embedded one is stripped). https://spec.c2pa.org/specifications/specifications/2.2/softbinding/Decoupled.html — accessed 2026-06-01. Tier 1.
  6. C2PA and Content Credentials Explainer 2.2 (manifest store, assertions, hard vs soft binding, durable Content Credentials concept). https://spec.c2pa.org/specifications/specifications/2.2/explainer/Explainer.html — accessed 2026-06-01. Tier 1.
  7. Google DeepMind — SynthID (frame-level invisible video watermarking applied at generation, detector network with confidence score, survives crop/filter/frame-rate/compression; 10B+ items watermarked by 2026; ships across Gemini, Imagen, Lyria, Veo). https://deepmind.google/models/synthid/ — accessed 2026-06-01. Tier 3 (first-party engineering).
  8. Digimarc — "C2PA 2.1: Strengthening Content Credentials with Digital Watermarks" (why hard bindings break on re-encode and how soft-binding watermarks restore durability). https://www.digimarc.com/blog/c2pa-21-strengthening-content-credentials-digital-watermarks — accessed 2026-06-01. Tier 4 (deployer vendor).
  9. Herbert Smith Freehills Kramer — "Transparency obligations for AI-generated content under the EU AI Act: from principle to practice" (deployer vs provider duties; practical compliance reading of Art. 50). https://www.hsfkramer.com/notes/ip/2026-03/transparency-obligations-for-ai-generated-content-under-the-eu-ai-act-from-principle-to-practice — accessed 2026-06-01. Tier 6 (legal analysis; secondary to the law itself).
  10. "The 2026 AI Video Production Playbook" / 2026 production-cost reporting (regeneration economics: a clip expected at ~$1.50 reaching ~$5 after five attempts; cost-per-finished-video as the optimization target). https://medium.com/data-science-collective/the-2026-ai-video-production-playbook-bc683d5b85da — accessed 2026-06-01. Tier 7 (practitioner aggregation; cited as directional, year-labelled).