Self-Hosting Open-Weights Video — HunyuanVideo, CogVideoX, Mochi, And LTX-Video

Why This Matters

If you build a video product, sooner or later someone asks the obvious question: "instead of paying OpenAI or Runway forever, why don't we just run one of these free open models ourselves?" It sounds like a clean way to cut a recurring bill, and sometimes it is — but more often it trades a predictable invoice for an unpredictable engineering project, and the people asking rarely know which case they are in. This lesson is written for the founder, product manager, or engineering lead who has to answer that question with numbers instead of vibes. It builds on the generative-video landscape lesson, which explains what each model can do, and the closed-API pricing lesson, which costs the rent-it path; here we cost the own-it path and tell you when it wins.

What "Open-Weights" Actually Means

Start with the term itself, because it is the whole subject. An AI video model is, underneath, a giant table of learned numbers — billions of them — called the model's weights. The weights are what the model "knows": run an image or a prompt through them in the right order and a video comes out. A closed model, like the ones in the previous two lessons, keeps that table on the vendor's servers where you never see it; you send a request over the internet and get a clip back. An open-weights model is the opposite: the vendor publishes the entire table for you to download, so you can run the model on your own computer, with no request ever leaving your building.

The plain-language analogy is the difference between ordering a cake and getting the recipe. A closed API is the bakery — you pay per cake, you never see the kitchen, and you live by the bakery's menu and hours. Open weights are the published recipe and the keys to a kitchen — now you can bake any time, change the recipe, and bake a thousand cakes for the cost of ingredients, but you also have to own the oven, learn to bake, and clean up afterward. Neither is "better" in the abstract. Which one wins depends entirely on how many cakes you need, how secret your ingredients are, and whether you are allowed to resell what comes out — and that last point is where most teams trip.

One caution before we go further. "Open weights" is not the same as "free to do anything with". The weights are published, but each one comes with a licence — a legal document that says what you are allowed to do with them — and those licences range from "use it for anything, including a paid product" all the way down to "research and personal use only, no commercial deployment". A model can be a free download and still be illegal to put inside the product you sell. We will treat the licence as the first thing you check on every model, not the last.

The Two Questions That Decide Everything

Before any benchmark, before any demo, two questions filter the field down to the one or two models you should even test. Ask them in this order.

The first question is "what does the licence let me sell?" This is a yes/no gate, and it comes first because no amount of quality matters if you legally cannot ship the output inside your product. Some open video models carry a true open-source licence — Apache 2.0 is the gold standard — which lets you use the output in a commercial product, charge money for it, fine-tune the model, and never ask permission. Others carry a custom licence that restricts commercial use, caps it at a revenue ceiling, or forbids it outright. The licence is a hard gate: a model that fails it is off your list no matter how good it looks.

The second question is "will it fit on a GPU I can afford?" A GPU — graphics processing unit — is the specialised chip that runs these models, and the number that matters is its VRAM, the video memory measured in gigabytes (GB) that holds the model while it runs. If a model needs more VRAM than your GPU has, it simply will not run, the way a program that needs more memory than your laptop has will not open. Video models are memory-hungry: the full versions of the largest ones need 60 GB or more, which means renting data-centre cards, while the trimmed-down versions fit in the 8 to 16 GB that a consumer gaming card carries. VRAM, not speed, is the wall you hit first.

Get these two answers and the rest of the decision is tuning. Get them wrong and you will burn a week building on a model you cannot legally ship or cannot physically run.

Figure 1. The two-gate filter. Licence is the first gate because it is a legal yes/no; VRAM is the second because it is a physical yes/no. Quality only breaks ties between the models that survive both.

The Five Families You Will Actually Consider

The open-weights video world has dozens of models, but five families do almost all the real work in production in mid-2026. We will take each in turn, in plain terms, then put them side by side. For every one, note the licence and the VRAM first; treat the quality notes as the tie-breaker they are.

HunyuanVideo (Tencent) — the quality leader with a licence catch

HunyuanVideo is Tencent's open-weights model, and the version that matters now is HunyuanVideo 1.5, released in November 2025. The headline is that Tencent made it smaller without making it worse: the core engine — the denoising backbone, the part that turns visual noise into a clean picture step by step — shrank from 13 billion learned numbers in the first version to 8.3 billion in 1.5. Smaller engine, less memory, same output quality. It does both text-to-video (you type a prompt, it invents a clip) and image-to-video (you give it a still photo plus a prompt, and it animates the photo into a clip).

On memory, the full-precision version wants roughly 24 to 28 GB of VRAM, which already exceeds a consumer card. But here the second big idea arrives: quantization, which means storing each of those billions of numbers in a smaller, less precise format to save memory, at a small cost in fidelity. The standard small format is called FP8 (eight bits per number instead of sixteen), and an FP8 build of HunyuanVideo 1.5 drops to roughly 14 to 16 GB — within reach of a high-end consumer GPU. Push further by moving the text-reading part of the model onto ordinary system memory (called offloading), and the GPU footprint can fall to the 8-to-12 GB range, slow but runnable on a normal gaming card. Expect roughly 75 seconds to render one short 480p clip on an RTX 4090.

The catch is the licence. HunyuanVideo 1.5 ships under the Tencent Hunyuan Community License, not Apache 2.0. It permits research and personal use but restricts commercial deployment — there are conditions and thresholds you must read before you put its output inside a product you sell. If you are building a commercial feature, this is the model whose legal page you read twice. Quality leader, licence asterisk.

CogVideoX (Zhipu AI) — the friendly, low-VRAM workhorse

CogVideoX, from Zhipu AI (the weights live under the zai-org account), is the model most teams start with, and the reason is in the second gate: it is genuinely easy to run. It comes in two main sizes, 2 billion and 5 billion parameters, plus a refreshed CogVideoX 1.5 generation. The 5B model fits in roughly 8 GB of VRAM with the right settings — small enough for a mid-range desktop card like an RTX 3060 — and the 2B model runs on hardware several years old.

Two things explain why it earns its keyword reputation (cogvideox is a low-competition search term, meaning relatively few strong pages compete for it — a sign the model has real interest but light coverage). First, its licensing is friendly at the small end: CogVideoX-2B is Apache 2.0, the clean commercial licence. The larger 5B carries its own CogVideoX licence with some conditions, so the same "read it twice" rule applies as you scale up. Second, it punches above its size on prompt adherence — how faithfully the clip matches what you actually asked for — thanks to an efficient 3D Causal VAE, a compression component that packs the video down without losing the details that make a prompt land. The trade-off is length and resolution: the 5B model's sweet spot is a six-second clip at 720×480, not a cinematic 4K shot. CogVideoX is the model you reach for when "runs on the GPU we already have" matters more than "wins the beauty contest".

Mochi 1 (Genmo) — the big, permissive, hardware-hungry one

Mochi 1, from Genmo, was on release the largest open video model ever published — 10 billion parameters — and it carries the licence engineers most want to see: Apache 2.0, free for commercial use, no revenue ceiling, no permission needed. It is built on an AsymmDiT design, an architecture that deliberately spends almost four times more of its capacity on the picture than on reading your text, which shows up as strong, fluid motion.

The price of that size is hardware. Full-precision Mochi 1 wants around 60 GB of VRAM and, in practice, four H100 data-centre cards to run comfortably — firmly out of consumer range. The community has clawed that down: a bfloat16 variant runs near 22 GB, and a ComfyUI-optimised build squeezes under 20 GB, bringing it onto a single high-end card at reduced settings. Mochi is the model you choose when you need a permissive commercial licence on a large, motion-strong model and you have, or can rent, serious GPUs. Permissive licence, heavy footprint.

LTX-Video and LTX-2 (Lightricks) — the speed and the 4K-with-audio newcomer

Lightricks took a different bet: speed. LTX-Video was the first model of its type able to generate high-quality clips in real time — it produces 30-frames-per-second video at 1216×704 faster than you can watch it on capable hardware. It ships in 2-billion and 13-billion sizes, with distilled (compressed-for-speed) builds that fit consumer VRAM. The licence is Lightricks' own open-weights licence, and notably the 13B model is free to license for companies under $10 million in annual revenue — a deliberate on-ramp for startups, with a paid tier above that line.

The bigger 2026 story is LTX-2, which Lightricks announced in October 2025 and fully open-sourced — weights, inference code, and training code, under the clean Apache 2.0 licence (free for commercial use by companies under $10 million in annual revenue; an enterprise support tier sits above that line) — on 6 January 2026. LTX-2 is the first production-ready open model to generate synchronised video and audio in a single pass at native 4K resolution and 50 frames per second, with lip-sync and ambient sound, from 14 billion video parameters plus 5 billion audio parameters (19 billion in total). That "one model, picture and sound together" capability is something none of the other four open families matched at the start of 2026, and it is the single most current fact in this lesson — most competing articles still describe LTX as silent video only. If your use case needs sound generated with the picture, LTX-2 is the open option that has it.

Wan (Alibaba) — the benchmark leader, with a licensing trend to watch

Wan, Alibaba's family, holds the top of the open leaderboard. Wan 2.1 and Wan 2.2 are released under Apache 2.0 — the clean commercial licence — and the Wan family posts the highest publicly verified open-source VBench score, about 86.2% — ahead of OpenAI's Sora (about 84.3%) on the same scorecard. (VBench is a standard scorecard for video generators; a higher aggregate means better overall quality across motion, consistency, and fidelity.) Wan 2.2 uses a Mixture-of-Experts design — 27 billion parameters in total but only 14 billion switched on for any single clip, a trick that keeps the running memory manageable — and a smaller 5-billion dense version runs on an 8 GB card at 720p. Wan leads open models on human faces, skin, and hair, and on prompts with several subjects interacting.

The trend to watch is licensing direction. Alibaba's newer releases — Wan 2.5 and above — moved to an API-first model, where the newest version is reached through Alibaba Cloud's service rather than published as open weights, with open-weight releases (if they come) lagging behind. So the rule for Wan in 2026 is precise: the open, Apache-2.0, self-hostable Wan you can rely on today is 2.1 and 2.2; the newest numbered versions may be rent-only. Always confirm the licence on the specific version you intend to ship — do not assume "Wan is open" covers the latest release.

Figure 2. The five families side by side. Read the licence column and the VRAM column first; the standout-strength column is the tie-breaker among the models that clear both gates.

The Real Decision: Self-Host or Rent?

Now the question that sends teams down this path in the first place — and the honest answer, which is "usually rent, sometimes self-host, and the deciding factor is rarely the per-clip price". Let us prove that with arithmetic, because the intuition that "free model equals free video" is the single most expensive mistake in this area.

The cost of a self-hosted clip is the GPU clock, not zero

The model download is free. The video is not, because generating it occupies an expensive GPU for real seconds, and that GPU costs money whether you own it or rent it. In mid-2026 a top-tier H100 data-centre card rents for roughly $2 to $2.70 per hour on a service like RunPod, with cheaper interruptible "spot" rates around $1.30 to $1.60. The cost of one clip is simply the slice of that hour your render uses.

Walk the math out loud once. Suppose your model and settings take three minutes to render one clip on an H100 you rent at $2.40 per hour:

cost per clip = (render minutes ÷ 60) × hourly GPU rate
             = (3 ÷ 60) × $2.40
             = 0.05 × $2.40
             = $0.12 per clip

Twelve cents — which is roughly what Runway's API charges per second, not per clip. So at first glance self-hosting looks cheap. But that twelve cents is only the GPU clock. It does not include the GPU sitting idle between jobs (you rent it by the hour, not the second of actual work), the engineer-days to set the system up, or the ongoing cost of keeping it running. Those are the numbers that move the decision.

The break-even is volume, and it is higher than you think

Industry cost analyses in 2026 put the crossover bluntly: below roughly 5,000 clips per month, routing through a paid API and skipping the infrastructure is the cheaper and saner choice; self-hosting only starts to pay off above that, and even then only if the GPUs are kept busy. The reason is the part the per-clip math hides.

First, setup is not free: standing up an open video model — installing the right GPU drivers, downloading tens of gigabytes of weights, fitting the model into VRAM, and debugging the inevitable version conflicts — runs to roughly 20 to 40 engineer-hours before you generate a single production clip. Second, keeping it alive is not free: model updates, dependency upgrades, and reliability work add an ongoing load that cost analyses summarise as a 3-to-5× multiplier on top of the raw GPU rental. A GPU that bills $1,700 a month rented can easily cost $5,000 to $8,000 a month once you count the people maintaining it. Third, an idle GPU still bills — if your traffic is bursty, you pay for the quiet hours too, which is exactly the inefficiency an API erases by charging only for work done.

Common mistake: pricing self-hosting as "free model × electricity". Teams compare the API's per-clip price against a self-hosted per-clip price of near-zero and conclude self-hosting wins by a mile. It almost never does at low volume, because the real self-hosted cost is GPU-hours billed around the clock plus a 3-5× operations multiplier plus 20-40 hours of setup before clip one. Always compare the API bill against the fully loaded self-hosted cost — GPU rental, idle time, setup amortised, and the engineers who babysit it — not against the GPU clock alone.

So when does self-hosting actually win?

If the per-clip price rarely justifies it, what does? Three reasons, none of which is the headline cost.

The first is data residency and privacy. If your footage is sensitive — medical video in telemedicine, faces in surveillance, private user uploads — sending it to a third-party API may be unacceptable on legal or contractual grounds. Self-hosting keeps every frame on machines you control, and that can be the entire reason to do it, price be damned.

The second is volume. Past that 5,000-plus-clips-a-month line, with GPUs kept genuinely busy, the owned-infrastructure cost per clip drops below the API's per-clip charge, and the savings compound. High, steady volume is the case the per-clip math was waiting for.

The third is the licence and customisation. Only an open model lets you fine-tune — retrain the model a little on your own examples so it learns your specific style, product, or domain — and only a permissively licensed open model (Apache 2.0) lets you ship that customised model inside a product you sell, with no per-call fee and no vendor able to change the terms or sunset the endpoint under you. If your product's edge depends on the model behaving in a way no API offers, self-hosting is not the cheap path — it is the only path.

Figure 3. Why volume, not the per-clip price, decides. The self-hosted line starts high and stays nearly flat; the API line starts at zero and climbs. They cross around 5,000 clips a month — and three non-price reasons can override the whole curve.

How You Actually Run One

Suppose the two gates passed and the decision favours self-hosting. What does running an open video model look like in practice? You do not write the rendering code from scratch; you stand on one of two ecosystems that the whole field has converged on.

The first is ComfyUI, a visual tool where you build the generation pipeline as a diagram of connected boxes — load the model here, feed the prompt there, set the resolution, collect the clip. It is the artists' and tinkerers' environment, and crucially it is where the community ships the memory optimisations — the FP8 quantization and the offloading tricks that let a 60 GB model run on a 20 GB card. If you want to test five models this week without writing code, ComfyUI is where you do it.

The second is diffusers, a Python software library from Hugging Face that lets your own code load and run these models in a few lines, the same way for each model. This is the path to production: diffusers is what your backend service imports when a user clicks "generate" and your servers do the work. The reason these five families are worth grouping into one lesson is that all of them ship with diffusers and ComfyUI support, so the surrounding code barely changes when you swap one model for another — the same lesson the closed-API adapter taught, now on the open side.

If the render time you measure is too slow for your product, that is not the end of the road — a family of speed-up techniques can cut it by a large factor, which we cover in the diffusion-acceleration lesson. And if your problem is not raw speed but keeping the same character, scene, and camera look across many clips, that is its own discipline, covered in the consistency-in-production lesson.

The order of operations in a real project is therefore: pick the model with the two gates; prototype it in ComfyUI to confirm quality on your actual prompts; measure the render time on the GPU you will rent so you can do the cost math above with real numbers; then wire it into your backend through diffusers behind the same kind of adapter the closed-API lesson described, so you keep the freedom to switch. Treat no open model as permanent infrastructure either: the field releases a better model roughly every quarter, and Wan's drift toward API-first is a reminder that today's open weights are not guaranteed to have an open successor.

A Worked Pick: Three Common Situations

Abstract rules are easier to trust when you see them resolve a real choice. Three quick scenarios, each resolved by the two gates plus the cost curve.

A telemedicine startup wants to animate explainer clips of procedures, the footage is patient-adjacent, and volume is low. Data residency dominates and volume does not justify infrastructure — but the privacy reason can still tip it to self-hosting on a small rented GPU kept private, choosing CogVideoX 5B for its low VRAM and clean enough licence, or staying on an API with a signed data-processing agreement if one is acceptable. The deciding factor is the privacy gate, not cost.

An OTT post-production team needs hundreds of B-roll clips a day, every day, with sound, and sells the finished content. Volume clears the break-even, audio is required, and the output is commercial — so LTX-2 (synced 4K audio-video, open weights) on owned or steadily rented data-centre GPUs is the strong pick, with the per-clip cost genuinely beating an API at that volume. The deciding factors are volume and the audio capability.

A conferencing product wants a small, fine-tuned avatar generator baked into its app, shipped to customers. Customisation and a sell-it licence dominate, so an Apache-2.0 model — Wan 2.2 or Mochi 1 — is the only viable class, fine-tuned on the product's own avatar style and shipped inside the app. The deciding factor is the licence-plus-fine-tuning reason; here self-hosting is not the cheap path but the only one. (For talking-head avatars specifically, the dedicated lip-sync and avatars lesson compares the purpose-built models.)

Where Fora Soft Fits In

We build video products across video conferencing, streaming and OTT, video surveillance, e-learning, telemedicine, and AR/VR, and the self-host-versus-rent question lands on our desk in every one of them. In telemedicine and surveillance, the privacy gate usually decides it: the footage cannot leave the customer's environment, so an open model on controlled hardware is the only compliant route. In OTT and e-learning, where teams generate large volumes of supporting clips and own the output, the volume math and the licence push toward self-hosted, fine-tuned open models behind a switchable adapter. The work is rarely the rendering code itself — diffusers makes that short — it is sizing the GPU fleet, costing the fully loaded bill honestly, and keeping the option open to move to the next quarter's better model. That last discipline is what separates a feature that survives a year from one that breaks on the next release.

Call to action

Talk to a video engineer — book a 30-minute scoping call to talk through your hunyuan video plan.
See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
Download the Self-Hosted vs API: Open Video Model Decision Sheet — One-page reference: the two gates (licence first, VRAM second), the five-model licence-and-VRAM table, the honest per-clip GPU-clock math, the ~5,000-clip/month break-even, and the three non-price reasons to self-host (data residency,….

References

Tencent-Hunyuan. HunyuanVideo-1.5: A leading lightweight video generation model. GitHub repository, accessed 2026-06-01. https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5 — primary source for the 8.3B backbone, T2V/I2V support, and Tencent Hunyuan Community License terms.
Tencent. HunyuanVideo 1.5 VRAM Requirements — FP16, FP8, and Practical GPU Guide (2026). Will It Run AI, accessed 2026-06-01. https://willitrunai.com/blog/hunyuanvideo-1-5-vram-requirements — FP16 24-28 GB, FP8 14-16 GB, offloaded 8-12 GB figures.
zai-org (Zhipu AI). CogVideo / CogVideoX repository. GitHub, accessed 2026-06-01. https://github.com/zai-org/CogVideo — model sizes (2B/5B), CogVideoX 1.5, and Apache-2.0 status of the 2B weights.
Zhipu AI. CogVideoX-5b model card. Hugging Face, accessed 2026-06-01. https://huggingface.co/zai-org/CogVideoX-5b — 5B VRAM (~8 GB scalable) and 720×480 6-second sweet spot. The CogVideoX-5B licence differs from the 2B Apache-2.0 licence; verify on the card before commercial deployment.
Genmo. Mochi 1: A new SOTA in open text-to-video. Genmo Blog, accessed 2026-06-01. https://www.genmo.ai/blog/mochi-1-a-new-sota-in-open-text-to-video — 10B parameters, AsymmDiT architecture, Apache-2.0 licence.
Genmo. genmo/mochi-1-preview model card. Hugging Face, accessed 2026-06-01. https://huggingface.co/genmo/mochi-1-preview — ~60 GB full / 4×H100, bfloat16 ~22 GB, ComfyUI <20 GB.
Lightricks. LTX-Video official repository. GitHub, accessed 2026-06-01. https://github.com/Lightricks/LTX-Video — 2B/13B sizes, real-time 30 fps at 1216×704, Lightricks open-weights licence and the under-$10M-revenue free-license clause.
Lightricks. Lightricks Open-Sources LTX-2, the First Production-Ready Audio and Video Generation Model With Truly Open Weights. Press release, 2026-01-06, accessed 2026-06-01. https://www.globenewswire.com/news-release/2026/01/06/3213304/0/en/Lightricks-Open-Sources-LTX-2-the-First-Production-Ready-Audio-and-Video-Generation-Model-With-Truly-Open-Weights.html — native 4K, 50 fps, synced audio, 14B video + 5B audio parameters, full open-source release date. Several competing articles still describe LTX as silent-video-only; this release overrides that.
Wan-AI. Wan2.2-T2V-A14B model card. Hugging Face, accessed 2026-06-01. https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B — MoE 27B total / 14B active, Apache-2.0 licence, weights and inference code; the Wan family's leading VBench aggregate (~86.2%, ahead of Sora's ~84.3%) is reported here and on the Wan2.1 card.
Wan-Video. Wan2.2 official repository. GitHub, accessed 2026-06-01. https://github.com/Wan-Video/Wan2.2 — two-expert MoE denoising design, 5B dense variant on ~8 GB at 720p, Apache-2.0 status confirmed; primary source for the VBench positioning versus Sora and HunyuanVideo.
AI Magicx. Open Source AI Video Generation: Wan 2.2 vs HunyuanVideo 1.5 vs LTXVideo (2026 Comparison). Accessed 2026-06-01. https://www.aimagicx.com/blog/open-source-ai-video-models-comparison-2026 — HunyuanVideo ~75 s/clip 480p on RTX 4090, LTX speed positioning. Vendor comparison, used as a deployer-tier source; cross-checked against the model cards above, which take precedence on any spec disagreement (the Wan VBench figure here follows the official Wan model card, not this comparison).
IntuitionLabs. H100 Rental Prices Compared: $1.49–$6.98/hr Across 15+ Cloud Providers (2026). Accessed 2026-06-01. https://intuitionlabs.ai/articles/h100-rental-prices-cloud-comparison — H100 hourly rental range used in the per-clip cost math.
AI Pricing Master. Self-Hosting AI Models vs API Pricing: Complete Cost Analysis (2026). Accessed 2026-06-01. https://www.aipricingmaster.com/blog/self-hosting-ai-models-cost-vs-api — ~5,000-clip/month break-even, 20-40 hour setup, 3-5× operations multiplier.

Self-Hosting Open-Weights Video — HunyuanVideo, CogVideoX, Mochi, And LTX-Video

Why This Matters

What "Open-Weights" Actually Means

The Two Questions That Decide Everything

The Five Families You Will Actually Consider

HunyuanVideo (Tencent) — the quality leader with a licence catch

CogVideoX (Zhipu AI) — the friendly, low-VRAM workhorse

Mochi 1 (Genmo) — the big, permissive, hardware-hungry one

LTX-Video and LTX-2 (Lightricks) — the speed and the 4K-with-audio newcomer

Wan (Alibaba) — the benchmark leader, with a licensing trend to watch

The Real Decision: Self-Host or Rent?

The cost of a self-hosted clip is the GPU clock, not zero

The break-even is volume, and it is higher than you think

So when does self-hosting actually win?

How You Actually Run One

A Worked Pick: Three Common Situations

Where Fora Soft Fits In

What to Read Next

Call to action

References

Related glossary terms

Self-Hosting Open-Weights Video — HunyuanVideo, CogVideoX, Mochi, And LTX-Video

Why This Matters

What "Open-Weights" Actually Means

The Two Questions That Decide Everything

The Five Families You Will Actually Consider

HunyuanVideo (Tencent) — the quality leader with a licence catch

CogVideoX (Zhipu AI) — the friendly, low-VRAM workhorse

Mochi 1 (Genmo) — the big, permissive, hardware-hungry one

LTX-Video and LTX-2 (Lightricks) — the speed and the 4K-with-audio newcomer

Wan (Alibaba) — the benchmark leader, with a licensing trend to watch

The Real Decision: Self-Host or Rent?

The cost of a self-hosted clip is the GPU clock, not zero

The break-even is volume, and it is higher than you think

So when does self-hosting actually win?

How You Actually Run One

A Worked Pick: Three Common Situations

Where Fora Soft Fits In

What to Read Next

Call to action

References

Related glossary terms

Diffusion model

Self-hosting