Published 2026-05-31 · 19 min read · By Nikolay Sapunov, CEO at Fora Soft
Why This Matters
If your product reads text aloud, dubs a video, or lets a user hear their own synthetic voice, you will almost certainly evaluate ElevenLabs, and you will need to answer three questions before a single line of code ships. What will it cost at your real volume? What can you legally clone, and how do you prove you had permission? And what must you disclose to the listener and to regulators? This article is for the product manager, founder, or engineering lead making a build-or-buy decision on synthetic voice. By the end you will understand the full 2026 pricing ladder and where it bites, the difference between an instant and a professional clone and the consent each one demands, and the two regulatory regimes — one American and proposed, one European and already dated — that turn "we cloned a voice" into either a feature or a lawsuit.
What ElevenLabs Actually Sells
Start with the product, because the pricing only makes sense once you know what you are paying for. ElevenLabs is a company that turns text into spoken audio using machine-learning models, a job called text-to-speech, abbreviated TTS. It also does the harder job of voice cloning — building a synthetic copy of a specific person's voice so that new sentences can be generated in that voice. Think of plain text-to-speech as hiring a generic narrator, and voice cloning as hiring an impersonator who can say anything in a particular person's voice.
The voice itself is produced by a model, and ElevenLabs ships several. Its flagship as of early 2026 is the v3 model, released that year, which captures emotion noticeably better than the versions before it; alongside it sit faster, cheaper "Flash" and "Turbo" models built for real-time use where a half-second delay would break the experience (ElevenLabs, 2026). You do not need to memorise the model names. The point to carry forward is that quality, speed, and price trade against each other: the expressive model costs more credits per character, the fast model costs fewer and answers quicker.
Everything you generate is metered in credits. A credit is ElevenLabs' unit of consumption, and for the standard multilingual models the rule is simple: one text character — a single letter, space, or punctuation mark — equals one credit (ElevenLabs Help Center, 2026). The faster Flash and Turbo models are discounted: they cost half a credit per character (ElevenLabs Help Center, 2026). Credits, not dollars, are what each plan hands you, so the whole pricing question reduces to "how many credits do I get, and how many does my product burn?" Convert credits to characters, then to minutes of audio, before comparing plans — almost every costing mistake starts with reading the big credit number as if it were words.
The 2026 Pricing Ladder, Tier By Tier
ElevenLabs sells seven tiers in 2026, from a free plan to a custom enterprise contract. The prices below are read directly from the company's own pricing page in May 2026, which is the source of truth — third-party blogs frequently quote stale figures (ElevenLabs, 2026). The table is the map. Read it as a ladder of credits and rights, not just prices — the jump that matters most for most teams is not a price jump but a rights jump.
| Plan | Price/mo | Credits/mo | Approx. TTS minutes | Cloning unlocked | Commercial use |
|---|---|---|---|---|---|
| Free | $0 | 10,000 | ~10 min | None | No |
| Starter | $6 | 30,000 | ~30 min | Instant Voice Clone | Yes |
| Creator | $11* | 121,000 | ~121 min | + Professional Voice Clone | Yes |
| Pro | $99 | 600,000 | ~600 min | Professional Voice Clone | Yes |
| Scale | $299 | 1,800,000 | ~1,800 min | + 3 team seats, 3 PVCs | Yes |
| Business | $990 | 6,000,000 | ~6,000 min | + 10 seats, 10 PVCs | Yes |
| Enterprise | custom | custom | custom | + HIPAA BAA, SSO | Yes |
*Creator lists at $22/month with the first month at 50% off ($11); the others are flat. Source: ElevenLabs pricing page, read 2026-05-31. Minute figures are the page's own "minutes included" conversions on the standard Multilingual model; the Flash and Turbo models stretch each plan to roughly double the minutes. Annual billing saves roughly two months.
Two lines in that table deserve a closer look because they trap teams.
The first trap is the free plan. It gives you 10,000 credits a month — about ten minutes of speech — and it is genuinely useful for prototyping, but it grants no commercial rights and no voice cloning (ElevenLabs, 2026). You cannot legally monetise anything you generate on it, and any public content must credit ElevenLabs. A team that builds a demo on the free tier, likes it, and ships it has quietly broken the licence. The cheapest plan that lets you use the audio commercially is Starter at $6 a month, and that is the line a real product crosses first.
The second trap is the Creator plan. It is the first tier that unlocks professional voice cloning, the high-quality kind, and it is the tier the keyword data shows people search for most after the headline brand name (ElevenLabs, 2026). Starter, one step below, only gives you the lower-fidelity instant clone. So the practical entry point for a serious voice feature is the Creator plan, not Starter — a distinction worth getting right in a budget before anyone is surprised.
Figure 1. The 2026 plan ladder. The two thresholds that catch teams are the commercial-rights line at Starter ($6) and the professional-cloning line at Creator ($11 first month, $22 after).
A Worked Example — What 200 Minutes A Month Really Costs
Numbers make this concrete. Suppose your product narrates articles and generates 200 minutes of speech a month using the standard multilingual model. The simplest way to price this at volume is the pay-as-you-go API rate of $0.10 per 1,000 characters. At about 150 words per spoken minute and six characters per word:
200 minutes × 150 words/minute = 30,000 words per month
30,000 words × 6 characters/word = 180,000 characters per month
180,000 characters ÷ 1,000 = 180 thousand-character units
180 units × $0.10 = about $18 per month, on the API
Now compare that against the subscription. The Creator plan includes 121,000 credits — roughly 121 minutes — for $22 a month after the first month, so 200 minutes would exceed it and push you toward the Pro plan at $99 for 600,000 credits, most of which you would not use:
Pro plan: $99 for ~600 minutes → 200 used, ~400 to spare (a lot of waste)
For this volume the pay-as-you-go API (~$18) beats both the Creator overage and the jump to Pro ($99). The lesson generalises: the subscription is priced for steady, predictable consumption, while the API is priced per unit and wins when your usage is modest or spiky. Model your real monthly minutes, convert them to characters, then compare the API rate against the cheapest subscription that covers you — the answer flips with volume. Switching to the faster Flash model halves the per-character cost ($0.05 per 1,000) and moves the break-even again.
Instant Versus Professional Voice Cloning
Voice cloning on ElevenLabs is not one feature but two, and confusing them is the most common planning error we see. They differ in how much audio they need, how long they take, how good they sound, and — this is the part that matters for the law — how hard they make it to clone someone without permission.
An Instant Voice Clone, or IVC, is the fast, lightweight option. It builds a usable copy of a voice from as little as thirty seconds of clear audio, and the clone is ready in seconds (ElevenLabs voice cloning, 2026). The reason it is instant is worth knowing: it does few-shot adaptation, handing your recording to the model as a live reference at the moment of speaking — "sound like this" — without retraining anything. Picture a skilled mimic who hears a short clip and imitates it on the spot. It is available from the $6 Starter plan. The trade-offs follow from the mechanism: its quality ceiling is set by that one reference clip, so background noise carries through; it generalises less well when asked for a very different emotion than the sample; and it carries a lighter consent check — you confirm you have the right to clone the voice, and that is the gate (ElevenLabs voice cloning, 2026).
A Professional Voice Clone, or PVC, is the high-fidelity option, and it works differently underneath. Instead of using your audio as a hint at speaking time, PVC fine-tunes the model — it actually adjusts the model's internal settings to represent the target voice more deeply, the way an actor studies hours of footage until they can become a person in any mood (ElevenLabs voice cloning, 2026). It needs about thirty minutes of high-quality audio, recording quality matters more because flaws are baked into the settings, and it takes minutes rather than seconds to build. It requires the $11 Creator plan or higher, reflecting the compute it consumes. And before it will proceed, ElevenLabs makes you pass a voice-CAPTCHA: the app shows you text and you must read it aloud within a time limit, then the system compares that live recording against the samples you uploaded to confirm the voice is genuinely yours (ElevenLabs voice cloning, 2026). That recorded read-aloud is consent engineering built into the product — it makes cloning a stranger from stolen audio far harder, because you would have to reproduce their voice live on demand.
Figure 2. Instant versus professional cloning. The professional clone trades speed for fidelity and adds a recorded voice-CAPTCHA that an instant clone does not require.
The practical routing is clear. Use an instant clone for low-stakes, high-volume, fast-turnaround needs — a user previewing their own voice, a quick draft narration. Use a professional clone for the voice your product depends on — a brand narrator, a named presenter, a recurring character — where the extra fidelity and the stronger verification both pay off. And in either case, the consent box and the CAPTCHA are ElevenLabs protecting itself; they do not discharge your legal duty to have permission, which the next sections cover.
How ElevenLabs Polices Misuse
Before the law, there is the platform's own policy, and it is stricter than most teams expect. ElevenLabs runs several layers of defence against voice misuse, and understanding them tells you what you can and cannot build (ElevenLabs safety, 2026; ElevenLabs use policy, 2026).
The first layer is the voice-CAPTCHA already described, which gates professional cloning behind a live read-aloud. The second is automated blocking of high-risk voices: the platform detects and refuses attempts to clone celebrities, politicians, and other public figures, with extra vigilance around political figures during election periods to prevent fabricated campaign audio (ElevenLabs safety, 2026). The third is ongoing content moderation — a mix of AI classifiers, human reviewers, and internal investigations that watch for violations of the Prohibited Use Policy (ElevenLabs safety, 2026). On top of these, accounts can carry live moderation that restricts which categories of content a given voice may be used for.
The Prohibited Use Policy itself bans the obvious harms: impersonating a person to defraud, voter suppression, impersonating political candidates or officials whether or not you claim authorisation, and other deceptive or harmful uses (ElevenLabs use policy, 2026). For a product team, the takeaway is that ElevenLabs will not be a willing partner in cloning a public figure or running a political impersonation, and designing a feature that depends on either will hit a wall in moderation, not just in court.
There is also a constructive side to this. ElevenLabs runs a Voice Library where voice owners can list their voice and earn payouts when others use it — a marketplace that turns consent into a revenue stream rather than a checkbox (ElevenLabs payouts, 2026). If your product needs a distinctive voice and you do not have one to clone, licensing from this library is the clean path: consent and compensation are handled by the platform.
The NO FAKES Act — A Federal Right Over Your Voice
Now the law. The most consequential piece of proposed U.S. legislation for anyone building voice features is the NO FAKES Act — the Nurture Originals, Foster Art, and Keep Entertainment Safe Act — introduced in the Senate as S.1367 and in the House as H.R.2794 in the 119th Congress, and reintroduced in April 2025 after a 2024 version stalled (Congress.gov S.1367, 2025; Coons, 2025). It is a bill, not yet a law, as of May 2026, but it is the clearest signal of where U.S. rules are heading, and several large technology and entertainment players — SAG-AFTRA, the RIAA, OpenAI, Google, Disney, Adobe — back it (Coons, 2025).
What the bill would do is create the first-ever federal right over a person's voice and visual likeness, what it calls a digital replica right (Manatt, 2025). A digital replica, in the bill's words, is a newly created, computer-generated, highly realistic representation that is readily identifiable as a specific person's voice or visual likeness (Congress.gov S.1367, 2025). Today, whether you can sue over an unauthorised voice clone depends on a patchwork of state right-of-publicity laws; the NO FAKES Act would set a national floor.
The mechanism most relevant to a product team is the notice-and-takedown system, deliberately modelled on the copyright takedown regime that platforms already know from the DMCA (Manatt, 2025). The bill does not require an online service to proactively scan for unauthorised replicas. Instead, a service that hosts user content gains a safe harbor — protection from liability — if, on receiving a valid notice, it promptly removes the offending replica, and if it has registered a designated agent with the U.S. Copyright Office and adopted a policy for terminating repeat offenders (Manatt, 2025). This is the same shape as copyright safe harbor, so if your platform already runs a DMCA process, you have the template.
The penalties are why this is not a footnote. The bill attaches a statutory penalty of $5,000 per violation, and each copy made, transmitted, or displayed counts as a separate violation — so a single viral clip can multiply fast (Wikipedia NO FAKES, 2026; GovTrack S.1367, 2025). To balance that, sending a false takedown notice carries its own penalty of the greater of $25,000 or actual damages, which is meant to deter abuse of the takedown system (GovTrack S.1367, 2025). The bill also sets up postmortem rights — the voice right survives death and is administered through a Copyright Office registration system — and it explicitly does not preempt state laws that existed as of 2 January 2025, so Tennessee's ELVIS Act and similar state statutes survive alongside it (GovTrack S.1367, 2025).
It is not without critics. Digital-rights groups including the EFF and FIRE warn that a broad new right over likeness could chill lawful speech — parody, commentary, journalism — and the bill has been revised partly in response (EFF, 2024). For your purposes, the prudent posture is to design as if a notice-and-takedown duty and a strong consent expectation are coming, because the direction of travel is clear even if the final text is not.
The EU AI Act, Article 50 — Disclosure Is Already Dated
If the NO FAKES Act is the future you should prepare for, the EU AI Act's Article 50 is a deadline you cannot ignore: its transparency obligations apply from 2 August 2026 (EU AI Act Article 50, 2026). Unlike the high-risk rules elsewhere in the Act, Article 50 reaches every generative AI system, which expressly includes tools that synthesise audio (EU AI Act Article 50, 2026).
Article 50 splits its duty between two roles. If you are a provider — the entity that makes the generation system, which is ElevenLabs' role — you must ensure the synthetic audio you output is marked in a machine-readable format and detectable as artificially generated, for example by an embedded provenance signal or watermark (EU AI Act Article 50, 2026). If you are a deployer — the entity that uses such a system to produce a deepfake for an audience, which is often your role — you must disclose to people that the content was artificially generated or manipulated (EU AI Act Article 50, 2026).
There are limited carve-outs. Where the content is plainly artistic, satirical, or fictional, the disclosure duty is softened to an unobtrusive acknowledgement that does not spoil the work, and law-enforcement uses are exempt (EU AI Act Article 50, 2026). A Code of Practice spelling out exactly how to mark and label content was in its second draft as of early 2026, with a final version expected around June 2026, just ahead of the August deadline (EU AI Act Article 50, 2026; Jones Day, 2026). For a product serving EU users, the engineering implication is concrete: build a disclosure surface — a visible label, a spoken or written "AI-generated" notice — into any feature that outputs a cloned or synthetic voice, and rely on the provider's machine-readable marking for the provenance half.
A Common Mistake — Treating The Consent Checkbox As Legal Cover
The single most damaging error we see is a team assuming that because ElevenLabs made them tick "I have the right to clone this voice", their legal duty is discharged. It is not. That checkbox protects ElevenLabs under its own terms; it does nothing to give you the permission the law requires from the actual person whose voice you cloned. If you clone an employee's voice for an internal training video and they later object, or clone a freelance narrator without a clause covering synthetic reuse, the platform's checkbox is no defence.
The fix is to engineer consent as a record, not a click. For any voice you clone, capture and store explicit, written, scoped permission from the person: what voice, for what uses, for how long, and with what right to revoke. Treat it the way you treat a signed model release in photography. Where a contract is involved — a voice actor, a presenter, an employee — add an explicit clause covering synthetic voice generation, its permitted uses, and compensation, because a generic appearance release written before voice cloning existed almost never covers it. This consent record is also what lets you respond cleanly to a NO FAKES-style takedown notice: you can show the permission instead of scrambling.
Where Fora Soft Fits In
We build voice features into the video products our clients ship, and the choice of voice engine and the consent design around it come up on most of them. We have wired synthetic and cloned voices into e-learning courses that need a consistent narrator across hundreds of lessons, telemedicine tools that read instructions aloud, OTT platforms that auto-dub catalogue content, and video conferencing features that voice live captions. The recurring lesson is that the model is the easy part. The hard part is the surrounding engineering: modelling cost at real monthly volume rather than the headline price, choosing instant versus professional cloning per use case, capturing and storing scoped consent so a takedown notice can be answered in minutes, and building the AI-generated disclosure that Article 50 now requires for EU audiences. We design the consent and disclosure layer first, then pick the engine, because a voice feature that cannot prove permission is a liability no matter how good it sounds.
How To Decide — Five Questions
The decision comes down to five questions, in order.
First, do you need a specific person's voice, or just a good voice? If any natural-sounding narrator will do, skip cloning entirely and use a stock or library voice — it is cheaper, faster, and carries no consent risk.
Second, what is your real monthly volume in minutes? Model it, convert to credits, and compare base-plus-overage against the next plan up. The cheapest plan flips with volume, and the Flash model can move the break-even.
Third, instant or professional clone? Use instant for low-stakes, fast, high-volume previews; professional for the voice your product leans on, where fidelity and the voice-CAPTCHA verification both matter.
Fourth, can you prove consent? For every cloned voice, hold explicit, written, scoped permission from the person, with a synthetic-voice clause in any contract. The platform's checkbox is not enough.
Fifth, do you serve EU users or expect U.S. rules to tighten? If yes, build the disclosure surface for Article 50 now and design your hosting flow for a NO FAKES-style notice-and-takedown, so compliance is a feature you already have rather than a fire drill later.
Figure 3. Five questions, asked in order, route most synthetic-voice projects to the right plan and the right consent design.
What To Read Next
- Speaker diarization with Pyannote in production
- Streaming ASR in production — Deepgram, Whisper, and AssemblyAI in 2026
- Streaming TTS — Kokoro, ElevenLabs Turbo, Cartesia, and OpenAI TTS
Talk To Us / See Our Work / Download
- Talk to a video engineer about adding a compliant synthetic-voice feature to your product → /services/ai-software-development
- See our case studies in e-learning, telemedicine, OTT, and video conferencing → /cases
- Download the ElevenLabs Plan & Voice-Cloning Consent Checklist (one page, printable) → Download the checklist
References
- ElevenLabs. Pricing for Creators & Businesses of All Sizes, accessed 2026-05-31.
https://elevenlabs.io/pricing. Primary first-party source for the seven 2026 plan tiers and their prices read directly from the vendor page (Free $0 / 10k credits; Starter $6 / 30k; Creator $11 first month then $22 / 121k; Pro $99 / 600k; Scale $299 / 1.8M / 3 seats / 3 PVCs; Business $990 / 6M / 10 seats / 10 PVCs; Enterprise custom with HIPAA BAA and SSO), the per-tier "minutes included" conversions, the commercial-licence and instant-cloning unlock at Starter, the professional-cloning unlock at Creator, and the annual-billing discount. Where third-party blogs quoted different figures (Starter $5, Creator 100k, Scale $330 / 2M, Business $1,320 / 11M), the vendor page governs and those secondary figures were discarded. - ElevenLabs Documentation. Voice cloning: how it works, accessed 2026-05-31.
https://elevenlabs.io/docs/eleven-api/concepts/voice-cloning. First-party source for the technical distinction between the two cloning modes — Instant Voice Cloning as few-shot conditioning at inference time with no weight updates (usable from ~30 seconds of audio, ready in seconds, quality capped by the reference, weaker cross-style generalisation, Starter plan and up) versus Professional Voice Cloning as fine-tuning that modifies model parameters (~30 minutes of high-quality audio recommended, recording quality matters more, takes minutes, Creator plan and up) — the "representation not recording" framing, the consent confirmation, and the voice-CAPTCHA verification step and its stated limit (it confirms the requester was present, not that the voice is theirs). - ElevenLabs. ElevenAPI Pricing, accessed 2026-05-31.
https://elevenlabs.io/pricing/api. First-party source for the pay-as-you-go API rates used in the worked example: text-to-speech Flash/Turbo $0.05 per 1,000 characters (~75 ms latency) and Multilingual v2/v3 $0.10 per 1,000 characters; Scribe speech-to-text $0.22/hour and Scribe realtime $0.39/hour; Dubbing $0.33/minute watermarked; Voice Changer and Voice Isolator $0.12/minute. Establishes that the API is metered per unit rather than out of a monthly credit bucket. - ElevenLabs. Safety and An overview of Safety framework for AI voice agents, accessed 2026-05-31.
https://elevenlabs.io/safety. First-party source for the platform's defence layers: voice-CAPTCHA gating of professional clones, automated blocking of celebrity and high-risk/political voices (with heightened election-period vigilance), and content moderation via AI classifiers plus human reviewers plus internal investigations. - ElevenLabs. Prohibited Use Policy, accessed 2026-05-31.
https://elevenlabs.io/use-policy. First-party source for the banned uses cited — fraudulent impersonation, voter suppression, and impersonation of political candidates or elected officials regardless of claimed authorisation. - ElevenLabs. Earn with Your Voice — Join the ElevenLabs Voice Library / Voice Actor Payouts, accessed 2026-05-31.
https://elevenlabs.io/payouts. First-party source for the Voice Library marketplace and payouts that turn voice consent into a compensated, platform-managed licence. - U.S. Congress. S.1367 — NO FAKES Act of 2025, 119th Congress, text via Congress.gov and GovTrack, accessed 2026-05-31.
https://www.congress.gov/bill/119th-congress/senate-bill/1367/text. Primary legislative source for the bill's federal digital-replica right, the definition of a digital replica, the notice-and-takedown mechanism and safe harbor, the $5,000-per-violation and $25,000-false-notice penalties, the postmortem registration via the Copyright Office, and the non-preemption of state laws existing as of 2 January 2025 (preserving Tennessee's ELVIS Act). Companion House bill: H.R.2794. - Manatt, Phelps & Phillips, LLP. Senators Officially Introduce NO FAKES Act with Digital Replica Right, 2025, accessed 2026-05-31.
https://www.manatt.com/insights/newsletters/client-alert/senators-officially-introduce-no-fakes-act-with-di. Legal-analysis source for the DMCA-modelled notice-and-takedown design, the safe-harbor conditions (designated Copyright Office agent, repeat-offender termination policy, no proactive monitoring duty), and the digital-replica right's scope. Where this secondary analysis and the bill text diverge on detail, the bill text (reference 7) controls. - Sen. Chris Coons. Senators … reintroduce NO FAKES Act to protect individuals and creators from digital replicas, press release, April 2025, accessed 2026-05-31.
https://www.coons.senate.gov/news/press-releases/. Source for the April 2025 reintroduction, the bipartisan sponsor list, and the named industry backers (SAG-AFTRA, RIAA, OpenAI, Google, Disney, Adobe, and others). - Electronic Frontier Foundation. NO FAKES — A Dream for Lawyers, a Nightmare for Everyone Else, 2024, accessed 2026-05-31.
https://www.eff.org/deeplinks/2024/08/no-fakes-dream-lawyers-nightmare-everyone-else. Source for the free-expression critique of the bill — the chilling-effect concern over a broad likeness right — included for balance against the proponents' framing. - European Union. EU AI Act, Article 50 — Transparency Obligations for Providers and Deployers of Certain AI Systems, accessed 2026-05-31.
https://artificialintelligenceact.eu/article/50/. Primary regulatory source for the 2 August 2026 effective date, the provider duty to mark synthetic audio in a machine-readable format, the deployer duty to disclose deepfakes, the artistic/satirical and law-enforcement carve-outs, and the scope covering all generative AI systems including audio synthesis. - Jones Day. European Commission Publishes Draft Code of Practice on AI Labelling and Transparency, January 2026, accessed 2026-05-31.
https://www.jonesday.com/en/insights/2026/01/european-commission-publishes-draft-code-of-practice-on-ai-labelling-and-transparency. Source for the status of the Article 50 Code of Practice — second draft published in early 2026, final version expected around June 2026, ahead of the August deadline. - ElevenLabs Help Center. How do text characters and credits work?, accessed 2026-05-31.
https://help.elevenlabs.io/hc/en-us. First-party source for the credit model: one credit equals one character on the standard Multilingual model and half a credit per character on Flash and Turbo, plus the official "100,000 credits ≈ 100 minutes" conversion used to sanity-check the article's characters-to-minutes arithmetic.


