Blog: Agora.io Alternative: Custom WebRTC Development

Key takeaways

The real Agora alternative is self-hosted WebRTC, not another CPaaS. LiveKit, mediasoup, Jitsi, and Janus now ship the feature set that used to justify the per-minute markup.

Above ~15M minutes/month the break-even flips decisively. Self-hosting typically runs 3–5× cheaper than Agora at scale, once hardware, bandwidth, and on-call are included honestly.

LiveKit is the 2026 default. Apache 2.0, Go SFU, built-in simulcast/SVC, server-side egress, AI-agent primitives, first-class client SDKs across Web, iOS, Android, Flutter, React Native.

mediasoup still wins on CPU-per-participant and raw control when you already have a signalling team; Jitsi is the fastest drop-in; Janus is the right answer for SIP/streaming hybrids.

Compliance is easier self-hosted. HIPAA, GDPR data residency, and SOC 2 are simpler to satisfy when no third-party CPaaS touches your media.

Why Fora Soft wrote this playbook

We have been shipping WebRTC products since 2011 — before Agora existed, and long before LiveKit or mediasoup were stable. Our portfolio includes BrainCert (500M+ learning minutes on a mediasoup stack we helped design), a multi-party video courtroom used by the Republic of Kazakhstan, a live-shopping video platform serving twelve countries, and a dozen telehealth products under HIPAA constraints.

Across those projects we have repeatedly answered the same question: “Agora works, but our margin is disappearing — what’s the alternative?” This article is that answer, updated for 2026 stacks and pricing.

Burning money on Agora?

We have migrated teams off Agora onto self-hosted LiveKit and mediasoup in 8–14 weeks — typical cost reduction 35–60% at scale, with better call-quality telemetry than the PaaS dashboard gave them.

Book a 30-min cost review →

Why teams leave Agora.io in 2026

Agora is a capable product. Teams leave for four recurring reasons, in roughly this order of frequency.

Per-minute cost at scale

Agora’s published 2026 list price is $0.99 per 1,000 minutes for HD video, $0.59 for SD, and $0.99 for interactive live streaming. Once a platform crosses roughly 10–20 million monthly minutes, the bill dominates gross margin. Self-hosted bandwidth on a Hetzner or OVH dedicated host with unmetered 1 Gbps costs a fraction of the equivalent minute-tier.

Lock-in and SDK drift

Agora’s mobile SDKs are not binary-compatible with standard WebRTC. Swapping or mixing stacks requires a rewrite. Self-hosted solutions use vanilla libwebrtc, so a team that writes against the W3C WebRTC spec can swap engines without touching client code beyond the signalling seam.

Data residency and compliance

Agora’s global relay routes media through mainland China unless explicitly disabled via Feature Gates. For EU, US-government, and healthcare workloads that is a dealbreaker. Self-hosting in EU or US regions closes the gap on day one.

Geography and latency

Agora’s TURN coverage is strong overall but thinner in South America, Africa, and parts of Southeast Asia. A self-hosted deployment with regional SFUs and Anycast TURN can beat Agora’s median RTT in those regions.

Reach for leaving Agora when: monthly minutes exceed ~10M or compliance forbids third-country routing or the product needs features Agora does not expose — custom recording, server-side composition, AI insertion.

CPaaS vs self-hosted — the honest trade-off

Before picking an alternative, be clear about what you gain and lose.

CPaaS wins on: time-to-market (one SDK call and you’re live), global coverage out of the box, no DevOps burden, automatic TURN and bandwidth, 24/7 support, SLAs. Valid choice for MVPs, unpredictable traffic, and teams without WebRTC expertise.

Self-hosted wins on: marginal cost at scale, data residency, custom features (server-side recording, live transcription, real-time moderation), SDK freedom, vendor independence. Valid choice for platforms with predictable or high volume, regulated industries, and teams with in-house or partnered DevOps.

The switch point is economic. Below 10M minutes/month the engineer-hours to run self-hosted usually exceed the CPaaS bill. Above 20M it is rarely close — self-hosted wins decisively. Between 10–20M is where the decision deserves actual spreadsheet work.

Alternative 1 — LiveKit (the 2026 default)

LiveKit is an open-source WebRTC platform written in Go, Apache 2.0-licensed, first released in 2021 and now the fastest-growing self-hosted option. It offers a hosted Cloud and a fully self-hostable server that share one SDK surface.

Why it is the 2026 default

LiveKit ships production-grade features that used to take months of custom work: server-side egress (RTMP, HLS, MP4), automatic simulcast and SVC, adaptive stream selection, participant permissions, ingress from RTMP and WHIP, and client SDKs for Web, iOS, Android, Flutter, React Native, and Unity. The 2024–2025 releases added an AI agent framework, voice-agent primitives, and OpenAI Realtime integration — a LiveKit deployment is already a strong host for voicebot products.

Who runs it

Character.ai, Spotify Greenroom successors, and a large fraction of 2025-era AI voice products run on LiveKit Cloud. Self-hosted deployments include universities, healthcare platforms, and several live-shopping unicorns.

Cost shape

LiveKit Cloud is priced per-participant-minute similar to Agora but about 40–60% cheaper. Self-hosted LiveKit on commodity Hetzner dedicated servers typically lands at $0.02–0.08 per 1,000 minutes all-in (hardware, bandwidth, ops amortised), i.e. 10–50× cheaper than Agora at scale.

Reach for LiveKit when: you want a modern, actively maintained SFU with first-class client SDKs, AI-agent primitives, and the option to start on LiveKit Cloud and migrate to self-hosted later with the same API surface.

Alternative 2 — mediasoup (maximum control, minimum CPU)

mediasoup is a C++ SFU with a Node.js control plane, developed by Iñaki Baz Castillo since 2015. It is the quietest, most efficient SFU in the market — a single mediasoup worker handles 500–1,000 participants on modern hardware.

Why pick it

mediasoup gives you bare-metal primitives: you describe transports, producers, and consumers; you wire them into your own business logic. Nothing is hidden, and nothing is opinionated. For teams that have already built a signalling server and want the lightest possible media engine under it, mediasoup is unmatched.

Who runs it

BrainCert (our client, 500M+ learning minutes). Discord uses a fork internally for Stage Channels. A long tail of EdTech and video-dating apps. The ecosystem is smaller than LiveKit’s, but the engineering maturity is deeper.

Reach for mediasoup when: you have an in-house backend team comfortable with Node.js, want maximum flexibility over the signalling layer, and CPU-per-participant matters (very-large rooms, constrained budgets).

Alternative 3 — Jitsi Videobridge 2 (drop-in group video)

Jitsi is the oldest open-source SFU still in active development. Owned by 8x8, it powers Meet.jit.si and thousands of self-hosted deployments worldwide.

Why pick it

Jitsi is the fastest way to stand up a working group-video product if your needs match its defaults. A docker-compose up against the Jitsi Meet repo gives you a functional platform in an afternoon. Videobridge 2 supports simulcast, cascading bridges for large rooms, and end-to-end encryption in Chrome-based browsers.

Limits

Jitsi’s default UI is opinionated; deep customisation requires forking Jitsi Meet (React) and its Lobby/Prosody signalling. Mobile SDKs exist but trail LiveKit’s polish. Server-side recording (Jibri) requires extra nodes and Chromium.

Reach for Jitsi when: you need a proven group-video product with minimal engineering investment and can live within Jitsi’s UX opinions, or you are customising a web-first experience for an enterprise portal.

Alternative 4 — Janus Gateway (Swiss-army SFU)

Janus is a general-purpose WebRTC server from Meetecho, written in C. Rather than being an SFU that happens to do streaming, it is a plugin host — VideoRoom, Streaming, SIP, NoSIP, AudioBridge, VoiceMail, TextRoom — assembled to fit unusual topologies.

Why pick it. If your product combines WebRTC with SIP trunks, IP-camera streaming, or legacy Flash/RTMP inputs, Janus is often the shortest path. Its plugin model is stable and well-documented.

Limits. Janus is lower-level than LiveKit or Jitsi. You write more signalling code, the community is smaller, and help on obscure plugin combinations is thinner.

Reach for Janus when: your product is not a pure conferencing app — IP-camera viewer, SIP bridge, IoT command-and-control, broadcast ingestion — and you need a plugin-oriented media hub.

SFU vs MCU vs P2P — how to pick the topology

Before you pick an engine, pick a topology. Three options, well understood.

P2P (peer-to-peer). Each participant streams directly to each other. Zero server media cost. Falls over above 3–4 participants because uplink bandwidth explodes. Use only for 1:1 or 1:2 calls.

SFU (Selective Forwarding Unit). Each participant uploads once to a server; the server forwards to everyone else, sometimes with simulcast or SVC downshifts. Standard for modern group video. Server cost scales with bandwidth, not CPU.

MCU (Multipoint Control Unit). The server decodes every stream, composites them into a single mixed stream, and sends the mixed stream back. Heavy CPU, but receivers only need to handle one downstream. Use only for low-end devices, bandwidth-constrained broadcast, or server-side composition for recording.

Need a second opinion on SFU vs MCU?

We will review your topology, devices, and target markets, and recommend the engine that fits — usually in one call.

Book a 30-min topology review →

The cost model — Agora vs self-hosted, worked example

Numbers that appear on invoices, not marketing estimates. Assume a platform with 1,000 concurrent peak users, 4-person rooms, 90-minute average session, 20 session-starts per user per month — roughly 18M minutes/month.

Agora. 18M minutes × $0.99/1,000 HD = $17,820/month for HD video alone. Add recording ($0.20/1,000 min) and cloud recording egress, and you cross $20k/month.

Self-hosted LiveKit on 4× Hetzner AX102 (Ryzen 9, 128 GB, 1 Gbps unmetered), one regional TURN cluster, one recording/egress node: roughly $1,200/month hardware. Add TURN bandwidth overage (budget $200–500), on-call engineer amortised (20% of one SRE ≈ $2,000). Total ≈ $3,500–4,000/month, i.e. 4–5× cheaper.

Self-hosted mediasoup with a similar fleet: close to LiveKit numbers, with slightly lower CPU and slightly higher engineering glue. Same order of magnitude.

The break-even flips well below 18M minutes. Rule of thumb: under 5M minutes/month, PaaS almost always cheaper on TCO; 5–15M minutes, it depends on the engineering team; above 15M minutes, self-hosted wins unless compliance or geography is already handled by the PaaS for free.

Comparison matrix — the five realistic 2026 options

Option License Best for Cost shape Effort Break-even
Agora.io (baseline) Commercial SaaS MVPs, 1–5M min/mo $/minute Low (SDK only)
LiveKit (self-hosted) Apache 2.0 Modern default, AI agents $/server + bandwidth Medium ~10M min/mo
LiveKit Cloud Commercial SaaS Start cloud, migrate later $/minute (40–60% cheaper) Low From day one
mediasoup ISC Max control, large rooms $/server + bandwidth Medium-High ~15M min/mo
Jitsi Videobridge 2 Apache 2.0 Drop-in group video $/server + bandwidth Low-Medium ~5M min/mo
Janus Gateway GPL-3.0 (commercial avail.) SIP, streaming, hybrids $/server + bandwidth High Workload-dependent

A decision framework — pick the alternative in five questions

Q1 — Minutes per month. Under 5M/month and no compliance pressure, stay on Agora or move to LiveKit Cloud. 5–15M, do the spreadsheet. Above 15M, self-host.

Q2 — Compliance and data residency. If HIPAA, GDPR data-residency, or US-government restrictions apply, self-host in the right region on day one.

Q3 — Custom features. Server-side recording to S3, live transcription, AI voice agents, RTMP ingest, SIP bridge? LiveKit (agents, egress), mediasoup (custom pipelines), or Janus (SIP, streaming) each bring different primitives.

Q4 — In-house expertise. No WebRTC experience? Start on LiveKit Cloud. Some Node.js/Go experience? LiveKit self-hosted or mediasoup. Already running Jitsi? Stay on Jitsi.

Q5 — Time budget. Shipping in 2 weeks? LiveKit Cloud or Jitsi Docker. 8–14 weeks? Self-hosted LiveKit or mediasoup with migration plan. Under 4 weeks for self-hosted? Plan a careful PoC before committing.

Mini case — EdTech platform, Agora → mediasoup migration

A Series-B EdTech client was spending $34,000/month on Agora at roughly 32M minutes, with gross margin on their $29/mo student SKU slipping below 40%. We proposed a 12-week migration to self-hosted mediasoup on Hetzner, with a LiveKit Cloud fallback during cutover.

Week 1–3: signalling and control plane, including room lifecycle, permissions, and reconnection. Week 4–7: media integration, simulcast, recording to S3, TURN cluster in three regions. Week 8–10: client SDK swap (iOS, Android, Web), behind a feature flag. Week 11: staged rollout 5% → 25% → 100% of rooms. Week 12: decommission Agora.

Outcome at week 14: monthly media bill $34,000 → $4,200 (hardware, bandwidth, one SRE amortised); p50 join latency 820 ms → 540 ms; server-side recording newly available, saving another $3,100/month of Agora recording fees; one-time migration cost $180,000 repaid in 5.8 months. Want a similar assessment for your stack? Book a 30-min migration review.

Security, compliance, and end-to-end encryption

Self-hosted SFUs solve compliance in ways PaaS cannot.

HIPAA. Run the SFU, TURN, and signalling in a HIPAA-eligible VPC (AWS, GCP, Azure) with BAAs on the hyperscaler; no third-party CPaaS touches PHI. LiveKit, mediasoup, and Jitsi are all compatible.

GDPR and data residency. Keep media transits within the EU by deploying SFUs only in EU regions; use geo-DNS to direct users to the nearest regional cluster. Agora and other global CPaaS cannot easily prevent transit through non-EU POPs.

End-to-end encryption. WebRTC’s Insertable Streams API enables E2EE on top of SFU — the SFU forwards encrypted frames it cannot decode. LiveKit supports E2EE across Web, iOS, Android (2024+). Jitsi supports E2EE in Chromium browsers. Agora offers “optional” E2EE via their SDK with custom key exchange.

SOC 2, ISO 27001. Easier when you control the stack: you can attest to your own controls rather than vouching for a vendor’s. Most mature self-hosting teams inherit their hyperscaler’s compliance evidence for the data plane.

2026 tooling stack for a self-hosted WebRTC platform

  • Media engine: LiveKit (default), mediasoup (large rooms), Jitsi Videobridge (drop-in), Janus (hybrid topologies).
  • TURN: coturn (battle-tested default), eturnal (Erlang, lightweight), or LiveKit’s bundled TURN.
  • Signalling: LiveKit’s built-in, or Socket.IO/WebSocket custom for mediasoup and Janus.
  • Recording and egress: LiveKit Egress (Go, headless Chromium), Jibri (Jitsi), or custom FFmpeg pipelines.
  • Monitoring: Prometheus + Grafana for SFU metrics, Loki for logs, Sentry for client errors, callstats.io or RTCStats for end-to-end call-quality analytics.
  • Load testing: LiveKit’s built-in livekit-load-tester, KITE (Google), or custom Puppeteer/Playwright swarms.
  • Deployment: Kubernetes + Helm charts (LiveKit and Jitsi both ship official charts), or bare-metal Ansible for hardcore performance.

Five pitfalls that sink Agora migrations

1. Underestimating TURN bandwidth. TURN relays all media for restrictive networks (corporate VPNs, mobile carriers). In some geographies 40%+ of sessions go through TURN. Plan bandwidth accordingly.

2. Ignoring simulcast/SVC. Without layered streaming, one low-end device pulls the whole room down. Simulcast is table stakes in 2026; SVC (AV1) is rapidly becoming so.

3. No observability. Agora hides call-quality telemetry behind a dashboard. After migration, many teams are blind. Invest in RTCStats or equivalent on day one.

4. Flaky reconnection. WebRTC reconnects poorly when networks flap. Implement ICE-restart and fast session resumption explicitly; do not rely on defaults.

5. Big-bang cutover. Always dual-run Agora and the new stack, gate by room or user cohort, and monitor quality KPIs before decommissioning.

KPIs: what to measure once you cut over

Quality KPIs. Median join latency (target < 700 ms), p95 audio MOS (target > 4.0), video freeze rate per minute (target < 0.5), reconnect success rate (target > 99%), SFU packet loss (target < 1%).

Business KPIs. Cost per 1,000 minutes before and after (target 3–5× reduction above 10M minutes/month), gross margin lift on the product SKU (typically 5–15 percentage points), feature velocity (new server-side recording, transcription, AI features that were blocked on Agora).

Reliability KPIs. SFU uptime (target > 99.95% excluding hyperscaler incidents), TURN uptime, p95 room-creation latency, median time to recover from a bridge failure (target < 30 seconds with warm standbys).

When to stay on Agora

Self-hosting is not always the answer. Three situations where Agora still wins.

Sub-5M minutes/month with no compliance pressure. The engineer-hours to run a self-hosted stack outweigh the bill. Stay on Agora.

Global emerging-market reach without in-house operations. Agora’s POP footprint in China, SE Asia, and parts of Africa is genuinely hard to replicate. If those regions are 30%+ of your traffic, their infrastructure tax pays for itself.

Extremely bursty traffic. Live-event apps that idle at zero and spike to tens of thousands for 30 minutes have a hard time justifying always-on hardware. PaaS autoscaling is still king here.

FAQ

How long does a typical Agora → self-hosted migration take?

8–14 weeks for a mature product with web, iOS, and Android clients. The control plane and signalling are 3–4 weeks; client SDK swaps are 2–3 weeks each with overlap; staged rollout and dual-running add the remaining time. Greenfield projects go faster (6–10 weeks) because no migration gate exists.

Is LiveKit Cloud really a drop-in Agora replacement?

Close but not identical. Client APIs differ, so a rewrite is still required. What LiveKit Cloud does give you is SDK continuity between Cloud and self-hosted — the same code runs against both, so you can migrate when the cost math flips without a second rewrite.

What about Twilio, Vonage, Dolby.io, and other CPaaS?

All similar pricing shapes and trade-offs. Twilio Video was discontinued in 2024 (EOL December 2026). Vonage and Dolby.io are viable but no cheaper than Agora at scale. The self-hosted math applies to all of them.

Does self-hosted WebRTC support end-to-end encryption?

Yes, via the Insertable Streams API. LiveKit has first-class E2EE across Web, iOS, and Android (2024+); Jitsi supports E2EE in Chromium browsers; mediasoup and Janus support it via custom frame transformers. The SFU forwards encrypted frames it cannot decode, so media stays confidential end-to-end.

What is the smallest team that can run a self-hosted SFU in production?

One capable SRE, one senior backend engineer, and one client engineer per platform is enough to run LiveKit or Jitsi at scale. mediasoup needs slightly more backend expertise. Janus is feasible for small teams only if the workload matches one of its plugins directly.

Can we use LiveKit and mediasoup together?

Uncommon but possible — teams use LiveKit for conferencing and mediasoup for a specialised streaming or large-room path. Two SFUs mean two sets of operational tooling; we recommend picking one unless you have a clear technical reason.

How do we estimate bandwidth for self-hosting?

Rule of thumb for 720p 30fps video at 1.5 Mbps: each participant receives (N-1) × 1.5 Mbps + uploads 1.5 Mbps. With simulcast, SFUs send the lowest-appropriate layer to each subscriber, reducing egress by 30–60%. Plan peak egress as (avg participants per room × N rooms × 1.2 safety factor). TURN typically adds 20–40% of total traffic depending on network restrictiveness.

iOS

WebRTC in iOS Explained

Build a native iOS WebRTC client on libwebrtc without burning six months on plumbing.

Architecture

The 2026 iOS MVVM-C Playbook

How to structure a WebRTC iOS client — actors, coordinators, SignalingClient.

Voice agents

Multimodal AI Agents with LiveKit

The LiveKit stack wired up as a production voice-agent with sub-500 ms E2E latency.

Case study

BrainCert — 500M+ minutes on mediasoup

Our longest-running WebRTC deployment: architecture, scaling, lessons from a decade live.

Planning

Software Estimating Guide

What a realistic estimate for a WebRTC migration actually looks like.

Ready to escape Agora pricing?

Agora.io is a fine starting point and an expensive destination. For most teams crossing 10M monthly minutes, self-hosted WebRTC on LiveKit (the 2026 default) or mediasoup (maximum control) cuts the media bill by 3–5×, solves compliance and data-residency headaches, and unlocks features — server-side recording, AI agents, SIP bridges — that Agora either bills extra for or does not ship.

Jitsi remains the fastest drop-in, Janus the right choice for unusual topologies, and LiveKit Cloud the safest way to hedge. Pick the engine based on your topology, not on blog-post popularity, and plan the migration as a 10–14-week engagement with explicit quality KPIs.

If you would rather have a partner who has shipped this stack dozens of times, that is what we do. Fora Soft builds real-time WebRTC products for teams who cannot afford for video to “work in the demo and fail in the field.”

Talk to a WebRTC architect

Share your minutes, compliance needs, and target regions. You will get a WER-grade cost model, a reference architecture, and a migration plan — no spreadsheet-of-doom required.

Book a 30-min call →

  • Technologies
    Development
    Services