
Key takeaways
• Stock Agora SDK ships a working call in days. Custom Agora development ships a defensible product. The line between them is recording, AI agents, white-label, and compliance — not the calling itself.
• Three Agora line items quietly bleed budgets: recording-by-the-participant (5 hosts = 5 service-minutes per minute), STT enabled but unused on silent sessions, and resolution-dependent video pricing that scales fast on HD/4K.
• The 2026 unlock is the Conversational AI Engine. At $0.0265 per agent-minute all-in, voice agents on Agora are competitive — but the integration with OpenAI Realtime, Anthropic, or ElevenLabs is non-trivial and is where most teams want a partner.
• Pick on geography and workload. Agora wins in Southeast Asia and Latin America for pure RTC; LiveKit wins for AI-agent-heavy products; Daily wins for fastest time-to-market; Zoom Video SDK is the official Twilio Programmable Video migration path.
• Fora Soft has shipped video-first products on Agora, WebRTC, LiveKit, and Twilio across telehealth, fitness, e-learning, and live events. We’ll tell you which platform fits your product and which line items will hurt you in month six.
Why Fora Soft wrote this Agora playbook
Fora Soft has been building video, audio, and real-time communication products since 2005 — more than 320 video-first products shipped, audited, scaled, or migrated across providers. We have written token servers, built recording clusters that survive 50× load spikes, integrated Conversational AI Engines with OpenAI Realtime and ElevenLabs, and migrated entire codebases from Twilio Programmable Video to Zoom Video SDK after Twilio sunset that product in December 2024.
We use Agora when the geography (SE Asia, LATAM, parts of Africa) demands its edge presence, when the workload is heavy on Interactive Live Streaming, or when the customer’s telehealth or education compliance posture requires geo-fencing and on-premise recording. We don’t use it when LiveKit’s open-source agent framework is a better fit, or when a Daily.co prebuilt UI ships faster. The point of this article is to make that call obvious for your product, then explain what custom Agora development actually looks like in 2026 — the architecture, the prices, the pitfalls, and the partner profile to look for.
For concrete proof, see our work on Perspire.tv (live fitness streaming with low-latency interactive sessions), CirrusMED (HIPAA-compliant telemedicine), BrainCert (WebRTC virtual classroom LMS), and VOLO.live (real-time event translation). Each one made a different platform call — and each call held up under load.
Need a second opinion on Agora vs the alternatives?
A 30-minute scoping call with a senior engineer who has shipped on Agora, LiveKit, Daily, Twilio, and Zoom Video SDK — tell us your workload and we’ll tell you the line items that will hurt at scale.
What Agora actually is in 2026 — products and recent updates
Agora.io is a CPaaS (Communications-Platform-as-a-Service) provider that runs a private global edge network — the SD-RTN, or Software-Defined Real-Time Network — with 200+ data centers across more than 200 countries and regions. The platform handled tens of billions of minutes per month at last public disclosure, with telehealth, social audio, online education, fitness streaming, and live commerce as its biggest verticals.
In 2026 Agora ships nine product lines that matter for custom development:
The nine Agora products you actually use
1. Voice and Video SDK. The core 1-on-1 and small-group calling SDK for iOS, Android, Web, and desktop. AV1 codec is in beta on Web with a 42% bitrate reduction versus H.264.
2. Interactive Live Streaming. Up to 128 concurrent hosts per channel and an unlimited audience via broadcast mode — this is what scales rooms past the default 17-host cap that catches teams off-guard.
3. Signaling / RTM. Sub-200ms global average latency for presence, room state, chat, and metadata. This is the layer most teams underestimate — it carries every UI affordance that makes a call feel premium.
4. Chat. Omnichannel in-app messaging API. Most custom builds replace this with Stream or Sendbird, but it ships out of the box.
5. Cloud Recording. Individual mode (raw streams), composite mode (server-mixed), and delayed transcoding mode (audio-only, processed within 24 hours for cost savings). Watermarking and screenshot moderation are add-ons.
6. On-Premise Recording SDK. The Agora-side stream is decrypted only on your own infrastructure — the option that lets HIPAA, GDPR, and SOC 2 audits actually pass.
7. Conversational AI Engine. v2.5 shipped in April 2026 with revised pricing. Native integration with OpenAI Realtime API, Anthropic, ElevenLabs, and other LLM/TTS providers. Built-in echo cancellation, background noise suppression, and intelligent pause detection that fixes the "agent-talks-over-user" problem.
8. Real-Time STT. Selective attention lock for speaker identification and noise suppression in group settings. Charged per minute, regardless of whether anyone speaks — this is one of the line items we flag in section 5.
9. Cloud Player. Streams pre-recorded media into a live channel as another participant — the building block for "VJ" features, instructor-led fitness classes, and live commerce demos.
2025–2026 platform updates worth your attention
Conversational AI Engine v2.5 (April 2026). Revised per-minute pricing and updated TTS/LLM bundles. The new pricing makes always-on voice agents materially cheaper than custom STT-LLM-TTS chains for most products.
Outbound calling for AI agents (November 2025). Agents can now initiate calls — the unlock for appointment reminders, surveys, lead qualification, and proactive customer success.
Multimodal LLM support (July 2025). Audio plus text plus image inputs in a single agent context. Useful for customer support agents that can see a screenshot of an error.
AV1 beta on Web (2025). 42% bitrate reduction versus H.264 with 25% faster encoding than x264. Critical in low-bandwidth markets — provided you keep H.264 and VP8 fallbacks for older devices.
Stock Agora SDK vs custom Agora development — the real difference
Most articles about Agora collapse the calling SDK and the platform around it into one thing. They are not one thing. The calling SDK is a commodity — you can copy a sample app, paste your App ID, and have a working two-party call running in your browser within a few hours. Where products win or fail is everything around the call: token security, recording, AI agents, white-label theming, retention, moderation, observability, and compliance.
Reach for stock Agora SDK when: you need a single-tenant 1-on-1 or small-group call, you don’t record, you don’t need AI, your audience is ≤ 17 hosts, and you have no regulated-industry obligations. A senior dev can ship this in two to three weeks.
Reach for custom Agora development when: you need recording with custom retention or redaction, white-label / multi-tenant, AI voice agents, > 17 hosts, HIPAA/GDPR/SOC 2 compliance, custom signaling, hybrid CDN fallback, or per-tenant analytics. Plan 8–16 weeks for the first production-grade version.
Twelve scenarios that force custom Agora development
Across our Agora projects, these are the specific moments where teams realize they have outgrown the stock SDK. If two or more apply to your roadmap, you are already in custom-development territory.
1. Multi-tenant white-label. One app, N customer brands, isolated channels, per-tenant theme tokens. The stock SDK has no concept of tenants — you build that layer.
2. Recording with custom retention or redaction. 90-day retention for one tenant, seven-year for another (HIPAA), automatic PII redaction in transcripts, per-region S3 buckets. None of this is in the stock recording UI.
3. AI moderation pipeline. Real-time frame inspection plus async review of completed recordings, with a policy engine and an audit trail. Add ~500ms latency on the real-time leg.
4. Real-time transcription overlays plus translation. Synchronized captions, speaker diarization, two-pass translation. STT alone is one chunk; weaving it into your UI without breaking the captions on poor networks is a different chunk.
5. Conversational AI agent integration. Custom routing across OpenAI Realtime, Anthropic, ElevenLabs, plus fallback logic and conversation context preservation across sessions. The CAE makes this easier than building from scratch — but only if you know which seams to trust.
6. Hybrid CDN with RTMP / HLS fallback. When your audience crosses 100k concurrent viewers, you stop pushing everything through SD-RTN and start mirroring the stream to a CDN. That layer is yours to build.
7. Custom signaling logic. Queue management, presence with capacity, room states ("waiting", "in-progress", "moderation-required"), event-driven side effects. Stock RTM gives you the pipes; the protocol is yours.
8. Server-side recording cluster with moderation worker. Recording is fan-out heavy. A 5-host channel with composite recording is 5 service-minutes per minute — you want the orchestration to be your code, not Agora’s defaults.
9. Scaling beyond 17 hosts. Interactive Live Streaming pushes you to 128 hosts per channel and unlimited audience — but only if you architect the publish/subscribe roles, the cross-channel media relay, and the audience-to-host promotion path correctly.
10. Regional compliance. Geo-fencing for GDPR, customer-managed encryption keys for HIPAA, data residency in specific regions. Agora supports the primitives; you wire them to your tenant model.
11. Low-bandwidth optimization. AV1 fallback to H.264 on weak devices, adaptive resolution and frame rate via NQC, audio-only mode at < 100kbps. The defaults work; the tail markets need tuning.
12. Custom analytics dashboard. P95 join latency, MOS, packet loss, churn predictors, cost-per-active-user-minute. The Agora console is fine for SREs; it is not the dashboard you show to your CEO or your customer’s ops team.
Agora pricing 2026 — and the three line items that surprise teams
Agora’s public per-minute prices are competitive on raw voice and video calling. Where teams get hurt is the workloads bolted on top — recording, STT, AI agents, and high-resolution video. Here are the prices that matter today, plus the three line items we routinely audit for surprise costs.
The published Agora prices in 2026
Voice SDK audio: $0.99 per 1,000 minutes ($0.00099/min). The cheapest tier on the market for plain audio calling.
Video SDK: resolution-dependent. SD costs less than HD; HD costs less than full HD; 4K is its own tier. You pay per stream-minute, so a 5-host channel is 5 stream-minutes per minute.
Conversational AI Engine: $0.0265 per agent-minute all-in — Audio Basic ($0.0099) plus ARES ASR ($0.0166) plus RTC audio ($0.00099). LLM tokens and TTS are billed by your LLM/TTS vendor on top of that.
Real-Time STT: charged per minute the feature is enabled, not per minute someone speaks.
Cloud Recording: billed per service-minute. Composite mode (one mixed file) charges for the recording server’s mixing. Individual mode (one file per host) charges per stream recorded.
Free tier: 10,000 standard minutes per project per month (expanded August 2025). Enough for a real pilot.
The three line items that quietly bleed budgets
Line item 1 — recording multiplier. Composite recording on a 5-participant call costs ~5 service-minutes per actual minute. Individual recording on the same call costs 5 file-minutes plus a transcoding charge if you mix later. Teams habitually budget recording at "1x" of call minutes; in practice it is 3–6× depending on average channel size. Audit your average channel size before you commit.
Line item 2 — idle STT. If STT is enabled on a channel, you pay per minute regardless of whether anyone is speaking. We have audited apps where 60% of session minutes were silent (waiting rooms, hold queues, classroom passive listening) and STT was burning $1,000+/month for nothing. The fix is to turn STT on per-utterance via the SDK’s control hooks, not "always on per channel".
Line item 3 — resolution drift. Most apps default to a video profile that supports HD. On phones in the LATAM market this is overkill — the device cannot show 720p clearly anyway. We routinely save 25–40% by setting per-tenant or per-device resolution caps and letting Network Quality Control adapt down. The math: SD (480p) is roughly half the per-minute cost of HD (720p).
Burning money on Agora and not sure where it’s going?
We’ll run a 2-week cost audit on your Agora console: recording multiplier, idle STT, resolution drift, AI agent stacking. Most audits surface 20–40% in addressable savings.
Agora vs LiveKit vs Daily vs Twilio vs Zoom Video SDK vs AWS Chime
The right platform depends on your workload mix, your geography, and how much UX control you need. The matrix below condenses what we normally walk customers through during a scoping call. The numbers come from each vendor’s public 2026 pricing pages; the qualitative cells come from our own production deployments on each platform.
| Platform | Audio (per 1k min) | Free tier | Hosts / scale | AI agents | Customization |
|---|---|---|---|---|---|
| Agora | $0.99 | 10,000 min/mo | 128 hosts + millions audience | CAE + OpenAI Realtime | Moderate–high |
| LiveKit Cloud | ~$4–$24 per track-min | 5,000 min/mo | Unlimited (with quotas) | Native — best on market | Open-source, very high |
| Daily.co | $4 ($0.004/min) | 10,000 min/mo | Unlimited | Via integration | Pre-built UI — fast TTM |
| Zoom Video SDK | Custom quote | None published | Unlimited | Via integration | Limited (locked-down) |
| AWS Chime SDK | $1.70 ($0.0017/min) | None | Configurable | Via Bedrock / Polly | Enterprise / AWS-only |
| Twilio Programmable Video | N/A — sunset Dec 2024 | N/A | N/A | N/A | Migrate to Zoom Video SDK |
For a deep dive on the most common 2026 platform crossover, read our LiveKit vs Agora cost analysis and WebRTC vs Agora architecture tradeoffs.
The short verdict. Agora wins for pure RTC in SE Asia, LATAM, and parts of Africa thanks to edge presence. LiveKit wins for AI-agent-heavy workloads and open-source control. Daily wins when shipping in two weeks matters more than per-minute price. Zoom Video SDK is the official Twilio Programmable Video migration path. AWS Chime makes sense only if your stack is already AWS-native.
Agora’s technical envelope — capacity, latency, codecs, encryption
Knowing exactly what the platform can and cannot do is the difference between a "we’ll figure it out" architecture and a clean one. Here are the numbers we keep on the wall.
Channel capacity. The default channel limit is 17 hosts. Interactive Live Streaming raises this to 128 concurrent video publishers, with an unlimited audience via broadcast mode. Each host can subscribe to a maximum of 50 other hosts simultaneously — a constraint that matters once you build large breakout rooms.
Audience scale. Broadcast mode reaches millions of passive viewers. Cross-channel media relay (4–6 destinations) is the path for co-hosting and "stage-to-arena" patterns.
Latency tiers. Ultra-Low (sub-100ms median in major regions), Low (100–200ms), Standard (regional fallback). The SD-RTN’s intelligent routing keeps median latencies below 200ms globally and below 100ms regionally.
Codecs. H.264, H.265, VP8, VP9, and AV1 (beta on Web). AV1 is 42% smaller bitrate than H.264 and 25% faster encoding than x264, but you must keep H.264/VP8 fallbacks for older devices — forcing AV1 on a 5-year-old Android will fail to connect.
Encryption. AES-128/256 in transit by default. End-to-End Encryption is in beta with customer-managed keys. On-Premise Recording SDK is the option to keep decrypted media on your infrastructure only — the seam HIPAA and SOC 2 audits actually inspect.
Geo and compliance primitives. Geo-fencing pins traffic to specific regions for GDPR. HIPAA-eligible service available under BAA. SOC 2 Type 2 reports available. Customer-managed key material on E2EE channels.
A reference architecture for custom Agora development
Most production Agora deployments converge on a four-layer architecture: clients, the Agora SD-RTN edge, your backend (token, auth, recording, moderation, AI orchestration), and your data plane (Postgres, object storage, analytics, billing). The diagram below is the reference we draw for every Agora scoping engagement — adapt the right column to your AI vendor and the bottom row to your cloud.
Figure 1. Reference architecture for custom Agora development — client tier, SD-RTN edge, your backend, and data & analytics layer.
What sits where, and why
Token server. The single most-attacked surface in any Agora deployment. Tokens are short-lived (5–24 hour expiry), per-channel, and role-scoped (publisher / subscriber). Never embed App ID and certificate in the client; always issue tokens server-side after authenticating the user.
Recording cluster. Cloud Recording or On-Premise Recording SDK depending on compliance posture. Composite mode for "one MP4 per session" use cases; individual mode for "post-process later" pipelines. The recording orchestrator is your code — it decides what to record, when to start, when to stop, where to write, and what retention policy applies.
Moderation worker. Real-time frame inspection (Rekognition, Vision API, custom CV models) for live channels and async batch processing for completed recordings. The policy engine and audit log live here, not in Agora.
AI orchestrator. The bridge between Agora’s Conversational AI Engine and your LLM/TTS providers (OpenAI Realtime, Anthropic, ElevenLabs). Owns the conversation context across sessions, fallback logic, prompt templates, and cost metering.
Data plane. Postgres for sessions and entitlements; S3/GCS for recordings and transcripts; ClickHouse or BigQuery for QoE analytics; a per-minute meter for cost attribution. The cost meter is what tells you that line items 1–3 from section 5 are spiraling.
Conversational AI on Agora — integrating OpenAI Realtime and ElevenLabs
The Conversational AI Engine is the most visible 2026 feature on Agora and the area where customers most often ask for help. There are four production-grade integration patterns. We have shipped all four and have opinions about which to use when.
Pattern A — native CAE with bundled providers
Use Agora’s Conversational AI Engine end-to-end. Pick from the bundled LLM and TTS providers (OpenAI Realtime, Anthropic, ElevenLabs). Lowest engineering cost. Lowest control. Best for customer-support agents and FAQ-style use cases where you do not need a custom prompt graph.
Pattern B — custom STT → LLM → TTS chain
Capture audio from Agora, run your own STT (Deepgram, AssemblyAI, Whisper), call your LLM, run your own TTS (ElevenLabs, Cartesia, OpenAI), publish back into the channel. Highest control, highest engineering cost. Right when you need a specific LLM, a custom voice clone, or a privacy posture that excludes Agora’s ASR vendors.
Pattern C — hybrid CAE for one ear, custom for the other
CAE handles inbound speech-to-text and pause detection; your code handles the LLM and TTS. Good middle ground for products that already own a prompt graph and a TTS pipeline but do not want to build the noise suppression and intelligent pause detection from scratch.
Pattern D — agent-orchestrated multi-modal
An LLM "agent" decides which sub-agent (voice agent, video agent, text agent) handles each turn. CAE plays the role of the voice channel; vision and text channels are separate. The pattern that fits products like AI tutors, video customer support, and field-service guidance.
A worked cost example for a 1,000-minute / month voice-agent pilot
Suppose you are running a customer-support voice agent that handles 1,000 minutes of conversation per month. Agora CAE: $26.50. OpenAI Realtime LLM tokens (rough): $30–$60 depending on prompt size. ElevenLabs TTS: $20–$40 depending on voice tier. Storage and analytics: ~$5. All-in: roughly $80–$130 / month for the pilot. The same workload on a custom STT→LLM→TTS chain runs $90–$150 / month plus 4–6 weeks of build cost — CAE wins for pilots, custom wins once you exceed ~10,000 agent-minutes / month and the LLM-token line dominates.
For a deeper architecture comparison, see our guide on how video AI agents work and our multimodal agents on LiveKit playbook for the analogous pattern on the open-source side.
Recording, transcoding, and moderation pipelines done right
Recording is the single biggest source of cost surprises and compliance failures we see in Agora projects. The decisions here drive at least three numbers your CFO and your compliance officer care about: storage cost, recording multiplier on the Agora bill, and audit-trail completeness.
Pick the recording mode before you pick the storage
Composite mode. Server-side mixing produces one MP4 per session. Cheapest to store, hardest to redact (you cannot mute one participant after the fact). Best for support calls, telehealth consults, classroom replays.
Individual mode. One file per host. Highest storage cost, easiest to redact (drop one file, mute one channel). Best for multi-tenant platforms with per-user retention rules and legal-discovery obligations.
Delayed transcoding (audio only). Stream raw audio to storage; transcode to MP3/M4A within 24 hours. Cheapest path for audio-only social and education products. Saves 30–50% on the recording bill if you do not need real-time playback.
On-Premise Recording SDK. The Agora-side stream is decrypted only inside your own VPC or data center. The HIPAA / SOC 2 path. The setup cost is real but pays back the moment your auditor asks where decrypted PHI lives.
Build moderation as two pipelines, not one
Real-time leg. Sample frames at 1–2fps, push to AWS Rekognition, Google Vision, or a custom CV model. Adds ~500ms latency. Suitable for nudity, violence, weapons, hate symbols. Channel-mute or kick if a confidence threshold is crossed.
Async leg. Walk completed recordings with a more expensive model. Flag for human review. Suspend, fine, or terminate the offending account. Append the verdict to the audit log. This is the leg that produces the data your trust-and-safety team cares about.
The three Agora workloads we ship the most in 2026
Across our portfolio in 2026, three workload patterns dominate. If your product fits one of these shapes, the architecture above the SD-RTN edge is largely a known-good template — the customization is in the tenant model, the data plane, and the AI orchestration.
Workload 1 — Telehealth and regulated 1-on-1 video. One provider, one patient, recording for compliance, customer-managed encryption keys, geo-fenced traffic, and on-premise recording so PHI never leaves the customer’s infrastructure. Typical channel size: 2–3 hosts. Recording multiplier: low. AI overlay: optional captions and live translation. The compliance posture defines almost every architectural decision.
Workload 2 — Live fitness, education, and creator streaming. One instructor, many viewers, low-latency two-way coaching for premium tiers, recording for replay, monetization through tiered access. Typical channel size: 1 host plus 50–500 viewers per group, scaling to thousands via broadcast mode. Recording multiplier: moderate. AI overlay: live captions, AI-driven highlight reels post-session. The architecture inflection is at the move from "100s of viewers" to "10k+ viewers" where you start mirroring to a CDN.
Workload 3 — Conversational AI agents and AI-augmented support. One human user, one AI agent (or one human plus one AI co-pilot), real-time voice via CAE, LLM routing across OpenAI Realtime / Anthropic / ElevenLabs, conversation context preserved across sessions. Typical channel size: 2 hosts. Recording multiplier: low. The architecture inflection is the AI orchestrator — that single component is where most agent-product roadblocks land.
Mini case — how Fora Soft ships RTC products on Agora-class platforms
A useful proof-of-pattern, not a sales pitch. Two production engagements that show what custom RTC development looks like in practice.
Perspire.tv — live fitness streaming. A platform that turns instructors into broadcasters with low-latency, two-way coaching. We built the live-class architecture, the recording pipeline (composite mode for replays, individual mode for instructor reviews), the in-class chat, and the per-tenant analytics dashboard. The product handles a 1-instructor-to-many-members model with low-latency video where the instructor sees and corrects member form in near real time — a workload Interactive Live Streaming was specifically designed for.
BrainCert — WebRTC virtual classroom LMS. A virtual-classroom platform with whiteboard, screen share, recording, and AI-driven captions. We architected the classroom around large breakout rooms (the host-cap problem from section 4 × 12 simultaneous breakouts), the recording orchestration, and the AI moderation worker for younger-student safety. The platform now powers thousands of classrooms across hundreds of institutions worldwide.
Each project made a different platform call — Agora-class for one workload profile, WebRTC-direct for another. The art is in choosing the right call before you start, then building the surrounding architecture so you are not paying for the wrong line items in month six. Want a similar assessment of your product?
A decision framework for Agora custom development in five questions
Q1. Will any channel ever exceed 17 hosts? If yes, you are in Interactive Live Streaming territory and the architecture is custom. The 17-host cap is the most common reason teams move from stock SDK to custom — usually around month four, when an "exclusive" event organizer asks for a 30-speaker panel.
Q2. Do you need recording with custom retention or redaction rules? If yes — HIPAA seven-year retention, GDPR right-to-be-forgotten, customer-tier-specific retention — you need a recording orchestrator, not the stock recording UI. That is custom development.
Q3. Are you in HIPAA / GDPR / SOC 2 territory? If yes, you need geo-fencing, customer-managed encryption keys, on-premise recording, audit logs, and compliance-grade observability. Stock SDK does not give you this; the platform’s primitives do, and your code wires them.
Q4. Are you launching AI voice agents in the next two quarters? If yes, you need the CAE integration plus your LLM/TTS plumbing plus your prompt graph plus your fallback logic plus your cost meter. CAE collapses the calling layer; the rest is your build.
Q5. Where is your audience concentrated? Heavy SE Asia / LATAM / Africa — Agora’s edge presence wins on join time and packet loss. Heavy US/EU with AI-agent-first product — LiveKit or Daily often wins. Mostly enterprise with a Twilio Programmable Video legacy — Zoom Video SDK is the official migration. Mostly AWS-native — Chime SDK is the path of least resistance.
Stuck on Q1–Q5? Bring us the answers.
Send us your three biggest workload constraints and we’ll come back with a custom-vs-stock recommendation, an estimated cost envelope, and a 12-week shipping plan for the first production cut.
Five pitfalls we see most often on Agora projects
1. Token sprawl. Tokens issued with 30-day expiry, no role scoping, no per-channel binding. A leaked token then hijacks every channel for a month. Fix: 5–24 hour expiry, role-scoped (publisher / subscriber), bound to a specific channel, mapped to your internal user ID via a sidecar table.
2. Idle STT bleed. STT enabled at the channel level on every session. 60% of your minutes are silent waiting rooms. You burn $1,000+/month on transcription that produces no transcript. Fix: turn STT on per-utterance via the SDK control hooks, not channel-wide.
3. Over-recording. Composite recording on a 5-host channel costs ~5 service-minutes per minute. Most teams budget 1x. Fix: pick composite or individual based on the redaction model, set per-tenant recording profiles, and instrument the recording multiplier in your cost meter.
4. AV1 forced on legacy devices. AV1 is great in markets with modern devices. Forced on a 5-year-old Android, the call fails to connect. Fix: keep H.264 and VP8 in the codec list; let Agora’s ACT auto-select; only disable fallbacks if you specifically test for it.
5. No fallback strategy on weak networks. Default high-resolution video on a 3G connection produces buffering, freeze frames, and a churn spike. Fix: enable Network Quality Control, fall back to audio-only at < 100kbps, monitor packet loss with the SDK’s NetworkQuality API, and surface the network quality UI to the user.
KPIs that matter — quality, business, reliability
Quality KPIs. P95 join latency < 2.5s. P95 packet loss < 2%. Mean Opinion Score (MOS) for audio > 4.0. Captions latency < 800ms when STT is on. These are the numbers that decide whether users stay through minute three of the call.
Business KPIs. Cost-per-active-user-minute (CPAUM) by tenant. 30-day MAU retention. AI-agent containment rate (the fraction of agent calls that resolve without escalation to a human). Recording storage cost as a fraction of total infrastructure spend. These tie the platform decisions to revenue and gross margin.
Reliability KPIs. Session-completion rate > 99.95%. Recording success rate > 99.5%. Agent uptime > 99.9%. Mean time to detect an Agora platform incident < 60 seconds (via your own QoE pipeline, not the Agora console). The reliability layer is what your customer’s ops team cares about during their 9-to-5.
When NOT to choose Agora — and what to use instead
Agora is a strong default for many video and voice products in 2026, but it is not universal. Here is when we steer customers elsewhere.
Pick LiveKit when AI agents dominate the workload. LiveKit’s open-source agent framework is the strongest in the market and the per-track-minute pricing favors agent-heavy workloads where the LLM cost dominates the RTC cost. Read our LiveKit vs Agora cost analysis for the math.
Pick Daily.co when shipping in two weeks beats per-minute price. Pre-built React UI components, transparent flat pricing, fastest time-to-market. Right for early-stage products that need a first version live before they raise the next round.
Pick Zoom Video SDK when migrating from Twilio Programmable Video. Twilio sunset Programmable Video in December 2024 and pointed customers at Zoom Video SDK as the official migration path. If you are on Twilio in 2026, the path of least resistance is Zoom, not Agora.
Pick AWS Chime SDK when your stack is AWS-native and the workload is voice-first. $1.70 per 1,000 audio minutes is the cheapest tier on the market. Right for IVRs, contact-center backends, and voice-only B2B tools that already live in AWS.
Pick raw WebRTC when you have peer-to-peer dyads and own the networking expertise. No per-minute SaaS bill. Massive engineering cost. Right only when you have a small number of two-party calls (1-on-1 dating, B2B sales calls), the engineers to run TURN servers, and the discipline to monitor packet loss yourself. See the tradeoffs in detail.
What to look for in an Agora development partner
Most CPaaS partners can wire a sample app. Far fewer can ship the surrounding architecture. Five things worth screening for.
1. Three or more production Agora projects shipped. Not "we’ve evaluated Agora" or "we read the docs". Real, live customers. Ask for case studies and reference calls.
2. Owns the recording infrastructure, not just the SDK integration. Recording is where most projects fail at scale. A partner who has only integrated the SDK has not solved retention, redaction, watermarking, or compliance — and you will pay them again to do it.
3. Can architect for HIPAA / SOC 2 / GDPR. Geo-fencing, customer-managed keys, on-premise recording, audit logs, BAA-eligible service architecture. The compliance posture is set in week two, not week 22 — pick a partner who knows that.
4. Has migrated teams across CPaaS providers. Twilio → Agora, Agora → LiveKit, Zoom → Agora. The migration scars are the experience that tells you which platform fits which workload — and a team that has only ever shipped on one platform cannot give you that.
5. Owns Conversational AI integration end-to-end. Not just CAE. The LLM choice, the prompt graph, the TTS voice tuning, the cost meter, the fallback when OpenAI Realtime hiccups. The seams between RTC and AI are where most agent projects fail in production. Our AI-integration practice exists for exactly this seam.
Frequently asked questions
How much does Agora custom development cost?
For a production-grade first cut — token server, recording, AI agent integration, custom UI, multi-tenant model, basic moderation — budget 8–16 weeks of senior engineering. Exact figures depend on the workload, but the range we routinely scope sits well below the legacy CPaaS-integration estimates from 2022–2024 because Agent Engineering speeds up the build. We do not publish a flat number because the variance between a single-tenant 1-on-1 call and a HIPAA multi-tenant Conversational AI agent is enormous; we’d rather scope it for your workload.
Is Agora HIPAA-compliant?
Agora is HIPAA-eligible under a Business Associate Agreement (BAA) for covered services. The HIPAA-compliant architecture typically combines geo-fencing for traffic routing, on-premise recording so PHI is decrypted only inside your VPC, customer-managed encryption keys, and a complete audit log on your side. Agora supplies the primitives; the compliance posture is the architecture you build with them. We have shipped HIPAA telehealth on this stack — see CirrusMED.
Can Agora replace WebRTC entirely?
Agora is built on WebRTC under the hood — the choice is not "Agora vs WebRTC" but "managed CPaaS vs roll-your-own peer-to-peer plus TURN/SFU servers". Agora trades a per-minute fee for a private edge network, observability, and recording. Raw WebRTC trades dollars for engineering time. For dyads with sophisticated networking teams, raw WebRTC still wins on cost. For everyone else, a CPaaS like Agora ships sooner and stays up longer. Read the full breakdown in WebRTC vs Agora architecture tradeoffs.
How long does an Agora MVP typically take?
A stock-SDK MVP — 1-on-1 calling, basic UI, no recording, no AI — ships in 2–4 weeks with a senior engineer. A custom MVP with token server, recording, and a simple AI agent ships in 8–12 weeks. A multi-tenant white-label, HIPAA-compliant, AI-agent product is a 12–20 week first cut, with another 8–12 weeks of hardening before a serious launch.
Does Agora support end-to-end encryption?
Yes — AES-128 / 256 with customer-managed keys is in beta. AES-128/256 in transit is the default. End-to-end encryption (where Agora cannot decrypt media at all) is an opt-in beta feature; you supply and rotate the keys yourself. E2EE is not currently supported on the on-premise recording mode — that detail tends to surface late in compliance review.
Is Agora cheaper than LiveKit for AI voice agents?
It depends on the LLM cost share. At low LLM-token volumes, Agora’s CAE bundle ($0.0265/agent-min all-in) is competitive. At high volumes where LLM tokens dominate, LiveKit’s open-source agent framework plus your own LLM contract often wins because you can negotiate the LLM price directly. Pure RTC pricing favors Agora; pure agent-orchestration pricing favors LiveKit.
What is the maximum audience on an Agora channel?
128 concurrent video publishers per channel and an unlimited audience via broadcast mode — in practice scaling to millions of passive viewers. The default 17-host limit applies only to the basic Voice/Video SDK; Interactive Live Streaming raises the cap. Each host can subscribe to a maximum of 50 other hosts simultaneously — relevant for very large breakout-room patterns.
How do I integrate OpenAI Realtime with Agora?
Two paths. Path A: use the Conversational AI Engine and pick OpenAI Realtime from the bundled providers — lowest engineering cost, lowest control. Path B: capture audio from the Agora channel server-side, push to OpenAI Realtime via WebRTC or WebSocket, publish the response audio back into the channel as another participant — highest control, more wiring. We use Path A for pilots and the simple cases, and Path B when the product needs custom prompt graphs, voice cloning, or strict data-residency on the LLM provider.
What to Read Next
Architecture
WebRTC vs Agora: Architecture Tradeoffs in 2026
When to roll your own peer-to-peer plus TURN, and when the per-minute SaaS fee is the cheaper engineering call.
Cost analysis
LiveKit vs Agora: A Cost Analysis
The line-item math on AI-agent-heavy workloads, with worked examples of when each platform wins.
Build guide
How to Build a Video Call App with Agora SDK in 2026
Architecture, cost envelope, and migration paths to LiveKit / Zoom Video SDK if you outgrow Agora.
Scale
Scaling Real-Time Video to 1 Million Viewers
Hybrid CDN, RTMP/HLS fallback, cross-channel media relay — the architecture above the 17-host cap.
Strategy
Hire a WebRTC Development Company vs Build In-House?
A pragmatic decision framework for staffing video / RTC products in 2026.
Ready to make Agora work for your product?
Custom Agora development services exist because the calling SDK is the easy part. The architecture around it — the token server, the recording cluster, the moderation worker, the AI orchestrator, the cost meter, the per-tenant analytics — is the part that decides whether your product is defensible at scale, compliant under audit, and profitable on month six. Pick the platform on geography and workload (Agora wins SE Asia and LATAM RTC; LiveKit wins agent-heavy; Daily wins fastest TTM; Zoom is the Twilio migration). Then build the architecture so the three line items in section 5 do not surprise you.
If you are at the scoping stage, the most useful next step is a 30-minute call with someone who has shipped on multiple CPaaS providers and can tell you which line items hurt at your scale. We will tell you whether Agora is the right call — even if the answer is "use LiveKit instead". The point is shipping a defensible product, not selling you a platform.
Make Agora work for your product
A 30-minute scoping call with a senior engineer who has shipped Agora, LiveKit, Daily, Twilio, and Zoom Video SDK in production. We’ll come back with a custom-vs-stock recommendation, a cost envelope, and a 12-week shipping plan.


.avif)

Comments