LiveKit Agents 1.6WebRTC + SIPgpt-realtime250+ projects since 2005
LiveKit AI Agent Development

Production LiveKit AI agents — voice, video, and multimodal

We build real-time AI agents on LiveKit Agents 1.x: speech in, an LLM in the middle, speech out, with video and screen-share when you need it. First pilot in –3 weeks, from $8K. Sub-500 ms voice-to-voice, semantic turn-taking, and native telephony — for 10 calls a day or 10,000.

Sub-500 ms
Target voice-to-voice round trip
20+ yrs
Building real-time media since 2005
250+
Custom projects shipped
70+ langs
Speech in, via gpt-realtime / Deepgram
Who this is for

Built for teams putting a real-time agent in front of users

If your agent has to listen, think, and talk back while the user is still moving — a support line, a sales qualifier, a tutor, an in-app copilot — LiveKit handles the transport and we build the agent on top.

Voice support & call deflectionAI sales / qualification agentsIn-app voice copilotsTutoring & coaching agentsHealthcare intake (HIPAA)Drive-thru & kiosk orderingOutbound reminder & survey callsVideo agents (avatar + screen-share)Multilingual conciergeVoice-enabled SaaS features
Options

Managed voicebot platform, DIY build, or custom on LiveKit

A managed platform (Vapi, Retell, Bland) ships a demo in a day but locks your models, latency, and call data behind their stack. A custom LiveKit build gives you the pipeline, the vendors, and the data — and still launches in weeks, not quarters.

Managed voicebot platformDIY in-houseCustom on LiveKit (Fora Soft)
Turn-taking & latencyFixed by the platform; barge-in often laggyWhatever you build; hard to tuneSemantic turn detection + sub-500 ms voice-to-voice, tuned per use case
Model choiceLocked to their LLM/STT/TTS menuOpen, but you wire every pluginAny: gpt-realtime, Claude, Gemini Live, Deepgram Nova-3, ElevenLabs, Cartesia Sonic-3
Voice + telephony + webUsually phone-firstYou integrate SIP yourselfLiveKit SIP 1.0 + WebRTC from one agent
Data ownershipTranscripts / audio on their serversYoursYours — your cloud, your logs, your eval set
CustomizationTemplated flowsUnlimited but slowTools, RAG, barge-in, custom VAD, avatars — built to spec
Cost at scalePer-minute markup compoundsEng time is the costYour infra + LiveKit; no per-minute platform tax
Time to productionDays to a demo, then a wallMonths3–6 weeks to a production pilot

Managed platforms are a fine way to validate a script. The moment latency, model choice, telephony, or call-data ownership matters, the custom build wins — at any call volume. New to the trade-offs? See the complete LiveKit for AI agents guide, or compare transports in LiveKit vs Agora.

The agent loop

What happens between the user speaking and the agent answering

A real-time voice agent is a loop that has to close in under half a second, every turn. Here is the pipeline we build and where the milliseconds go.

CaptureMic · WebRTCtransportTurn detectSemantic VAD50–150 msSTTDeepgram Nova-3<300 msLLM + toolsgpt-realtimeRAG · MCP200–600 msTTSSonic-3 · Flash75–150 msPlayback+ barge-inliveEnd to end: under 500 ms voice-to-voice — gpt-realtime collapses STT + LLM + TTS into one hop
Figure 1: The real-time agent loop. Each turn closes in under ~500 ms; the latency budget is split across turn detection, STT, the LLM, and TTS.
1

Capture & turn detection

Audio comes in over WebRTC. LiveKit’s semantic TurnDetector v1 decides when the user has actually finished — using intonation and rhythm, not just silence — so the agent stops interrupting and stops talking over the user.

Budget ~50–150 ms
2

Speech to text

Deepgram Nova-3 (or GPT-Realtime-Whisper) transcribes as the user speaks — about 6.84% WER on real-world audio, 36 languages. Partial transcripts stream to the LLM so it can start thinking early.

Budget <300 ms streaming
3

Reason, call tools, retrieve

The LLM (gpt-realtime, Claude, or Gemini Live) decides what to say, calls your tools — book, look up, transfer — and pulls answers from your knowledge base over RAG. MCP tool-calling wires the agent to your systems.

Budget 200–600 ms to first token
4

Text to speech

Cartesia Sonic-3 or ElevenLabs Flash v2.5 starts speaking within ~75–90 ms, so the reply begins before the full sentence is generated — in your brand voice, your languages.

Budget 75–150 ms first audio
5

Playback, barge-in & handoff

Audio streams back over WebRTC. If the user cuts in, barge-in stops playback instantly and the loop restarts. When the agent hits its limit, it transfers to a human over SIP with full context.

Continuous

End to end, a well-tuned LiveKit loop answers in under 500 ms — the threshold where a conversation feels natural instead of robotic. Speech-to-speech models (gpt-realtime) shorten the loop further by skipping the separate STT and TTS hops.

Architecture

The pieces we assemble for a production agent

LiveKit is the transport and orchestration layer. Around it we wire the models, telephony, data, and observability that turn a demo into something you can put on a phone line.

Layer
What we build
Transport
LiveKit WebRTC for web and mobile, LiveKit SIP 1.0 for phone numbers and PSTN, one agent serving both
Turn-taking
Semantic TurnDetector v1 + tuned VAD and endpointing, barge-in and interruption handling, telephony noise cancellation
Models
Pluggable LLM (gpt-realtime, Claude, Gemini, Llama), STT (Deepgram Nova-3, Whisper, AssemblyAI), TTS (Cartesia Sonic-3, ElevenLabs, OpenAI)
Logic & data
Tool and function calling, MCP servers, RAG over your knowledge base, state and memory, escalation rules
Telephony & handoff
Inbound and outbound calls, warm transfer to human agents, IVR replacement, call recording with consent
Eval & observability
Transcript logging, latency and percentile dashboards, a regression eval set so prompt changes do not break live calls
Deploy & scale
Your cloud or LiveKit Cloud, autoscaling worker pools, HIPAA and SOC 2-ready patterns where required
Use cases

What teams build with LiveKit agents

Support

Call deflection & support

An inbound voice agent that answers FAQs, looks up account state via tools, and transfers to a human with context when it cannot resolve. Replaces hold queues.

Sales

AI sales & qualification

Outbound or inbound agent that qualifies leads, books meetings on the calendar, and logs the call to your CRM over MCP.

In-app

In-app voice copilot

A voice- and screen-aware agent embedded in your web or mobile app that walks users through tasks hands-free.

Education

Tutoring & coaching

A multilingual tutor that listens, corrects, and adapts in real time; video and avatar optional for presence.

Healthcare

Intake & reminders

HIPAA-pattern voice intake, appointment reminders, and post-visit follow-up over the phone line.

Front line

Drive-thru, kiosk & IVR

Order-taking and front-line phone automation that understands interruptions and accents, with sub-500 ms turn-taking.

Build vs Buy

When a custom LiveKit agent is the right call

Managed voicebot platforms are built to demo fast. A custom build is built to be yours — your latency, your models, your data, your roadmap. Here is the honest split.

Control & model freedomlowhighOwnership & fit to your productManaged platformDIY in-houseCustom on LiveKit(Fora Soft)
Figure 2: Value axes, not scale. A custom LiveKit agent wins on control and ownership at any call volume — there is no minimum size where it starts to make sense.
Buy a managed platform when
You are testing a script and want a demo this week
Call volume is low and per-minute markup does not bite yet
You are fine with their models, their latency, and their servers holding your call data
Right when: the agent is a quick experiment, not a product surface.
Build on LiveKit when
Latency and natural turn-taking are part of the experience
You want to pick (and swap) the LLM, STT, and TTS as the market moves
The agent must call your systems, follow your rules, and speak in your voice
Call transcripts and audio have to live in your cloud
At any size — worth owning from your first call, not only at enterprise volume
Right when: the agent is a product surface, not a side experiment — a production pilot can be live in 6–12 weeks.

Not sure which side you are on? The free architecture review below will tell you straight.

How we work

Three ways to bring us in

Pricing

Starting points, not size caps

Fixed-scope starting points for a LiveKit agent build. Every number is a floor you build up from, not a ceiling you are capped at.

Voice Agent Pilot
from $8K
~–3 weeks
  • One agent, one pipeline (STT + LLM + TTS or gpt-realtime)
  • Web or a single phone number
  • Tuned turn-taking, basic tools
  • A production-ready pilot for real users
Start a pilot
Most teams start here
Production Agent
from $16K
~–6 weeks
  • Voice + telephony (SIP)
  • Tools / function-calling + RAG
  • Warm human handoff
  • Eval harness + latency dashboards
  • Deployed to your cloud
Scope a production build
Multimodal / Platform
from $32K
~6+ weeks
  • Voice + video + screen-share or avatars
  • Multi-agent routing, multilingual
  • Observability + compliance (HIPAA / SOC 2)
  • Autoscaling for high call volume
Plan a platform

Model and usage costs (LLM, STT, TTS, LiveKit) are billed at provider rates — no per-minute platform markup from us. We help you forecast them in the estimate.

Free for qualified projects

Three ways to de-risk before you commit

Before you spend a dollar on the build, we will help you figure out whether it should exist and how it should be shaped.

Why Fora Soft

Real-time is what we do

We are not a generalist shop that added a voice demo last quarter. Real-time audio and video has been the core of the business for 20 years.

Track record

Since 2005, 250+ projects

Two decades shipping real-time media systems — not a pivot into this quarter’s trend.

Same layer

WebRTC & LiveKit are home turf

We work in the same transport, SFU, and turn-taking layer agents run on, so the agent and the media stack are built by the same people.

Full pipeline

The whole pipeline, named

Turn detection, STT, LLM, tools and RAG, TTS, telephony, eval — we build and tune every hop, and we will tell you the exact versions we run.

Latency

We tune for the half-second

Sub-500 ms voice-to-voice is a target we engineer toward — semantic turn-taking, streaming STT, first-token and first-audio budgets.

Ownership

You own it

Your code, your cloud, your call data, your eval set. No platform lock-in, no per-minute tax.

Straight talk

Honest scoping

If a managed platform is genuinely the right call for your stage, we will say so on the first call.

FAQ

LiveKit AI agents, answered

The questions buyers ask before they build. The same answers power this page’s FAQ schema.

What is LiveKit AI agent development?

Chevron down icon for interactive fields

Why build on LiveKit instead of a platform like Vapi or Retell?

Chevron down icon for interactive fields

How low can the latency go?

Chevron down icon for interactive fields

Which models can the agent use?

Chevron down icon for interactive fields

Can the agent answer real phone calls?

Chevron down icon for interactive fields

Can it handle interruptions (barge-in)?

Chevron down icon for interactive fields

Can the agent use our data and tools?

Chevron down icon for interactive fields

Voice only, or video and multimodal too?

Chevron down icon for interactive fields

How long does a build take, and what does it cost?

Chevron down icon for interactive fields

Do we own the code and the data?

Chevron down icon for interactive fields
Further reading

Go deeper on LiveKit agents

Ready to put a LiveKit agent in front of users?

Tell us what the agent needs to do. We will map the pipeline, name the models, and give you a timeline and a number — in one call. Building plain LiveKit infrastructure, or a phone-only call agent? See LiveKit development and AI call agents.

Specialist software house for video, real-time and AI products. Founded 2005. 50 in-house engineers.

+1 (914) 775-5855
New York · USA
© Fora Soft, 2005–2026
Describe your project and we will get in touch
Enter your message
Enter your email
Enter your name

By submitting data in this form, you agree with the Personal Data Processing Policy.

Your message has been sent successfully
We will contact you soon
Message not sent. Please try again.