Published 2026-06-02 · 26 min read · By Nikolay Sapunov, CEO at Fora Soft

Why this matters

Real-time sales coaching moved from a nice idea to a shipping product category in 2026: tools now surface objection responses and competitive battlecards fast enough that reps actually read them mid-call. If you build conferencing, sales-enablement, or telehealth software, your customers are starting to ask for this, and the buy-vs-build question lands on your desk. This lesson gives a product lead a concrete mental model of what the feature really is — three capabilities and one privacy rule — and gives an engineer the LiveKit primitives that implement each one. It is the third lesson in the LiveKit build series, after the architecture pillar and the note-taker repo, and it assumes you have read at least the pillar.

From note-taker to coach: what actually changes

The LiveKit meeting note-taker we built in the previous lesson is the right starting point, because a coach is a note-taker that grew three new senses and learned to keep a secret. It already joins the call as an extra software participant, already transcribes every speaker separately, and already stays silent. We keep all of that and add to it.

The note-taker was deliberately passive. It collected words during the call and did its one clever thing — the summary — only after everyone left. A coach inverts that. Its value is entirely in the moment: a tip that arrives after the call is worthless. So the coach has to think continuously while the call runs, and it has to act on three kinds of input instead of one.

Here are the three additions, in plain terms before we go deep on each.

First, tool-calling. The note-taker never needed to look anything up. A coach does: when the prospect says "we already use Acme," the coach should be able to fetch the Acme battlecard; when the prospect names their company, the coach should be able to pull that account's record from your customer database. Tool-calling is the mechanism that lets the language model decide, on its own, to go get a specific piece of information before it answers.

Second, screenshare awareness. Sales calls are not only talk. The rep shares a screen — a slide deck, a live product demo, a pricing page — and a coach that cannot see the screen is half-blind. We give the coach eyes by subscribing to the screen-share video track and sampling still frames from it, so the model can notice "the rep is still on the intro slide eight minutes in" or "the demo just threw an error."

Third, and most important, a private channel. The note-taker's silence was a feature; the coach's silence toward the prospect is a hard requirement. The coach must reach the rep and only the rep. We do that with a direct, addressed message that the LiveKit server delivers to a single participant, so the guidance is structurally invisible to everyone else in the room.

The rest of this lesson takes those three in order, then shows how they combine into one loop. We start with the privacy rule, because it constrains everything else.

The non-negotiable rule: the coach is private

A sales coach that the prospect can hear is not a coaching tool; it is a sabotage tool. The entire feature rests on one guarantee: the coach's output goes to the rep and to no one else. Get this wrong once in production and you have leaked your own playbook — your discount limits, your competitive talking points, your read on the buyer — to the buyer, on a recorded call. So we design the privacy in at the transport layer rather than hoping a prompt will keep the model quiet.

LiveKit gives us exactly the right primitive: a remote procedure call, or RPC — a way for one participant to invoke a named function on another specific participant and get a reply. The word "specific" is the whole point. When the agent calls perform_rpc, it must name a destination_identity. The LiveKit server delivers that message to that one participant. No other participant in the room receives it. The prospect's client is never a destination, so there is nothing for them to intercept.

Diagram contrasting the wrong and right ways for a coach to deliver advice in a live call. On the left, labelled "Never", a coach speaks an audio reply into the shared room, so both the rep and the prospect hear the private guidance. On the right, labelled "The only safe pattern", the coach sends a text payload over an addressed remote procedure call whose destination identity is the rep alone; the LiveKit server routes it to the rep's client only, and the prospect's client receives nothing. A footer line states the rule: the coach publishes no audio and no room-wide data; advice reaches exactly one identity. Figure 1. The one rule that makes a coach safe: advice travels on a channel addressed to the rep's identity, never as room audio or a room-wide broadcast.

In code, the rep's app registers a handler the agent can call, and the agent calls it with the rep's identity as the destination. The shape is small:

# In the agent: deliver a coaching nudge to the rep ONLY.
await ctx.room.local_participant.perform_rpc(
    destination_identity=rep_identity,        # the rep — never the prospect
    method="coach_nudge",                     # a method the rep's app registered
    payload=json.dumps({
        "kind": "objection",
        "headline": "Price objection — anchor on ROI",
        "detail": "They flagged budget. Pull up the ROI slide; cite the 6-week payback.",
    }),
    response_timeout=4,                        # give up fast; a stale tip is noise
)

Three things in that call carry the safety. The destination_identity is the rep, established when the rep joined and passed to the agent — we cover how to pass it in the RAG section. The payload is plain text, capped by LiveKit at 15 KiB, which is far more than a nudge needs; we send small JSON so the rep's UI can style an objection tip differently from a next-question tip. And the response_timeout is short on purpose: coaching is perishable, so if the rep's client cannot acknowledge within a few seconds, we drop the tip rather than let it arrive late and confusing.

The matching half lives in the rep's client, registered once before the call so it is ready the instant the agent fires:

# In the rep's app: receive nudges and render them in a private side panel.
@room.local_participant.register_rpc_method("coach_nudge")
async def on_coach_nudge(data: RpcInvocationData):
    tip = json.loads(data.payload)
    render_in_coach_panel(tip)   # a panel only the rep can see
    return "shown"               # acknowledgement back to the agent

Because RPC is a request-and-response mechanism, the agent learns whether the tip was delivered. The rep's client returns a short acknowledgement; if the agent instead receives a recipient-disconnected or response-timeout error, it knows the nudge did not land and can decline to send a follow-up that assumes the rep saw the first one. One operational note from the spec worth internalizing: a participant marked "hidden" cannot make RPC calls at all, so the agent itself must be a normal, non-hidden participant for this to work — though it still publishes no audio and no video.

A reasonable question: why not use a broadcast data message, which LiveKit also supports? Because a broadcast reaches every participant by definition, and the only thing standing between a broadcast and the prospect's screen would be client-side code that chooses to ignore it. That is a privacy guarantee made of hope. An addressed RPC moves the guarantee into the server's routing, where it belongs. The rule to carry forward: the coach publishes no audio track, and sends no room-wide data — it speaks only in addressed, single-recipient messages.

Capability one: tool-calling

With the channel safe, we can let the coach be useful. The first new sense is the ability to look things up, and in LiveKit Agents that is tool-calling.

A tool is just a function in your agent's code that you allow the language model to call. You write an ordinary Python function, decorate it so the framework registers it, and write a clear docstring describing what it does — and the model reads that docstring to decide, on its own, when the function is worth calling. The decision is the model's; your job is to make the menu of choices and describe each dish well.

For a sales coach, the menu is your sales process turned into functions. A few that nearly every coach wants:

from livekit.agents import function_tool, RunContext

@function_tool()
async def get_battlecard(self, context: RunContext, competitor: str) -> str:
    """Fetch the one-page battlecard for a named competitor the prospect mentioned.
    Use this the moment a rival product comes up so the rep can counter it."""
    return await battlecards.lookup(competitor)

@function_tool()
async def get_account_context(self, context: RunContext, company: str) -> str:
    """Pull the CRM record for the prospect's company: open deals, past notes,
    contract renewal date. Use this early to ground advice in this account's history."""
    return await crm.fetch(company)

@function_tool()
async def check_discount_authority(self, context: RunContext, percent: float) -> str:
    """Return whether the rep is allowed to offer the requested discount, and the
    approval step if not. Use this whenever price or discounting is discussed."""
    return await pricing_rules.evaluate(self._rep_id, percent)

Read the docstrings, not just the code. Each one tells the model two things: what the tool returns and when to reach for it. "Use this the moment a rival product comes up" is not a comment for humans — it is the instruction the model actually follows. Vague docstrings produce a coach that calls the wrong tool at the wrong time; precise ones produce a coach that behaves like a disciplined sales engineer.

LiveKit supports two kinds of tools, and a sales coach uses both. The functions above are function tools — code you wrote, running on your server, reaching your systems. The second kind is provider tools: capabilities the model vendor runs on their own servers, exposed to your agent with one line. Several frontier vendors offer built-in web search, file search over an uploaded document set, and code execution as provider tools. For a coach, a provider's file-search tool is a quick way to let the model query your sales collateral without you building a retrieval system on day one — though, as we will see in the RAG section, you will usually outgrow it.

A flow diagram of one pass through the coach's tool loop. A prospect's spoken turn is transcribed and reaches the language model, which decides whether it needs external information. If yes, it issues a tool call — for example get_battlecard or get_account_context — which runs against your CRM, battlecard store, or pricing rules and returns a result; the model then composes a short nudge. If no tool is needed, the model composes the nudge directly. Either way the nudge is delivered to the rep over the private remote procedure call from Figure 1. A side note marks that slow tools run in the background so the loop is never blocked. Figure 2. One turn of the coach's loop. The model decides whether to look something up, runs the tool against your systems, then delivers a private nudge — never a spoken reply.

There is one trap unique to a real-time coach, and it is about time, not correctness. A normal voice assistant can pause and say "let me check that for you" while a slow tool runs, because the human is waiting for the answer. A coach has no such luxury: the prospect keeps talking whether or not your CRM has responded. If the coach blocks the whole loop waiting on a three-second CRM query, it falls behind the conversation, and a tip about a topic from thirty seconds ago is worse than no tip at all. LiveKit's answer is async, or background, tools: a long-running tool runs without freezing the agent, so the model can keep processing new turns and deliver the slow result when it lands. The design rule: any tool that touches a network service you do not control should be treated as potentially slow and run in the background, with the loop free to move on.

Let us make the latency concrete, because it is the number that decides whether the feature feels magical or broken. Suppose the prospect finishes a sentence and your pipeline runs: speech-to-text finalization, the model deciding to call a tool, the CRM round-trip, and the model composing the nudge.

speech-to-text finalize      ≈ 300 ms
model decides + emits call    ≈ 400 ms
CRM tool round-trip           ≈ 500 ms
model composes the nudge      ≈ 500 ms
RPC delivery to the rep       ≈ 50 ms
-----------------------------------------
total                         ≈ 1,750 ms

Under two seconds from the end of the prospect's sentence to a tip on the rep's screen. That is fast enough to be useful and slow enough that you should not pretend otherwise: the rep reads the tip during their own thinking pause, not as the prospect's last word lands. The lesson on a sub-100-millisecond latency budget explains why the speech and network stages cost what they do; the practical takeaway here is that the tool round-trip is the biggest controllable slice, which is exactly why slow tools belong in the background and why a cached or pre-loaded battlecard beats a live database query when you can manage it.

Capability two: screenshare awareness

The second sense is sight, and specifically sight of the rep's shared screen. In LiveKit, a shared screen is published as a video track, exactly like a camera — the same plumbing, a different source. So giving the coach eyes means subscribing to that track and turning its moving picture into still frames the model can read.

Language models do not watch video the way we do. Most read a frame — one still image — at a time. So the coach does not stream the whole screen to the model; it samples. It grabs a frame on a schedule or when something interesting happens, hands that single image to the model alongside the recent transcript, and lets the model reason about both together. LiveKit's own vision support samples about one frame per second while the user is speaking and one frame every three seconds when they are not, and resizes each grabbed frame to fit within 1024 by 1024 pixels before encoding it as a JPEG. Those defaults are a sane starting point; for a screen share you will often slow them down further, because a slide deck changes far less often than a face.

A pipeline diagram of screenshare frame sampling. On the left, the rep publishes a screen-share video track. The agent selects that track specifically by its screen-share source — not the camera, and not merely the most recently published track. A sampler grabs one still frame on a schedule, for example one every few seconds, and resizes it to fit within 1024 by 1024 pixels, encoding it as a JPEG. The frame is attached to the next model turn together with the recent transcript, so the model reasons about what is on screen and what was said at the same time. A footer note warns that each sampled frame is a billed vision token cost, so the sampling rate is a direct cost lever. Figure 3. Turning a shared screen into something a model can read: select the screen-share track, sample one frame on a schedule, resize and encode it, attach it to the next turn.

The detail that separates a working coach from a confusing one is which track you sample. A sales call often has several video tracks live at once: the rep's camera, the prospect's camera, and the rep's shared screen. LiveKit's automatic live-video mode uses only the single most recently published video track, which is convenient for a one-camera assistant and wrong for a coach — if the prospect turns their camera on after the rep starts sharing, the coach would suddenly be staring at the prospect's face instead of the demo. So a coach should not rely on the automatic mode. Instead it enumerates the rep's published tracks, picks the one whose source is the screen share, and samples that track explicitly. The screen share has its own track source precisely so you can tell it apart from a camera; use that distinction rather than guessing by recency.

Here is the selection and sampling, trimmed to the essential moves:

from livekit import rtc

def attach_to_screenshare(self, rep: rtc.RemoteParticipant) -> None:
    # Pick the SCREEN-SHARE track specifically, not the camera, not "most recent".
    for pub in rep.track_publications.values():
        if pub.source == rtc.TrackSource.SOURCE_SCREENSHARE and pub.track:
            self._frames = rtc.VideoStream(pub.track)
            break

async def latest_screen_frame(self):
    # Keep only the newest frame; we sample, we do not stream every frame.
    async for event in self._frames:
        self._latest = event.frame

When the coach is about to think — say, on each completed prospect turn — it attaches the most recent screen frame to that turn, the same way the note-taker attached a transcript line. The model then sees the words and the screen at once and can produce advice that depends on both: "They asked about security and you are still on the pricing slide — jump to the compliance section."

Two cautions. The first is cost, and it is not small. Every frame you sample is an image the model has to read, and vision input is billed per image. Sample a 30-minute call once every three seconds and you have fed the model roughly six hundred images; sample once a second and you have fed it eighteen hundred. The sampling rate is therefore a direct, linear cost lever — the cheapest screenshare coach is one that samples only when the transcript suggests the screen matters, not on a fixed fast clock. We model these per-feature economics in the real cost of AI in video products, and the trade-offs in how often and how richly to sample frames are the whole subject of video VLMs — frame sampling versus token streaming.

The second caution is comprehension. A still frame of a slide is easy for a model to read; a frame mid-scroll, or a fast product demo, can be a blur that tells the model nothing useful. Screen content is also unusually detailed — small text, dense tables — so down-scaling to 1024 pixels can erase the very text you wanted read. For a coach this is usually fine, because you want the gist ("a pricing table is on screen"), not the fine print; but if you ever need the fine print, sample at a higher resolution for that one frame and accept the higher cost.

Capability three: feeding the coach your playbook

Tools let the coach pull a specific record on demand. But a coach also needs broad, always-on knowledge — your methodology, your product facts, your competitive positioning — that should color every tip without a tool call each time. That standing knowledge is what retrieval-augmented generation, or RAG, supplies, and LiveKit's guidance on external data shapes how you wire it in without slowing the call.

Start by separating two kinds of knowledge by how often they change. Your sales methodology and product facts are the same for every call, so load them once when the agent process starts — LiveKit calls this the prewarm step — and reuse them across every call that process handles. The opposite extreme is data specific to this one call: which rep is on it, which account, what was said on the last call. That belongs not in prewarm but in the call's metadata, passed to the agent when it is dispatched. LiveKit lets you attach this as job metadata or participant attributes, and the guidance is explicit: send per-call data as metadata rather than fetching it inside the agent's startup path, and if you must make a network call at startup, make it before the agent connects to the room, so the rep never sees the coach appear before it is actually ready. This is also where the rep's identity — the destination_identity from Figure 1 — arrives, so the coach knows who to whisper to.

Between those two extremes sits the live retrieval: as the prospect talks, pull the most relevant slice of your playbook for what they just said. There are two ways to do it, and choosing well is the difference between a snappy coach and a sluggish one. The tool-call way is what we saw earlier: the model decides to call get_battlecard, which costs an extra model round-trip. The faster way, when you are running a speech-to-text-then-model pipeline, is to retrieve on the user's completed turn before the model runs, and inject the result straight into the model's context. LiveKit exposes this as the on_user_turn_completed hook, and its own docs note the win plainly: it avoids the extra round-trips that tool calls incur, because the lookup happens in the same step as the turn rather than as a separate call the model has to request and wait for. The trade is that you give up the model's judgment about whether to look something up — you always look — so it works best for retrieval you want on every turn, like "find the playbook passage most relevant to what was just said."

A simple rule of thumb: use turn-completed injection for the broad, every-turn playbook context, and reserve explicit tools for the precise, occasional actions — pulling one named account, checking one discount, logging an outcome to the CRM. The lesson on video RAG over an archive goes deep on building the retrieval side itself; here the point is only where in the call loop each kind of knowledge enters.

Putting it together: the coach loop

We now have all the parts. Stand back and watch one full loop run, because the architecture is just these pieces wired in the right order.

An end-to-end architecture diagram of the LiveKit sales coach. At the top, two humans — the rep and the prospect — are in a LiveKit room along with the silent coach agent. Three inputs flow into the agent: each participant's microphone, transcribed by a per-speaker speech-to-text session; the rep's screen-share video track, sampled into still frames; and standing playbook knowledge loaded at startup plus per-call account data passed as metadata. These feed the model loop in the centre, which on each prospect turn retrieves relevant playbook context, optionally calls a tool such as a battlecard or CRM lookup, and composes a short nudge. The single output, on the right, is a private remote procedure call addressed to the rep's identity, rendered in a coach panel only the rep can see. The prospect receives nothing from the agent. A footer lists the billed meters: WebRTC participant minutes, agent session minutes, speech-to-text per minute, model text tokens, and vision tokens per sampled frame. Figure 4. The whole coach as a runtime. Three inputs — speech, screen, playbook — feed one model loop; one private output reaches the rep alone.

The prospect finishes a sentence. The per-speaker transcriber — the same pattern as the note-taker, one speech-to-text session per participant so every line is attributed — produces a finalized line attributed to the prospect. On that completed turn, the coach retrieves the most relevant playbook passage and injects it, attaches the latest screen frame, and runs the model. The model may decide it needs a specific fact and call a tool; if the tool is slow, it runs in the background while the loop stays responsive. The model composes a short nudge. The coach sends that nudge over an RPC addressed to the rep, and the rep's private panel updates. The prospect, throughout, has heard and seen nothing but the rep.

Notice that the coach still never speaks and still publishes no video. It is, like the note-taker, a listen-and-watch participant — it just also thinks continuously and whispers privately. Everything that made the note-taker safe and cheap to run is preserved; we have only added senses and a mouth that opens to exactly one person.

Build this, or buy a coaching add-on?

Real-time coaching is a real product category now, so the honest comparison is against buying rather than building. Established conversation-intelligence vendors and a wave of newer real-time tools deliver live battlecards, objection prompts, and missed-question alerts during calls; the 2026 shift in that market is "agentic" coaching that does not just flag a moment but drafts the follow-up email and updates the CRM afterward. If you want coaching bolted onto your reps' existing Zoom or Meet calls and you do not need it inside your own product, buying is faster.

You build the LiveKit version when the coaching has to live inside software you already own — your own conferencing app, a telehealth platform, an e-learning room, a sales tool you ship to customers — or when the playbook, the models, and the call data must stay on infrastructure you control. That is the same build-vs-buy logic as the note-taker, one capability tier up.

Path What you do Best when
Build on LiveKit (this lesson) Run a coach agent inside your own room Coaching lives in your product; you own the data and playbook
LiveKit self-hosted Run the open-source server plus the agent Strict privacy, data-residency, or on-prem requirements
Buy a real-time coaching tool Connect it to reps' Zoom/Meet calls Coaching is an add-on to calls you don't own

The cost shape, if you build, extends the note-taker's. The same three LiveKit meters apply — WebRTC participant minutes for everyone in the room including the agent, agent session minutes for the agent's own runtime, and speech-to-text per minute of audio. A coach adds two model meters the note-taker mostly avoided: text tokens for the continuous thinking, and vision tokens for every sampled screen frame. The note-taker ran one model pass at the end of the call; a coach runs many during it. That is the price of acting in the moment, and the sampling rate from Figure 3 is your main dial for controlling the vision half of it.

Common pitfall: the coach that leaks, blocks, or nags. Three failure modes account for most broken coaches. The first is leakage — the agent is configured with audio output enabled, or it sends advice as a room-wide data broadcast, and the prospect sees the playbook. Fix: publish no audio, and deliver only via single-recipient RPC, as in Figure 1. The second is blocking — a slow CRM tool freezes the loop and the coach falls behind the conversation. Fix: run network tools in the background and keep the loop free. The third is nagging — the coach fires a tip on every turn, the rep drowns, and turns the panel off. Fix: make the model's instructions raise the bar for interrupting, and rate-limit nudges in your own code so at most one lands every several seconds. A coach that is silent 90% of the time and sharp the other 10% is the one reps keep on.

A word on consent and the law

Because a coach processes the prospect's words and screen in real time, it sits squarely inside call-recording and AI-disclosure law, and you must design for that, not bolt it on. In the United States, several states require all parties to a call to consent to its being recorded or monitored; in the European Union, the AI Act's transparency rules require that people are informed when they are interacting with or being processed by an AI system in many contexts. The safe defaults: disclose at the start of the call that an AI assistant is present, gate the coach behind that disclosure and consent, and keep the prospect's data handling inside whatever your recording consent already covers. None of this is legal advice — we are engineers, not your counsel — and the specifics vary by jurisdiction and change over time, so treat consent as a product requirement to confirm with a lawyer, the same way you would treat storing health data. The disclosure-engineering details for AI features live in the EU AI Act Article 50 lesson.

Where Fora Soft fits in

We have built real-time video products since 2005 — video conferencing, telemedicine, e-learning, and live collaboration — and the meeting agent has become a near-default request on conferencing projects we scope. The coach is the higher-value version of that request: the same listen-and-watch participant, now with tool-calling into a client's CRM and playbook and a private channel back to one user. The architecture in this lesson is the one we reach for when the coaching has to live inside the client's own product rather than a third-party add-on bolted onto Zoom. In telemedicine the same private-nudge pattern becomes a clinician prompt during an intake; in e-learning it becomes a quiet hint to an instructor; in sales it is the coach described here. The engineering judgment we care about is the privacy rule first, the latency budget second, and the cost dial — the sampling rate — third.

What to read next

Talk to us / See our work / Download

  • Talk to a LiveKit engineer — bring us your coach, copilot, or real-time assistant idea and we will help you shape the agent, the tools, the screenshare pipeline, and the private channel: /livekit-ai-agent-development-experts
  • See our work — real-time video and conferencing products we have shipped since 2005: /portfolio
  • Download the build cheat sheet — a one-page LiveKit Sales Coach build cheat sheet (PDF) summarizing the private-channel rule, the three capabilities, a starter tool list, the latency budget, and the build-vs-buy and consent checklist.

References

  1. LiveKit Docs — Tool definition and use — function tools vs provider tools; what a tool can do (speech, RPC to frontend, handoff, context, external APIs/RAG); provider tools by vendor (Anthropic ComputerUse; Gemini GoogleSearch/FileSearch/CodeExecution; OpenAI WebSearch/FileSearch/CodeInterpreter; xAI; Mistral). Accessed 2026-06-02. https://docs.livekit.io/agents/logic/tools/
  2. LiveKit Docs — Function tools (definition) — the @function_tool decorator, the docstring as the LLM's instruction, RunContext, dynamic tools, and error handling. Accessed 2026-06-02. https://docs.livekit.io/agents/logic/tools/definition/
  3. LiveKit Docs — Async tools — running long tools in the background so the agent keeps responding; AsyncToolset progress and reply timing. Accessed 2026-06-02. https://docs.livekit.io/agents/logic/tools/async/
  4. LiveKit Docs — Vision: Video — sampling frames in an STT-LLM-TTS pipeline via on_user_turn_completed + ImageContent; live video input with video_input=True (Gemini Live / OpenAI Realtime); default 1 fps speaking / 1 per 3 s otherwise; resize to 1024×1024 JPEG; only the single most recently published video track is used; video is passive for turn detection. Accessed 2026-06-02. https://docs.livekit.io/agents/multimodality/vision/video/
  5. LiveKit Docs — Screen sharing — a shared screen is published as a video track like a camera (setScreenShareEnabled(true)); browser-tab audio sharing and the screen_share_audio source. Accessed 2026-06-02. https://docs.livekit.io/transport/media/screenshare/
  6. LiveKit Docs — Remote method calls (RPC)perform_rpc(destination_identity, method, payload, response_timeout); register_rpc_method; payload ≤ 15 KiB and method name ≤ 64 bytes (UTF-8); default 10 s timeout; hidden participants cannot call RPC; error codes incl. 1502 response timeout and 1503 recipient disconnected. The mechanism behind the private side-channel. Accessed 2026-06-02. https://docs.livekit.io/transport/data/rpc/
  7. LiveKit Docs — External data and RAG — load static data in prewarm; pass per-call data via job metadata / participant attributes; make startup network calls before ctx.connect(); on_user_turn_completed RAG injection avoids tool round-trips; tool calls for precise external actions; frontend RPC for cancellable long operations. Accessed 2026-06-02. https://docs.livekit.io/agents/logic/external-data/
  8. W3C Recommendation, "WebRTC: Real-Time Communication in Browsers" (13 March 2025) — the finished core WebRTC platform standard underlying every LiveKit audio, video, and screen-share track. https://www.w3.org/TR/webrtc/
  9. IETF RFC 8831 (January 2021), "WebRTC Data Channels" — the SCTP-over-DTLS data-channel transport that carries LiveKit's RPC and data messages — the private coaching channel rides this. https://www.rfc-editor.org/rfc/rfc8831
  10. IETF RFC 8832 (January 2021), "WebRTC Data Channel Establishment Protocol" (DCEP) — how a data channel used for RPC payloads is opened and negotiated between peers. https://www.rfc-editor.org/rfc/rfc8832
  11. IETF RFC 8825 (January 2021), "Overview: Real-Time Protocols for Browser-Based Applications" — the applicability statement defining the WebRTC protocol suite the agent and clients speak. https://www.rfc-editor.org/rfc/rfc8825
  12. IETF RFC 6716 (September 2012), "Definition of the Opus Audio Codec" — the default WebRTC audio codec on each participant's microphone track the coach transcribes. https://www.rfc-editor.org/rfc/rfc6716
  13. LiveKit — Pricing — Build $0, Ship $50/mo, Scale $500/mo; WebRTC participant minutes, agent session minutes, and downstream transfer metered separately; model (LLM/STT/vision) costs billed on top. Point-in-time, accessed 2026-06-02. https://livekit.io/pricing
  14. EU AI Act (Regulation (EU) 2024/1689), Article 50 — Transparency obligations — duty to inform people when they interact with or are processed by certain AI systems; basis for in-call AI disclosure. https://eur-lex.europa.eu/eli/reg/2024/1689/oj
  15. The Quantum Leap, "AI-Driven Call Coaching in 2026: Capabilities, Use Cases, and Trends" — competitor/market reference (tier 7): real-time prompt cards (objection responses, battlecards, missed discovery questions), 200–400 ms suggestion lag on commodity hardware, and the 2026 shift to "agentic" coaching that drafts follow-ups and updates the CRM. Accessed 2026-06-02. https://www.thequantumleap.business/blog/ai-driven-call-coaching-2026-capabilities-use-cases-trends
  16. Gong — Sales coaching software — competitor reference (tier 7): real-time assistance surfacing battlecards, objection responses, and talk tracks when topics or competitor names are detected. Accessed 2026-06-02. https://www.gong.io/sales-coaching-software