WebRTC Explained Without Arcana

Why This Matters

If you are building a product where two or more people see and hear each other in real time — a video meeting, a doctor's consultation, a live auction with viewer voice, a fitness class with the coach watching form — WebRTC is almost certainly the right transport, and you need to understand it well enough to scope it, budget it, and talk to the engineers who will ship it. The technology has been a W3C Recommendation since January 2021, the IETF published the protocol suite the same month as nine RFCs (RFC 8825 through 8835), and the original JavaScript API has been refined by a second RFC family that landed between 2022 and 2024. By 2026 every modern browser ships WebRTC by default and the open-source server side — mediasoup, Janus, LiveKit, Jitsi Videobridge, Pion — is mature enough that a competent team can launch a conferencing product in weeks rather than years. The catch is that the simple "two browsers talking to each other" picture is wrong as soon as you have more than three participants or a public-internet network, and the cost of misunderstanding what WebRTC is and is not is paid in viewer rebuffers, telemedicine calls that fail to connect, and engineering rewrites. The point of this article is to make the picture correct without making it intimidating.

What WebRTC Actually Is

The shortest accurate definition: WebRTC is a collection of standards that turn a web browser into a media endpoint capable of sending and receiving real-time audio, video, and data over the public internet, with built-in encryption, network-traversal, and bandwidth-adaptation. The standards live in two places — the World Wide Web Consortium, which owns the JavaScript API the front-end developer writes against, and the Internet Engineering Task Force, which owns the on-the-wire protocols the bytes obey once they leave the browser. The W3C document is WebRTC: Real-Time Communication in Browsers, formally a Recommendation since 26 January 2021 and revised in March 2023 to include candidate amendments that the implementers had shipped in production (W3C, WebRTC: Real-Time Communication in Browsers, 2023). The IETF documents are nine RFCs published as a single batch in January 2021, then extended by JSEP's revised edition as RFC 9429 in April 2023, which obsoleted the earlier RFC 8829 (IETF, RFC 9429, JavaScript Session Establishment Protocol, 2023).

A practical paraphrase, before we get into the stack: WebRTC is the bundle of things you need so that a getUserMedia() call in a web page can produce a MediaStream object, that stream can be handed to an RTCPeerConnection object, and the other side of the connection — another browser, a mobile app, or a server — receives the frames within a few hundred milliseconds and renders them as if the two endpoints were patched together by a direct cable. Everything WebRTC standardises exists to make that paraphrase true in the presence of network address translators, firewalls, packet loss, jitter, asymmetric bandwidth, and the constraint that the bytes must never travel in clear text.

A high-level diagram showing the WebRTC stack: the JavaScript API on top, the W3C RTCPeerConnection layer below, then the IETF protocol stack — JSEP signalling, ICE for network traversal, DTLS-SRTP for encryption, RTP/RTCP for media, SCTP-over-DTLS for data channels — and at the bottom the UDP transport over IP

Figure 1. The WebRTC stack at a glance. The JavaScript API and the RTCPeerConnection object sit on top; the IETF protocols carry the bytes once they leave the browser.

Where WebRTC Came From, in One Paragraph

WebRTC began at Google in 2010 as the acquisition of two companies — On2 Technologies, which contributed the VP8 video codec, and Global IP Solutions, which contributed the audio engine and the network-resilient real-time stack that had powered earlier desktop calling products. Google open-sourced the code in May 2011 and proposed it as an internet standard the same year. The IETF created the rtcweb working group; the W3C created the WebRTC Working Group; the two bodies coordinated for ten years; the standards crossed the finish line together in January 2021 (IETF Blog, WebRTC Standardized, 26 January 2021). The decade-long gestation matters because it explains the design: WebRTC was built for the open web from day one, which is why the security model is mandatory rather than optional, why every browser implements it the same way, and why no vendor owns it. It is the only browser-native media-transport standard that survived the plugin-free transition that killed Flash, Silverlight, and Java applets.

The Five Things WebRTC Does

Strip WebRTC to the work it actually performs and there are five jobs. The rest of the article elaborates each one; this section names them so the reader has a map.

The first job is media capture — getting audio and video out of a microphone and a camera into a JavaScript object the page can manipulate. The browser API for this is navigator.mediaDevices.getUserMedia(), defined by a sibling W3C specification called Media Capture and Streams (W3C, Media Capture and Streams, Recommendation since April 2025). The output is a MediaStream object that holds one or more MediaStreamTrack objects, each representing a single audio or video source.

The second job is session negotiation — the two endpoints have to agree on which codecs they will use, which audio sampling rates, which video resolutions, whether they will exchange video in addition to audio, and how their network addresses will be wired together. The agreement is expressed as a pair of Session Description Protocol documents, called an offer and an answer, and the rules for producing and applying them are the JavaScript Session Establishment Protocol, RFC 9429 (IETF, RFC 9429, JSEP, April 2023). The browser will create the SDP for you; the application code never has to write SDP by hand. What the application code does have to do is move those SDP blobs from one peer to the other. That movement is signalling, and WebRTC deliberately does not standardise it.

The third job is network traversal — finding a path through whatever network address translators, firewalls, and corporate proxies sit between the two endpoints. The standard for this is Interactive Connectivity Establishment, RFC 8445, plus the helpers STUN (RFC 8489) and TURN (RFC 8656). ICE gathers every plausible address each endpoint might be reachable at — local IP, public IP discovered through a STUN server, and a relayed IP through a TURN server — then probes every combination of the two sides' addresses in parallel until it finds a pair that works (IETF, RFC 8445, Interactive Connectivity Establishment, July 2018).

The fourth job is encryption. WebRTC mandates that every byte on the wire is encrypted; there is no clear-text mode. The handshake uses Datagram Transport Layer Security, RFC 9147, the UDP-friendly cousin of TLS, to negotiate the symmetric keys, then the media itself rides inside Secure Real-Time Transport Protocol, RFC 3711, with the keys exchanged through the DTLS-SRTP extension, RFC 5764. The fingerprints that prove the DTLS certificates belong to the right endpoints are carried inside the SDP — which is how the signalling channel becomes the trust anchor for the media channel (IETF, RFC 5764, DTLS Extension to Establish Keys for SRTP, May 2010).

The fifth job is the media transport itself — once the encrypted UDP channel is open, the audio and video frames ride on top of Real-Time Transport Protocol, RFC 3550, and the receiver tells the sender how the network is performing through RTP Control Protocol, RFC 3550 §6. The RTP/RTCP loop is what powers WebRTC's bandwidth adaptation, packet-loss recovery, and jitter buffering. The data-channel side, when present, uses Stream Control Transmission Protocol, RFC 4960, encapsulated inside DTLS — a deliberate choice that gives the application reliable, ordered byte streams without paying the head-of-line-blocking penalty of TCP.

That is everything WebRTC does, written without acronyms wherever possible. The rest of this article walks each job in more depth.

The JavaScript Surface a Developer Actually Touches

The W3C specification's central object is RTCPeerConnection. A working example reduces to a handful of lines and is worth reading even if you do not write JavaScript, because it shows how thin the developer-facing API is relative to the protocol stack underneath:

// 1. Get the local microphone and camera.
const stream = await navigator.mediaDevices.getUserMedia({ audio: true, video: true });

// 2. Create the peer connection with a STUN server for NAT discovery.
const pc = new RTCPeerConnection({
  iceServers: [{ urls: "stun:stun.l.google.com:19302" }]
});

// 3. Attach each local track to the peer connection.
stream.getTracks().forEach(track => pc.addTrack(track, stream));

// 4. When a remote track arrives, render it.
pc.ontrack = (event) => {
  document.getElementById("remoteVideo").srcObject = event.streams[0];
};

// 5. Create an SDP offer and send it through the application's signalling channel.
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
mySignallingChannel.send({ type: "offer", sdp: offer.sdp });

The reader who has never seen the WebRTC API often asks where the encryption call is, where the bandwidth-adaptation call is, where the NAT-traversal call is. The honest answer is that none of those calls exist as separate APIs; they are inside the RTCPeerConnection constructor and the negotiation that runs when setLocalDescription and setRemoteDescription are invoked. The application is responsible for two things only — capturing the media, and moving the SDP offer and answer between the two peers. Everything else is the browser's job (Mozilla MDN, RTCPeerConnection, 2026). That is by design, and it is why WebRTC took ten years to standardise.

The SDP Offer/Answer Dance

The negotiation between two peers always follows the same shape: the offerer creates a description of what it wants to send and receive, the answerer replies with a description of what it agrees to, and both sides apply both descriptions to their local state machine. The rules are RFC 9429 (JSEP). The two documents are blobs of Session Description Protocol, defined originally for telephony in RFC 8866 (IETF, RFC 8866, Session Description Protocol, January 2021).

An SDP offer for a single audio-and-video call runs to about 40 lines of text. The fields that matter for understanding what WebRTC does are these: the m= lines declare a media section, one per audio or video track plus one for the data channel; the a=rtpmap: lines list every codec the offerer will accept, ordered by preference; the a=fingerprint: line carries the SHA-256 fingerprint of the DTLS certificate the offerer will present during the encryption handshake; the a=ice-ufrag and a=ice-pwd lines carry the credentials ICE will use to authenticate connectivity-check packets; and the a=candidate: lines list every IP-and-port pair the offerer has gathered as a potential local address. The answerer reads all of that, picks one codec per media section, picks the IP-and-port pairs it will use, signs its own DTLS certificate fingerprint into its reply, and sends the SDP back.

The arithmetic of the negotiation is worth showing once, because it is the source of one of WebRTC's enduring surprises — the SDP is large. A typical offer with audio (Opus), video (VP8, VP9, H.264, AV1), data channel, and the standard set of WebRTC-mandated header extensions runs roughly:

1 session-level block        +  6 lines
+ 3 media sections           × 12 lines (m=, c=, a=rtcp, a=ice-ufrag, a=ice-pwd, a=fingerprint, a=setup, a=mid, a=sendrecv, a=rtcp-mux, a=rtpmap × N, a=fmtp × N)
+ 4 codecs per video section ×  3 lines (rtpmap, rtcp-fb, fmtp)
+ 12 ICE candidates          × 12 lines (one per a=candidate)
= roughly 240 lines, 8–10 KB of text

Eight kilobytes is small over a 100 Mbps connection, but it is not free — and it is the payload your signalling channel has to relay before the call even begins. Productions that use trickle ICE (the more common pattern in 2026) send the initial SDP smaller — typically 4 KB — and dribble the candidate lines in afterward as the browser gathers them.

Signalling: The Part WebRTC Did Not Standardise

WebRTC has no signalling protocol. The standard is explicit on this point — the JSEP authors deliberately left signalling out so that the application could pick whatever transport made sense for its existing infrastructure (RFC 9429, §1.1, April 2023). In practice the signalling channel is almost always a WebSocket between each peer and a small server the application controls; the server is responsible for routing offers to the right answerer, answers back to the right offerer, and trickled ICE candidates in both directions. A signalling server doing nothing else can be written in a few hundred lines of code in any language.

The non-obvious consequence is that every team building a WebRTC product writes a signalling server. Some are minimal and survive the lifetime of the product unchanged; some grow features for room membership, presence, recording control, screen-sharing coordination, and chat, at which point the signalling server is the application's nervous system. Open-source frameworks like LiveKit and mediasoup ship reference signalling implementations; Janus and Jitsi Videobridge ship their own; the closed-source vendors hide it inside their SDK. There is no canonical answer because there is no canonical specification.

Where Fora Soft Fits In

Fora Soft has shipped WebRTC products since 2011, the year Google open-sourced the codebase. Across more than 80 projects in conferencing, telemedicine, e-learning, live-shopping, and AR-VR collaboration, the team has built signalling servers from scratch and integrated mediasoup, Janus, Jitsi, LiveKit, and Pion as the media-routing back end. The patterns that come up across every engagement — picking an SFU topology, sizing TURN-server bandwidth, designing the WebRTC-to-HLS bridge for recording, hardening the signalling server against room-bombing — are the same patterns covered in this Block 8 of Learn. The article you are reading is the on-ramp; the rest of the block goes deeper into each piece.

ICE, STUN, and TURN: How the Bytes Find Each Other

The hardest job in WebRTC is connecting two endpoints across the public internet, because most endpoints on the public internet do not have a publicly routable IP address. A laptop in a coffee shop sits behind the shop's NAT, which sits behind the ISP's carrier-grade NAT, which sits behind whatever firewall the ISP applies; a phone on cellular data sits behind the carrier's NAT and possibly behind a private-IP overlay too. The endpoint's own operating system knows only its private address — typically a 192.168.x.x or 10.x.x.x — and has no idea what the public-facing address looks like.

Interactive Connectivity Establishment, RFC 8445, solves this by gathering, in parallel, every address the endpoint might possibly be reachable at. There are three address types. The host candidate is the local IP the operating system reports. The server-reflexive candidate is the public IP discovered by sending a STUN binding request to a public STUN server and reading the source-address-from-the-server's-perspective out of the response — this is what reveals the NAT-mapped public address (IETF, RFC 8489, Session Traversal Utilities for NAT (STUN), February 2020). The relayed candidate is the public IP of a TURN server the endpoint has authenticated to and asked to forward packets on its behalf — this is the fallback when no direct path exists (IETF, RFC 8656, Traversal Using Relays around NAT (TURN), February 2020).

Once both peers have gathered candidates and exchanged them via the signalling channel, ICE forms every possible pair (peer-A-candidate, peer-B-candidate) and sends STUN connectivity-check packets across each pair until at least one succeeds in both directions. The first pair that succeeds wins. In production the win typically comes from a server-reflexive-to-server-reflexive pair (both peers behind permissive NATs that allow inbound packets after an outbound one has been sent) or from a relayed-to-relayed pair (both peers behind restrictive NATs where the only working path is through a TURN server in the middle). The percentage of calls that need TURN varies by network mix — public-internet conferencing platforms typically see 8–15% of sessions relayed; enterprise-conferencing platforms with strict firewalls see 25–40% (Cloudflare, State of WebRTC report, 2025).

A topology diagram showing peer A on the left and peer B on the right, each behind its own NAT. Three candidate paths are drawn between them: host-to-host (blocked by the NATs), server-reflexive-to-server-reflexive going through both NATs directly, and relayed-to-relayed going through a TURN server placed centrally on the public internet. STUN servers sit above each peer providing reflexive-address discovery

Figure 2. The three candidate paths ICE considers. STUN tells each peer its public address; TURN relays the media when no direct path works.

The cost economics of TURN is the part product teams underestimate. A TURN server is a pure bandwidth relay — every byte of media traverses it twice (once inbound from the sender, once outbound to the receiver) — and bandwidth is the line item that bills. A single 720p video call at 1.5 Mbps in each direction relays 3 Mbps per peer, 6 Mbps for the call; multiplied by the percentage of sessions that need TURN, multiplied by your peak concurrent calls, that becomes the dominant cost in a public-internet conferencing product. The B5 calculator at the bottom of this page lets you model your own number.

DTLS-SRTP: Why WebRTC Has No Clear-Text Mode

WebRTC mandates encryption on every connection, every time. The W3C specification and the IETF security architecture (RFC 8827) both require it; no browser ships an opt-out. The handshake works like this: each peer generates a self-signed DTLS certificate, computes its SHA-256 fingerprint, and writes that fingerprint into its SDP. The SDP travels over the signalling channel — which must be TLS-protected if the application cares about MitM resistance — and the receiving peer therefore knows in advance which fingerprint the other side's DTLS certificate must match. When the DTLS handshake runs over the ICE-discovered UDP path, each side verifies the other's certificate against the fingerprint it received in the SDP; a mismatch terminates the connection (IETF, RFC 8827, WebRTC Security Architecture, January 2021).

Once DTLS has produced the master secret, the keys for SRTP are derived through the DTLS-SRTP extension (RFC 5764). From that point onward every RTP packet carrying audio or video is encrypted with AES (typically AES-128-GCM in modern deployments) and authenticated. A network observer who can read the bytes on the wire sees only the SRTP headers and a stream of ciphertext.

The non-obvious consequence: WebRTC is the only widely deployed real-time media protocol that has been encrypted by default since its inception. Legacy SIP/RTP telephony lets the operator choose; WebRTC does not. This makes it the rational choice for telemedicine, financial-services voice, and any product where the regulatory question is "did anyone on the network path see the patient's voice in clear text?" and the answer must be no.

RTP, RTCP, and How WebRTC Reads the Network

Once the encrypted UDP channel is open, the audio and video themselves ride inside Real-Time Transport Protocol, RFC 3550. RTP has been the workhorse of internet voice and video for thirty years; what WebRTC adds is the receiver feedback loop that closes the control system. The receiver continuously sends RTCP receiver reports — packet-loss percentage, jitter (the variance in inter-packet arrival times), and round-trip delay — back to the sender. The sender's bandwidth-estimation algorithm reads those numbers and adjusts the encoder's bitrate, the simulcast layer it transmits, or the SVC layer it activates, in response.

Google's reference algorithm, the one Chrome and most Chromium-based clients ship, is called Google Congestion Control (GCC). It maintains a Kalman-filter estimate of the available bandwidth from the RTCP feedback and dynamically reduces the encoder's bitrate when the estimate drops, increasing it again when the network recovers (Carlucci et al., Congestion Control for Web Real-Time Communication, IEEE/ACM Transactions on Networking, 2017). The newer transport-cc extension lets the receiver report per-packet arrival timestamps, giving the sender a richer signal than the standard RTCP receiver report does; transport-cc is now the default in all major implementations (W3C WebRTC spec, §5.6, 2023).

The same feedback loop drives WebRTC's packet-loss recovery. The receiver can request a NACK (negative acknowledgment) for a specific lost packet, and the sender will retransmit it from a small history buffer if there is time before the receiver's jitter buffer would have played it out. For losses too late to recover, the receiver can request a picture-loss indication (PLI) or a full-intra-request (FIR), prompting the sender to emit a keyframe so the decoder can resynchronize. Forward error correction is supported but rarely the default — the round-trip-time of typical conferencing is short enough that retransmission usually wins on bandwidth efficiency.

A Worked Latency Budget: Why WebRTC Hits 100–300 ms

Latency in WebRTC is measured glass-to-glass — the wall-clock time from the moment a photon hits the camera sensor on one side to the moment the corresponding pixel reaches the screen on the other side. The published latency floor for WebRTC over a clean network is roughly 100 ms; the upper bound for an acceptable interactive conversation is around 300 ms; above 500 ms human turn-taking starts to break down. The numbers come from decades of telephony research and apply to WebRTC unchanged.

Where do those 100–300 ms go? A worked budget for a clean network looks like this:

Contributor	Typical (ms)	Notes
Camera capture and encoder delay	20–40	One to two frame intervals at 30 fps; longer if the encoder uses lookahead
Local jitter and packetization buffer	10–20	The OS-level network stack buffers a small amount before sending
Network transit, one-way	20–80	Across a continent on a tier-1 backbone; 20 ms US-coast-to-coast, 80 ms transatlantic
Receiver jitter buffer	30–80	Sized to absorb the 95th-percentile inter-packet variance; larger on lossy networks
Decoder and render pipeline	10–30	One frame at 30 fps is 33 ms; the decoder may add another frame
Total	90–250	Within the published 100–300 ms WebRTC band

Compare this to HLS (HTTP Live Streaming), which buffers in 4–10 second segments before play and runs at 5–30 seconds glass-to-glass in default configurations; or LL-HLS, which gets to 2–5 seconds; or Media over QUIC, the 2026 newcomer aiming for 500 ms on a CDN scale. The two-orders-of-magnitude gap between WebRTC and HLS is the single most important number a product team has to internalise — if your use case needs latency under a second, WebRTC is the right transport; if it does not, an HTTP-based protocol is usually cheaper to operate. A horizontal stacked bar showing the latency contributors in a WebRTC session, segmented into capture-and-encode, send-side buffer, network transit, receive-side jitter buffer, and decode-and-render bands; total annotated at 90 to 250 ms with the 100 to 300 ms target band marked underneath

A horizontal stacked bar showing the latency contributors in a WebRTC session, segmented into capture-and-encode, send-side buffer, network transit, receive-side jitter buffer, and decode-and-render bands; total annotated at 90 to 250 ms with the 100 to 300 ms target band marked underneath

Figure 3. The WebRTC latency budget, glass-to-glass. The receiver-side jitter buffer and network transit dominate.

The Two-Browser Picture Breaks at Three Browsers

WebRTC is described as peer-to-peer because the API gives a single RTCPeerConnection object that talks to one other endpoint. The trouble is that "one other endpoint" is the limit. If you want three people in a call, each peer has to maintain a peer connection to each of the others — three peers means three peer connections per browser, four peers means six per browser, ten peers means nine per browser. The bandwidth scales the same way: each peer is uploading their video N times, once per other peer, which the upstream of a residential connection cannot sustain past four or five participants. This topology — every peer connected to every other peer — is called a mesh, and it is the default topology when you write a naive WebRTC application.

Production conferencing solves this by introducing a server in the middle. The two patterns are:

A Selective Forwarding Unit (SFU) terminates a WebRTC connection from each peer, decrypts only enough to read the RTP headers, and forwards the encrypted payload to every other peer. The SFU does not transcode; it does not mix; it routes. Bandwidth per peer falls from O(N) to O(1) on the upload side because each peer sends their video once to the SFU. mediasoup, Janus, LiveKit, Jitsi Videobridge, and Pion are all SFUs (LiveKit, SFU vs MCU vs Mesh, 2024).

A Multipoint Conferencing Unit (MCU) terminates every connection, decodes every stream, composites them into a single output video, and re-encodes that composite for each receiver. The peer's upload is one stream, the peer's download is one stream, but the server's CPU bill is enormous because it is doing real-time video encoding on behalf of every participant.

In 2026 the SFU has won every category where the participants' devices can handle multiple incoming video streams (web, mobile, smart TV), because the lower CPU cost and lower latency of routing-instead-of-mixing outweigh the slight client-side complexity. MCUs survive only where the receiving device cannot handle multiple streams (some legacy telephony bridges, some constrained embedded clients). The detailed comparison is in article 8.4 (SFU vs MCU vs Mesh); the comparison of the five major open-source SFUs is in article 8.5.

A Common Pitfall: Treating WebRTC Like an HTTP Protocol

The single most common mistake teams make when first shipping WebRTC is reasoning about it as if it were HTTP. They assume that a CDN will help with scale (it will not — a CDN caches; WebRTC streams are unique per session); they assume a load balancer in front of the signalling server is enough (it is not — the WebRTC media plane runs on a different transport entirely and does not pass through the signalling server); they assume the media will work over their corporate VPN (it usually will not, because the VPN's NAT is not WebRTC-aware). The honest framing for a product manager is: WebRTC is a peer-to-peer media protocol that uses HTTP only for signalling and only as the application chooses. The scaling, the costs, and the failure modes are completely different from anything in the HTTP world. Plan the architecture accordingly — the rest of Block 8 walks you through how.

The 2026 State of WebRTC

WebRTC has been "stable" as a standard since January 2021, but the ecosystem around it has continued to evolve. Three changes matter for any team building today.

First, the codec story has improved. AV1 with scalable video coding (AV1 SVC) shipped in Chrome 90, Firefox went live with hardware-accelerated decode in 2024, and Safari 18 added baseline support in late 2024. Where the network can carry it, AV1 SVC delivers the same perceived quality as VP9 at roughly 30% lower bitrate, and the SVC layering means the SFU can drop layers to slow receivers without re-encoding (Bitmovin, AV1 in WebRTC, 2024). H.264 remains the universal fallback; Opus remains the universal audio codec.

Second, WebTransport reached baseline status across all major browsers in March 2026 when Safari 26.4 shipped it (webrtc.ventures, WebTransport Is Now Baseline, April 2026). WebTransport is not a replacement for WebRTC for two-way conversation, but it complements it for one-way data and is the foundation under several of the WHIP-over-WebTransport experiments. It is the standard you will hear about more in 2026–2027.

Third, the AI-agent integration story has matured. LiveKit's Agents framework launched in 2024 and reached production scale in 2025, letting developers connect an OpenAI Realtime API model, a Google Gemini Live model, or any equivalent voice model into a WebRTC room as if the agent were another participant. By Q1 2026, LiveKit Agents had hit 700 monthly searches as a head term in the cluster; the category did not exist in 2024. This is the largest emerging use case and the one article 8.10 dives into. If your product roadmap involves any kind of voice-first AI agent, WebRTC is now the assumed transport.

What Is Not in WebRTC

It is worth naming what WebRTC deliberately does not do, because the gaps drive the architectural decisions teams have to make:

It does not provide signalling. You build that.
It does not provide an SFU. You buy one, run an open-source one, or write one.
It does not provide recording. You bridge to an HLS or MP4 recorder, usually by adding a recording bot as another participant in the room.
It does not provide a TURN server. You run one (coturn is the open-source default) or buy capacity from Twilio, Cloudflare, or your CDN.
It does not provide a room or presence model. The signalling server handles it.
It does not provide rate-limiting or anti-abuse controls. The signalling server, again.
It does not provide a way to broadcast one stream to a million viewers. WebRTC scales to thousands in an SFU cascade; for millions you bridge to HLS, DASH, or Media over QUIC.

The list looks intimidating; the reality is that the open-source ecosystem covers every gap, and the patterns are well documented. The point of naming them is to set expectations: the moment your scope grows beyond "two people talking", the work to do is the gap-filling, not the WebRTC itself.

Call to action

Talk to a streaming engineer — book a 30-minute scoping call to talk through your webrtc plan.
See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
Download the WebRTC stack cheat sheet — One-page reference to the WebRTC API, the nine IETF RFCs, the open-source SFUs, and the cost lines.

References

IETF, RFC 8825, Overview: Real-Time Protocols for Browser-Based Applications, January 2021. https://datatracker.ietf.org/doc/html/rfc8825 — the umbrella RFC that introduces the WebRTC protocol suite.
IETF, RFC 8826, Security Considerations for WebRTC, January 2021. https://datatracker.ietf.org/doc/html/rfc8826 — the threat model.
IETF, RFC 8827, WebRTC Security Architecture, January 2021. https://datatracker.ietf.org/doc/html/rfc8827 — the mandate that every WebRTC connection is encrypted.
IETF, RFC 8828, WebRTC IP Address Handling Requirements, January 2021. https://datatracker.ietf.org/doc/html/rfc8828 — privacy rules for how the browser may reveal IP addresses.
IETF, RFC 9429, JavaScript Session Establishment Protocol (JSEP), April 2023. https://datatracker.ietf.org/doc/rfc9429/ — the offer/answer rules; obsoletes RFC 8829.
IETF, RFC 8830, WebRTC MediaStream Identification in the SDP, January 2021. https://datatracker.ietf.org/doc/html/rfc8830 — how SDP refers to tracks and streams.
IETF, RFC 8831, WebRTC Data Channels, January 2021. https://datatracker.ietf.org/doc/html/rfc8831 — the arbitrary-data side of WebRTC, SCTP-over-DTLS.
IETF, RFC 8834, Media Transport and Use of RTP in WebRTC, January 2021. https://datatracker.ietf.org/doc/html/rfc8834 — the RTP profile WebRTC uses.
IETF, RFC 8445, Interactive Connectivity Establishment (ICE), July 2018. https://datatracker.ietf.org/doc/html/rfc8445 — the candidate-gathering and connectivity-check algorithm.
IETF, RFC 8489, Session Traversal Utilities for NAT (STUN), February 2020. https://datatracker.ietf.org/doc/html/rfc8489 — how a peer learns its public address.
IETF, RFC 8656, Traversal Using Relays around NAT (TURN), February 2020. https://datatracker.ietf.org/doc/html/rfc8656 — the relay-of-last-resort.
IETF, RFC 5764, DTLS Extension to Establish Keys for SRTP, May 2010. https://datatracker.ietf.org/doc/html/rfc5764 — how WebRTC bootstraps SRTP keys from a DTLS handshake.
W3C, WebRTC: Real-Time Communication in Browsers, Recommendation, 6 March 2023. https://www.w3.org/TR/webrtc/ — the JavaScript API and the formal specification.
W3C, Media Capture and Streams, Recommendation, April 2025. https://www.w3.org/TR/mediacapture-streams/ — getUserMedia and MediaStream.
Mozilla MDN, RTCPeerConnection, accessed 2026-05-25. https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection — the developer-facing reference.
IETF Blog, WebRTC standardized: a major milestone, 26 January 2021. https://www.ietf.org/blog/webrtc-standardized/ — the standards announcement.
LiveKit, A tale of two protocols: comparing WebRTC against HLS for live streaming, 2024. https://blog.livekit.io/webrtc-vs-hls-livestreaming/ — production latency numbers from a major SFU operator.
webrtc.ventures, WebTransport Is Now Baseline. Here's What That Means for Real-Time Media, April 2026. https://webrtc.ventures/2026/04/webtransport-is-now-baseline-what-it-means-for-real-time-media/ — the Safari 26.4 milestone.

WebRTC Explained Without Arcana

Why This Matters

What WebRTC Actually Is

Where WebRTC Came From, in One Paragraph

The Five Things WebRTC Does

The JavaScript Surface a Developer Actually Touches

The SDP Offer/Answer Dance

Signalling: The Part WebRTC Did Not Standardise

Where Fora Soft Fits In

ICE, STUN, and TURN: How the Bytes Find Each Other

DTLS-SRTP: Why WebRTC Has No Clear-Text Mode

RTP, RTCP, and How WebRTC Reads the Network

A Worked Latency Budget: Why WebRTC Hits 100–300 ms

The Two-Browser Picture Breaks at Three Browsers

A Common Pitfall: Treating WebRTC Like an HTTP Protocol

The 2026 State of WebRTC

What Is Not in WebRTC

What to Read Next

Call to action

References

Related glossary terms

WebRTC Explained Without Arcana

Why This Matters

What WebRTC Actually Is

Where WebRTC Came From, in One Paragraph

The Five Things WebRTC Does

The JavaScript Surface a Developer Actually Touches

The SDP Offer/Answer Dance

Signalling: The Part WebRTC Did Not Standardise

Where Fora Soft Fits In

ICE, STUN, and TURN: How the Bytes Find Each Other

DTLS-SRTP: Why WebRTC Has No Clear-Text Mode

RTP, RTCP, and How WebRTC Reads the Network

A Worked Latency Budget: Why WebRTC Hits 100–300 ms

The Two-Browser Picture Breaks at Three Browsers

A Common Pitfall: Treating WebRTC Like an HTTP Protocol

The 2026 State of WebRTC

What Is Not in WebRTC

What to Read Next

Call to action

References

Related glossary terms

Simulcast

WebTransport

mediasoup

Congestion control

Jitsi Videobridge

LiveKit

Packet loss

Live streaming