Why This Matters
Every modern video conferencing product, telemedicine consultation, browser-based surveillance viewer, voice-AI agent, and live-shopping interaction depends on ICE working correctly. The user clicks "join", the browser opens a camera, the signalling exchanges an offer and an answer — and then nothing happens. The screen says "connecting…" forever. Almost every such failure traces back to a misconfigured ICE server list, a missing TURN candidate, or an iceconnectionstate transition the application code never handled. The cost of getting this wrong shows up two ways: customers on corporate Wi-Fi, university networks, and mobile carriers with carrier-grade NAT abandon the product because it does not connect, and the bandwidth bill for the TURN relays that do connect quietly hits six figures a year on a hyperscaler's egress meter. This article is for the product manager scoping a WebRTC build, the founder picking between Twilio and a self-hosted coturn cluster, the engineer reading RFC 8445 for the first time, and the architect who needs to know exactly how many TURN regions a global product needs in 2026. The transport-layer background — what NAT is, what STUN does at the byte level, the four classical NAT flavours — lives in our companion article on NAT, STUN, TURN, ICE for streaming; this article goes one level deeper on the WebRTC-specific signalling, the JavaScript API surface, and the production deployment patterns.
Where this article fits
The full picture has three layers and you need all three. The first is the network-theory layer — what NAT is, the four classical NAT flavours, the STUN binding request, the TURN allocate request. That is the companion article on NAT, STUN, TURN, ICE for streaming. The second is the signalling layer — how SDP offers and answers carry ICE candidates and credentials. That is the SDP offer/answer deep dive. The third — the one this article owns — is the WebRTC-specific behaviour: how RTCPeerConnection actually drives ICE, how the JavaScript API exposes the state machine, how Trickle ICE is implemented in practice, how ICE restart works under a network change, and the production configurations that ship in real 2026 products.
If you have read neither companion article, this article still stands alone. It defines every term on first use. The companion articles let you go deeper on the transport layer or the SDP layer if you need to.
What ICE actually is — beyond the textbook definition
The textbook definition is straight from the spec: "Interactive Connectivity Establishment is a technique for NAT traversal for UDP-based media streams established by the offer/answer model" — RFC 8445, July 2018. That sentence is correct but it leaves out the half of the story that matters in practice.
A more useful definition: ICE is a state machine plus a candidate-gathering pipeline plus a connectivity-check protocol, all running concurrently inside the WebRTC stack, that takes the question "can these two peers reach each other?" and answers it within a few hundred milliseconds even though the underlying network is unfit for the purpose. Each piece does one job, and the cleverness is in how they overlap in time.
The state machine — exposed in the browser as the iceconnectionstate property of an RTCPeerConnection — moves through new, checking, connected, completed, disconnected, failed, and closed. The candidate-gathering pipeline runs in parallel inside the browser: for each network interface, the stack creates a host candidate, then sends a STUN binding request to learn a server-reflexive candidate, then sends a TURN allocate request to obtain a relay candidate. Each candidate, as it is gathered, fires an icecandidate event into your JavaScript, which forwards it through your signalling channel to the other peer. The connectivity-check protocol — STUN binding requests sent peer-to-peer rather than client-to-server — runs as soon as the first pair of candidates is available on both sides, racing every possible route in parallel until one of them works.
The key insight is that ICE was designed around three constraints. First, no single strategy works for every network — corporate firewalls block UDP, mobile carriers use symmetric CGNAT, consumer routers have unpredictable behaviour, IoT devices have no public address at all — so the algorithm must try every plausible path. Second, connection latency matters more than any single optimisation — a call that connects in 1.5 seconds beats a call that connects in 3 seconds even if the slower one finds a better route — so the algorithm must overlap gathering, signalling, and checking. Third, network conditions change mid-call — a phone moves from Wi-Fi to cellular, a laptop switches VPNs, a hotel network drops UDP — so the algorithm must support seamless re-negotiation.
Once you internalise those three constraints, every detail in RFC 8445 stops feeling like an arbitrary design choice and starts feeling like the obvious answer to a hard problem.
iceConnectionState state machine inside RTCPeerConnection. Each transition is driven by a specific event — STUN response, consent freshness timeout, ICE restart — and each state has a recommended application reaction.
The four candidate types — what each one really is
ICE collects four kinds of candidate. The taxonomy is older than WebRTC — it goes back to RFC 5245 in 2010 — but every WebRTC implementation in 2026 uses it unchanged.
A host candidate is the device's real network address on one of its interfaces. A laptop with Wi-Fi and Ethernet plugged in produces two host candidates. A phone with Wi-Fi and 5G produces two. The address looks like 192.168.1.42 (an IPv4 private address) or fe80::1a:b3 (an IPv6 link-local). Host candidates are gathered instantly — they require no network round trip — and they are tried first because if the host-to-host pair works, you get the cheapest possible path: zero hops over your local network.
A server-reflexive candidate, abbreviated srflx, is the public address a STUN server saw when the device's binding request arrived. The STUN exchange is a single UDP round trip — the device sends a 20-byte binding request to a STUN server, the server copies the source address-and-port it observed into a XOR-MAPPED-ADDRESS attribute, and replies. The address the device learns is the address the outside world sees, after every NAT in between has finished rewriting the packet. Server-reflexive candidates are gathered in 20–200 ms depending on the STUN server's RTT.
A peer-reflexive candidate, abbreviated prflx, is a public address discovered during the ICE connectivity check itself, not during gathering. The mechanism: peer A sends a STUN binding request to peer B over a candidate pair. Peer B's NAT rewrites the request's source address — but the NAT mapping it creates is one neither STUN server saw, because the destination is the peer rather than the STUN server. When peer B's RTCPeerConnection receives the request, it notes the source address-and-port and adds a new peer-reflexive candidate to its checklist. Peer-reflexive candidates are how ICE discovers the mappings created mid-call by symmetric NATs that allocate per-destination ports. You do not configure peer-reflexive candidates; they appear as a side effect of the connectivity check.
A relay candidate, abbreviated relay, is a TURN-allocated transport address on a TURN server. The device sends a TURN allocate request, the server reserves a public address-and-port for the device's exclusive use, and the device adds the reserved address as a relay candidate. Relay candidates are the last-resort fallback because every byte of media has to go through the TURN server in both directions; the latency is the round-trip to the TURN server doubled, and the bandwidth bill is paid by you.
The four candidate types have different priorities in the ICE algorithm. RFC 8445 §5.1.2.2 specifies a type preference — an integer from 0 to 126 — that the implementation recommendation sets as: host 126, peer-reflexive 110, server-reflexive 100, relay 0. The relay's zero is deliberate: it ensures that relay-relay pairs sit at the very bottom of the priority list, used only when nothing else works. Inside each type, candidates are further ordered by local preference (interface index — wired Ethernet usually beats Wi-Fi beats cellular) and by component ID (RTP component 1 beats RTCP component 2 when they are not multiplexed).
The full priority formula in RFC 8445 §5.1.2.1 is:
priority = (2^24) × type_preference
+ (2^8) × local_preference
+ (2^0) × (256 - component_id)
A host candidate on a wired Ethernet with local preference 65535 and component 1 gets priority (2^24 × 126) + (2^8 × 65535) + 255 = 2_130_706_431. A relay candidate on the same interface with the same local preference and component gets priority (2^24 × 0) + (2^8 × 65535) + 255 = 16_776_960. The ratio is roughly 127:1, which is exactly what the algorithm wants — host candidates dominate by a huge margin and relay candidates fight only with each other.
How candidate gathering actually unfolds in time
The textbook description makes candidate gathering sound sequential — first host, then STUN, then TURN. In reality, the browser overlaps everything that can be overlapped, and the timing matters because the user is staring at a "connecting…" spinner.
The moment your JavaScript calls pc.createOffer() and then pc.setLocalDescription(offer), the browser starts gathering. For each network interface the operating system reports, the browser:
- Creates a host candidate immediately (no round trip needed) and fires an
icecandidateevent. - Sends a STUN binding request to every STUN server in the
iceServersconfiguration. Each response, when it arrives, becomes a server-reflexive candidate and fires anothericecandidateevent. - Sends a TURN allocate request to every TURN server in
iceServers. Each successful allocate becomes a relay candidate.
The application code listens for these events:
pc.addEventListener('icecandidate', (event) => {
if (event.candidate) {
// Send this single candidate to the other peer via your signalling channel.
signaling.send({ type: 'ice-candidate', candidate: event.candidate });
} else {
// event.candidate === null marks the end of gathering.
// Most modern code does not depend on this — Trickle ICE handles incremental candidates.
}
});
On the other end, the receiving peer adds each candidate as it arrives:
signaling.on('ice-candidate', async ({ candidate }) => {
try {
await pc.addIceCandidate(candidate);
} catch (err) {
// Common cause: candidate arrived before setRemoteDescription was called.
// Queue the candidate and replay after setRemoteDescription.
}
});
The interleaving in time looks like this. At time 0, both peers create offers and start gathering. By time ≈ 5 ms, host candidates are produced and sent over signalling. By time ≈ 50–200 ms, server-reflexive candidates arrive from STUN. By time ≈ 100–500 ms, relay candidates arrive from TURN. Connectivity checks start the moment the first pair is known — the first host candidate from each side, typically — and the first valid pair is usually a host-host or host-srflx pair completing inside 300 ms. The "connecting…" spinner clears well before TURN allocates have even finished.
This overlap is the engineering point of Trickle ICE, specified in RFC 8838 (January 2021). Without trickling, the browser would wait for every candidate before sending any of them, adding 200–500 ms of unnecessary delay to every connection. With trickling, the first valid pair wins as soon as one exists, and the rest of the candidates are checked in the background in case they offer a better path.
The current state of Trickle ICE: every modern browser (Chrome 122+, Firefox 125+, Safari 17+) trickles by default. Every modern SFU stack (mediasoup, LiveKit, Janus, Pion, Jitsi) trickles by default. The non-trickle code paths exist mostly for SIP gateway interoperability with legacy enterprise PBX systems that pre-date the RFC. If you are writing a 2026 WebRTC product, you can assume Trickle ICE works and write the simpler "send each candidate as it arrives" code path; if you are integrating with a SIP gateway, check the gateway's documentation for a=ice-options:trickle support and have a fallback.
The connectivity check — STUN, this time peer-to-peer
A pair of candidates becomes "valid" when one side sends a STUN binding request to the other side and gets a matching binding response back from the expected address. The STUN message format is the same as the client-to-server case — a 20-byte header with the magic cookie 0x2112A442 and a transaction ID — but the request is sent over the candidate-pair's transport address, not to a STUN server.
The request carries three attributes that matter. USERNAME is constructed by concatenating the remote peer's ICE username fragment (ufrag) with the local peer's ufrag, separated by a colon. The ufrag and password are exchanged in the SDP offer/answer (a=ice-ufrag: and a=ice-pwd:) — see the SDP offer/answer deep dive for the full SDP context. MESSAGE-INTEGRITY is an HMAC-SHA1 over the entire message, signed with the remote peer's password. This is how each side proves it knows the secret shared in the SDP, defeating off-path injection attacks. PRIORITY is the priority of the local candidate sending the check.
The pacing is governed by the Ta timer, default 50 ms per RFC 8445 §14.2. Every 50 ms, the agent picks the highest-priority "frozen" or "waiting" pair from its checklist and sends a binding request. The first response that comes back marks the pair as "succeeded". If the controlling agent's pair completes with a higher priority than the currently selected pair, ICE replaces the selected pair and traffic flows over the better one. The controlling vs controlled designation is decided by the ICE-CONTROLLING and ICE-CONTROLLED STUN attributes; in WebRTC the offerer is controlling and the answerer is controlled, per RFC 9429 §3.5.5.
The check is bidirectional. A pair only becomes the selected pair when both sides have succeeded on it. This four-way handshake matters because either side's NAT could be the asymmetric one that prevented the reverse direction.
When the first selected pair carries media, the iceconnectionstate transitions from checking to connected. When all pairs in the checklist have completed and no further checks are pending, the state transitions to completed. Most applications treat connected and completed identically; they matter different only for the controlling agent that may keep checking for better pairs.
TURN in WebRTC — credentials, lifetime, and the channels you never wanted to know about
The TURN protocol in WebRTC has three operational details that catch every team off guard the first time.
Credentials are short-lived and HMAC-signed. The "long-term credential" mechanism in RFC 8489 — username plus password as static strings — is acceptable in theory but terrible in practice, because a leaked credential lets anyone abuse your TURN cluster as free egress. The WebRTC industry standard, adopted by every major TURN server, is the time-bounded credential pattern: the application server hands the client a username constructed as and a password computed as base64(HMAC-SHA1(static-secret, username)). The TURN server holds the same static-secret, recomputes the HMAC over whatever username the client presents, and accepts the client if the HMAC matches. The username embeds an expiry timestamp; the TURN server rejects requests with timestamps in the past. A leaked credential expires in minutes.
The client code that talks to your credential-issuing endpoint looks like:
async function fetchIceServers() {
const response = await fetch('/api/turn-credentials');
const { iceServers } = await response.json();
return iceServers;
}
const pc = new RTCPeerConnection({
iceServers: await fetchIceServers(),
iceTransportPolicy: 'all', // or 'relay' — see below
bundlePolicy: 'max-bundle',
rtcpMuxPolicy: 'require'
});
The server-side credential mint (Node.js, ten lines):
import crypto from 'crypto';
function mintTurnCredentials(userId, ttlSeconds = 600) {
const username = `${Math.floor(Date.now() / 1000) + ttlSeconds}:${userId}`;
const hmac = crypto.createHmac('sha1', process.env.TURN_STATIC_SECRET);
hmac.update(username);
const password = hmac.digest('base64');
return { username, password, ttl: ttlSeconds };
}
A TTL of 600 seconds is the most common default; some teams use 24 hours for long meetings, but the longer the TTL, the bigger the abuse window if a credential leaks. The credential authorises an allocate; once allocated, the TURN server refreshes the allocation as the client sends refresh requests, so the credential's expiry does not interrupt an in-progress call.
The allocation lifetime is 600 seconds by default and the client must refresh it. When the TURN server replies to an Allocate request, the response includes a LIFETIME attribute, default 600 seconds per RFC 8656 §7.2. The client sends a Refresh request every LIFETIME / 2 seconds to keep the allocation alive. Browsers handle this automatically; you do not write any code for it. The reason to know about it: if your TURN server is overloaded and drops refresh requests, the allocation expires and the call drops silently. Monitoring the TURN server's allocation count and refresh-request error rate is the first metric to add to your dashboard.
Channels and permissions are an optimisation you usually do not need to think about. RFC 8656 specifies two ways the client can send media through a TURN server. The first is the Send indication, which wraps every media packet in a STUN message with the peer's address embedded — adds 36 bytes of overhead per packet. The second is the channel, which the client binds once per peer (ChannelBind) and then uses a tiny 4-byte ChannelData header for every subsequent packet. Channels save 32 bytes per packet — at 50 packets per second per stream, that is 1.6 KB per second per peer per direction, multiplied across thousands of relayed calls. Every modern WebRTC TURN client uses channels by default. The browser handles binding for you. The reason to know about it: when you watch a packet capture and see ChannelData traffic mixed with STUN messages, that is normal.
ICE servers in RTCPeerConnection — the configuration that matters
The iceServers configuration object is the single most important configuration in any WebRTC product. Get it wrong and connections fail silently on the networks that matter most.
A minimal correct configuration in 2026:
const pc = new RTCPeerConnection({
iceServers: [
{ urls: 'stun:stun.l.google.com:19302' },
{
urls: [
'turn:turn.example.com:3478?transport=udp',
'turn:turn.example.com:3478?transport=tcp',
'turns:turn.example.com:443?transport=tcp'
],
username: shortLivedUsername,
credential: shortLivedPassword
}
],
iceTransportPolicy: 'all',
bundlePolicy: 'max-bundle',
rtcpMuxPolicy: 'require'
});
Each of these settings has a story. The STUN server alone is never enough — STUN-only configurations are the single most common reason a demo works locally and then fails when the first user joins from a mobile carrier. Always include TURN. Always expose TURN on UDP/3478, TCP/3478, and TLS/443. The UDP path is the cheapest. The TCP path covers networks that block UDP. The TLS-on-443 path covers networks that block everything except HTTPS — most corporate firewalls, university networks, and aggressive captive portals fall into this category. The browser tries all three in priority order and picks the cheapest one that works.
The iceTransportPolicy: 'all' setting tells the browser to gather every candidate type. The alternative, 'relay', restricts gathering to relay candidates only — the browser does not even attempt host or server-reflexive paths. Use 'relay' for privacy-sensitive products that must hide users' real IP addresses from each other (a deanonymising chat, a clinical telemedicine product with HIPAA requirements that need to log every connection through a single point). Use 'all' everywhere else, because the cost of forcing every call through TURN is exactly the cost of a 100 %-relayed deployment — five times higher than the 15–20 % baseline.
The bundlePolicy: 'max-bundle' setting tells the browser to multiplex every track (audio, video, screen-share, data) onto a single ICE-level transport. Without it, each track opens its own ICE session, the browser gathers candidates five times, and the connection takes five times longer. With it, one transport carries everything via the BUNDLE extension specified in RFC 9143. Always use max-bundle.
The rtcpMuxPolicy: 'require' setting tells the browser to multiplex RTP and RTCP onto the same port. Without it, ICE must establish two transports per track — one for media, one for control. With it, one transport carries both. Always require it.
The iceServers array can have any number of entries. Multiple STUN servers add redundancy if one server is down. Multiple TURN regions improve latency for a geographically distributed user base — a TURN server in Frankfurt should not relay a call between two New York phones if a New York TURN server is available. The browser will gather candidates from every server in parallel; the priority formula then puts the lowest-RTT region's relay candidates at the top of the relay tier.
ICE restart — what happens when the network changes mid-call
A user joins a video call on home Wi-Fi. Halfway through, they walk out of range and the phone hands the connection over to 5G. The phone's IP address changes. The NAT mapping it had on Wi-Fi is dead. Every existing ICE candidate-pair is invalid. Without ICE restart, the call drops; with ICE restart, the renegotiation is silent and the user does not notice.
ICE restart is initiated by the controlling agent — in WebRTC, the offerer — generating a new ufrag and password and issuing a fresh SDP offer with a=ice-ufrag: and a=ice-pwd: lines that differ from the previous offer. The other peer sees the new credentials in setRemoteDescription, gathers a fresh set of candidates, and runs a new connectivity check. The currently selected pair keeps carrying media until the new pair is valid, then the media switches over.
In application code, ICE restart is triggered by passing { iceRestart: true } to createOffer:
async function restartIce() {
const offer = await pc.createOffer({ iceRestart: true });
await pc.setLocalDescription(offer);
signaling.send({ type: 'offer', sdp: pc.localDescription });
// The other peer replies with an answer; gathering resumes; new pair selected.
}
When does an application code trigger ICE restart? Three common cases. Network change event. The browser fires a networkchange or connectionchange event when the operating system reports an interface up/down. Application code can listen for this and call restartIce() proactively. ICE state goes to disconnected. The iceconnectionstate transitions to disconnected when consent freshness probes start failing; if it stays disconnected for more than 5 seconds, the application restarts ICE. ICE state goes to failed. The state goes to failed when every pair in the checklist has failed. At this point, ICE is dead and restart is mandatory — without it, the call is gone.
Modern browsers (Chrome 122+, Firefox 125+, Safari 17+) implement a more aggressive variant called "ICE restart on every renegotiation" — every offer/answer with new tracks also restarts ICE. This is more conservative than the spec requires but it simplifies the application's mental model: every time you renegotiate, you get a fresh ICE state.
The convenience method RTCPeerConnection.restartIce() was added in the W3C WebRTC Recommendation (March 2023) and is supported by every modern browser. It does not need an explicit createOffer({ iceRestart: true }) call; calling restartIce() immediately fires negotiationneeded and the application's existing renegotiation flow handles the rest.
Consent freshness — keeping the path alive
A call connects, the user starts talking, the media flows. Twelve minutes in, the corporate firewall the user sits behind decides to clean up a NAT mapping it has not seen activity on for 30 seconds and silently drops the entry. The next media packet from the remote peer is silently dropped. The user hears nothing. The remote peer's screen freezes. Without a mechanism to detect this, the application has no idea anything is wrong.
The mechanism is consent freshness, specified in RFC 7675 (October 2015). Every 15 seconds, each side sends a STUN binding request over the currently selected pair and expects a binding response within 30 seconds. If the response arrives, consent is fresh and the pair is still alive. If three consecutive requests time out, ICE declares the pair dead, the iceconnectionstate transitions to disconnected, and the application's restart logic takes over.
Consent freshness is also the mechanism that detects a peer hanging up without calling close(). If peer A's network cable is yanked, peer B's consent freshness probe to A fails, B notices A is gone, and B can clean up its session.
Application code should listen for the iceconnectionstatechange event and reason about the transitions:
pc.addEventListener('iceconnectionstatechange', () => {
switch (pc.iceConnectionState) {
case 'disconnected':
// Consent freshness failing. Could be a transient blip — wait a few seconds.
scheduleIceRestartAfter(5000);
break;
case 'failed':
// Every pair has failed. ICE restart is mandatory now.
restartIce();
break;
case 'connected':
case 'completed':
// We're back. Cancel any pending restart timer.
cancelScheduledIceRestart();
break;
}
});
A common pitfall: applications that treat disconnected as a fatal state and immediately tear down the connection. The state is recoverable in roughly 80 % of cases — a transient packet loss, a brief wireless handover, a 2-second cellular dropout — and the right reaction is to wait 5 seconds before restarting ICE and 10 seconds before declaring the call lost.
The signalling channel — what your application has to carry
ICE does not specify a signalling protocol. RFC 8445 leaves the question of how candidates reach the other peer entirely to the application. WebRTC products use whatever they already have — a WebSocket to a Node.js backend, a gRPC stream, an HTTP long-poll endpoint, a custom protocol on top of MQTT. The transport does not matter; what matters is that every ICE candidate and every offer/answer reaches the other side reliably and in order.
The minimal protocol carries five message types:
// 1. Offer — initial SDP from the controlling peer
{ type: 'offer', from: 'alice', to: 'bob', sdp: '...' }
// 2. Answer — response SDP from the controlled peer
{ type: 'answer', from: 'bob', to: 'alice', sdp: '...' }
// 3. ICE candidate — one of many, trickled as they're gathered
{ type: 'ice-candidate', from: 'alice', to: 'bob', candidate: '...' }
// 4. End-of-candidates — signals gathering completion
{ type: 'end-of-candidates', from: 'alice', to: 'bob' }
// 5. Hangup — peer-initiated session teardown
{ type: 'hangup', from: 'alice', to: 'bob' }
The signalling server's only job is to forward messages between peers; it does not interpret SDP or ICE candidates. In practice, every signalling server also handles authentication, room membership, presence, and the application's own messaging — chat, reactions, hand-raises — but those are application concerns, not WebRTC concerns.
For the WHIP/WHEP standardised case — where one side is a media server (e.g., a live encoder pushing to an ingest server, or a viewer pulling from an SFU) — the signalling protocol is HTTP-based and standardised in RFC 9725 (WHIP, March 2025) and the WHEP draft. See our WHIP deep dive for the WHIP-specific signalling pattern, including the TURN-credential exchange that happens over a single HTTP POST.
A worked example — TURN sizing for a 5,000-seat conferencing product
A team is scoping infrastructure for a B2B conferencing product expected to host 5,000 simultaneous meetings at peak, with 4 participants average per meeting, 1.5 Mbps video plus 50 kbps audio per participant per direction. The relay rate varies by the user base. Consumer products see 15–20 %. B2B products with heavy corporate-firewall traffic see 30–40 %. We'll use 35 % for this exercise.
The arithmetic, step by step:
Step 1: total participants and per-participant traffic.
5,000 meetings × 4 participants = 20,000 participants.
Per-participant uplink: 1.55 Mbps.
Per-participant downlink: 3 × 1.55 Mbps = 4.65 Mbps (each participant receives 3 streams in a 4-person SFU mesh).
Step 2: participants whose media flows through TURN.
20,000 × 35 % = 7,000 participants relayed.
Step 3: TURN bandwidth at the relay layer.
For each relayed participant, the TURN server sees full uplink + downlink traffic.
Uplink at TURN: 7,000 × 1.55 Mbps = 10.85 Gbps.
Downlink at TURN: 7,000 × 4.65 Mbps = 32.55 Gbps.
Total at TURN: 43.4 Gbps sustained at peak.
Step 4: monthly egress.
43.4 Gbps × 86,400 s/day × 30 days × 12.5 % busy-hour-to-average ratio ≈ 14,000 TB/month = 14 PB/month.
(The 12.5 % factor approximates the ratio of busy-hour throughput to 24×7 average throughput for a business-hours-skewed B2B product. Consumer products with evening peaks use 20 %. Always-on industrial products use 100 %.)
Step 5: cost on a hyperscaler.
AWS EC2 egress in 2026, blended rate at this scale: ~$0.04/GB after the volume discounts kick in. 14,000 TB × $0.04/GB ≈ $560,000/month.
Step 6: cost on bare-metal colocation with peering.
Bare-metal transit at 95th percentile, ~$0.50–$1.00/Mbps/month. 43.4 Gbps peak ≈ $20,000–$40,000/month for transit, plus rack space and hardware amortisation.
Step 7: cost on a managed TURN provider.
Cloudflare Realtime TURN at $0.05/GB (April 2026): 14,000 TB × $0.05/GB = $700,000/month — close to hyperscaler pricing but with anycast routing across 330+ cities. Twilio's bandwidth pricing in 2026 lands in a similar range with volume contracts.
The 15–20× gap between hyperscaler egress and bare-metal colocation is why every WebRTC product at meaningful scale eventually moves TURN off the hyperscaler. The path most teams take: prototype on a single coturn instance on AWS EC2 → grow to a multi-region coturn cluster → benchmark the egress bill → move to a managed provider (Cloudflare, Twilio, Daily.co, Subspace) until the in-house operational maturity is high enough → eventually run a bare-metal cluster in two or three colocation facilities with private interconnects to the regional carriers.
For an AI voice agent product where 100 % of traffic goes through TURN — because one end is a server with no public path — the same arithmetic gives roughly 3× the bandwidth per participant (only one direction, but every byte) and roughly 3× the cost. This is the deployment shape that turned TURN bandwidth into an existential cost line item for AI voice products in 2025–2026.
Picking a TURN server in 2026 — the four production patterns
Every WebRTC team eventually faces the same TURN deployment decision: managed, self-hosted, hybrid, or cloud-native. The four patterns have different cost curves, different operational profiles, and different fit with team maturity.
Pattern 1: managed TURN-as-a-service. Twilio Network Traversal Service, Cloudflare Realtime TURN, Xirsys, Subspace, ExpressTURN. You pay per gigabyte relayed; the provider runs the cluster. The 2026 price floor is $0.05/GB (Cloudflare). Best for: products under 10K MAU, teams without dedicated infrastructure engineers, products that need global anycast coverage from day one. Worst for: products approaching 1 PB/month of relayed traffic, where the marginal cost of relay starts dominating the gross margin.
Pattern 2: self-hosted coturn or eturnal. The traditional default. coturn is the most-deployed open-source TURN server in 2026 — present in every major SFU stack including LiveKit, mediasoup, Janus, and most enterprise videoconferencing products. coturn v4.8.0 (January 2026) includes hardened DDoS handling, configurable socket buffer sizes, the CVE-2025-69217 patch, and faster packet validation. eturnal, developed by ProcessOne, is the actively-maintained alternative with a cleaner config file and a more responsive maintainer team. Both run on commodity Linux. Best for: teams with at least one infrastructure engineer comfortable with iptables, Linux network stack tuning, and PagerDuty rotations. Worst for: teams that want zero on-call burden.
Pattern 3: cloud-native Pion TURN on Kubernetes. Pion is a Go library for building TURN clients and servers; STUNner is a Pion-based, Kubernetes-native deployment of a TURN cluster behind a load balancer. Best for: teams that already run Kubernetes for everything else and want TURN to fit the same operational model. The Go runtime gives lower memory footprint than coturn under high allocation counts; the Kubernetes-native deployment gives autoscaling that coturn does not have natively.
Pattern 4: hybrid — managed for fallback, self-hosted for primary. A growing pattern in 2026: deploy a self-hosted coturn or Pion cluster as the primary TURN, and configure the iceServers list to include a managed-provider TURN URL as the third or fourth entry. If the primary cluster fails over or a region is unreachable, the browser falls through to the managed provider. The cost: you pay the managed provider only for the relayed traffic that landed on it — typically less than 1 % of total. The benefit: you do not page on-call when a single region's TURN fails.
The decision frame for picking among the four:
| Criterion | Managed | Self-hosted coturn | Pion/STUNner on K8s | Hybrid |
|---|---|---|---|---|
| Time to first call | 1 day | 1 week | 1–2 weeks | 1–2 weeks |
| Cost at 100 TB/month | $$$$ | $ | $ | $ + small $$ |
| Cost at 10 PB/month | $$$$$ | $$ | $$ | $$ + small $$ |
| On-call burden | None | High | Medium | Low |
| Global anycast | Built in | DIY | DIY | Built in |
| Best fit | < 10K MAU | Mature infra team | Kubernetes-native | Belt-and-braces |
Eight production pitfalls and how to avoid them
Pitfall 1: STUN-only ICE servers. The single most common WebRTC misconfiguration. The product works on the development team's WiFi, fails silently on every mobile carrier with symmetric CGNAT. Always include at least one TURN server in iceServers. STUN alone is a demo, not a product.
Pitfall 2: TURN over UDP only. Corporate firewalls, hotel networks, and aggressive captive portals block UDP entirely. A TURN URL of turn:host:3478?transport=udp reaches none of these networks. Always expose turn:host:3478?transport=tcp and turns:host:443?transport=tcp (TLS-over-TCP on 443) alongside the UDP entry. The browser picks the cheapest one that works.
Pitfall 3: Long-lived TURN credentials in client JavaScript. A handful of tutorials still hardcode username and credential in browser code. Anyone with developer tools can extract them and use the TURN cluster as free egress. Mint short-lived HMAC-SHA1 credentials server-side, return them through an authenticated API endpoint, refresh them as the call extends past their TTL.
Pitfall 4: Ignoring iceconnectionstatechange. Application code that never listens for disconnected or failed cannot recover from a network blip. The user walks out of WiFi range, the call dies, the app sits forever in a "connected" UI while no media flows. Always handle the state transitions.
Pitfall 5: Restarting ICE on every transient disconnect. The mirror image of pitfall 4. Application code that calls restartIce() the instant iceconnectionstate goes to disconnected causes a renegotiation storm on every wireless hiccup. Wait 5 seconds before restarting; about 80 % of disconnects resolve on their own.
Pitfall 6: Single-region TURN. A single TURN cluster in us-east-1 adds 150 ms of one-way latency to every European call. By the time the relay path is selected (because direct paths fail for a corporate user behind a firewall), the cumulative latency is 300+ ms one way — and the call quality is degraded for the entire duration. Deploy TURN in every region your users live in. Three regions (US-East, EU-West, AP-Southeast) cover 95 % of consumer global traffic; five regions (add US-West and AP-South) cover 99 %.
Pitfall 7: Not budgeting for the 100 % relay case. A product that adds an AI voice agent or a SIP gateway suddenly relays 100 % of calls — because one end is a server with no public path. The TURN bill triples overnight. Scope this scenario before adding the feature; the architecture diagram and the cost model should show the relay percentage explicitly.
Pitfall 8: Trusting iceTransportPolicy: 'relay' as a privacy guarantee. Setting iceTransportPolicy to 'relay' forces every byte through TURN, which hides your real IP from the peer — but the TURN server sees both IPs and the application server that issues TURN credentials usually logs the mapping. If you need genuine network-layer anonymity (e.g., for a deanonymising chat product), iceTransportPolicy: 'relay' is necessary but not sufficient; you also need to control what your TURN server logs.
Where Fora Soft fits in
ICE, STUN, and TURN sit at the core of every WebRTC project we ship — video conferencing, telemedicine, e-learning, browser-based surveillance viewers, AI voice agents, and live-shopping interactions all depend on this layer working correctly under hostile real-world networks. We have run coturn clusters in production since 2014, deployed Pion-based STUNner on Kubernetes for cloud-native customers, integrated managed TURN providers (Twilio, Cloudflare Realtime, Xirsys) as fallback paths for teams that need global coverage from day one, and built the credential-issuing services that sit in front of every production TURN cluster. The hard lessons — when to restart ICE, how to size a TURN cluster for a B2B vs a consumer product, how to handle the 100 % relay case for an AI voice agent — come from the projects, not from the RFCs.
What to read next
- SDP offer/answer in depth — the signalling protocol that carries ICE candidates and credentials.
- SFU vs MCU vs Mesh topologies — how ICE behaves differently across the three WebRTC topologies.
- WebRTC security: DTLS, SRTP, fingerprints, identity — the encryption layer that sits on top of ICE.
Talk to us / See our work / Download
- Talk to a WebRTC engineer about scoping or rescuing a real-time product.
- See our case studies for production WebRTC builds in conferencing, telemedicine, and e-learning.
- Download the ICE / TURN production cheat sheet — a one-page PDF reference with the candidate priority formula, the
iceServersconfiguration template, theiceconnectionstatetransition table, and the eight pitfalls in checklist form.
References
- RFC 8445, "Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal", A. Keränen, C. Holmberg, J. Rosenberg, IETF, July 2018. Obsoletes RFC 5245. The canonical ICE specification; §5.1.2 defines the candidate priority formula; §6 defines the connectivity-check procedure. [Tier 1 — official IETF spec.] https://www.rfc-editor.org/rfc/rfc8445.html
- RFC 8489, "Session Traversal Utilities for NAT (STUN)", M. Petit-Huguenin, G. Salgueiro, J. Rosenberg, D. Wing, R. Mahy, P. Matthews, IETF, February 2020. Obsoletes RFC 5389. The current STUN specification; §5 defines the binding request, §14 defines the
XOR-MAPPED-ADDRESSattribute. [Tier 1.] https://www.rfc-editor.org/rfc/rfc8489.html - RFC 8656, "Traversal Using Relays around NAT (TURN): Relay Extensions to Session Traversal Utilities for NAT (STUN)", T. Reddy, A. Johnston, P. Matthews, J. Rosenberg, IETF, February 2020. Obsoletes RFCs 5766 and 6156. The current TURN specification; §7.2 defines the
LIFETIMEattribute; §9 defines the channel-data mechanism. [Tier 1.] https://www.rfc-editor.org/rfc/rfc8656.html - RFC 8838, "Trickle ICE: Incremental Provisioning of Candidates for the Interactive Connectivity Establishment (ICE) Protocol", E. Ivov, T. Stach, E. Marocco, C. Holmberg, IETF, January 2021. Defines the Trickle ICE extension; specifies the
a=ice-options:trickleSDP attribute and the end-of-candidates indication. [Tier 1.] https://www.rfc-editor.org/rfc/rfc8838.html - RFC 8839, "Session Description Protocol (SDP) Offer/Answer Procedures for Interactive Connectivity Establishment (ICE)", M. Petit-Huguenin, S. Nandakumar, C. Holmberg, A. Keränen, R. Shpount, IETF, January 2021. Defines
a=candidate:,a=ice-ufrag:,a=ice-pwd:,a=end-of-candidates:, anda=ice-options:SDP attributes. [Tier 1.] https://www.rfc-editor.org/rfc/rfc8839.html - RFC 7675, "Session Traversal Utilities for NAT (STUN) Usage for Consent Freshness", M. Perumal, D. Wing, R. Ravindranath, T. Reddy, M. Thomson, IETF, October 2015. Defines the 15-second consent freshness probe and the 30-second timeout. [Tier 1.] https://www.rfc-editor.org/rfc/rfc7675.html
- RFC 9429, "JavaScript Session Establishment Protocol (JSEP)", J. Uberti, C. Jennings, E. Rescorla (eds.), IETF, April 2024. Obsoletes RFC 8829. §3.5.5 specifies that the offerer is the controlling agent and the answerer is the controlled agent in WebRTC. [Tier 1.] https://www.rfc-editor.org/rfc/rfc9429.html
- W3C WebRTC 1.0, W3C Recommendation, March 2023. Defines the
RTCPeerConnection,RTCIceCandidate,RTCIceTransport,iceConnectionState,restartIce()JavaScript APIs. [Tier 1 — W3C Recommendation.] https://www.w3.org/TR/webrtc/ - RFC 5780, "NAT Behaviour Discovery Using Session Traversal Utilities for NAT (STUN)", D. MacDonald, B. Lowekamp, IETF, May 2010. The STUN extensions used to probe a NAT's behaviour — endpoint-independent vs address-dependent vs address-and-port-dependent mapping. [Tier 1.] https://www.rfc-editor.org/rfc/rfc5780.html
- RFC 9143, "Negotiating Media Multiplexing Using the Session Description Protocol (SDP)", C. Holmberg, H. Alvestrand, C. Jennings, IETF, February 2022. The BUNDLE extension that enables
bundlePolicy: 'max-bundle'. [Tier 1.] https://www.rfc-editor.org/rfc/rfc9143.html - Cloudflare Realtime TURN documentation, April 2026. Pricing ($0.05/GB outbound; free when paired with Cloudflare Realtime SFU); anycast deployment across 330+ cities. [Tier 4 — vendor docs.] https://developers.cloudflare.com/realtime/turn/
- Twilio Network Traversal Service, "STUN/TURN", Q1 2026. The original commercial managed-TURN service; the source of the 15-20 % consumer relay-rate figure widely cited in the industry. [Tier 4 — vendor docs.] https://www.twilio.com/en-us/stun-turn
- coturn v4.8.0 release notes, January 2026. CVE-2025-69217 patch, faster DDoS packet validation, configurable socket buffer sizes. [Tier 4 — open-source release.] https://github.com/coturn/coturn/releases
- Pion TURN library, v3.x. Go toolkit for TURN clients and servers; foundation of STUNner. [Tier 4.] https://github.com/pion/turn
- eturnal STUN/TURN server documentation, ProcessOne, 2026. The actively-maintained coturn alternative. [Tier 4.] https://eturnal.net/
In any disagreement between sources, this article followed the spec text. Two specific discrepancies are worth noting: (a) some vendor blogs describe candidate priorities using older RFC 5245 formulas; this article uses the current RFC 8445 §5.1.2 values, and (b) some vendor docs claim a 600-second TURN allocation cannot be changed; RFC 8656 §7.2 allows the server to choose any LIFETIME up to 3600 seconds and the client requests its preferred lifetime in the Allocate request.


