NAT, Firewalls, STUN, TURN, ICE: How WebRTC Actually Reaches a Phone

Why This Matters

If you ship a product that uses WebRTC — a video conferencing app, a telemedicine consultation, a browser-based surveillance viewer, a customer-support widget, an AI voice agent — then NAT traversal is the single most common reason a call fails to connect. The signalling worked, the cameras turned on, the SDP was exchanged, the user sees "connecting…" — and then nothing. Almost every such failure traces back to ICE, STUN, or TURN. A product manager or founder who does not understand this layer ends up paying a vendor like Twilio or Daily.co for a problem that has open-source solutions, or runs a TURN cluster that quietly burns six figures a year on AWS egress because nobody routed the traffic through cheaper transit. This article explains what each piece does, when each one is needed, what they cost, and the practical defaults a streaming or real-time product should ship with in 2026.

What NAT is, in one paragraph

Network Address Translation — NAT — is the trick that lets every device in your home share one public IP address. Your laptop, your phone, your TV, and your thermostat all have private IP addresses inside the home network (192.168.x.x or 10.x.x.x), and the router holds a single public address (203.0.113.45). When the laptop sends a packet out to the internet, the router rewrites the source address from the laptop's private address to the router's public address, and remembers the mapping in a table. When the reply comes back, the router uses the table to rewrite the destination from public back to private and forwards the packet to the laptop. This is the mechanism the IETF specified in RFC 4787 (January 2007) and the reason the public internet did not run out of IPv4 addresses in 2008 the way every projection said it would. The cost: a device behind a NAT cannot be reached from the outside unless something on the inside opens a pinhole first. For an outbound HTTP request to Google, this is invisible. For an incoming WebRTC call from a stranger's phone, this is the entire problem.

The four flavours of NAT, and why one of them ruins your day

Different NAT boxes behave differently when a device sends packets out and another device tries to reply. The classical taxonomy comes from RFC 3489 (March 2003) — the original STUN specification — and even though RFC 8489 superseded it, the four-name taxonomy is still how the industry talks about NAT in 2026.

The first is Full-Cone NAT. Once a device behind the NAT sends a packet from internal address-and-port (X, x) to any external destination, the NAT creates a mapping to external address-and-port (X', x'), and any external host on the internet can send packets to (X', x') and they will be delivered to (X, x). The most permissive. Best behaved for WebRTC.

The second is Address-Restricted Cone NAT. The same mapping is created, but only external hosts that the internal device has already sent a packet to (matched by IP, not port) can send packets back. Most home routers behave this way.

The third is Port-Restricted Cone NAT. As above, but the external host has to match both the IP and the source port the internal device contacted. Still tractable for WebRTC: a STUN exchange tells each side what the other side will see, and they punch matching pinholes.

The fourth is Symmetric NAT. The NAT assigns a different external port for every distinct destination. The mapping the STUN server saw — (X', x'_stun) — is not the mapping the peer would need to use, because the peer is at a different destination address and gets a different port (X', x'_peer). The STUN-discovered candidate is useless. The two peers cannot find each other without help.

The modern, more precise terminology from RFC 4787 splits NAT behaviour into two independent properties: mapping behaviour (how the NAT assigns external ports — endpoint-independent, address-dependent, or address-and-port-dependent) and filtering behaviour (which external hosts are allowed to send packets back through the mapping). A NAT with endpoint-independent mapping and endpoint-independent filtering is what most people mean by "Full-Cone"; a NAT with address-and-port-dependent mapping is what most people mean by "Symmetric". The IETF prefers the RFC 4787 terms because they are precise; the streaming industry still uses the four-name taxonomy because everyone knows what it means.

The bad news for WebRTC: a meaningful slice of carrier-grade NAT (CGNAT) on mobile carriers, and a significant fraction of enterprise NAT, behave as symmetric NAT. The good news: ICE has a fallback — TURN — that handles the case. The whole architecture below exists to make symmetric NAT survivable.

NAT mapping diagram showing a laptop on the private subnet 192.168.1.10:50000 sending a packet outbound through a home router with public IP 203.0.113.45; the router rewrites the source to 203.0.113.45:54321 on the public internet, and the reply path uses the inverse rewrite; on the right, four NAT flavours (Full-Cone, Address-Restricted, Port-Restricted, Symmetric) shown as four small panels with their filtering behaviours

Figure 1. How NAT rewrites a packet on the way out and on the way back, and the four classical NAT flavours that decide whether two peers behind two NATs can ever find each other directly.

STUN — the device learns its own public address

STUN — Session Traversal Utilities for NAT — is the smallest of the four protocols. It is a single-purpose tool: a device asks a public STUN server "what does my address look like to you?", and the STUN server replies with the public IP and port it observed in the request. The current specification is RFC 8489, published in February 2020, which obsoletes RFC 5389 from 2008. The on-the-wire format is a 20-byte fixed header followed by attribute-value pairs, sent over UDP (default port 3478), TCP (port 3478), or TLS over TCP (port 5349).

The exchange is one round trip. The client sends a Binding Request to the STUN server. The server sees the packet arriving from the NAT-rewritten address-and-port (X', x') and copies that pair into a XOR-MAPPED-ADDRESS attribute in the Binding Response. The client now knows its own public-facing address — at least, the one this particular NAT mapping uses for this particular destination. This is called a server-reflexive candidate, abbreviated srflx.

Two things matter about that wording. First, server-reflexive — the address is the reflection the server saw, not an absolute property of the device. A symmetric NAT will give a different reflexive address for every destination, which is exactly why STUN alone is not enough for symmetric NAT. Second, candidate — the address is one of several possible routes for an incoming packet to take, and ICE will rank it among the others (host, relay) and try them in priority order.

STUN servers are deliberately stateless and free to run. Google operates stun.l.google.com:19302 as a free public STUN endpoint, as does Cloudflare (stun.cloudflare.com:3478) and Mozilla. The cost of running your own STUN server is trivial — a coturn instance on a $5 cloud VM handles tens of thousands of binding requests per second — and most teams do run their own to avoid a soft dependency on a third party's uptime.

STUN does not relay media. STUN does not authenticate peers. STUN does not survive network changes. It is a single-question, single-answer probe whose only job is to tell the device how the outside world sees it. The protocols that do the real work — TURN and ICE — are built on top of STUN's message format.

TURN — the relay server, for when nothing else works

TURN — Traversal Using Relays around NAT — is the fallback when direct peer-to-peer connectivity is impossible. The current specification is RFC 8656, also from February 2020, which obsoletes the original TURN RFC (5766, April 2010) and its IPv6 addendum (RFC 6156, April 2011). TURN extends STUN: a TURN server is a STUN server with the extra ability to act as a media relay.

The mechanics are simple to describe. A WebRTC client connects to a TURN server and asks it to allocate a public address and port — an "allocation". The TURN server replies with a public address-and-port pair (T, t) reserved for that client. Anything sent to (T, t) from the internet gets forwarded over the existing client-server connection to the WebRTC client. Anything the WebRTC client wants to send to a peer, it sends through the TURN server, which forwards it from (T, t) to the peer's address. The allocation lasts as long as the client refreshes it (the default lifetime is 600 seconds; the client refreshes every 5 minutes).

Two consequences fall out of that design. First, every byte of media goes through the TURN server twice — once from client A to TURN, once from TURN to client B. For a 2-megabit-per-second video call, the TURN server moves 4 megabits per second per call. For 1,000 concurrent relayed calls, the TURN server moves 4 gigabits per second per direction. This is why TURN bandwidth is the single biggest line item in a WebRTC infrastructure budget. Second, the latency added by TURN is the round-trip delay between client and TURN server, doubled — typically 20–60 ms for a well-placed TURN cluster, 100 ms or more if the TURN server is on the wrong continent.

TURN servers require authentication. RFC 8656 specifies a time-bounded credential scheme: the application server hands the client a username (typically a Unix timestamp plus a user identifier) and a password (an HMAC-SHA1 over the username, signed with a shared secret known only to the application server and the TURN server). The TURN server verifies the HMAC against its shared secret and accepts the client without ever having seen the user identifier before. This means a TURN cluster can be stateless with respect to user accounts, and short-lived credentials prevent abuse: a leaked credential expires in minutes.

TURN listens on UDP/3478, TCP/3478, and TLS over TCP on port 5349. Most production deployments also expose TLS on port 443 — labelled "turns" in the ICE candidate URL — because port 443 looks like HTTPS to the corporate firewalls that drop everything else. The result: a TURN-over-TLS-on-443 candidate is the universal fallback that gets through almost any network. The cost is that the call rides over TCP (latency-sensitive) instead of UDP, but a call that connects with 200 ms of one-way latency beats a call that does not connect at all.

The 2026 production reality for TURN looks like this. Coturn (open source, originally written by Oleg Moskalenko) is still the default everyone reaches for; eturnal (developed by ProcessOne, the team behind ejabberd) is the more actively maintained fork that many teams have started to prefer; Pion TURN is a pure-Go implementation popular in Kubernetes-native deployments. The cloud-managed TURN providers — Twilio, Cloudflare Calls, Daily.co, Xirsys, Subspace — all wrap the same protocol, charge per gigabyte of relayed traffic, and add their own anycast routing on top.

TURN allocation and media-relay diagram: WebRTC client A behind a symmetric NAT, TURN server in the middle with public IP, WebRTC client B behind a port-restricted NAT; A sends a TURN Allocate request, server replies with a relayed transport address; both A and B send media to the TURN server which forwards it to the peer; arrows labelled with the protocol on each leg

Figure 2. TURN as a media relay. Every packet between the two peers traverses the TURN server twice, which is why TURN bandwidth dominates the cost of a WebRTC infrastructure budget.

ICE — the algorithm that picks the path

ICE — Interactive Connectivity Establishment — is the algorithm that ties STUN and TURN together. The current specification is RFC 8445, published in July 2018, which obsoletes RFC 5245 from 2010. ICE does three things: it gathers every possible address each peer might be reachable on, it pairs every local address with every remote address to form a list of candidate pairs, and it tests every pair until it finds a working route. The whole point of ICE is to avoid hard-coding any single strategy for NAT traversal — instead, the algorithm tries every plausible path in parallel and keeps the one that works.

A candidate is one of four kinds. A host candidate is the device's real network address — a Wi-Fi interface, a 5G interface, a wired Ethernet interface. A server-reflexive candidate is the public address learned from a STUN server (the srflx from earlier). A peer-reflexive candidate is a public address learned during the connectivity-check itself, when a peer's STUN probe shows the device an address it had not yet observed — this happens when the NAT mapping was created mid-call. A relay candidate is a TURN-allocated relayed transport address.

The connectivity check uses STUN's Binding Request message, but this time peer-to-peer rather than client-to-server. Each pair is tested in priority order; ICE's priority formula favours host candidates first, then server-reflexive, then peer-reflexive, then relay. A pair becomes "valid" when one side sends a STUN Binding Request to the other side and gets a Binding Response back from the expected address. The very first valid pair found goes into the "selected pair" slot and the call connects; ICE keeps testing other pairs in the background and can switch to a better one if it finds one.

Once a pair is selected, the same STUN exchange is reused to keep the path alive. RFC 7675 specifies the consent freshness protocol: every 15 seconds, each side sends a Binding Request over the selected pair and expects a response within 30 seconds, or the pair is declared dead and ICE restarts. This is the mechanism that survives a network change — the new path is renegotiated transparently.

Trickle ICE — specified separately in RFC 8838 (January 2021) — is the optimisation that nearly every modern WebRTC stack uses by default. Instead of waiting until every candidate is gathered before sending them to the peer, each side sends candidates over the signalling channel as they are discovered. The two sides can start connectivity checks on early candidates while later candidates are still being gathered. For most calls, this cuts the time-to-connected from 2–3 seconds to under 500 milliseconds.

ICE candidate gathering and connectivity-check diagram showing: peer A gathers host candidates from Wi-Fi and 5G interfaces, sends a STUN Binding Request to a STUN server and gets back a server-reflexive candidate, asks a TURN server to Allocate and gets back a relay candidate; peer B does the same; both peers exchange candidates over the signalling channel using Trickle ICE; both peers pair every local candidate with every remote candidate, sort by priority, and run STUN Binding Requests on every pair until one succeeds — the host-host and host-srflx pairs typically succeed first, the relay-relay pair is the last-resort fallback

Figure 3. ICE candidate gathering and the connectivity check. Each peer gathers host, server-reflexive, and relay candidates in parallel; trickles them across the signalling channel as they arrive; pairs every local with every remote; and runs STUN probes on every pair until the highest-priority working route is found.

The end-to-end story — what happens when you click "join call"

A user clicks "join" in a browser. The browser opens the camera, the microphone, and creates an RTCPeerConnection. The application server has already given the browser a list of ICE servers — a STUN URL and a TURN URL with short-lived credentials. The browser starts ICE.

In parallel, the browser does five things. It enumerates every network interface — Wi-Fi, 5G, Ethernet — and creates a host candidate for each one. It sends a STUN Binding Request to the STUN server from each interface and waits for the server-reflexive candidate to come back. It sends a TURN Allocate request to the TURN server from each interface and waits for the relay candidate to come back. It generates an SDP offer that lists every candidate gathered so far. And it sends the offer through the application's signalling channel (WebSocket, HTTP long-poll, gRPC — the signalling transport is outside ICE's scope) to the other peer.

The other peer does the symmetric thing. It receives the offer, generates an SDP answer with its own list of candidates, and sends the answer back. Both peers trickle additional candidates as they finish gathering. Each side pairs every local candidate with every remote candidate, computes a priority for each pair using the formula in RFC 8445 §6.1.2.3 (essentially 2^32 × min(G,D) + 2 × max(G,D) + (G > D ? 1 : 0)), sorts the pairs by priority, and starts sending STUN Binding Requests on each pair in priority order with a small pacing delay (Ta, default 50 ms).

A pair succeeds when a Binding Request gets a Binding Response back with the expected transaction ID and authentication. The first pair to succeed becomes the selected pair, the media starts flowing, and the user hears the other person speak. ICE keeps testing other pairs in the background for the next few seconds; if a better pair completes (host-host wins over host-relay every time), the media switches over silently.

Total time from "click" to "first frame received" on a typical 2026 home-to-home call: about 300–600 ms. Total time on a home-to-corporate-firewall call where only the TLS-over-443 TURN candidate works: about 800–1200 ms. The variance comes almost entirely from how long it takes to discover that the cheaper candidates do not work.

A worked example — TURN bandwidth math for a 1,000-seat product

A founder is planning the infrastructure for a video meeting product expected to host 1,000 simultaneous one-on-one calls at peak. Average call bitrate: 1.5 Mbps of video plus 50 kbps of audio per direction, so ~1.6 Mbps × 2 directions = 3.2 Mbps per call total. The industry-baseline TURN-relay fraction for a consumer mix is 15–20 % of sessions; let us be conservative and use 20 %.

Step one: how many calls go through TURN? 1,000 calls × 20 % = 200 calls relayed.

Step two: TURN bandwidth per relayed call. Each call's full media stream goes through the TURN server twice — once in each direction. So bandwidth at the TURN server = 3.2 Mbps × 2 = 6.4 Mbps per relayed call. (Some sources count only one direction because TURN is "in the middle" of the path; that is wrong. The TURN server sees ingress from one peer and egress to the other peer for every byte, and both legs count.)

Step three: total TURN throughput. 200 calls × 6.4 Mbps = 1,280 Mbps = 1.28 Gbps sustained at peak.

Step four: monthly egress. 1.28 Gbps × 86,400 s/day × 30 days × 12.5 % busy-hour-to-average ratio (a heuristic for video products) = ~410 TB/month of egress.

Step five: cost on a hyperscaler. AWS EC2 egress in 2026 is roughly $0.085/GB for the first 10 TB, dropping to $0.05/GB above 150 TB; the blended rate at 410 TB is around $0.054/GB. 410 TB × $0.054/GB ≈ $22,000/month for TURN egress alone.

Step six: where the savings live. Running TURN on a colocated bare-metal cluster with 95th-percentile transit at $0.50–$1.00 per Mbps/month and 1.28 Gbps peak ≈ $700–$1,400/month for transit. The 20× cost gap between hyperscaler egress and colocated transit is the reason every WebRTC product at meaningful scale either runs its own TURN cluster on bare metal or contracts with a TURN provider whose pricing reflects bare-metal economics (Cloudflare Calls, Daily.co, Subspace).

If the product is an AI voice agent or a one-sided IoT deployment where 100 % of traffic must go through TURN — because one end has no public path at all — the same math gives 5× the bandwidth and 5× the cost. This is the deployment shape that turned TURN bandwidth into an existential line item for AI voice products in 2025–2026.

Common pitfalls

Pitfall 1: trusting STUN-only on a symmetric NAT. Many demos use only a STUN server. They work fine until the first user on a mobile carrier — most of which run symmetric CGNAT — joins the call, and connectivity fails silently. Always include at least one TURN server in your ICE configuration. The fallback is the entire point.

Pitfall 2: TURN over UDP only. Corporate firewalls routinely block UDP altogether and force everything through TCP/443. A TURN candidate on turn:host:3478?transport=udp is useless inside such a network. Always also expose turns:host:443?transport=tcp (TLS-over-TCP on 443). The browser will pick the cheapest one that works.

Pitfall 3: long-lived TURN credentials. A handful of vendor tutorials still show TURN credentials hardcoded in client-side JavaScript. Anyone who opens devtools can extract them and use your TURN cluster as free egress for years. Always mint short-lived credentials server-side, signed with a shared secret, scoped to the call. RFC 8656 §9.2 specifies the HMAC-SHA1 format.

Pitfall 4: one TURN server on one continent. Latency added by TURN equals the round-trip time between client and TURN server, doubled. A call between two users in São Paulo relayed through a TURN server in Virginia adds 240 ms of latency for no reason. Deploy TURN geographically close to your users — at least one cluster per major continent, with anycast or geo-DNS in front.

Pitfall 5: forgetting consent freshness. RFC 7675 mandates that each side keeps the path alive with a STUN Binding Request every 15 seconds. Some homegrown stacks skip this, which works fine until a NAT mapping idles out (most home routers reclaim UDP mappings after 30–60 seconds of silence) and the call drops mid-conversation with no diagnostic.

Where Fora Soft Fits In

Fora Soft has been building WebRTC and conferencing software since 2005, with 239+ shipped projects across video conferencing, OTT, telemedicine, e-learning, video surveillance, and AR/VR. NAT traversal is the part of WebRTC most teams underestimate — the SDP offer/answer is documented in every tutorial, the cameras and microphones are well-supported in the browser, the SFU options (mediasoup, Janus, LiveKit, Pion) are well understood. What separates a product that works in a demo from a product that works for a Brazilian mobile carrier customer on a corporate office Wi-Fi is the TURN cluster — its placement, its credentials story, its observability, and its cost discipline. We run TURN clusters for clients on dedicated bare-metal and on hyperscaler infrastructure, instrument them with per-call cost attribution, and pair them with multi-region SFU deployments where the call topology calls for it.

Call to action

Talk to a streaming engineer — book a 30-minute scoping call to talk through your stun turn ice explained plan.
See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
Download the NAT, STUN, TURN Deployment Checklist — One-page reference for opening the right ports, minting short-lived TURN credentials, placing TURN clusters geographically, picking observability KPIs, and a worked bandwidth and cost model for a 1,000-seat product.

References

IETF RFC 8489, Session Traversal Utilities for NAT (STUN), M. Petit-Huguenin et al., February 2020. Obsoletes RFC 5389. The controlling specification for STUN message format, attributes, and the discovery flow. <https://datatracker.ietf.org/doc/html/rfc8489>
IETF RFC 8656, Traversal Using Relays around NAT (TURN): Relay Extensions to STUN, T. Reddy et al., February 2020. Obsoletes RFC 5766 and RFC 6156. The controlling specification for TURN allocation, permissions, channels, and the short-lived-credential mechanism. <https://datatracker.ietf.org/doc/html/rfc8656>
IETF RFC 8445, Interactive Connectivity Establishment (ICE): A Protocol for NAT Traversal, A. Keränen, C. Holmberg, J. Rosenberg, July 2018. Obsoletes RFC 5245. The controlling specification for ICE candidate gathering, pairing, priority computation, and the connectivity-check state machine. <https://datatracker.ietf.org/doc/html/rfc8445>
IETF RFC 8838, Trickle ICE: Incremental Provisioning of Candidates for the ICE Protocol, E. Ivov, J. Uberti, P. Saint-Andre, January 2021. The standard for sending candidates incrementally over the signalling channel instead of waiting for full gathering. <https://datatracker.ietf.org/doc/html/rfc8838>
IETF RFC 4787, Network Address Translation (NAT) Behavioral Requirements for Unicast UDP, F. Audet, C. Jennings, January 2007. BCP 127. The IETF's terminology for NAT mapping and filtering behaviours; the modern replacement for the four-name taxonomy. <https://datatracker.ietf.org/doc/html/rfc4787>
IETF RFC 5780, NAT Behavior Discovery Using STUN, D. MacDonald, B. Lowekamp, May 2010. The STUN-based technique for probing what kind of NAT a device sits behind. <https://datatracker.ietf.org/doc/html/rfc5780>
IETF RFC 7675, STUN Usage for Consent Freshness, M. Perumal et al., October 2015. The keep-alive mechanism that detects path failure on a selected ICE pair. <https://datatracker.ietf.org/doc/html/rfc7675>
IETF RFC 8826, Security Considerations for WebRTC, E. Rescorla, February 2021. The security model that frames why TURN credentials must be short-lived and why DTLS-SRTP is mandatory. <https://datatracker.ietf.org/doc/html/rfc8826>
W3C WebRTC 1.0 Recommendation, WebRTC: Real-Time Communication Between Browsers, March 2023. The W3C surface that exposes ICE candidates as JavaScript objects on RTCPeerConnection. <https://www.w3.org/TR/webrtc/>
coturn project repository, the de-facto open-source TURN server reference implementation, current release 4.6.x as of 2026. <https://github.com/coturn/coturn>
eturnal documentation, the actively maintained ProcessOne TURN/STUN server. <https://eturnal.net/>
Pion TURN repository, the Go-language TURN toolkit popular in cloud-native deployments. <https://github.com/pion/turn>
WebRTC for the Curious, "Connecting", a high-quality open-source book by the Pion maintainers explaining ICE/STUN/TURN with code examples. Used as a secondary reference per §4.3.2; the spec is always the source of truth. <https://webrtcforthecurious.com/docs/03-connecting/>
Cloudflare blog, "How Cloudflare Calls handles TURN traffic at the edge", 2025. First-party deployer engineering blog used as a tier-4 source for production TURN behaviour.

Per §4.3.2, in any disagreement between sources, the article follows the controlling IETF specification and notes the discrepancy. Several popular vendor blog posts still describe TURN credentials as long-lived; we follow RFC 8656 §9.2 and explicitly recommend short-lived HMAC-signed credentials.

NAT, Firewalls, STUN, TURN, ICE: How WebRTC Actually Reaches a Phone

Why This Matters

What NAT is, in one paragraph

The four flavours of NAT, and why one of them ruins your day

STUN — the device learns its own public address

TURN — the relay server, for when nothing else works

ICE — the algorithm that picks the path

The end-to-end story — what happens when you click "join call"

A worked example — TURN bandwidth math for a 1,000-seat product

Common pitfalls

Where Fora Soft Fits In

What to Read Next

Call to action

References

Related glossary terms

NAT, Firewalls, STUN, TURN, ICE: How WebRTC Actually Reaches a Phone

Why This Matters

What NAT is, in one paragraph

The four flavours of NAT, and why one of them ruins your day

STUN — the device learns its own public address

TURN — the relay server, for when nothing else works

ICE — the algorithm that picks the path

The end-to-end story — what happens when you click "join call"

A worked example — TURN bandwidth math for a 1,000-seat product

Common pitfalls

Where Fora Soft Fits In

What to Read Next

Call to action

References

Related glossary terms

STUN

Janus

mediasoup

TURN

WebRTC

Pion

LiveKit

QUIC