
Key takeaways
• SIP is not a fossil. 70%+ of enterprises still run SIP trunks and every PSTN dial-in on Zoom, Teams or Meet touches a SIP gateway — you cannot ship a serious video platform in 2026 without one.
• Three architectures, pick one. Self-hosted gateway (Kamailio + FreeSWITCH / RTPEngine), managed SBC, or a platform-native SIP add-on (LiveKit SIP, Chime SDK, Twilio, JIGASI). The choice drives 80% of your monthly bill.
• Codec mismatch is the silent killer. Opus on the WebRTC side, G.711/G.729 on the SIP side. Transcoding adds 20–50 ms, eats CPU, and wrecks one-way audio on mis-configured SBCs.
• Budget realistically. 100 concurrent PSTN legs cost roughly $1.2k–$2.4k per month — the SIP trunk is the biggest line, not compute.
• Build the bridge, not the PBX. Your product is video conferencing. Re-implementing SIP from scratch is a 12-month detour. Sit a thin gateway on top of Kamailio or use a managed SIP SDK — you will ship 4× faster.
Why Fora Soft wrote this playbook
At Fora Soft we have spent 21 years building the media plumbing behind 625+ video products: conferencing rooms, telemedicine consults, courtrooms, e-learning classes and corporate training hubs. Almost every enterprise customer eventually asks the same question: “Can the executive on the Cisco room system join the WebRTC call?” “Can clients dial in from a plain phone?” “Does your platform handle the PBX we already paid for?” The answer is SIP integration.
This playbook is the condensed version of the runbook we hand new video engineers. It leans on projects like Valt (remote interrogation video platform for US law enforcement), our telemedicine builds where patients need to dial in from a landline, and corporate training platforms where 40% of participants still join from a Cisco or Poly room.
If you are building, scaling or migrating a video conferencing platform and SIP keeps coming up, the rest of this article tells you exactly what to build, what to buy, and where teams burn money.
Need a SIP bridge for your video conferencing platform?
We have shipped SIP-to-WebRTC bridges on top of Kamailio, FreeSWITCH, Janus, LiveKit and Chime SDK for 20+ enterprise customers. Let’s scope yours.
What SIP actually is in 2026 (in 90 seconds)
SIP (Session Initiation Protocol, RFC 3261) is the signalling layer of business telephony. It negotiates who is calling whom, on what codec, via which IP address. It does not carry audio or video — that is RTP’s job. SIP is what makes the phone ring; RTP is the sound coming out.
The spec has not changed materially since 2002. What changed is the transport. In 2026 a WebRTC browser can speak SIP over WebSocket (RFC 7118) and end up talking to a 2008 Polycom phone via the same Kamailio proxy that fronts your enterprise PBX. The signalling is identical; only the wrappers differ.
| Layer | Protocol | Typical ports | Notes 2026 |
|---|---|---|---|
| Signalling | SIP / UDP / TCP / TLS / WSS | 5060, 5061, 443 | TLS is effectively mandatory for carriers |
| Session description | SDP | — | Where the codec war happens |
| Media | RTP / SRTP / DTLS-SRTP | 16384–32767 (UDP) | DTLS-SRTP is the default on WebRTC side |
| NAT traversal | ICE / STUN / TURN | 3478, 5349 | Legacy SIP endpoints do not speak ICE |
Why enterprises still need SIP integration in 2026
We keep getting the same five reasons from enterprise buyers. Each one turns a pretty WebRTC product into a “nice demo, does not meet procurement requirements” rejection unless you bridge SIP.
1. PSTN dial-in. An executive joining from an airport lounge phone. A witness calling into a legal hearing. A doctor dialing in from a clinic landline. None of that works without a SIP trunk and a PSTN-capable gateway.
2. Room systems (Cisco, Poly, Lifesize, Yealink). Most boardrooms built between 2012 and 2022 are H.323 / SIP boxes. They are stable, amortised, and their IT owners will not rip them out. Zoom Room Connector, Teams Rooms SIP/H.323 gateway and Meet-via-Pexip all exist precisely because bridging these is the only way in.
3. Hotel, fire-alarm and safety phones. Fire-alarm panels are legally required to dial PSTN. Hotel room phones are SIP-wired to the property PBX. When a telemedicine platform goes into a hospital, half the wards still have wall-mounted handsets.
4. Call-center and contact-center integration. Genesys, Avaya, Five9, Amazon Connect, Asterisk-based contact centres all speak SIP. The path from “customer waits in IVR” to “customer joins video agent” runs through a SIP bridge.
5. Compliance recording and analog fax. HIPAA, PCI-DSS and SOC 2 audit recording platforms tap the SIP trunk. Some legal and medical workflows still require faxing. Both live on SIP, not WebRTC.
The three bridging architectures — pick exactly one
Every SIP–WebRTC bridge in production is one of these three patterns. Mixing them will burn you. Pick based on scale, compliance requirements and how much SIP plumbing you want to own.
| Architecture | Core components | Pros | Cons |
|---|---|---|---|
| A. Self-hosted gateway + transcoder | Kamailio or OpenSIPS, FreeSWITCH or Asterisk, RTPEngine, Janus / mediasoup / Jitsi | Full control, no per-minute fees, any compliance profile | Ops load, on-call rotation, real SIP expertise required |
| B. Managed SBC + SIP SDK | Oracle / AudioCodes / Ribbon SBC or Twilio / Telnyx Elastic SIP + SDK | Vendor handles carrier issues, high availability out of the box | Per-minute fees, vendor lock-in, slower feature turnaround |
| C. Platform-native SIP add-on | LiveKit SIP, AWS Chime SDK SIP Media App, Jitsi JIGASI, Vonage SIP Connect | Fastest path to production, integrates cleanly with your SFU | Tied to a specific platform, limited SBC features |
Reach for architecture A when: you are handling regulated workloads (HIPAA, SOC 2, government), forecast >500 concurrent PSTN legs, and already run Kubernetes or bare metal at Hetzner / OVH / AWS.
Reach for architecture B when: you have enterprise customers with existing PBX contracts and need carrier-grade uptime (99.99%+) without building an on-call team. Budget $0.005–$0.015 per minute plus DID fees.
Reach for architecture C when: your SFU is already LiveKit, Chime or Jitsi, you need PSTN within a sprint, and volumes are under ~200 concurrent legs. This is our default recommendation for startups.
Codec negotiation: where bridges quietly die
The SIP side speaks G.711 (μ-law / A-law, 64 kbps, no compression), sometimes G.729 (8 kbps, licensed), rarely G.722 (wideband). The WebRTC side speaks Opus at 10–120 kbps. Some endpoints advertise VP8/VP9/H.264 for video; many SIP room systems only speak H.264. If your SBC cannot transcode, the session sets up and nothing happens — the infamous one-way or no-audio call.
Asterisk 13.12+ transcodes Opus natively. FreeSWITCH 1.10+ handles it via mod_opus. RTPEngine transcodes Opus↔G.711 at ~5 ms per hop on a modern Xeon. Video transcoding is the expensive one: H.264↔VP9 costs ~0.5–1 vCPU per concurrent call. Budget accordingly.
Practical defaults
Audio: Advertise Opus first, G.711 as fallback, G.722 only if you control both ends. Avoid G.729 unless a customer demands it (licence fees per channel).
Video: H.264 constrained baseline on the bridge, VP8 as fallback. Let VP9 and AV1 stay WebRTC-only unless you can afford the transcode.
PSTN trunks compared — Twilio, Telnyx, Bandwidth, Vonage
| Provider | Outbound / min (US) | DID / month | Best for |
|---|---|---|---|
| Twilio Elastic SIP | $0.0053–$0.042 | ~$1.00 | Global coverage, ecosystem breadth |
| Telnyx | $0.005 | ~$0.80 | Lowest per-minute, engineer-friendly API |
| Bandwidth | Custom (volume) | Custom | US enterprise, E911, toll-free |
| Vonage | $0.010+ | ~$1.00 | Combined voice + video API stack |
| Plivo | $0.0055 | ~$0.80 | Budget-sensitive, high-volume SMS+voice |
We default to Telnyx for new builds on cost, Twilio when the client already has an account or needs deep global DID coverage, Bandwidth when E911 is in scope. Multi-provider failover (DNS SRV + health probes) adds 2–3 engineer-days and cuts carrier-outage risk dramatically.
Managed SIP–WebRTC gateways compared
| Gateway | License | Pricing shape | Best for |
|---|---|---|---|
| Jitsi JIGASI | Apache 2.0 | Self-hosted, infra only | Teams running Jitsi Meet |
| LiveKit SIP | Apache 2.0 + SaaS | Cloud: bundled in data pricing | LiveKit-based video apps |
| AWS Chime SDK SIP Media App | Proprietary | ~$0.0022 / min inbound | AWS-native stacks |
| Twilio Voice + Video | Proprietary | Per-minute voice + per-participant video | Rapid prototypes, global DIDs |
| Daily.co | Proprietary | ~$0.004 / participant-min | Small teams shipping fast |
Security: SIP over TLS, SRTP, and what HIPAA actually requires
Signalling. SIP over TLS 1.3 on 5061 is non-negotiable for anything leaving your VPC. Plain 5060 is acceptable only on internal networks behind a strict firewall.
Media. DTLS-SRTP on the WebRTC side, SRTP (SDES) on the SIP side. RTPEngine or your SBC terminates both and re-keys. If you drop to plain RTP anywhere in the flow, you have failed a HIPAA audit.
End-to-end encryption. True E2EE (SFrame, MLS) is still impossible when one leg is a classic SIP phone — the SBC must decrypt to transcode. Be explicit with customers: bridges give you hop-by-hop encryption, not E2EE.
HIPAA 2025 refresh. Signed BAAs required with every vendor in the media path (Twilio, AWS, Telnyx all offer these). Encryption at rest mandated for recordings, MFA on every admin console, auditable access logs with 6-year retention.
Bridging SIP into a regulated video platform?
We have shipped HIPAA and SOC 2 compliant video products with SIP dial-in for telehealth, legal and government. We can audit your current bridge or build the next one.
Quality and latency budget
ITU-T G.114 says mouth-to-ear latency <150 ms is transparent, 150–400 ms is acceptable with effort, >400 ms is noticeably awful. Each hop in a SIP–WebRTC bridge adds: 20–50 ms codec transcoding, 20–50 ms jitter buffer, 30–80 ms inter-region network. Budget early.
DTMF relay. Pick RFC 2833 (in-band RTP events) as the default; fall back to SIP INFO for gateways that still cannot handle RTP events. In-band audio tones break on Opus — never use them.
Loss resilience. Enable RED/FEC on Opus, PLC on G.711 legs, and tune jitter buffers to 60–120 ms on mobile SIP clients. Echo cancellation lives on the WebRTC side; SIP phones usually handle it themselves.
Three call flows you will actually ship
Flow 1 — Click-to-call from web app to PSTN
Browser → SIP.js over WSS → Kamailio/OpenSIPS → RTPEngine (DTLS-SRTP ↔ SRTP, Opus ↔ G.711) → Telnyx / Twilio SIP trunk → PSTN number. Round-trip: 120–200 ms inside a region. This is the default for sales dialers, support call-backs and appointment reminders.
Flow 2 — PSTN dial-in to a WebRTC room via PIN
Phone user dials a DID → SIP trunk → your SBC → IVR (FreeSWITCH or Chime SIP Media App) prompts for room PIN → user joins the WebRTC room as audio-only participant. Your SFU (LiveKit, Janus, mediasoup) receives a bot participant publishing Opus audio transcoded from the G.711 leg.
Flow 3 — Cisco / Poly room system joining a WebRTC meeting
Room system dials a URI (sip:meeting-123@conf.example.com) → your SBC authenticates → gateway joins the WebRTC room as a video participant with H.264 baseline. The room camera shows up in the WebRTC UI like any other participant; screen share works via BFCP or via the gateway re-publishing a second stream.
Scaling and infra: capacity planning in one page
Concurrency targets. A single Kamailio proxy handles 1 000+ CPS signalling on modest hardware. RTPEngine pushes 500–1 000 concurrent audio media sessions per core on a modern AMD EPYC. Video transcode drops that to ~50–100 per core depending on resolution.
Ports. RTP port range of 16384–32767 gives you ~8 000 concurrent media flows per interface. Signalling on 5060/5061 plus WSS on 443.
NAT / TURN. Browser-side TURN (coturn) for restrictive networks, static IPs on SBC for SIP trunk whitelisting, and turn off SIP ALG on every router in the path. ALGs rewrite SIP headers in creative, broken ways; every SIP engineer has the scars to prove it.
Geographic failover. DNS SRV records for signalling, Anycast IP for TURN, multi-region SBC pair with active-active signalling and sticky media (media follows where the RTP first landed). Measure p99 signalling round-trip against every trunk quarterly.
Mini case — bridging 600 courtroom sites into a WebRTC platform
Situation. A government client needed remote attorneys and witnesses to join courtroom hearings from PSTN phones, Cisco room systems in regional courthouses, and a browser-based WebRTC app. They had a single vendor SBC that could not transcode Opus and was charging per port.
12-week plan. Migrated signalling to a Kamailio pair (active-active), placed RTPEngine for media with Opus ↔ G.711 transcoding, connected Bandwidth as primary and Telnyx as secondary trunks with DNS SRV failover. Wired compliance recording off the SBC via SIPREC into an S3 bucket with object-lock for 7-year retention. Joined the WebRTC conference room via a LiveKit gateway bot.
Outcome. Per-port licence fees dropped to zero. Concurrent PSTN capacity grew from 120 to 1 200 with no hardware change. Cisco room systems joined via SIP in under 3 seconds (down from 11 s). Monthly savings: ~$14 000 against the old vendor SBC. Want a similar assessment?
Cost model — 100 concurrent PSTN legs on a modern bridge
| Line item | Audio-only self-hosted | Audio-only managed | Audio + video transcode |
|---|---|---|---|
| SIP trunk & DIDs | ~$810 | ~$1 200 | ~$1 200 |
| Kamailio + RTPEngine infra | ~$290 | — | ~$600 |
| Managed gateway fees | — | ~$580 | ~$540 |
| Recording storage (S3) | ~$60 | ~$60 | ~$100 |
| Monthly total | ~$1 160 | ~$1 840 | ~$2 440 |
Expect ~$11–$25 per concurrent leg per month steady-state. Building with Agent Engineering we typically compress the initial bridge engagement to 6–10 weeks of focused work; your mileage depends on compliance scope and how much of the SFU already exists.
Five pitfalls we see on every SIP bridge audit
1. SIP ALG on the firewall. Consumer and mid-range enterprise routers enable SIP ALG by default. It rewrites headers, breaks ICE, mangles SDP and causes intermittent one-way audio. Disable it everywhere in the path.
2. Missing transcoding — silent calls. Opus advertised to an endpoint that only speaks G.711 produces a session that sets up and is perfectly quiet. Always force a known-good codec on the bridge side.
3. DTMF mismatch. In-band tones on Opus legs are destroyed by the codec. Normalise to RFC 2833 at the SBC; fall back to SIP INFO only for trunks that need it.
4. NAT asymmetry on legacy endpoints. Old SIP phones do not speak ICE. Use a public-IP SBC or a TURN-style media anchor so the phone only ever sees one IP:port pair.
5. Under-provisioned port range. The default 16384–32767 RTP range is ~16 000 ports, enough for 8 000 concurrent media flows per NIC. Teams hitting scale without expanding the range start rejecting calls at peak.
KPIs: what to measure on a SIP bridge
Quality KPIs. One-way mouth-to-ear latency p95 < 250 ms across the bridge. Post-Dial Delay (PDD) < 2 s on inbound. MOS ≥ 4.0 (POLQA or ITU-T P.863) on audio-only, ≥ 3.8 on transcoded video.
Business KPIs. Answer Seizure Ratio (ASR) ≥ 60%, Network Efficiency Ratio (NER) ≥ 95%. Monthly per-leg cost tracked against target ($15–$25).
Reliability KPIs. 99.99% signalling availability (4.3 min downtime/month). No single trunk >70% of total minutes — enforce multi-provider failover. Call setup success rate > 99.5% on registered devices.
When to skip SIP and go pure WebRTC
SIP is worth the pain when enterprise procurement, PSTN dial-in, room systems, call centres or compliance recording are real requirements. It is a waste of time when none of those apply.
If your users are mobile-first consumers, all your traffic is browser-to-browser or browser-to-native-app, and no one has asked for PSTN dial-in, ship pure WebRTC on LiveKit, mediasoup or Janus. You can always add SIP later — the inverse (ripping out a bespoke SIP layer once it has grown roots) is a 6-month migration.
FAQ
Is SIP obsolete in 2026?
No. SIP is the signalling protocol for over 70% of enterprise telephony and the only path into PSTN. The baseline RFC (3261) is stable; what changed is the transport — WebSocket and TLS are now the defaults, and modern endpoints support ICE and DTLS-SRTP.
Do I need to build my own SBC?
Usually not. Start with Kamailio or OpenSIPS plus RTPEngine for self-hosted, or use a managed SIP SDK like LiveKit SIP, Chime SDK SIP Media App, or Twilio Voice. Commercial SBCs (Oracle, AudioCodes, Ribbon) make sense only at very large scale or when a specific certification is required.
How long does a SIP–WebRTC bridge take to build?
With existing WebRTC infra and a managed SIP SDK: 2–4 weeks for a production-ready PSTN dial-in. A self-hosted Kamailio + FreeSWITCH bridge with transcoding, failover and compliance recording usually needs 8–12 weeks with a focused team using Agent Engineering.
Can Cisco and Poly room systems join Zoom/Teams/Meet via SIP?
Yes. Zoom Room Connector and Microsoft Teams Rooms ship SIP/H.323 gateways. Google Meet uses Pexip as the interoperability partner. If you are building a fourth platform, expect to replicate the same gateway layer for your enterprise deals.
Which codec should I use on the SIP side?
Advertise Opus first (if the other side supports it), with G.711 μ-law as guaranteed fallback. Avoid G.729 unless a customer specifically needs it — licence fees per channel are not worth the ~7 kbps savings.
Does SIP support end-to-end encryption?
SIP signalling is encrypted with TLS and media with SRTP/DTLS-SRTP, but as soon as the call traverses a transcoding gateway, true E2EE (SFrame, MLS) is broken. Tell customers honestly: bridges give you hop-by-hop encryption, not E2EE.
What is SIP ALG and why disable it?
SIP ALG is an “Application Layer Gateway” in many routers that rewrites SIP headers thinking it is being helpful. It usually mangles SDP, breaks ICE, and causes one-way audio. Every experienced SIP engineer will tell you to disable it at the firewall and handle NAT in your SBC instead.
How do I handle HIPAA when bridging SIP?
Sign BAAs with every vendor in the path (Twilio, Telnyx, AWS Chime all offer them). Use TLS 1.3 for signalling, SRTP/DTLS-SRTP for media, encryption at rest for recordings, MFA on all admin consoles, and retain access logs for 6 years.
What to read next
SIP / Multilingual
SIP Translation Integration: Multilingual Conferencing Guide
Adding real-time translation on top of the SIP bridge you just designed.
WebRTC / LiveKit
LiveKit Multimodal Agents Guide: Voice, Vision & Production
The SFU stack that LiveKit SIP plugs into — same gateway, new capabilities.
Real-time / Meetings
3 Best Real-Time Meeting Translation Platforms in 2026
How real-time translation sits next to SIP dial-in in enterprise meeting stacks.
Speech / Live
5 Tips for Effective Speech-to-Text in Live Streaming
Captions and transcripts on the mixed WebRTC/SIP legs of your platform.
Ready to bridge SIP into your video platform?
Pick one architecture, not three. Self-hosted Kamailio + RTPEngine for regulated scale, managed SBC + SIP SDK for enterprise uptime with zero ops, platform-native add-ons (LiveKit SIP, Chime, JIGASI, Twilio) for speed to market. Budget $15–$25 per concurrent leg per month steady state. Force a known-good codec at the edge, disable SIP ALG everywhere, and measure PDD, MOS and ASR from day one.
Done right, a SIP bridge unlocks every enterprise deal, every PSTN dial-in, every boardroom with a Cisco in the corner. Done poorly, it produces silent calls, one-way audio and six months of on-call pages. The difference is mostly engineering discipline and having seen it before.
Let’s build (or rescue) your SIP–WebRTC bridge
21 years of real-time media, 625+ shipped video products. Bring us your architecture and we will show you where the savings and the scars are.


.avif)

Comments