Blog: P2P, SFU, MCU, Hybrid: Which WebRTC Architecture Fits Your 2026 Roadmap?

Key takeaways

Architecture is a product decision, not a plumbing decision. P2P, SFU, MCU and hybrid pick your unit economics, your scaling ceiling, your mobile battery life, and your compliance posture. Get it wrong and you rewrite in year two.

The single most important question is peak participants per room. 1:1 → P2P with TURN fallback. 3–50 → SFU. 50–1,000+ broadcast-style → MCU or SFU + simulcast cascade. Anything mixed → hybrid.

SFU is the 2026 default. LiveKit, Janus, mediasoup, Amazon Chime SDK, Jitsi JVB, Dolby.io — all SFU-first. Simulcast + SVC scales to thousands of viewers with acceptable CPU and predictable cost.

MCU is a niche tool, not a default. Use it when you need server-side composition (layouts, branding, legal archival, low-power legacy clients). Expect 4–10x the compute cost of an SFU at the same participant count.

Hybrid is how real products ship. 1:1 over P2P, group over SFU, broadcast over SFU+simulcast cascade or MCU. TURN is mandatory — around 15–20% of corporate and mobile networks force media relay.

Why Fora Soft wrote this playbook

Fora Soft has been shipping WebRTC products for more than a decade and real-time video software for 21 years. Across 625+ delivered products, a large share are WebRTC-backed: video conferencing, telemedicine, live-education, e-commerce live-selling, courtroom archives, webinars, interactive sports, customer-support video, remote inspection, and LiveKit-based multimodal AI agents. We have debugged every architecture under every kind of corporate firewall.

This playbook is the scoping conversation we have with product teams in Week 1 of a WebRTC project. It replaces a lot of generic internet content with sharp, business-first recommendations. For the technical side of the house see our WebRTC architecture development services. For code-level comparisons between P2P, SFU and MCU see our older piece, P2P vs MCU vs SFU for video conferencing.

Picking between P2P, SFU and MCU for 2026?

A 30-minute call with a Fora Soft WebRTC engineer will answer the unit-economics questions that vendor websites never will.

Book a 30-min call → WhatsApp → Email us →

The one question that decides the architecture

Before anything else, answer: what is the 95th-percentile concurrent participants per room? Not the best case, not the demo day, but the routine peak once your product is in real use.

Participants per room (p95) Best baseline Typical products
1:1 P2P + TURN Telehealth, customer support, quick consult, sales demo.
3–8 Small SFU or P2P mesh with 4–participant cap Huddle meetings, tutoring, team standups.
9–50 SFU with simulcast Virtual classrooms, workshops, medical grand rounds.
50–200 SFU + active-speaker pin, server-side recording Conferences, large training, town halls, live commerce.
200–1k broadcast SFU cascade or MCU + HLS/LL-HLS egress Webinars, investor events, keynote streaming.
1k–100k+ broadcast Hybrid: SFU origin + LL-HLS CDN Interactive sports, creator live-selling, fitness, esports.

Peer-to-peer (P2P): fast, cheap, and bounded

In pure P2P, two browsers negotiate SDP via your signalling server and then exchange DTLS-SRTP media directly. No media server in the middle. Latency on a good network is 50–150 ms end to end. Server cost is near zero — you are paying only for signalling and TURN relay fallback.

Mesh P2P extends this to small groups: every participant sends their stream to every other. For N participants you get N×(N–1) streams. It works for 3–4 participants on decent devices; past 5 it burns mobile batteries and exhausts uplinks.

Reach for P2P when: almost all your sessions are 1:1 or occasional 3-way, you care about end-to-end media privacy, and you want the lowest possible server bill. Telehealth, support chat, sales demos, notary video.

Non-negotiables. A TURN cluster (coturn on 2–3 hosts is enough for MVP; scale to Cloudflare Calls TURN, Twilio, or Xirsys for global). Approximately 15–20% of corporate and mobile networks will fail direct ICE and require TURN relay. Skipping TURN looks cheaper on day one and explodes into support tickets on day 30.

Selective Forwarding Unit (SFU): the 2026 default

An SFU is a media router. Each participant sends their stream once; the SFU forwards it selectively to every other participant who needs it. Critically, the SFU does not decode or re-encode the media. That keeps CPU usage per participant low and latency close to direct-path.

Modern SFUs exploit two bandwidth tricks: simulcast (publisher sends 3 layers at different resolutions/bitrates; SFU forwards whichever layer suits each subscriber) and SVC (scalable video coding) with VP9 or AV1 (single encoded stream with multiple extractable layers). Either gets you bandwidth-aware forwarding for free.

The 2026 SFU shortlist

LiveKit. Open-source Go server plus managed cloud. Clean SDKs on all platforms, first-class AI agent integration, strong simulcast. Default pick for new projects. Our LiveKit multimodal AI guide has implementation detail.

mediasoup. Node.js SFU library. Small, composable, you wire it into your own server process. Used by Discord and many high-volume products. Preferred when you want deep control of routing logic.

Janus Gateway. C-based, plugin architecture, very mature. Great for broadcast-style plug-ins (streaming, sip-gateway, recording).

Jitsi Videobridge. Java SFU behind Jitsi Meet. Simple to operate, good defaults, free community, mature LastN / dominant-speaker logic.

Amazon Chime SDK, Dolby.io, Agora, Daily. Managed SFU as a service. You trade engineering for per-minute cost. Good if you want to ship before month three.

Multipoint Control Unit (MCU): mixer, composer, heavyweight

An MCU decodes every incoming stream, mixes them into a single composite, re-encodes, and sends one stream to each viewer. That one stream is much easier on end-user devices — any old smart TV, phone or set-top can play a single 1080p H.264 feed without a WebRTC client. The cost is the server: decoding and re-encoding N streams in real time is 4–10x the compute of an SFU at the same participant count.

Reach for an MCU when: you need server-side composition (branded layouts, speaker-focus cuts, PiP inlays, broadcast-grade graphics), legal archival with a single evidentiary file per session, SIP/PSTN bridging, or you must deliver to extremely low-power clients that cannot run a WebRTC stack.

Common MCU tooling: Pexip Infinity (enterprise), Kurento (open-source, now mostly maintained as OpenVidu), Wowza Streaming Engine, Dolby.io Interactivity, specialist broadcast mixers. A pure MCU deployment at 100+ participants generally costs more to run than an SFU feeding HLS out to 10,000 viewers.

Hybrid architectures: how real products ship

Serious WebRTC products rarely pick one model. A typical hybrid pattern on a modern meeting or creator-commerce product looks like this:

1. Signalling control plane. One service (your own or LiveKit / Chime) handles auth, rooms, presence.

2. P2P for 1:1. Under a certain room-size threshold (typically 2–3), drop the SFU and connect peers directly. Instantly cheaper, typically lower latency.

3. SFU for groups. Above that threshold, escalate to an SFU with simulcast and dominant-speaker forwarding.

4. SFU cascade + LL-HLS for broadcast. Past a few hundred subscribers, mirror the SFU’s active-speaker feed into an LL-HLS egress on Cloudflare, Mux or AWS MediaLive; watchers join via HLS without touching WebRTC.

5. Optional MCU for composition. A side MCU instance can produce a recording or a branded broadcast feed from the same SFU source.

6. TURN everywhere. Below all of the above.

Need a hybrid architecture that scales clean?

We have shipped LiveKit, mediasoup, Janus, Jitsi, Agora and Chime SDK deployments. Bring your traffic curve; we will design the right topology.

Book a topology call → WhatsApp → Email us →

Cost models: what each architecture does to your bill

The headline unit is $-per-participant-minute. Ranges below are ballpark for commodity cloud (AWS / Hetzner blend) in 2026; adjust for region and volume discounts.

Architecture Server $/min/participant Egress profile Hidden costs
P2P + TURN $0.0001–$0.0008 Near-zero except on TURN relays (15–20% of sessions). TURN egress spikes; no server recording.
Self-hosted SFU $0.0008–$0.003 Predictable, scales with subscribers. Ops burden, autoscaling, region placement.
Managed SFU (LiveKit Cloud, Daily, Chime) $0.003–$0.012 Included in per-min price. Vendor lock-in; recording add-ons.
MCU (self-hosted) $0.008–$0.025 Only output streams; smaller than SFU. CPU/GPU-bound; limited per-host participant count.
SFU + LL-HLS egress $0.0008–$0.003 SFU + CDN egress CDN egress $0.008–$0.025/GB. DRM & manifest packaging overhead.

Cloud host choice has a large effect. Our bias for media-heavy workloads is Hetzner AX-series or dedicated Equinix Metal for steady load, with AWS / GCP bursts for peaks; see AWS vs DigitalOcean vs Hetzner for the TCO breakdown.

Security and compliance by architecture

WebRTC mandates DTLS-SRTP for media: every path is encrypted in transit. The differences show up in metadata exposure, server-side processing, and compliance obligations.

P2P. Media never touches a server. End-to-end confidentiality is the default; subpoena exposure on your infrastructure is minimal. This is why telehealth and legal consult apps often start here.

SFU. The SFU forwards but does not decrypt media as long as you keep it transparent. For stricter privacy (“the server should not be able to see my video under any circumstance”), use Insertable Streams or end-to-end encryption frames (Chrome / Safari support since 2023). Most regulated clients we work with enable this for healthcare and finance workloads.

MCU. By definition, the server decodes and re-encodes. No media-layer E2EE is possible. Compensate with hardened infrastructure, private networking, short-lived tokens, server-side DLP, and audit logs. Still compatible with HIPAA and PCI with the right contracts.

Hybrid. Apply the right model per session. Regulated 1:1 goes P2P with E2EE. Group training goes SFU with E2EE frames. Marketing broadcast goes MCU or CDN with SSO gating.

The AI layer on top — noise suppression, transcription, agents

AI capabilities are increasingly the reason customers pick one WebRTC product over another. Five high-value integrations keep recurring in our 2026 engagements:

Noise suppression. RNNoise, Krisp SDK, Dolby Voice. Run on the publisher where possible; on the SFU where necessary.

Real-time transcription / captions. Deepgram, AssemblyAI, Whisper-large. Tap into the SFU’s audio forwarding with a server-side consumer. Our noise-robust ASR guide covers the accuracy side.

Real-time translation. Strong use case for global sales and events; see our real-time meeting translation comparison.

Synthetic voice agents. LiveKit + OpenAI/ElevenLabs makes a voice agent a first-class participant in the room. Huge in customer support, scheduling, and onboarding.

Recording and smart summarisation. Record the SFU’s composite (or an MCU output), run a post-call LLM to generate action items, push to CRM. Shipping this used to be a quarter of work; with Agent Engineering we now deliver it in 3–4 weeks.

Mobile: the architecture that kills batteries fastest

Mobile clients change the math. P2P mesh with four or more participants will drain a mid-range Android in under an hour; SFU with simulcast plus hardware H.264 typically stretches that to 3–4 hours under continuous use.

Best practices we enforce on every WebRTC mobile app. Hardware-accelerated codecs only (H.264 baseline / H.265 where supported); aggressive simulcast layer selection based on video tile size; screen-off and PiP pause policy; dynamic bitrate cap on low battery / thermal-throttle; WebRTC audio track in “comm” mode; AudioSession category = voiceChat on iOS with AVAudioSessionModeVoiceChat; proper lifecycle handling so tracks actually release on backgrounding.

Mini case — a hybrid platform from 1:1 telehealth to 50k-viewer live

Situation. A European digital-health company came to us with an existing P2P telehealth product and a new product line: live group patient-education sessions with up to 500 concurrent viewers, expected to peak around 50k during campaign weeks. HIPAA-adjacent, GDPR strict. Budget sensitive.

The 14-week plan. Weeks 1–3: architecture design, LiveKit pilot, TURN hardening, Insertable Streams E2EE proof. Weeks 4–7: SFU production rollout with simulcast; P2P retained for 1:1 consults. Weeks 8–10: LL-HLS egress cascade from SFU to CloudFront for big-audience days. Weeks 11–12: live transcription + captions, recording, CRM handoff. Weeks 13–14: load test at 3x projected peak, audit logging, GDPR review.

Outcome. Median publish-to-subscribe latency dropped to 180 ms on LTE; CDN-backed broadcast held sub-4 s for 40k+ concurrent viewers at a marginal cost roughly 1/6 of running a pure SFU at that scale. Existing 1:1 flows kept their P2P costs unchanged. The client’s infra bill per paid participant-minute fell by ~35% versus the managed Agora stack they had been scoping against.

The 2026 WebRTC vendor and tooling landscape

Category Open source Managed
SFU LiveKit, mediasoup, Janus, Jitsi JVB, ion-sfu LiveKit Cloud, Daily, Dolby.io, Chime SDK, Agora.
MCU / composition OpenVidu (ex-Kurento), Jitsi Jibri Pexip, Wowza Engine, Dolby Interactivity.
TURN coturn, pion/turn Cloudflare Calls, Twilio, Xirsys, Vonage.
Recording Jibri, Headless Chrome + FFmpeg, LiveKit egress Dolby, Agora Cloud Recording, Daily recording.
Noise suppression RNNoise Krisp SDK, Dolby Voice.
Observability webrtc-internals, Prometheus, callstats OSS testRTC, Spearline, callstats.io, LiveKit Analytics.

Development effort estimates (Agent-Engineering-adjusted)

Ballpark hours for building each layer from scratch with the tooling above. Non-accelerated teams should plan for 40–60% more. All dollar figures are rough ranges.

Layer Hours Notes
P2P MVP (1:1 web) 120–180 Signalling, peer lifecycle, TURN integration.
SFU integration (LiveKit / mediasoup) 220–360 Rooms, roles, simulcast, active-speaker UI.
Recording + CDN egress 120–200 Server-side composite, LL-HLS fanout.
E2EE frames / insertable streams 80–140 Key management, per-session rotation.
AI layer (ASR, translation, noise) 100–180 Depends on cloud vs on-prem model.
Full hybrid MVP ~640–1,060 4–7 months elapsed with 3–5 engineers.

A decision framework: pick an architecture in five questions

Q1. What is your p95 concurrent participants per room? 1:1 → P2P. 3–50 → SFU. 50+ broadcast → SFU + LL-HLS or MCU.

Q2. Does the server need to decode media? (Composition, legal archive, PSTN bridge, low-power clients.) Yes → MCU in the mix. No → SFU only.

Q3. What is your compliance bar? HIPAA / PCI / GDPR-strict → SFU with E2EE frames or P2P. Commercial only → any model.

Q4. What is your time to market? Under 3 months → managed SFU (LiveKit Cloud, Daily, Chime). 6+ months → self-hosted OSS SFU.

Q5. Will the product have broadcast mode? Yes → plan cascade/egress from day one. No → revisit if the roadmap shifts.

Five pitfalls that kill WebRTC projects

1. Skipping TURN. Corporate firewalls and mobile carrier NATs will sink 15–20% of sessions without it.

2. Picking MCU as a default. MCU sounds “more professional” but it is 4–10x the bill for most meeting use cases. Use SFU unless you genuinely need composition.

3. Ignoring simulcast / SVC. A 50-person meeting without simulcast wastes bandwidth on the viewer with the smallest thumbnail. Turn on simulcast from day one.

4. Single-region SFU. A London-only SFU serving Tokyo publishers is painful. Plan for at least two regions past a few hundred MAU.

5. No QoE instrumentation. You cannot debug what you cannot see. Log bitrate, packet loss, jitter and RTT from day one; push them to a dashboard with alerting.

Paying too much for Agora or Twilio?

We have migrated several clients off managed WebRTC onto self-hosted LiveKit or mediasoup with 40–70% savings. We can tell in an hour whether you are a candidate.

Book a migration call → WhatsApp → Email us →

KPIs: what to measure in a WebRTC product

Quality. Publish-to-subscribe latency < 250 ms p95 regional, < 400 ms p95 cross-region; packet loss < 2% p95; jitter < 30 ms p95; video freeze rate < 0.4%.

Business. Connection success rate > 99.5%; call setup time < 3 s; call drop rate < 0.8%; retention on sessions > 5 min > 80%.

Reliability and cost. TURN-relay rate < 25%; SFU CPU headroom > 30%; $-per-participant-minute within 10% of target for 3 consecutive months.

When WebRTC is the wrong answer

Pure one-way broadcast to 10k+ viewers. Use LL-HLS or HLS from a CDN. WebRTC makes sense only if you must keep the broadcast interactive (chat, reactions, live Q&A with < 1 s latency).

On-demand video. OTT and VOD belong on HLS/DASH; see our OTT platform development playbook.

Hyper-low-latency audio (< 100 ms). Esports team-voice, pro music collab — consider purpose-built protocols (Soundstage, Elk Audio, bespoke RTP with FEC). WebRTC’s typical sub-150 ms is usually enough but not always.

Reach for a managed SFU (LiveKit Cloud, Daily, Chime) when: you want to ship in < 3 months, scale is unproven, and you prefer per-minute pricing over ops burden.

Reach for self-hosted LiveKit or mediasoup when: managed-SFU per-minute cost projects into $30k+/month and you have enough engineering to own the ops.

FAQ

Do I still need TURN if my SFU is in the public cloud?

Yes. TURN is the fallback path when a participant’s network blocks direct UDP/TCP to the SFU. It fires on roughly 15–20% of corporate networks even against a public SFU. Without TURN, those users cannot connect at all.

What is simulcast and why does it matter?

Simulcast is a technique where the publisher sends multiple resolutions (typically 3) of the same video, and the SFU decides per subscriber which one to forward. A 25-person meeting with simulcast sends thumbnails as 180p and the active speaker as 720p, instead of forcing every subscriber to process the full 720p of every publisher.

Can I switch architectures later without a rewrite?

Yes, if you design around abstraction. Keep signalling, room management and UI decoupled from the media transport. Most of the clients we migrate from P2P to SFU or from Agora to LiveKit keep their product UI untouched and rewire only the media plumbing.

Does LiveKit replace an SFU or is it one?

LiveKit IS an SFU. It ships as a self-hosted Go server or as a managed cloud. On top of the SFU it adds SDKs, agents for AI workloads, egress for recording and LL-HLS, plus ops niceties.

Is MCU always more expensive than SFU?

Per participant, almost always yes. MCU wins only when it removes heavier cost elsewhere — e.g., when you replace 10,000 WebRTC subscribers with a single encoded HLS output, the MCU is dramatically cheaper than a 10,000-fan-out SFU.

How does WebRTC compare to WHIP / WHEP?

WHIP and WHEP are HTTP-based ingest and playback profiles that use WebRTC under the hood. They simplify the signalling side for broadcast-style workflows and are increasingly supported by CDN / LL-HLS providers. Use them when the workload is broadcast-shaped and a thin client is preferred.

What about recording HIPAA-restricted sessions?

Record server-side into encrypted storage with customer-managed keys, never into the publisher device. Keep audit logs (who started, who stopped, who accessed). With Insertable Streams you can even record encrypted frames and provide a separate signing service for legitimate access.

Can AI voice agents actually feel natural over WebRTC?

Yes. End-to-end latency under 700 ms — ASR + LLM + TTS over a WebRTC path — is achievable today with LiveKit Agents, OpenAI Realtime, or similar stacks. Our LiveKit multimodal agent guide shows the architecture.

Deep dive

P2P vs MCU vs SFU for Video Conferencing

The code-level companion with protocol details and engineering trade-offs.

AI agents

LiveKit Multimodal AI Agents Guide

Putting voice + vision agents into your WebRTC room, at sub-700 ms latency.

OTT broadcast

OTT Platform Development Playbook

When your audience grows past 10k viewers, the WebRTC → HLS cascade lives here.

Hosting costs

AWS vs DigitalOcean vs Hetzner

Where we host SFUs and why — the raw $/minute numbers that drive the decision.

Ready to ship real-time video that pays back?

Real-time video architecture is a trade-off space, not a menu. P2P is free but bounded. SFU is the default and scales beautifully with simulcast. MCU is the right tool when server-side composition earns its keep. Hybrid is how production products actually ship. The engineering questions are solved; the business question is which combination matches your traffic, your compliance, and your time-to-market.

Fora Soft has been doing this for over a decade. If you want a partner that has already answered your question on a similar project, book 30 minutes and we will tell you the shortest path to a working, scalable WebRTC stack.

Let’s design the right WebRTC stack for your roadmap

Bring your traffic curve, your compliance bar, and your launch date. We will come back with an honest architecture recommendation and a phased estimate.

Book a 30-min call → WhatsApp → Email us →

  • Technologies
    Services
    Development