This is engineering guidance, not legal advice. Confirm specifics with qualified counsel.
Why this matters
If you are deciding whether to build, buy, or assemble a telemedicine video platform, scaling is where the cost and the risk live. A consultation that freezes during a flu-season surge is not a minor glitch — it is a clinician who cannot see a rash, a patient who gives up and goes to a crowded emergency room, and, if patient data becomes unreachable, a possible failure of a HIPAA security goal. This article is written for founders, product managers, and hospital IT leads who need to ask their engineers and vendors the right questions: how many servers, in which regions, sized for which peak, and what happens when a region fails. You will not need to write code to follow it. You will finish able to read a capacity plan and spot the gap before it becomes an incident.
What "scaling" actually means for video
Most software scales by adding cheap, identical web servers behind a load balancer. Each request is short, stateless, and small. Video breaks all three assumptions. A consultation is a long-lived connection — fifteen minutes, sometimes an hour — that streams millions of bits per second the entire time, and the server has to keep track of who is talking to whom for the whole call. So before any numbers, hold one idea: scaling video is about moving streams of bits between the right places, not about answering more requests.
To make this concrete we need the one piece of plumbing every real-time platform uses. When two or more people are on a clinical video call, their video does not usually travel directly between them. It travels through a media server — most often a Selective Forwarding Unit (SFU), a server that receives each person's video once and forwards it to the others. Think of an SFU as a switchboard operator who takes each caller's voice and patches it through to everyone who needs to hear it, rather than making every caller phone every other caller directly. We cover why an SFU beats the alternatives for clinical use in P2P, SFU, and MCU for clinical use; here we only care that the SFU is the box that carries the load, and therefore the box you have to scale.
One SFU is a single computer. It has a fixed amount of processing power and, more importantly, a fixed amount of network capacity — the pipe through which it can push video out. When that pipe fills, the next consult does not get a slow page; it gets choppy video, frozen frames, or a dropped call. That hard ceiling is why scaling clinical video is its own engineering discipline, not a setting you turn up.
The first ceiling: one server's bandwidth budget
Start with the cost of a single consult, then multiply. A clinical video stream that is good enough for a face-to-face conversation runs around 1.5 megabits per second (Mbps) in each direction. (What "good enough" means clinically — and when a dermatology or wound-care consult needs more — is its own topic, covered in Latency, quality, and the clinical "good enough" bar.)
In a two-person consult, the SFU receives the patient's 1.5 Mbps and the provider's 1.5 Mbps, then forwards each one to the other side. So the server sends out about 3 Mbps for every consult in progress. Let us write the arithmetic out loud, because the result surprises people:
Outbound per consult = 2 participants × 1.5 Mbps = 3 Mbps
Server pipe (example) = 10 Gbps = 10,000 Mbps
Theoretical max = 10,000 ÷ 3 ≈ 3,300 concurrent consults
Practical max (~50%) ≈ 1,600 concurrent consults
The theoretical number assumes you can fill the pipe perfectly, which you never can — you leave headroom for traffic bursts, retransmissions, processing limits, and the simple rule that a server running at 95% capacity is a server about to fail. A practical planning figure is half the theoretical ceiling. So one well-provisioned media server handles on the order of 1,000–1,600 concurrent two-person consults — not ten thousand. The moment your busiest minute needs more than one server's worth of streams, you have crossed into horizontal scaling, and most growing telemedicine products cross it sooner than they expect.
Horizontal scaling: many servers, one session router
Horizontal scaling means running many SFUs and sending each new consult to one that has room. Two patterns do almost all the work.
The simple and most common pattern keeps every single consult inside one SFU. A patient and a provider join the same server; a different consult lands on a different server. A piece of software called a session router (or signaling service) decides, at the moment a call starts, which SFU has spare capacity and assigns the call there. Because a two-person or small-group consult fits comfortably on one machine, you almost never need anything fancier for one-to-one telemedicine. You scale by adding servers and letting the router spread calls across them. This is the bread-and-butter design, and it is enough for the overwhelming majority of clinical workloads.
The second pattern, SFU cascading, links multiple SFUs together to serve one very large session. The first server receives the speaker's video and forwards it to other SFUs, each of which fans it out to its own group of viewers. Cascading is how you run a thousand-person grand-rounds lecture or a hospital all-hands, where one origin server would never have enough outbound pipe to reach everyone. A single SFU can host roughly 50–200 simultaneous meetings depending on how many people are in each; beyond that, or beyond a few hundred viewers in one room, you cascade. Most telemedicine platforms need cascading only for their occasional large educational or care-coordination events, not for routine visits. The mechanics live in the Video Streaming section's WebRTC at scale article; here the point is to know which pattern your workload actually needs, so you do not pay for cascading you will rarely use.
Figure 1. Two scaling patterns. Left: a session router spreads independent consults across many SFUs — the design for routine telemedicine. Right: SFU cascading links servers to serve one very large room.
The second ceiling: distance and the speed of light
A server with spare bandwidth in the wrong place still gives a bad consult. Real-time conversation depends on latency — the delay between something happening on one camera and appearing on the other screen, measured in milliseconds (ms). Above a certain delay, people start talking over each other, the natural rhythm of a clinical interview breaks, and a tele-stroke neurologist cannot reliably judge timing on an exam.
The limit is physical. Data on the internet travels at a meaningful fraction of the speed of light through fiber, but every thousand kilometres still adds delay, and each network hop adds more. The widely used engineering reference point, ITU-T Recommendation G.114, treats one-way delays up to about 150 ms as acceptable for interactive conversation. Route a patient in Sydney through a media server in Virginia and back, and the round trip alone can blow past that budget before you have added any processing.
The fix is regional media routing: run media servers in several geographic regions and connect each patient to the nearest one. A patient in Germany lands on a Frankfurt SFU; a patient in California lands on one in Oregon. The session router's job grows from "which server has room" to "which nearby server has room." Regional routing is what keeps the conversation natural, and — as the next section explains — it is also where a compliance question hides.
Figure 2. Regional routing. Each patient connects to the nearest media server to stay inside the conversational latency budget; a single far-away server breaks it.
Where regions meet compliance: data residency
The instant you run servers in more than one country, a patient's video and audio — which is Protected Health Information, PHI, meaning any health data tied to an identifiable person — may be processed in a different country than the patient sits in. That raises data residency: the rules about which country health data may live or move through.
Here is the part teams get wrong in both directions. U.S. HIPAA does not require PHI to stay inside the United States. The Security Rule is about safeguards, not geography: encrypt the data, control access, log it, sign the right contracts, and you may process PHI abroad. The contract that lets a vendor handle PHI on your behalf is a Business Associate Agreement (BAA), and HHS guidance is blunt that a provider handling encrypted ePHI is still a business associate that must sign one, even if it cannot read the data. So adding a Frankfurt region does not, by itself, break HIPAA.
What can break is the law on the other end. The European Union's General Data Protection Regulation (GDPR) treats health data as a special category and restricts moving it out of the EU; some countries require certain health data to stay on home soil. So the rule of thumb runs the opposite way from most people's instinct: HIPAA usually lets PHI leave the U.S.; foreign law often will not let it leave the patient's own country. That makes regional routing a feature, not just a latency trick — keeping an EU patient's media on an EU server can be how you satisfy EU law. The regulatory detail belongs to GDPR, PIPEDA, and global health-data law and the security angle to data residency and where PHI lives; the engineering takeaway is that your region map and your legal map must be drawn together. And every hop a stream takes across the boundary between systems must be encrypted in transit, the subject of encryption in transit, at rest, and end-to-end.
Common mistake: "We'll add EU users to our US cluster for now and fix regions later." The latency is bad and the GDPR transfer is worse. Decide the region map before you onboard a single patient in a new country, because a residency mistake is a reportable problem, not a refactor.
The third ceiling: surge
Clinical demand is not flat. It has a predictable daily shape and an unpredictable seasonal one, and you must size for both.
The predictable shape is clinic hours. The unpredictable one is a surge: a bad flu season, a regional outbreak, or a public-health event that sends demand vertical for weeks. The 2025–26 U.S. influenza season was classified by the CDC as moderately severe, with hospitalization rates among the highest in years at their December peak — exactly the conditions under which patients reach for telehealth instead of a packed waiting room. A platform sized for an ordinary Tuesday will drop calls on the worst Monday of January.
Let us size a real example, arithmetic shown.
Target volume = 10,000 consults per day
Clinic window = 8 hours = 480 minutes
Average consult = 15 minutes
Concurrency (avg) = (10,000 × 15) ÷ 480 ≈ 312 consults at once
Outbound bandwidth = 312 × 3 Mbps ≈ 936 Mbps (~1 Gbps at the average minute)
That average already needs serious capacity — but the average is a trap. Real clinic traffic clusters: a midday peak can run two to three times the daily average. Apply a 2.5× peak factor:
Peak concurrency = 312 × 2.5 ≈ 780 consults at once
Peak bandwidth = 780 × 3 Mbps ≈ 2,340 Mbps (~2.3 Gbps)
And now layer a flu surge that lifts demand another 50%:
Surge peak = 780 × 1.5 ≈ 1,170 consults at once
Surge bandwidth = 1,170 × 3 Mbps ≈ 3,510 Mbps (~3.5 Gbps)
The platform's busiest realistic minute needs roughly 3.5 Gbps of outbound media capacity and over a thousand concurrent consults — more than three times the comfortable average. You have two honest ways to be ready for it, and one dishonest one.
The dishonest way is to provision for the average and hope. It saves money on calm days and fails the patients who need you most on the worst day.
The first honest way is static over-provisioning: keep enough servers running to cover the surge peak at all times. It is simple and always ready, but you pay for idle capacity most of the year.
The second honest way is autoscaling: run a baseline of servers and automatically add more when load climbs, removing them when it falls. It is cost-efficient, but it has a catch unique to video. A new web server is useful the second it boots; a new media server is useful only once patients are routed to it, and a consult already in trouble cannot wait the minutes a fresh server needs to come online. So clinical autoscaling must be predictive, with warm headroom — scale up ahead of the morning rush and the flu curve, keep a buffer of spare servers always ready, and never let utilization climb so high that a single failure cascades. The pattern that survives a surge is "autoscale early, keep a cushion," not "autoscale exactly to demand."
Figure 3. Sizing for surge. Average demand is a trap; capacity must clear the daily peak and the seasonal surge, with a headroom band above the worst realistic minute.
Availability is a HIPAA goal, not just an SLA
Here is the point that turns scaling from an operations concern into a compliance one. People summarize HIPAA as "keep patient data private." The Security Rule's actual goal, in 45 CFR 164.306(a), is to protect the confidentiality, integrity, and availability of electronic PHI. Availability — the data being there when an authorized clinician needs it — is a named security objective with equal standing to privacy.
Two requirements make this operational. First, the contingency-plan standard, 45 CFR 164.308(a)(7), requires a plan for responding to an emergency that damages systems holding ePHI, and it names three required pieces: a data backup plan, a disaster recovery plan, and an emergency mode operation plan for keeping critical processes running during a crisis. Second, the technical safeguards at 45 CFR 164.312(a)(2)(ii) require an emergency access procedure so authorized people can still reach needed ePHI in an emergency. A platform that collapses under a surge, or that loses a whole region with no failover, is not merely missing a service-level target — it is straining against the availability goal the Security Rule sets out.
This reframes redundancy. Running media servers in more than one region, and being able to shift a region's traffic elsewhere when it fails, is both the latency design from earlier and part of your contingency posture. The same multi-region footprint that keeps a Sydney consult crisp keeps the service alive when one data center goes dark.
There is regulatory motion here worth watching. The HIPAA Security Rule update proposed by HHS (the Notice of Proposed Rulemaking published at 90 FR 800 on January 6, 2025, RIN 0945-AA22) would tighten exactly this area — among other changes it would make today's "addressable" specifications required and push contingency and resilience controls harder. As of mid-2026 it remains proposed, not final — HHS is still working through thousands of public comments — so treat it as direction, not law, and confirm status before you cite a deadline. We track its video-specific implications in the 2026 HIPAA Security Rule for video.
Graceful degradation: scaling's safety valve
No capacity plan survives every spike. The platforms that stay trusted are the ones that fail gracefully instead of all at once. When a region nears its ceiling, a well-built system does not drop calls at random — it sheds the least clinical load first. It can drop a participant's video resolution before dropping the call, fall back to audio-only when a network or a server is saturated (audio is a fraction of video's bandwidth and carries most of a conversation's clinical value), and route new consults to a neighbouring region rather than refuse them. These reliability behaviours are the spine of a clinical product, and they connect directly to connection reliability and graceful degradation. The design goal is simple to state: under overload, a patient should lose quality, never lose care.
Cost control: egress is the bill
Scaling decisions are also cost decisions, and one line dominates the bill: egress, the price cloud providers charge to send data out of their network. Because a media server's whole job is sending video out, a busy SFU is an egress machine. Run the earlier surge peak — 3.5 Gbps sustained at the busy hour — and the monthly data transfer is large enough that egress, not server rental, becomes the biggest single number in the budget. Three levers move it: keep media regional so you pay less for long-haul transfer and cross-region traffic; right-size video quality to the clinical need so you are not shipping 1080p where 480p is diagnostic; and choose between self-hosting and a managed video provider with open eyes, since a provider's per-minute price already bakes in their egress. That build-versus-buy decision, including the BAA column that gates every vendor, lives in choosing the video layer: build vs buy, and the whole-platform economics in the telemedicine cost model.
A capacity comparison at three sizes
The architecture you need is a function of scale. The table below sketches three honest tiers; the BAA column is a reminder that any managed media vendor handling PHI must have a signed Business Associate Agreement before it carries a single consult.
| Scale | Concurrency at peak | Architecture | Regions | Surge strategy | BAA needed? |
|---|---|---|---|---|---|
| Pilot (≤100/day) | ~10 consults | One SFU, one region | 1 | Manual headroom | Yes, if managed vendor |
| Growth (≤2,000/day) | ~150 consults | Several SFUs + session router | 1–2 | Predictive autoscale | Yes, if managed vendor |
| Scale (10,000+/day) | ~1,000+ consults | SFU pool per region, multi-region failover | 2–4 | Warm headroom + autoscale + degradation | Yes, if managed vendor |
The progression is the lesson: you do not jump to a multi-region pool on day one, but you design the session router and the quality controls early, because retrofitting them under a flu surge is the worst possible time to learn this.
Where Fora Soft fits in
Fora Soft has built real-time video on WebRTC since 2005 across video conferencing, streaming, e-learning, surveillance, and telemedicine, and the scaling discipline is shared across all of them — the difference in healthcare is that availability is a compliance requirement, not only a service goal. We design clinical video so the region map satisfies both the latency budget and data-residency law, so the SFU pool and session router scale horizontally with predictive headroom for seasonal surges, and so the system degrades to audio before it ever degrades to a dropped consult. The compliance-first framing comes first: we make availability part of the contingency posture under the HIPAA Security Rule, then make it fast.
What to read next
- P2P, SFU, and MCU for clinical use
- Connection reliability and graceful degradation
- Telemedicine video reference architecture: the full picture
Call to action
- Talk to a telemedicine engineer — book a 30-minute scoping call to talk through your scaling webrtc telehealth plan.
- See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
- Download the Telemedicine Video Scaling & Surge Capacity Checklist — One page: the capacity-planning math, horizontal SFU and session-router checks, regional routing and data-residency questions, surge and autoscaling headroom, and the HIPAA availability/contingency controls — to pressure-test your own….
References
- HHS, HIPAA Security Rule — General Rules, 45 CFR §164.306(a) (confidentiality, integrity, and availability of ePHI). eCFR, current 2026. https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-C/part-164/subpart-C/section-164.306 — Tier 1 (primary rule).
- HHS, Administrative safeguards — Contingency plan, 45 CFR §164.308(a)(7) (data backup, disaster recovery, emergency mode operation). eCFR, current 2026. https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-C/part-164/subpart-C/section-164.308 — Tier 1 (primary rule).
- HHS, Technical safeguards — Emergency access procedure, 45 CFR §164.312(a)(2)(ii). eCFR, current 2026. https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-C/part-164/subpart-C/section-164.312 — Tier 1 (primary rule).
- HHS/OCR, HIPAA Security Rule To Strengthen the Cybersecurity of Electronic Protected Health Information (NPRM), 90 FR 800, Jan. 6, 2025, RIN 0945-AA22 (proposed; comment period closed Mar. 7, 2025; not final as of mid-2026). https://www.federalregister.gov/documents/2025/01/06/2024-30983/hipaa-security-rule-to-strengthen-the-cybersecurity-of-electronic-protected-health-information — Tier 1 (primary rulemaking).
- HHS/OCR, Guidance on HIPAA & Cloud Computing (a service provider maintaining encrypted ePHI is a business associate requiring a BAA). https://www.hhs.gov/hipaa/for-professionals/special-topics/health-information-technology/cloud-computing/index.html — Tier 1/2 (agency guidance).
- HHS/OCR, Business associate contracts, 45 CFR §164.314(a)(2)(i) (required BAA terms). eCFR, current 2026. https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-C/part-164/subpart-C/section-164.314 — Tier 1 (primary rule).
- ITU-T Recommendation G.114, One-way transmission time (≤150 ms one-way acceptable for interactive conversation). International Telecommunication Union, 2003. https://www.itu.int/rec/T-REC-G.114 — Tier 1 (standard).
- CDC, Weekly U.S. Influenza Surveillance Report (FluView), 2025–26 season (season classified moderately severe at December peak). https://www.cdc.gov/fluview/ — Tier 1 (agency surveillance data).
- B. Grozev (Jitsi/8x8), Improving Scale and Media Quality with Cascading SFUs, webrtcHacks. https://webrtchacks.com/sfu-cascading/ — Tier 3 (first-party engineering; cascading mechanics). Used only for SFU-cascading mechanics; all compliance claims rest on the primary rules above, which override any vendor framing.
Where sources disagreed, the official rule won: popular summaries that reduce HIPAA to "privacy" were overridden by 45 CFR 164.306(a), which names availability as an equal security goal; and vendor framing that implies PHI must stay in the U.S. was overridden by the Security Rule's safeguards-not-geography standard and HHS cloud guidance.


