This is engineering guidance, not legal advice. Confirm specifics with qualified counsel.
Why this matters
Clinical AI is no longer exotic; by 2026 most telemedicine teams ship at least one feature that runs patient data through a model, and the regulators have caught up. The mistakes that sink these features are rarely about the model's accuracy — they are about a missing contract, patient data sent to an endpoint that was never allowed to see it, an "assistant" that quietly crossed into making diagnoses, or a model that works for most patients and fails the few who look least like the training data. Each of those is a compliance violation, a safety incident, or both, and each is avoidable with a checklist applied before you build. This article is for the founder, product manager, or engineer who has to decide whether an AI feature can ship — written in plain language so you can run any feature through the same four gates and know, before you write code, where the regulatory and safety lines sit.
How to read this article
This is the spine of our AI section. The other articles — the ambient AI scribe, AI triage, remote-monitoring anomaly detection — each apply these gates to one feature. Here we define the gates once, in full, so the rest can point back. If you read only one article in the section before approving an AI feature, read this one.
A quick note on scope. This article is about the clinical wiring and the compliance of AI — the contract, the data path, the regulatory line, the human check. It is not about how the models work inside. The architecture of speech recognition, summarization, and anomaly models lives in our AI for Video Engineering section; the general HIPAA foundations live in Block 2. We link out rather than repeat.
Figure 1. The four gates. Every clinical AI feature passes through the same checks in order: a signed contract for the data, a guarded boundary the data stays inside, the FDA line between support and diagnosis, and a human-plus-validation layer that keeps it safe in production.
First, the words
Two terms carry this article, so define them before using them.
Protected Health Information, or PHI, is any health information that can be tied to an identifiable person — a name with a diagnosis, a phone number with a lab result, a face on a video with a complaint. PHI is the unit of risk in everything that follows. The whole question "is this AI feature compliant?" reduces to "what happens to the PHI?"
Clinical AI here means any software feature in your product that uses a machine-learning model and touches the clinical encounter: drafting a note, transcribing a consult, translating between patient and clinician, taking a symptom history, summarizing a visit, or flagging a reading. The model might be one you host, or an application programming interface — an API, a remote service your code sends data to and gets an answer back from — run by a vendor. Where the model runs turns out to matter enormously, and gate two is where we deal with it.
With those two terms in hand, the four gates follow.
Gate 1 — The contract: a model that sees PHI is a business associate
Start with the gate that teams skip most often because it feels like paperwork rather than engineering.
Under the U.S. health-privacy law known as HIPAA (the Health Insurance Portability and Accountability Act), any outside party that handles PHI on your behalf is a business associate — a vendor entrusted with patient data — and it may not touch that data until you have signed a Business Associate Agreement, or BAA: the contract in which the vendor promises to guard the data and accept its share of HIPAA liability (45 CFR §160.103 and §164.502(e)).¹ A model provider that receives your patients' words, vitals, or histories is a business associate exactly like a cloud host or a billing service. No BAA, no data — no matter how good the model.
Think of the BAA as the signed promise every contractor makes before getting a key to the building. The lock does not care how skilled the contractor is; without the signature, they do not get a key.
This single rule disqualifies the AI endpoint most teams reach for first: the free, public, consumer version of a chatbot or speech service. Those ship no BAA and are explicitly not built for PHI, so sending a patient's transcript to one is a breach the moment you press send. The enterprise tiers of the major model and cloud providers are offered under a BAA — but three details decide whether you are actually covered:
- The BAA must name the exact service and tier you use. "The vendor offers a BAA" is not the same as "the BAA covers this endpoint." Vendors routinely cover the enterprise API and not the consumer one.
- The agreement must include a no-training clause: an explicit promise that your patients' data will never be used to train or improve the vendor's future models. Without it, PHI can leak into a model that later answers someone else's prompt.
- The BAA flows down to every sub-processor in the chain. If the model vendor passes data to its own cloud host, that host needs to be covered too.
The contract gate is binary, per vendor, per service. A feature either has a signed BAA covering its exact data path or it does not — and if it does not, nothing else about the feature matters yet. The general mechanics of BAAs, including how to read one, live in our Business Associate Agreement article.
Gate 2 — The boundary: keep PHI inside, send only what is needed, de-identify the rest
The contract says who is allowed to hold PHI. The boundary is the engineering job of making sure PHI only ever goes to those parties — and as little of it as possible.
HIPAA's minimum necessary standard requires you to use or disclose only the PHI actually needed for the task (45 CFR §164.502(b)).² Applied to AI, that means a model gets the fields it needs to do its job, not the whole patient record because the whole record was easy to pass. A scribe drafting today's note does not need the patient's full ten-year history; a translation model needs the spoken text, not the patient's address. Trimming the payload shrinks the blast radius if anything goes wrong.
Where the model runs decides how hard the boundary is to hold. There are three honest options, and a fourth that is a trap:
- Self-hosted open model on a BAA-able cloud. You run an open-weight model on infrastructure you control, under the cloud provider's BAA. PHI never leaves your perimeter. Highest control, highest operational cost.
- Enterprise model API under a BAA, with a no-training clause. PHI leaves your servers but only to a contracted business associate that is barred from training on it. The standard, defensible middle path.
- De-identified data to any endpoint. If you strip the data of identifiers first, it is no longer PHI and HIPAA's restrictions fall away — useful for analytics and model evaluation outside the live clinical loop.
- The trap: a free consumer endpoint. No BAA, not built for PHI, and often training on whatever you send. This is the boundary breach in gate one wearing an engineering disguise.
De-identification deserves its own line because it is the legitimate escape hatch. HIPAA gives two methods (45 CFR §164.514(b)): Safe Harbor, which requires removing eighteen specified categories of identifier — names, geographic detail smaller than a state, all dates more specific than a year, phone and record numbers, faces, and the rest — and Expert Determination, in which a qualified statistician certifies the re-identification risk is very small.³ Get either right and the data leaves the boundary lawfully; the full method is the subject of our de-identification and analytics article. The common, expensive error is "we removed the name, so it's anonymous" — Safe Harbor lists eighteen categories for a reason, and a date of birth plus a ZIP code can re-identify a person without a name anywhere in sight.
Figure 2. The PHI boundary. Inside the contracted perimeter, a self-hosted model or an enterprise API under a BAA may process PHI; de-identified data may leave for analytics. A free consumer endpoint sits outside the boundary and must never receive a patient's data.
Gate 3 — The FDA line: support versus diagnosis
Gates one and two keep the data lawful. Gate three asks a different question: is your feature still software, or has it become a regulated medical device? Cross this line and the entire regulatory regime changes — from a feature you can ship after review to a product that may need FDA authorization before it reaches a patient.
The U.S. Food and Drug Administration regulates Software as a Medical Device — software intended to diagnose, treat, cure, mitigate, or prevent a disease (the device definition in the Food, Drug, and Cosmetic Act, §201(h)).⁴ The deciding factor is the software's intended use and its claims, not the cleverness of the model. A tool that organizes information for a clinician is on one side of the line; a tool that tells anyone what disease they have, or what to do about it, is on the other.
For decision-support features specifically, the FDA's Clinical Decision Support guidance — issued in final form on January 6, 2026 (docket FDA-2017-D-6569) — draws the line with four tests.⁵ Software stays on the non-device side only when all four are true:
- It does not analyze a continuous signal from a medical device (an imaging stream or a live monitor).
- It displays or analyzes existing medical information — labs, notes, guidelines — rather than generating a new clinical conclusion from raw signal.
- It offers recommendations to a health care professional rather than directing care automatically.
- The professional can independently review the basis for the recommendation — the software explains why, so the clinician is not forced to rely on it.
Read those as an AND, not an OR. Fail any one and the carve-out is gone. Two failures dominate telemedicine. First, the recipient: the carve-out is for software that speaks to a clinician who can second-guess it. A feature that delivers a recommendation straight to a patient or caregiver — a symptom checker that tells the user what they probably have — is generally a device, because the patient cannot independently review the medical basis. Second, the black box: a model that outputs a recommendation it cannot explain fails test four even when a clinician is the reader, because "the AI said so" is not an independently reviewable basis.
So the practical test for a product team: does the feature organize information for a clinician who can check it, or does it reach a clinical conclusion someone is expected to act on? Walk the same model through both. "Here is the patient's blood-pressure trend and the relevant guideline" is support. "This patient has hypertensive crisis; start treatment" is device territory. The triage case is worked in full in our AI triage article.
Figure 3. When clinical AI becomes a regulated device. Making a disease diagnosis or directive, speaking directly to a patient, or producing a recommendation a clinician cannot independently review each move the feature out of the support carve-out and toward FDA regulation as a medical device.
If your feature is a device, that is a decision, not a dead end — but plan for it. AI devices change as their models retrain, and the FDA's Predetermined Change Control Plan guidance (final, December 2024) lets a manufacturer pre-authorize a defined envelope of future model updates in the original submission, so routine retraining does not require a new clearance each time.⁶ Building a device-grade AI feature means a regulatory pathway, a quality system, and a change-control plan — work that belongs in the plan from day one, not discovered after launch.
Gate 4 — The human, the validation, and the bias test
A feature can clear the first three gates and still be unsafe. Gate four is what keeps it safe once real patients flow through it, and it has three parts: keep a human in the loop, validate the model honestly, and test it for bias.
Keep a human in the loop. For every feature that stays on the support side of the FDA line, a qualified person reviews and owns the output before it affects care. The scribe drafts; the clinician signs. The triage tool suggests a queue; a clinician can override. This is not only safer — as gate three showed, a reviewable human checkpoint is part of what keeps the feature out of device territory in the first place. The safe default is escalate, never silently act: when the model is unsure, it surfaces the case to a person rather than making a quiet decision.
Validate honestly — and beware the average. A single headline accuracy number hides the failures that matter. Here is the arithmetic every product owner should run before trusting a model. Suppose a model is reported at 94% accuracy, measured on 9,000 patients. Split that by subgroup: 8,000 patients from the majority group score 96%, while 1,000 from an under-represented group score 79%. The blended figure is (8,000 × 0.96 + 1,000 × 0.79) ÷ 9,000 = (7,680 + 790) ÷ 9,000 ≈ 94.1% — exactly the number on the slide. The "94%" is true and useless: it averages a strong result over the many with a dangerous one over the few. Validation that does not break performance out by patient subgroup is hiding its worst behavior behind its best.
Test for bias, because it is now a legal duty. Under Section 1557 of the Affordable Care Act, a covered entity must identify and mitigate the risk of discrimination when it uses a patient care decision support tool — a category the 2024 final rule defines broadly to include AI and machine-learning models — with the core prohibition effective May 1, 2025 (45 CFR §92.210).⁷ The textbook example is pulse oximetry, which reads blood-oxygen levels too high on darker skin and has caused real low-oxygen events to be missed; any model trained on a biased signal inherits the bias and fails exactly the patients already underserved. The duty — and the engineering practice — is to measure performance across subgroups, document what you find, and close the gaps, not to assume the model is fair because the math is.
Two more obligations round out this gate. Transparency to the people who rely on the output: the federal health-IT rules now require certified decision-support technology to disclose, in plain language, how a predictive model was built and validated — its training data, its validation, its performance, and the steps taken for fairness (the ONC HTI-1 rule's Predictive Decision Support criterion, 45 CFR §170.315(b)(11)).⁸ Even where that specific certification does not apply to you, its list is the right disclosure checklist, and patients should be told plainly when AI is involved in their care. And governance: a repeatable way to identify, measure, and manage AI risk across the model's life, for which the NIST AI Risk Management Framework (AI RMF 1.0, 2023, with its Generative AI Profile, 2024) is the standard U.S. vocabulary — its four functions, Govern, Map, Measure, and Manage, give the gate-four work a structure auditors recognize.⁹
Putting the gates together: the deployment options
The four gates collapse into one practical question when you choose how to run a model: which deployment keeps PHI lawful, stays on the support side, and can be validated? The table below is the decision in one view. Read the BAA column first — without a signed agreement covering the exact service, nothing else on the row can save the feature.
Figure 4. Clinical AI deployment options compared. The free consumer endpoint fails at the first column; the three workable paths differ in control, cost, and operational burden, but each keeps PHI inside a contracted boundary.
A common, expensive mistake
The signature failure in clinical AI is the "helpful" shortcut that breaks two gates at once: a team wires a patient-facing feature to a free, public AI endpoint because it is fast to build and gives a slick answer. In one move they have sent PHI to a vendor with no BAA that may train on it (gate one and two breached) and shipped a tool that tells a patient what is wrong with them (gate three crossed). The feature that was meant to delight users has become a reportable breach and an unauthorized medical device.
The quieter cousin is the un-validated "assistant" that clears the contract and boundary gates but ships on a single average accuracy number, with no subgroup testing and no human checkpoint. It demos beautifully and fails the patients it was never tested on, months later, with no one in the loop to catch it. Run every feature through all four gates — contract, boundary, FDA line, human-and-validation — and neither mistake survives the checklist.
Where Fora Soft fits in
The requirement comes first: a clinical AI feature must hold every patient record inside a BAA-covered boundary, send a model only the minimum it needs, stay on the support side of the FDA's device line unless it is built and cleared as a device, and ship with a human checkpoint, honest subgroup validation, and a bias test. Fora Soft has built real-time video, conferencing, streaming, and AI-enabled software since 2005, including telemedicine platforms where an AI feature, the PHI boundary around it, and the clinician's review step are one connected workflow. We wire AI in behind these four gates — a contracted, no-training model inside the boundary, a clear support-not-diagnosis posture, and a clinician who owns the output — rather than bolting a model onto a product and hoping the compliance follows. The model engineering itself, when a feature needs a custom model, is the subject of the AI for Video Engineering section.
A note for teams shipping outside the United States
The four gates are framed on U.S. law because that is where most telemedicine products start, but the structure travels. In the European Union, health data is a special category under the GDPR, and the EU AI Act classifies AI that is a medical device, or a safety component of one, as high-risk — layering conformity assessment, risk management, and transparency duties on top of the existing Medical Device Regulation; the obligations for AI embedded in medical devices apply from August 2028 under the current timeline.¹⁰ The gates are the same shape — a lawful basis for the data, a guarded boundary, a regulatory line for clinical claims, and human oversight — even where the named rules differ. Confirm the specifics for each market with local counsel, and see our global health-data law article for the non-U.S. picture.
What to read next
- Where AI fits in a telemedicine product — the map
- Ambient clinical documentation: the AI scribe
- Building vs buying AI features, and the cost
Download the Clinical-AI Compliance & Safety Checklist (PDF)
Call to action
- Talk to a telemedicine engineer — book a 30-minute scoping call to talk through your clinical ai compliance plan.
- See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
- Download the Clinical-AI Compliance & Safety Checklist — One page: run any telemedicine AI feature through four gates — the BAA and no-training clause for the model, the PHI boundary and de-identification, the FDA support-vs-diagnosis line, and human-in-the-loop, validation, and bias testing….
References
- HIPAA Privacy Rule — Business Associates (45 CFR §160.103, §164.502(e)) — U.S. Department of Health and Human Services. Tier 1. Any party that handles PHI on a covered entity's behalf is a business associate and needs a signed BAA before it receives any PHI; the agreement flows down to sub-processors.
- Minimum Necessary Requirement (45 CFR §164.502(b), §164.514(d)) — U.S. Department of Health and Human Services. Tier 1. Covered entities must limit PHI use and disclosure to the minimum necessary for the task — applied to AI, send a model only the fields it needs.
- Guidance on De-identification of Protected Health Information (45 CFR §164.514(b)) — U.S. Department of Health and Human Services (OCR). Tier 1. De-identification by Safe Harbor (remove 18 identifier categories) or Expert Determination removes data from HIPAA's restrictions; removing only the name is not de-identification.
- Software as a Medical Device (SaMD) and the device definition (FD&C Act §201(h)) — U.S. Food and Drug Administration. Tier 1. Software intended to diagnose, treat, cure, mitigate, or prevent a disease meets the device definition; intended use and claims, not the algorithm, decide device status.
- Clinical Decision Support Software — Guidance for Industry and FDA Staff (Final, January 2026) — U.S. Food and Drug Administration, docket FDA-2017-D-6569. Tier 1. The non-device CDS carve-out requires all four criteria, including a health-care-professional recipient who can independently review the basis; patient-facing or unexplainable recommendations are generally devices. Time-sensitive — confirm the current version.
- Marketing Submission Recommendations for a Predetermined Change Control Plan for AI-Enabled Device Software Functions (Final, December 2024) — U.S. Food and Drug Administration. Tier 1. A PCCP lets a manufacturer pre-authorize a defined envelope of future AI model changes in the original submission, avoiding a new marketing submission for each covered update. Time-sensitive.
- Section 1557 — Nondiscrimination; patient care decision support tools (45 CFR §92.210) — HHS (eCFR), 89 FR 37692, May 6, 2024. Tier 1. A covered entity must identify and mitigate the risk of discrimination from patient care decision support tools, including AI; the core prohibition is effective May 1, 2025. Time-sensitive — confirm provisions remain in force.
- Health Data, Technology, and Interoperability (HTI-1) Final Rule — Decision Support Interventions (45 CFR §170.315(b)(11)) — Assistant Secretary for Technology Policy / ONC. Tier 1. Certified decision-support technology must disclose source attributes for predictive models in plain language — training data, validation, performance, and fairness measures.
- AI Risk Management Framework (AI RMF 1.0) and the Generative AI Profile (NIST AI 600-1) — U.S. National Institute of Standards and Technology, January 2023 (GenAI Profile July 2024). Tier 1. The framework's four functions — Govern, Map, Measure, Manage — are the standard U.S. vocabulary for identifying, measuring, and managing AI risk across the model lifecycle.
- EU AI Act (Regulation (EU) 2024/1689) — high-risk AI systems and medical devices — European Union. Tier 2. AI that is a medical device or a safety component of one is classified high-risk, layering conformity assessment, risk management, and transparency duties on top of the Medical Device Regulation; obligations for embedded medical-device AI apply from August 2028 under the current timeline. Time-sensitive — confirm the deadline after the Digital Omnibus amendments.


