A large language model (LLM) generates and transforms language, and it is the engine behind a broad set of telemedicine features: AI scribes, visit summaries, medical-coding suggestions, drafts of patient messages, and conversational symptom intake. Rather than a single feature, think of it as a general capability that several product surfaces draw on.
In healthcare deployments, three constraints dominate everything else. First, protected health information (PHI) must stay inside processing covered by a BAA, and that includes calls to a vendor's hosted model API — sending PHI to an LLM endpoint with no BAA is a HIPAA problem, not a technicality. Second, outputs must be grounded in real source data and reviewed by a human, because the characteristic failure of LLMs is the fluent, confident error: text that reads correctly and is wrong. Third, the intended use must stay on the support side of the FDA line — an LLM drafting a note is decision support, while an LLM presented as diagnosing a patient drifts toward Software as a Medical Device (SaMD) scope.
On top of those, latency and cost per token are real product constraints: an LLM that produces excellent summaries too slowly or too expensively to use at the volume of a clinic is not viable in practice. The common mistake is treating an LLM as a drop-in answer machine and shipping its output directly to clinicians or patients; without grounding, human review, and a compliant processing boundary, you are shipping confident errors into a clinical setting.

