Streaming ASR

Definition

Transcribing speech as it is spoken, word by word, instead of waiting for the audio to finish. Required for live captions and voice agents.

Streaming ASR emits partial text continuously with very low delay, then refines it as more audio arrives. It is harder than batch transcription because the model must commit early, and it is the backbone of real-time captioning and conversational AI.

Also known as

real-time ASR, online ASR