Streaming ASR emits partial text continuously with very low delay, then refines it as more audio arrives. It is harder than batch transcription because the model must commit early, and it is the backbone of real-time captioning and conversational AI.