ASR converts speech into text the software can use. Quality is measured by word error rate, and the hard cases are noise, accents, overlap, and jargon. It is the first step in nearly every audio AI feature in a video product.
Definition
Automatic Speech Recognition — turning spoken audio into written text. The base layer for captions, transcripts, voice commands, and meeting notes.
ASR converts speech into text the software can use. Quality is measured by word error rate, and the hard cases are noise, accents, overlap, and jargon. It is the first step in nearly every audio AI feature in a video product.
Also known as
speech recognition, speech-to-text, STT