ASR (Speech-to-text)

Definition

Automatic Speech Recognition — turning spoken audio into written text. The base layer for captions, transcripts, voice commands, and meeting notes.

ASR converts speech into text the software can use. Quality is measured by word error rate, and the hard cases are noise, accents, overlap, and jargon. It is the first step in nearly every audio AI feature in a video product.

Also known as

speech recognition, speech-to-text, STT

Specialist software house for video, real-time and AI products. Founded 2005. 50 in-house engineers.

Knowledge base

Blog Guides Courses Glossary Downloads

Company

Services Projects Demos Calculator Contacts

+852-8193-2621

Hong Kong

+1 (914) 775-5855

New York · USA

eager2develop@forasoft.com

Your message has been sent successfully

We will contact you soon

Message not sent. Please try again.

ASR (Speech-to-text)

Related terms

Whisper

Streaming ASR

WER (Word error rate)

Speaker diarization