Opus: the open codec that ate WebRTC

Why this matters

If your product has a "join call" button — video conferencing, telemedicine, an e-learning classroom, an in-game voice channel — the audio leaving your users' microphones is, in practice, Opus. The web standard that powers browser calling, WebRTC, requires every browser to support Opus, so it is the default and you rarely get a choice. This article is written for a product manager, founder, or operations lead with no audio background: by the end you will understand why Opus won real-time audio, what its key settings (FEC, DTX, bitrate, frame size) actually do to a call, and when Opus is the right choice versus AAC. Every claim traces back to the IETF specification or the codec's open-source maintainer, not a secondhand blog.

The forty-year split that Opus closed

For most of audio history, engineers kept two separate toolboxes. One held speech codecs — the ones inside a phone call. They work by modelling the human voice box: the throat, the vocal cords, the shape of the mouth. That model is extraordinarily efficient for a single talking voice, squeezing intelligible speech into a few kilobits per second, but it mangles music, because a violin is nothing like a larynx. The other toolbox held music codecs like MP3 and AAC. They work by analysing the sound's frequencies and throwing away the parts your ear cannot hear — covered in how audio compression works. They sound superb on music but waste bits on plain speech and add too much delay for a live conversation.

A real-time product that carried both — a video call where someone talks, then shares a music video, then talks again — had to pick one toolbox and live with its weakness, or switch codecs mid-stream and risk a glitch. Opus ended that compromise. It packs both toolboxes into one codec and decides, many times per second, which one fits the sound in front of it. That single design choice is the thing to hold onto for the rest of this article: Opus is a speech codec and a music codec wearing one coat, with a switch that flips automatically.

Opus was standardized by the Internet Engineering Task Force — the same body that standardizes the protocols of the internet itself — as RFC 6716 in September 2012. Its authors came from Mozilla (the Firefox maker), Skype, and Xiph.Org, the nonprofit behind several open media formats. That pedigree matters for a reason we will return to at the end: Opus was built to be open and free, not to be licensed.

Block diagram of Opus showing an input signal entering a mode selector that routes to the SILK speech engine, the CELT music engine, or a hybrid that uses both, with the three audio modes and their frequency ranges labelled and a single Opus bitstream coming out Figure 1. One codec, two engines, three modes. Opus inspects the incoming sound and routes it to SILK (speech), CELT (music), or a hybrid that runs both at once. The listener only ever sees one Opus bitstream.

SILK, CELT, and the switch in the middle

Open up Opus and you find two engines bolted to one frame.

The first engine is SILK, which came from Skype, where it had already carried billions of minutes of voice calls. SILK is a linear-prediction coder, which is the technical name for the voice-box-modelling approach: it predicts each moment of the waveform from the moments just before it, the way you can predict the next note of a familiar tune, and only sends the small prediction error. That is brilliant for speech and for the lower frequencies of any sound, and it is cheap on bits. SILK handles Opus's lower internal sample rates, up to wideband.

The second engine is CELT, which came from Xiph.Org. CELT is a transform coder built on the Modified Discrete Cosine Transform, or MDCT — the same family of frequency analysis that AAC uses. CELT is designed for very low delay, which is unusual for a music-style codec, and it handles the full audible frequency range and music-like sound.

Between them sits a mode selector. Opus runs in one of three modes, chosen automatically based on the bitrate, the bandwidth, and the content:

SILK-only mode — pure speech engine, for voice at low bitrates and narrower frequency ranges.
CELT-only mode — pure music engine, for music and for the lowest-latency settings.
Hybrid mode — both at once: SILK codes the low frequencies, CELT codes the high frequencies on top. This is how Opus delivers full-band speech that sounds natural without paying full music-codec cost.

The reader does not need to track which mode is active at any instant; the encoder decides and the decoder follows, all signalled inside the bitstream. What matters is the result: one codec that never has to apologise for the kind of sound it is given.

The numbers that make Opus flexible

Opus's reputation rests on its range. Three dials explain almost everything.

The first dial is bitrate — how many bits per second you spend. Opus supports 6 kbit/s at the low end up to 510 kbit/s at the top, and it can change the rate at any instant without restarting the stream (RFC 7587, §3.1). To make that range concrete, the specification publishes "sweet spot" bitrates — the rate at which each kind of content sounds good for its cost, at the standard 20-millisecond frame:

Content	Opus sweet spot (20 ms frame)
Narrowband speech	8–12 kbit/s
Wideband speech	16–20 kbit/s
Full-band speech	28–40 kbit/s
Full-band mono music	48–64 kbit/s
Full-band stereo music	64–128 kbit/s

Table 1. Opus recommended bitrates, from RFC 7587, §3.1.1. "Full-band" means the codec is carrying the entire audible frequency range up to 20 kHz; "narrowband" is telephone-quality speech up to 4 kHz.

The second dial is audio bandwidth — how much of the frequency range the codec carries. Opus offers five settings, from narrowband (telephone speech, up to 4 kHz) through wideband and super-wideband to fullband (the whole audible range up to 20 kHz, at a 48 kHz sampling rate). The encoder can narrow the bandwidth to save bits when the network is weak, then widen it again when capacity returns, mid-call, with no audible seam.

The third dial is frame size — how much sound goes into each packet. Opus can encode frames of 2.5, 5, 10, 20, 40, or 60 milliseconds, and pack several frames into one packet up to 120 ms (RFC 7587, §4.2). This dial is the heart of why Opus owns real-time audio, so it deserves its own arithmetic.

A short frame means the encoder waits less time to fill a packet before sending it. That waiting time is pure, unavoidable delay added to every word spoken. Compare a music-codec frame with an Opus real-time frame:

AAC-LC frame  = 1024 samples ÷ 48,000 samples per second ≈ 21.3 ms per frame
Opus frame    = 20 ms (typical real-time setting)
Opus minimum  = 2.5 ms (lowest-latency setting)

Twenty-one milliseconds of forced delay before a single packet leaves the encoder may not sound like much, but in a two-way conversation it stacks on top of network delay, the jitter buffer, and the decoder, and the total is what makes a call feel laggy or natural. Opus's ability to drop to a 2.5 ms frame, while keeping good quality, is a large part of why it, not AAC, sits in every calling app. We unpack the full delay chain in the WebRTC audio pipeline end to end.

Pitfall — chasing the shortest frame can cost you more than it saves. A shorter frame lowers encoder delay, but it also raises overhead, because every packet carries the same fixed network headers (IP, UDP, RTP) regardless of how little audio is inside. Halve the frame size and you double the number of packets and the header tax. On a congested network that extra packet rate can trigger the very loss you were trying to avoid. The common real-time setting is 20 ms precisely because it balances delay against overhead; go shorter only when you have measured that the latency win is worth the cost.

The two features that save a bad call: FEC and DTX

Two Opus settings do more for call quality than any amount of raw bitrate, because they address what actually breaks live audio: lost packets and wasted bandwidth.

The first is in-band Forward Error Correction, or FEC. On the internet, audio travels as a stream of packets, and some packets simply never arrive. Normally a lost packet means a gap — a click, a dropped syllable. FEC fixes this with a clever trick: when the encoder thinks the network is lossy, it tucks a low-quality copy of the previous packet's audio inside the current packet (RFC 7587, §3.3). So if packet number 5 is lost, packet number 6 arrives carrying a backup copy of 5, and the decoder reconstructs the missing audio instead of leaving a hole. The cost is a few extra kilobits; the payoff is speech that stays intelligible through packet loss that would otherwise shred it. We cover the full family of loss-recovery techniques in forward error correction, in-band FEC, and RED redundancy and how players hide the gaps that slip through in packet loss concealment.

The second is Discontinuous Transmission, or DTX. When you stop talking, there is nothing useful to send — yet a naive codec keeps transmitting silence at full rate. DTX detects the pause and stops sending audio packets, sending only an occasional tiny update so the far end knows the line is still alive (RFC 7587, §3.1.3). On a typical call where each person talks less than half the time, DTX can cut the average bandwidth substantially. Opus also generates its own comfort noise — a faint, natural-sounding hiss — during the silence, because total digital silence feels unnervingly dead to the listener, as if the call had dropped. The mechanics of detecting speech and suppressing transmission are covered in voice activity detection and discontinuous transmission.

A short worked example shows the combined effect on a two-person call:

Both mics streaming continuously at 32 kbit/s each = 64 kbit/s total
Each person actually talks ~40% of the time
With DTX, average sent ≈ 0.40 × 64 ≈ 26 kbit/s
FEC adds back a few kbit/s for loss protection on the active speech
Net: a robust call for roughly the bandwidth of one continuous stream

Pitfall — turning on FEC the receiver cannot use just wastes bandwidth. FEC only helps if the decoding side knows to look for the backup copy in the next packet. If FEC is enabled on the sender but the receiver is not configured to use it, the redundant data is sent and then thrown away — pure waste (RFC 7587, §3.3 recommends not using FEC when the receiver cannot take advantage of it). FEC and DTX are negotiated at call setup through SDP parameters (useinbandfec, usedtx); make sure both sides agree, rather than flipping them on blindly.

Why every browser ships Opus

Opus did not become universal by being marketed. It became universal because the standard that powers in-browser calling made it mandatory.

WebRTC is the technology that lets a web page open a microphone and camera and place a call with no plugin. Its audio rules are written in IETF RFC 7874 (May 2016), and they are blunt: every WebRTC endpoint is required to implement Opus, plus the old telephone codec G.711 for compatibility with legacy phone systems. When the device can handle more than telephone quality — which is almost always — the specification recommends offering Opus first. The practical result is that Chrome, Firefox, Safari, and Edge all encode microphone audio as Opus by default. Safari was the last holdout and closed the gap in 2021; as of 2026 every major browser sends and receives Opus.

Because the browsers agreed on Opus, so did the server software that routes calls between them — mediasoup, Pion, LiveKit, Janus, and the rest. A codec that is guaranteed present on every endpoint is a codec you never have to transcode, and transcoding audio in the middle of a call costs CPU and adds delay. Opus being everywhere is therefore not just convenient; it removes an entire class of work from a real-time system. How those servers route audio without re-encoding it is the subject of audio in SFU vs MCU vs P2P.

Opus reaches beyond real-time, too. It rides inside several containers — Ogg (defined in RFC 7845), WebM and Matroska, MP4/ISOBMFF, and MPEG-TS — so it also appears in on-demand streaming over HLS and DASH and in audio containers generally. Its weak spot historically was hardware playback devices and broadcast chains, which standardised on AAC and Dolby years before Opus existed — which is exactly where AAC still wins.

Opus versus AAC: a choice with a clear rule

The two codecs that dominate modern audio are Opus and the AAC family. They are not really competitors so much as specialists, and the choice between them usually answers itself once you ask what the audio is for.

Criterion	Opus	AAC family
Best at	Real-time calls, interactive audio	On-demand streaming, broadcast, device playback
Lowest practical bitrate	~6 kbit/s (speech)	~12 kbit/s (xHE-AAC)
Minimum frame / delay	2.5 ms	~21 ms (AAC-LC)
Speech + music in one stream	Yes, native	Yes, with xHE-AAC
Universal browser / WebRTC support	Yes, mandatory	Partial; not the WebRTC default
Hardware decoder in TVs / phones	Improving, not universal	Universal
Licensing (2026)	Royalty-free, BSD license	Per-unit royalty via Via LA pool

Table 2. Opus and AAC compared on the axes that actually decide a project. "Frame / delay" is the minimum encoder delay before a packet can be sent — the figure that matters for live conversation.

The rule that falls out of the table: if the audio is a live, two-way conversation, choose Opus; if it is one-way playback to a wide range of devices and TVs, lean to AAC. A telemedicine call uses Opus. A movie streamed to a smart TV uses AAC. A product that does both — an e-learning platform with live classes and a recorded-lecture library — sensibly uses Opus for the live side and AAC for the on-demand side, which is a common and correct pattern, not a failure to standardise.

On pure quality at the same bitrate for music, the two are close enough that the decision should rest on the factors above — delay, device support, licensing — rather than on a quality contest most listeners would fail.

What Opus 1.5 added: machine learning inside the codec

Opus is a living codec. Its open-source reference implementation keeps improving without changing the on-the-wire format, so newer encoders produce better-sounding audio that older decoders can still play.

The 2017 update RFC 8251 was housekeeping: it fixed two security issues found by fuzzing the decoder and corrected minor quality bugs, while staying fully compatible with the original RFC 6716. Any decoder that passed the original tests still works.

The bigger leap came with Opus 1.5, released by Xiph.Org in March 2024, which put machine learning inside the codec for the first time. Two features stand out. Deep PLC uses a neural network to fill in lost packets with reconstructed audio that sounds far more natural than the old repeat-and-fade trick. Deep Redundancy (DRED) pushes FEC much further, letting the decoder recover speech even through long bursts of loss that would normally destroy a call. These run at the decoder, so a service can adopt them on the playback side and improve call resilience for users on bad networks. As of 2026 the DRED format is still being finalised, so treat it as advanced rather than settled, but the direction is clear: the loss recovery this article described in classical terms is being rebuilt with neural networks, and Opus is leading that work in the open.

Where Fora Soft fits in

We have built real-time audio into video products since 2005 — video conferencing, telemedicine consultations, e-learning classrooms, and AR/VR experiences — and Opus is the codec under almost all of it, because WebRTC hands it to us by default and it is free to use. In production the recurring lesson is not about choosing Opus but about tuning it: setting a sensible 20 ms frame, enabling FEC and DTX on both ends, and letting the bitrate adapt to the network rather than nailing it to a fixed number. When a client also needs an on-demand library alongside live sessions, we pair Opus for the calls with AAC for the recordings, which is the same split this article recommends. The failures we are called in to fix are rarely "wrong codec" and usually "Opus with the wrong settings" — FEC off on a lossy network, or a frame size chosen without measuring its overhead.

Call to action

Talk to a audio engineer — book a 30-minute scoping call to talk through your opus codec explained plan.
See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
Download the Opus Settings — cheat sheet — One-page reference: the three Opus modes (SILK / CELT / hybrid), the RFC 7587 bitrate sweet spots, the FEC / DTX / frame-size dials, key SDP parameters, and the one rule that picks Opus over AAC.

References

IETF RFC 6716, Definition of the Opus Audio Codec (Valin, Vos, Terriberry; September 2012) — the controlling standard. Defines the SILK + CELT hybrid architecture, the 6–510 kbit/s range, and the normative reference decoder. https://www.rfc-editor.org/rfc/rfc6716.html
IETF RFC 8251, Updates to the Opus Audio Codec (Valin, Vos; October 2017) — updates RFC 6716 with decoder fixes for two CVEs (CVE-2013-0899, CVE-2017-0381) and minor quality issues; remains fully compatible with RFC 6716. https://www.rfc-editor.org/rfc/rfc8251.html
IETF RFC 7587, RTP Payload Format for the Opus Speech and Audio Codec (Spittka, Vos, Valin; June 2015) — bitrate sweet spots (§3.1.1), frame sizes and the 120 ms maximum packet (§4.2), FEC (§3.3), DTX (§3.1.3), the 48000 Hz RTP clock, and the SDP parameters (useinbandfec, usedtx, stereo, maxaveragebitrate). https://www.rfc-editor.org/rfc/rfc7587.html
IETF RFC 7874, WebRTC Audio Codec and Processing Requirements (Valin, Bran; May 2016) — requires WebRTC endpoints to implement Opus and G.711, and recommends offering Opus first above 8 kHz. https://www.rfc-editor.org/rfc/rfc7874.html
IETF RFC 7845, Ogg Encapsulation for the Opus Audio Codec (Terriberry, Lennox, Vos; April 2016) — the Ogg container mapping for Opus. https://www.rfc-editor.org/rfc/rfc7845.html
Xiph.Org, Opus Codec — License — the BSD-style license and royalty-free patent grants for Opus. https://opus-codec.org/license/
Xiph.Org, Opus 1.5 Released (March 2024) — the first Opus release with machine learning in the encoder and decoder: Deep PLC and Deep Redundancy (DRED). https://opus-codec.org/demo/opus-1.5/
MDN Web Docs, Codecs used by WebRTC — current browser support for Opus in WebRTC, used for orientation; the mandatory-codec claim is taken from RFC 7874 (reference 4), not from MDN. https://developer.mozilla.org/en-US/docs/Web/Media/Guides/Formats/WebRTC_codecs
Valin, Terriberry, Montgomery, Maxwell, High-Quality, Low-Delay Music Coding in the Opus Codec (AES Convention Paper, 2016; arXiv preprint) — the CELT music-coding design from the codec's authors. https://arxiv.org/pdf/1602.04845

Per §4.3.2, where popular blogs claim Opus "always beats AAC at the same bitrate", this article follows the more careful position of the codec authors' own papers (reference 9) and the standards: Opus's decisive advantage is delay and openness for real-time use, while music quality at matched bitrate is close to AAC rather than uniformly superior.

Opus: the open codec that ate WebRTC

Why this matters

The forty-year split that Opus closed

SILK, CELT, and the switch in the middle

The numbers that make Opus flexible

The two features that save a bad call: FEC and DTX

Why every browser ships Opus

Opus versus AAC: a choice with a clear rule

What Opus 1.5 added: machine learning inside the codec

Where Fora Soft fits in

What to read next

Call to action

References

Related glossary terms

Opus: the open codec that ate WebRTC

Why this matters

The forty-year split that Opus closed

SILK, CELT, and the switch in the middle

The numbers that make Opus flexible

The two features that save a bad call: FEC and DTX

Why every browser ships Opus

Opus versus AAC: a choice with a clear rule

What Opus 1.5 added: machine learning inside the codec

Where Fora Soft fits in

What to read next

Call to action

References

Related glossary terms

Opus

AAC

Bitrate

CELT

SILK

Audio codec

WebRTC audio

xHE-AAC