Why this matters

If you run a video conferencing product, a telemedicine platform, an online classroom, or a contact centre, the complaint you hear is rarely "the gain control is wrong" — it is "I can't hear the quiet person" or "that one caller is blowing out my headphones." Both are AGC problems. This article is for the product manager, founder, or operations lead who needs to understand level control well enough to read those complaints correctly, ask an engineer a sharp question, and set realistic expectations with a customer. A senior engineer will also find every claim traced to the relevant ITU-T Recommendation, the W3C constraint, or the WebRTC source. By the end you will be able to explain why one talker is too loud, why AGC sometimes makes background noise swell between sentences, and why turning AGC off can be the right call for a music or studio use case.


The problem: every talker arrives at a different volume

Start with what the listener experiences, because the whole solution follows from it. In any call with more than two people, the talkers never match. One person leans into a headset mic; another sits a metre back from a laptop. One bought a podcast microphone; another uses the cheap array built into a budget tablet. One operating system shipped with the microphone slider at 30 percent; another at 100. The result is that the same spoken sentence reaches the listener anywhere from barely audible to painfully loud, and the listener spends the meeting riding their own volume knob.

The job of automatic gain control is to erase that difference before it ever reaches the listener. It measures how loud each talker's voice is, decides how much to boost or cut, and applies that change so everyone lands near one consistent target. Done well, it is invisible: every voice simply sounds present and even. Done badly, it is the most fatiguing defect on the call.

The unit we use to talk about loudness here is the decibel relative to full scale, written dBFS. Full scale — 0 dBFS — is the loudest a digital signal can be before it clips; every real level is a negative number below it. A comfortable speaking voice in a captured signal sits somewhere around −18 to −12 dBFS; a whisper might be −40 dBFS; clipping distortion starts at 0 dBFS. Keep that scale in mind: AGC is the machinery that nudges a −35 dBFS quiet talker and a −6 dBFS loud talker toward the same target, say −18 dBFS, so the listener hears them as equals.

What "gain" actually means

The word at the centre of this topic is gain. Gain is simply a multiplier applied to the audio signal — a number greater than one makes it louder, a number less than one makes it quieter. Engineers usually express gain in decibels rather than as a raw multiplier, because decibels match how the ear perceives loudness and because they add instead of multiply, which is easier to reason about.

The conversion is worth doing out loud once, because it demystifies every number in this article:

gain in decibels = 20 × log10(output amplitude ÷ input amplitude)

A gain of +6 dB  →  multiply the signal amplitude by 2 (twice as tall a wave)
A gain of +12 dB →  multiply by 4
A gain of −6 dB  →  multiply by 0.5 (cut it in half)

So when an AGC module says it can apply "up to 30 dB of gain," it means it can make a faint talker more than thirty times taller in amplitude. That is a lot of lifting, and it is exactly why AGC has to be careful: boost a quiet voice by 30 dB and you also boost the room hum, the fan, and the keyboard by 30 dB. The whole art of AGC is boosting the voice without dragging everything else up with it.

The two levers: analog gain and digital gain

A capture system has two physically different places to change the level, and good AGC uses both. Understanding the split is the key to everything that follows.

The first lever is analog gain. Before your voice is ever turned into numbers, it passes through the microphone's preamplifier, and most operating systems expose a control for how much that preamp amplifies — the microphone "input volume" slider you have seen in your sound settings. AGC can ask the operating system to move that slider. The advantage is decisive: raising analog gain captures a quiet talker with more real detail, because the signal is amplified before it is digitised, so you are not stretching a coarse, already-quantised number. The cost is that the slider is slow, shared with the whole machine, and on some hardware moves in coarse steps.

The second lever is digital gain. Once the audio is captured as samples, software can multiply those samples directly. This is instant, precise, and fully under the application's control. Its limit is that it cannot add detail that was never captured: if the talker came in at −45 dBFS, multiplying by a large number lifts the voice and the noise floor together, and amplifies any quantisation coarseness baked in at capture time.

The sensible strategy, then, is to use analog gain to get the raw capture into a healthy range, and digital gain to fine-tune the last few decibels to the exact target. WebRTC's classic design did exactly this, in what its source calls the adaptive analog mode: a feedback loop that watches the captured level and nudges the operating-system microphone volume up or down — internally mapped to a 0–255 scale — while a digital stage trims the remainder. On a phone, where no analog microphone slider is exposed to the app, that lever is missing, so the system falls back to a virtual microphone: a software stand-in that imitates the analog slider entirely in the digital domain.

Diagram contrasting analog gain and digital gain in a capture chain. On the left, sound enters a microphone preamplifier whose gain the operating system controls before the analog-to-digital converter; this is analog gain, applied before digitisation. On the right, after the converter produces samples, a software multiplier applies digital gain. A caption notes that analog gain adds real detail but is slow and coarse, while digital gain is instant and precise but cannot recover detail that was never captured, and that good AGC uses analog gain for the coarse range and digital gain for the fine target. Figure 1. The two levers AGC pulls. Analog gain is applied in hardware before the signal is digitised; digital gain is applied to the samples afterward. The first adds detail but is slow; the second is instant but cannot recover what the microphone never captured.

Two styles of control: peak-following and loudness-target

Beyond where to apply gain, there is the question of what to aim at. Two philosophies have competed for decades.

The first is peak-following. Here the controller watches the loudest momentary sample — the peak — and turns the gain down fast whenever a peak threatens to hit 0 dBFS and clip. This is what a limiter does, and a limiter is the safety net at the end of almost every AGC chain. Peak-following reacts quickly and never lets the signal clip, but if used alone it makes a voice sound squashed, because it clamps down hard on every loud syllable.

The second is loudness-target control. Here the controller measures the average loudness over a window of time — closer to how the ear judges loudness — and slowly steers the gain so that average lands on a target. This sounds natural because it does not react to every transient; it tracks the overall level of the voice. The cost is that a slow controller can be caught off guard by a sudden loud sound, which is why a fast peak limiter is always kept downstream as a backstop.

Real systems combine the two: a slow loudness-target stage does the musical work of evening out talkers, and a fast peak limiter sits at the very end so that even a cough or a slammed desk can never clip the output. The WebRTC module names these jobs directly — an adaptive digital gain controller that steers toward a target, a saturation protector that anticipates peaks, and a limiter that catches whatever slips through.

The control loop: measure, compare, adjust — slowly

Every AGC is a feedback loop, and the single most important design choice in that loop is how fast it reacts. Walk the loop once. The controller measures the current voice level. It compares that to the target. It computes the gain change needed to close the gap. Then — and this is the crucial part — it applies that change gradually, over tens or hundreds of milliseconds, rather than all at once.

Why gradually? Because human loudness varies naturally within a single sentence. We stress some words and throw away others; the end of a sentence trails off. If the AGC chased every one of those variations, it would amplify the trailing-off and clamp the stressed words, and the voice would lurch in volume — the artefact engineers call pumping or breathing. The fix is to make the loop slow on the way up and fast on the way down: rise gently toward a louder target so quiet passages are lifted smoothly, but cut quickly when something loud arrives so nothing clips. These two speeds are the attack (how fast gain comes down for loud sound) and release (how slowly it comes back up for quiet sound).

The international standard for level control in the telephone network, ITU-T Recommendation G.169 (Automatic level control devices, 06/1999), codifies exactly this caution. It does not prescribe an algorithm, but it does set hard limits on behaviour: the device's gain "should not increase at a rate of more than 10 dB/s," and the initial gain at the start of a call should default to unity (no change) and in any case "should not exceed +4 dB." The reasoning is the same one a meeting needs — a controller that grabs gain quickly destabilises the whole connection and makes everyone sound like they are talking through a tunnel that keeps changing size.

Why AGC must know when you are talking

Here is the subtlety that separates a good AGC from a bad one. The loop above says "measure the current voice level." But during the gaps between words and sentences, there is no voice — only room noise, a fan, distant traffic. If the controller naively measured those gaps and saw a quiet signal, it would conclude "this talker is too quiet" and crank the gain up, swelling the noise into a loud hiss. Then you start talking, the level jumps, and it slams the gain back down. The result is the breathing artefact again, this time driven entirely by silence.

So a competent AGC includes a voice activity detector, abbreviated VAD — a small classifier that decides, frame by frame, whether the current audio is speech or not. The AGC only updates its level estimate and adapts its gain when the VAD says speech is present; during silence it holds the gain steady. WebRTC's AGC does this with a statistical detector that tracks both a short-term and a long-term estimate of the signal's energy and looks at how far the short-term level deviates from the long-term "centre of gravity" — when the deviation is high, speech is likely present. In real-time calls the detector also has to discount any residual echo of the far end leaking into the microphone, so it does not mistake the other person's voice for yours and adapt to the wrong signal. We cover the detector in its own right in Voice Activity Detection (VAD) and Discontinuous Transmission (DTX).

The pitfall, stated plainly. When a customer says "the background noise gets loud whenever I stop talking, then drops when I start," that is an AGC-plus-VAD problem, not a noise problem. The gain is rising into the silence because the voice detector is missing the gaps, or because noise suppression is running too weakly ahead of the gain stage. The fix is upstream of the gain — better voice detection and stronger noise suppression — not a lower target. Test for it deliberately: have a tester sit in a noisy room and stay silent for ten seconds between sentences, and listen for the swell.

Diagram of the AGC control loop drawn as a four-step cycle. Step one, measure the current voice level, but only when the voice activity detector confirms speech is present. Step two, compare that level against the target, for example minus eighteen dBFS. Step three, compute the gain change needed to close the gap. Step four, apply that change gradually using a slow rise and a fast fall, then a peak limiter at the end guarantees the output never clips at zero dBFS. An arrow loops from step four back to step one, labelled every frame. A side note warns that without the voice detector gating step one, the loop would amplify silence into loud noise. Figure 2. The AGC loop. It measures the voice level — but only while the voice detector confirms speech — compares against the target, and applies a gain change slowly. A peak limiter at the very end guarantees the output never clips.

A worked example: evening out two talkers

Numbers make the whole thing concrete. Suppose your target is −18 dBFS and two people join a call.

Talker A is close to a good microphone and arrives at −12 dBFS — six decibels louder than target. The AGC needs to cut:

gain change = target − measured = −18 − (−12) = −6 dB

So the controller settles at −6 dB of gain on Talker A, multiplying their amplitude by 0.5. It eases there over a few hundred milliseconds so the cut is inaudible.

Talker B sits back from a quiet tablet and arrives at −38 dBFS — twenty decibels below target. The AGC needs to boost:

gain change = target − measured = −18 − (−38) = +20 dB

Twenty decibels is a ten-fold amplitude boost. The controller cannot slam that in instantly — at 10 dB/s, the G.169 ceiling, it would take two full seconds to ramp, and a well-tuned conferencing AGC moves faster than that but still over a noticeable fraction of a second. And here is the catch the example exposes: boosting Talker B by 20 dB also lifts the hum of their room by 20 dB. If their noise floor was −60 dBFS, it is now −40 dBFS — quiet, but audible. This is why a big boost always needs strong noise suppression in front of it, and why the controller caps how far it will push: WebRTC's adaptive digital gain, in practice, will lift a very quiet talker by roughly 30 to 35 dB at the most, and reaches that ceiling over several seconds rather than instantly.

How WebRTC does it: AGC1, AGC2, and the move to digital

Most browser-based voice apps and many native ones run the open-source libwebrtc stack, and its gain control lives inside the Audio Processing Module (APM) — the same capture-side cleanup stage that holds echo cancellation and noise suppression, which we map end to end in The WebRTC Audio Pipeline End-to-End. There are two generations of the gain code, and the names show up in every engineer's bug report.

AGC1 is the classic module. It offers three modes. Fixed digital applies a constant gain with a limiter and no feedback — simple, used on embedded gear. Adaptive analog runs the feedback loop that adjusts the operating-system microphone slider (the 0–255 level) and trims the rest digitally — the PC and Mac path. Adaptive digital imitates that analog loop with a virtual microphone for phones, where no slider is exposed. AGC1's two core dials are the target level in dBFS and the compression gain in dB, the most the module is allowed to boost a quiet signal.

AGC2 is the newer module, built around an adaptive digital gain controller, a voice classifier, a saturation protector that anticipates peaks before they clip, and a final limiter. The industry direction over the past several years has been to move the heavy lifting into AGC2's digital path and treat the analog slider as a coarse helper rather than the primary control. The reason is practical: the analog slider is shared with the whole operating system, moves in coarse steps on some hardware, and surprises users — which is the source of a long-running complaint we turn to next.

The order in the chain matters. Gain control runs after echo cancellation and after noise suppression in the WebRTC capture chain. The reasoning is strict: if AGC boosted the signal before noise suppression, it would amplify the noise and make the suppressor's job harder; and if it ran before echo cancellation, it would change the level the canceller is trying to model. So the fixed order is high-pass filter, then echo cancellation (Acoustic Echo Cancellation (AEC): How It Really Works), then noise suppression (Noise Suppression: Classical NS, RNNoise, Krisp, NVIDIA RTX Voice), and gain control last — so it sets the level of an already-cleaned voice.

The controversy: "stop adjusting my microphone"

There is a well-known friction point worth naming, because your support inbox will see it. When the browser's adaptive-analog AGC moves the operating-system microphone slider, the user sees their own slider move. A musician, a podcaster, or anyone using a carefully set audio interface watches the browser yank their input level around mid-session, and it feels like the application is fighting them. This has been a standing complaint against Chrome's WebRTC implementation for years, with users asking for a way to stop the browser touching the hardware slider.

The lever for this is the autoGainControl constraint defined by the W3C Media Capture and Streams specification. When an application calls getUserMedia to open the microphone, it can request autoGainControl: false to ask the browser to leave gain alone. The spec is careful about what the source promises: if the device cannot do auto gain control at all it reports a single false; if it cannot be turned off it reports a single true; only if the script can actually control the feature does it report both true and false as options. In other words, whether you can disable AGC depends on the browser and the device, and a good application checks rather than assumes.

// Ask the browser NOT to touch the microphone level — for music, studio, or pro-audio capture.
const stream = await navigator.mediaDevices.getUserMedia({
  audio: { autoGainControl: false, echoCancellation: false, noiseSuppression: false }
});

The practical rule: for a meeting or a call, leave AGC on — even talker levels matter more than fidelity. For music, instrument capture, or a user with a professional interface, expose a toggle that sets autoGainControl: false, because on that hardware the user's own gain staging is better than any automatic loop.

Settings to expose, and settings to hide

This is the most actionable part of the article, drawn straight from the plan for this section: which dials belong in your product's UI and which belong only in an engineer's config file.

Setting Expose to users? Why
AGC on/off (the autoGainControl constraint) Yes, for pro-audio / music modes Musicians and podcasters need their own gain staging; meetings do not
Microphone "input level" slider (analog gain) Yes, but as a manual override Users expect to control their own hardware; surprise changes feel like a bug
Target level (dBFS) No A wrong target makes every talker too loud or too quiet across the whole product
Compression / max boost (dB) No Too high amplifies noise; too low leaves quiet talkers inaudible — tune it per platform
Attack / release times No The loudness-pumping artefact lives here; only an audio engineer should touch them
Limiter on/off No The clip safety net should never be user-disableable on a call

The pattern is clear. Expose the intent — "I am in a meeting" versus "I am playing music" — and hide the mechanism. Every dial in the bottom half of that table is a way to make the product sound worse if set wrong, and none of them is something a non-engineer can tune by ear in the moment.

Where Fora Soft fits in

We have built real-time audio into video conferencing, telemedicine, e-learning, and live-shopping products since 2005, and uneven talker levels are a complaint every one of those products has to answer. In telemedicine especially, a clinician on a good headset and a patient on a far-away laptop speaker are exactly the −12 dBFS-versus-−38 dBFS mismatch this article describes, and getting both voices to land at one comfortable level is the difference between a calm consultation and a strained one. Our work is mostly in choosing the right gain strategy per device class, configuring the WebRTC Audio Processing Module so the analog slider does not surprise users, deciding when to expose an AGC-off toggle for music or instruction, and testing deliberately for the gain-chase and noise-swell artefacts before they reach a customer. We do not rewrite AGC2; we make it behave on the messy mix of devices your users actually own.

What to read next

Call to action

References

  1. ITU-T Recommendation G.169, Automatic level control devices (06/1999; reaffirmed/republished 05/2011). The network standard for automatic level control: the gain "should not increase at a rate of more than 10 dB/s," initial gain defaults to unity and "should not exceed +4 dB," and the ITU deliberately does not specify the ALC algorithm or target levels. https://www.itu.int/rec/T-REC-G.169
  2. W3C, Media Capture and Streams, the autoGainControl constraint and its capability-reporting rules (a source reports false if it cannot do AGC, true if it cannot turn AGC off, and both values only if the script can control it). W3C Candidate Recommendation; accessed 2026-06-06. https://www.w3.org/TR/mediacapture-streams/
  3. WebRTC source, modules/audio_processing/agc2/ — the AGC2 module: adaptive digital gain controller, level estimator / voice classifier, saturation protector, and final limiter. Accessed 2026-06-06. https://webrtc.googlesource.com/src/+/main/modules/audio_processing/agc2/
  4. WebRTC source, modules/audio_processing/agc/legacy/gain_control.h — AGC1's three modes (fixed digital, adaptive analog, adaptive digital) and its targetLevelDbfs / compressionGaindB / limiterEnable parameters. Accessed 2026-06-06. https://webrtc.googlesource.com/src/+/main/modules/audio_processing/agc/legacy/gain_control.h
  5. WebRTC issue tracker, AGC clean-up and refactoring (issue 7494) — the multi-year effort to consolidate the gain controllers and move the heavy lifting to AGC2's digital path. Accessed 2026-06-06. https://issues.webrtc.org/issues/42220611
  6. Chromium issue tracker, "Disable allow WebRTC to adjust the input volume" by default (issue 398550914) — the standing user complaint that the adaptive-analog AGC moves the operating-system microphone slider. Accessed 2026-06-06. https://issuetracker.google.com/issues/398550914
  7. Luo Shen / Alibaba Cloud Video Cloud (CloudImagine), Detailed explanation of the high sound quality and low latency behind WebRTC-AGC (2021). A first-party engineering walk-through of WebRTC AGC's fixed-digital, adaptive-analog, and adaptive-digital (virtual-mic) modes, the gain tables, the VAD-driven level estimate, the ~35 dB maximum boost, and the ~10 s convergence — used here for implementation detail; spec facts above defer to G.169 and the WebRTC source. https://segmentfault.com/a/1190000040072259/en
  8. MDN Web Docs, MediaTrackSettings: autoGainControl property — practical browser-support and usage reference for the AGC constraint (orientation only; the normative source is reference 2). Accessed 2026-06-06. https://developer.mozilla.org/en-US/docs/Web/API/MediaTrackSettings/autoGainControl
  9. ITU-T Recommendation G.114, One-way transmission time (05/2003). Companion delay standard; relevant because a slow, stable AGC loop must not add perceptible delay to a real-time call. https://www.itu.int/rec/T-REC-G.114