Published 2026-05-31 · 18 min read · By Nikolay Sapunov, CEO at Fora Soft

Why This Matters

If your product carries live voice — a video conferencing tool, a telemedicine consult, an e-learning classroom, a customer-support voice agent — the single fastest way to make it feel cheap is to let a participant's background noise reach everyone else. Noise suppression is the fix, but it is not free: pick the wrong model and you either add too much delay (the call starts to feel like a satellite phone) or burn too much processor (the feature drains laptop batteries and dies on a phone). This article is for the product manager, founder, or engineering lead who has to decide which noise suppressor goes into the product, and whether to host an open model or pay for a managed one. By the end you will understand how these systems separate voice from noise, why "added latency" is the metric that gates a live call, how RNNoise, DeepFilterNet, and Krisp compare on speed, quality, cost, and licensing, and a five-question test that lands you on the right choice. The goal is that you can walk into an engineering conversation and ask the right questions instead of being sold on a demo recording.

What "Noise Suppression" Actually Does

Start with the problem. When you speak into a microphone, the microphone does not capture only your voice. It captures everything: your words, plus the air conditioner, plus the keyboard, plus the neighbour's lawnmower, all mixed into one stream of sound. Noise suppression is the software that takes that mixed stream and tries to hand back only the voice.

The word "suppression" is precise. The software does not physically remove the noise from the world — it estimates which parts of the signal are speech and which are noise, then turns the noise parts down while leaving the speech parts up. Think of it like a sound engineer at a mixing desk with one fader per sound, pushing down every fader that is not your voice, many times per second, automatically.

Here is the key distinction that trips people up. Noise suppression (sometimes called noise cancellation in marketing) is a software step that cleans up a captured signal. It is not the same as the noise-cancelling in your headphones, which is a hardware trick that plays an anti-sound wave to cancel ambient noise at your ear. It is also not echo cancellation, which removes the sound of the far end leaking back through your speaker — a separate problem covered in its own article. This article is only about the software that removes background noise from the microphone signal before it is sent on.

Why does this matter so much for a live product? Because noise does two kinds of damage. The obvious damage is to humans: a call with a screaming background is unpleasant and unprofessional. The less obvious damage is to machines: if your product also transcribes the call, the speech-to-text engine's accuracy collapses on noisy audio, so cleaning the signal first directly improves caption and transcript quality (see our streaming ASR article). Clean audio is the foundation that every later audio feature stands on.

A pipeline diagram showing a microphone capturing a mixed signal of voice plus noise icons, the mixed waveform flowing into a noise-suppression box, and a clean voice-only waveform flowing out toward the network; below, a separate greyed-out box labelled echo cancellation is marked as a different problem handled elsewhere. Figure 1. Noise suppression takes the mixed microphone signal and hands back voice only. Echo cancellation is a separate, neighbouring problem.

How A Modern Noise Suppressor Decides What To Keep

To choose between the three options, it helps to understand how they work, because their differences in speed and quality come straight from their designs.

Every modern suppressor follows the same three-step rhythm, repeated continuously on tiny slices of audio. First, it chops the incoming sound into short frames — small slices typically 10 to 20 milliseconds long, where a millisecond is one-thousandth of a second. Second, for each frame it asks a trained model, "which frequencies here are speech, and which are noise?" Third, it turns down the noisy frequencies and passes the frame on. Do this fast enough, frame after frame, and the listener hears continuous clean speech.

The "which frequencies are speech" question is where the machine learning lives, and where the three options diverge. A sound can be split into frequency bands — low rumble at the bottom, hiss at the top, the human voice mostly in the middle. The model looks at the energy in each band and predicts a gain for it: a number between 0 (silence this band completely) and 1 (keep this band untouched). Multiply each band by its predicted gain and the noise bands shrink while the voice bands survive.

The simplest design, used by RNNoise, predicts one gain per band on a coarse set of bands and applies it. This is fast because there are few numbers to predict. The cost is precision: when noise and voice share the exact same frequency at the same instant, a single per-band gain cannot separate them — it either keeps both or kills both. The more advanced design, used by DeepFilterNet, adds a second stage that does something cleverer than a simple turn-it-down: it applies a small filter across several recent frames to reconstruct the fine detail of the voice even where noise overlapped it. That second stage is why DeepFilterNet sounds cleaner — and why it costs more compute and more latency, because reaching across several frames means waiting for them.

A two-row comparison diagram. Top row labelled RNNoise shows audio split into a small number of frequency bands, each getting a single gain value between zero and one, then recombined. Bottom row labelled DeepFilterNet shows the same band-gain stage followed by a second deep-filtering stage that spans several frames to restore voice detail, with an annotation that the extra stage adds quality and latency. Figure 2. RNNoise predicts one gain per frequency band. DeepFilterNet adds a second stage that filters across frames to rebuild voice detail — cleaner, but heavier.

The Number That Gates A Live Call: Added Latency

There is one metric that decides whether a noise suppressor can be used in a live conversation at all, and it is easy to overlook because a demo recording never reveals it. That metric is added algorithmic latency: the extra delay the suppressor introduces between sound going in and clean sound coming out.

The delay has two sources, and both come from the design above. The first is the frame size: the suppressor cannot process a frame until the whole frame has arrived, so a 10-millisecond frame means a 10-millisecond wait before processing can even start. The second is look-ahead: a model that reaches across several future frames to do its job — like DeepFilterNet's second stage — must wait for those frames to arrive too. Add the two together and you get the suppressor's contribution to the delay.

Why does a few milliseconds matter? Because noise suppression is not the only thing in the path. In a live call, audio travels from the speaker's microphone, gets captured and encoded, crosses the network, gets decoded, and reaches the listener's ear. The international standard for voice quality, ITU-T Recommendation G.114, sets the well-known guideline: keep the total one-way delay — mouth to ear — at or below 150 milliseconds for the conversation to feel natural, with 150 to 400 milliseconds still usable but degraded (ITU-T, 2003). Every millisecond the suppressor adds lands on top of everything else in that 150-millisecond budget. A suppressor that adds 40 milliseconds has quietly spent more than a quarter of the entire budget before the audio has even left the device.

Let us make the budget concrete with arithmetic. Suppose your network round adds a typical 80 milliseconds one-way, and capture plus encode plus decode adds another 30 milliseconds. That is:

80 ms (network) + 30 ms (capture/encode/decode) = 110 ms baseline

You now have 40 milliseconds of headroom before you cross the 150-millisecond line:

150 ms (G.114 target) − 110 ms (baseline) = 40 ms remaining

Drop in RNNoise at about 10 milliseconds and you stay comfortably inside the budget with room to spare. Drop in DeepFilterNet at about 40 milliseconds and you have spent the entire remaining headroom on noise suppression alone — fine if the rest of your pipeline is lean, dangerous if it is not. This is the calculation every team should run before committing to a model, and almost none do.

A horizontal stacked-bar latency budget. A baseline bar shows network plus capture and encode and decode summing to 110 milliseconds, then three scenarios stack a noise-suppression segment on top: RNNoise adding 10 milliseconds staying well under the line, Krisp adding about 25 milliseconds, and DeepFilterNet adding about 40 milliseconds reaching the 150-millisecond G.114 threshold marked by a dashed vertical line. Figure 3. Added latency stacks on top of the rest of the call. RNNoise leaves headroom; DeepFilterNet spends most of it. The dashed line is the 150 ms G.114 target.

A companion number is the real-time factor, abbreviated RTF: how long the model takes to process one second of audio. An RTF below 1 means the model processes audio faster than it arrives, which is the minimum requirement for real-time use. RNNoise runs far below 1 on a single CPU core. DeepFilterNet posts an RTF of 0.19 on a single thread of a modest laptop processor and 0.42 on a Raspberry Pi 4 — comfortably real-time, but it is doing more work, so it needs a more capable chip than RNNoise to stay there (Schröter et al., 2022; Schröter et al., 2023).

Meet The Three Options

RNNoise — the tiny, free open-source workhorse

RNNoise is an open-source noise suppression library from the Xiph.Org Foundation, the group behind the Opus audio codec, created by audio engineer Jean-Marc Valin. Its design is described in a 2018 paper with a name that captures the whole idea: "A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement" (Valin, 2018). "Hybrid" is the key word — RNNoise marries classic signal processing with a small neural network, which is what makes it so light.

The numbers tell the story of its smallness. RNNoise processes audio in 10-millisecond frames at the full 48-kilohertz sample rate, predicts gains on 22 frequency bands using a type of compact memory network called a Gated Recurrent Unit, and runs comfortably on a single CPU core (Valin, 2018). The model file is a few hundred kilobytes. Because it is so small, it compiles to WebAssembly — a format that runs at near-native speed inside a web browser — so RNNoise is a popular way to add noise suppression to a browser-based call without any server at all (Gcore, 2026).

The licence is as friendly as the footprint. RNNoise is released under the BSD licence, a permissive open-source licence that allows both free and commercial use with almost no restrictions (Xiph.Org, 2026). You can ship it in a commercial product without paying anyone. The catch is quality: RNNoise is excellent for steady, simple noise like a fan or a hum, but on hard cases — overlapping speech in the background, sudden clatter, heavy reverberation — its single-gain-per-band design shows its age compared with newer models. It is the right tool when compute is scarce and the noise is ordinary.

DeepFilterNet — the open-source quality leader

DeepFilterNet is a newer open-source suppressor from researchers at the University of Erlangen, built specifically to beat the quality ceiling that simple band-gain models like RNNoise hit, while still running in real time. It has gone through three published versions: the original in 2021, DeepFilterNet2 in 2022 focused on running on small embedded devices, and DeepFilterNet3 in 2023 (Schröter et al., 2021; 2022; 2023).

Its quality comes from the two-stage design described earlier. The first stage cleans up the broad shape of the sound using gains on perceptually spaced bands — bands sized to match how human hearing works, finer where the ear is sensitive. The second stage, called deep filtering, applies a short multi-frame filter to the lowest frequencies (below about 5 kilohertz, where most voice energy lives) to reconstruct detail that a simple gain would have smeared (Schröter et al., 2023). The result measurably outperforms band-gain-only approaches.

That power has a price in compute and delay. DeepFilterNet uses 20-millisecond windows with a 10-millisecond hop and a two-frame look-ahead, which works out to roughly 40 milliseconds of added algorithmic latency — about four times RNNoise (Schröter et al., 2023). It has around 2 million internal parameters versus RNNoise's tens of thousands, and it is written in Rust for speed. Its licence is also permissive — dual-licensed MIT or Apache 2.0, both allowing free commercial use (Rikorose, 2026). DeepFilterNet is the right tool when you want the best open quality and your hardware and latency budget can absorb the extra cost.

Krisp — the managed commercial SDK

Krisp is a commercial Voice AI company whose noise-cancellation software development kit — an SDK, a pre-built code library a developer drops into an app — powers noise suppression inside Discord, RingCentral, and, by the company's own count, more than a hundred other applications (Krisp, 2026). Where RNNoise and DeepFilterNet are models you integrate and operate yourself, Krisp is a product: you license it, embed the SDK, and the vendor handles the model, the updates, and the platform coverage.

The technical pitch rests on three points. First, it runs entirely on the device — all audio processing happens locally and, by Krisp's statement, voice data is never uploaded to its servers, which matters for privacy-sensitive deployments like telemedicine (Krisp, 2026). Second, it is broad: the SDK ships for Windows, macOS, Linux, Android, iOS, and the major browsers via a WebAssembly build, so one vendor relationship covers every surface (Krisp, 2026). Third, the latency is competitive: Krisp processes in 10-millisecond frames with a typical added algorithmic latency around 25 milliseconds, and a lightweight voice-isolation variant reaching about 15 milliseconds (Krisp, 2026).

The trade is cost and control. Krisp is paid — its consumer app offers a free tier of 60 minutes of noise cancellation per day, a Pro tier around $8 per month, and business and enterprise tiers above that, while the developer SDK is licensed commercially through a sales process rather than a public price (Krisp, 2026). You do not control or inspect the model, and you depend on the vendor's roadmap. For many teams that is exactly the right trade: you get production-grade quality across every platform without building or maintaining an audio-ML pipeline.

A two-column-plus comparison card titled host it versus buy it. The host-it side groups RNNoise and DeepFilterNet, listing free permissive licence, you run and update it, audio stays local, you own the quality ceiling. The buy-it side lists Krisp, with commercial licence, vendor runs and updates it, audio stays local on device, production quality across all platforms, with the trade-off cells lightly tinted. Figure 4. The fork: self-host an open model (RNNoise or DeepFilterNet) and own the operations, or license Krisp and buy the operations away.

The 2026 Comparison, Side By Side

The table below collects the numbers that decide most choices. Latency figures are added algorithmic latency from each project's own documentation or paper; real-time factors are from the DeepFilterNet papers' published measurements; licensing is read from each project's repository in May 2026. There is no single winner — the right column depends on what your product cares about most.

Suppressor Type Added latency Compute target Quality Licence Best for
RNNoise Open-source, hybrid DSP + RNN ~10 ms One CPU core / browser WASM Good on steady noise BSD (free commercial) Tight compute, browser, ordinary noise
DeepFilterNet 3 Open-source, two-stage deep filtering ~40 ms Modern CPU; RTF 0.19 laptop, 0.42 Pi 4 Best open quality MIT / Apache 2.0 Highest open quality within budget
Krisp SDK Commercial, on-device ~25 ms (15 ms lite) CPU, all major platforms Production-grade, broad Commercial licence Cross-platform, zero ops, privacy

Read it as trade-offs, not a leaderboard. RNNoise wins footprint and the browser. DeepFilterNet wins raw open quality. Krisp wins platform breadth and hands you zero operations. A fourth, free baseline is worth knowing too: every modern browser already exposes a built-in noise suppressor (more on that below), so the real question is often "is the free built-in good enough, and if not, which of these three do I reach for?"

How To Add Noise Suppression To A Browser Call — The Free Baseline First

Before integrating any of the three, check whether you even need to. The web's own standard already includes noise suppression, and turning it on costs one line.

The W3C Media Capture and Streams specification — the standard that defines how a web page gets access to a microphone — includes a constraint called noiseSuppression. When you request microphone access, you pass it as an option, and the browser applies its built-in suppressor (W3C, 2026). Here is the whole thing:

// Request the microphone with the browser's built-in noise suppression on.
navigator.mediaDevices.getUserMedia({
  audio: { noiseSuppression: true, echoCancellation: true }
});

Firefox enables this by default; Chrome applies its own built-in suppressor, the noise-suppression module that ships inside the WebRTC engine (W3C, 2026). For many products this free baseline is genuinely enough, and the correct engineering decision is to ship it and move on.

You reach past the baseline when the built-in suppressor is not good enough — when users complain about specific noise it misses, or when you need consistent behaviour across every browser rather than each browser's own version. That is when you swap in RNNoise compiled to WebAssembly, or license Krisp's browser SDK, feeding the microphone audio through your chosen model using the browser's audio-processing plumbing before it reaches the call. The deep mechanics of wiring a custom suppressor into a live WebRTC call — where exactly it sits in the pipeline, how to avoid double-processing — is its own topic, covered in our real-time noise suppression in WebRTC deep dive.

How To Tell If A Suppressor Is Actually Good — The Metrics

Vendors love to show you a before-and-after recording, which proves nothing because they chose the recording. To compare suppressors honestly you need objective metrics, and a basic vocabulary of them lets you read any benchmark critically.

The gold standard is still human listening, formalised by an international standard. ITU-T Recommendation P.835 defines a subjective test built specifically for noise suppression: listeners rate each clip three times — once for the speech quality alone (called SIG), once for how intrusive the leftover background noise is (BAK), and once for overall quality (OVRL), each on a 1-to-5 scale (ITU-T, 2003). Rating the three separately is the clever part, because a suppressor can win on killing noise while losing on distorting the voice, and a single score would hide that. A related standard, ITU-T P.808, defines how to run these listening tests at scale over the internet using crowdsourced listeners (ITU-T, 2021).

Because human tests are slow and expensive, the field also uses automatic metrics that a computer can compute instantly. The most relevant is DNSMOS, a model from Microsoft that listens to a clip and predicts the P.835 SIG, BAK, and OVRL scores a human panel would give — letting you score thousands of clips without a listening panel (Reddy et al., 2021). Older objective metrics you will see quoted include PESQ (defined in the now-withdrawn ITU-T P.862) and its successor POLQA (ITU-T P.863), plus STOI for intelligibility and SI-SDR for raw signal cleanliness (ITU-T; Taal et al., 2011; Le Roux et al., 2019). For most product decisions, DNSMOS P.835 on your own noise recordings is the practical workhorse; reserve a full P.835 or P.808 human test for the final go/no-go before a major launch.

Much of this benchmarking culture grew out of Microsoft's Deep Noise Suppression (DNS) Challenge, a research competition run across the major signal-processing conferences from 2020 to 2023 that standardised datasets and the DNSMOS metric, and shipped baseline models called NSNet and NSNet2 that are still useful reference points (Reddy et al., 2022). When you read a noise-suppression paper's results, they are almost always measured on DNS Challenge data with DNSMOS — knowing that lets you compare numbers across papers.

A Common Mistake — Cranking Suppression To Maximum

The single most frequent error teams make is to treat noise suppression as a slider that should be set to maximum. More suppression sounds like strictly better noise removal, so why not remove all of it?

Because aggressive suppression damages the voice. When a model is pushed to silence every band that might contain noise, it inevitably silences quiet parts of speech too — the soft consonants, the trailing ends of words, the breaths that make speech sound human. The result is the underwater, robotic, "cutting out" artefact you have heard on over-processed calls, where the speaker sounds like they are talking through a broken connection. This is exactly why ITU-T P.835 scores speech quality (SIG) separately from noise removal (BAK): a suppressor tuned for a perfect BAK score often posts a terrible SIG score, and the overall experience is worse than doing less.

The fix is to tune for overall quality, not maximum noise removal, and to test on your real noise — not the vendor's demo clip. Record samples in the actual conditions your users face (a home office with a dog, a hospital ward, a noisy classroom), run each candidate suppressor at a few strength settings, and score them with DNSMOS P.835. The best setting is almost never the most aggressive one. A suppressor that removes 90 percent of the noise while keeping the voice natural beats one that removes 99 percent and makes the speaker sound like a robot.

Where Fora Soft Fits In

We build the products that depend on clean live audio — video conferencing platforms, telemedicine systems where a doctor must hear every word, e-learning classrooms, and surveillance and broadcast tools. In that work the suppressor choice is rarely about the single highest quality score; it is about matching the model to the product's real constraints. A privacy-sensitive telemedicine deployment often points toward an on-device model — open-source self-hosted or Krisp's local SDK — so patient audio never leaves the device. A browser-first conferencing feature on a tight compute budget often starts with the built-in noiseSuppression baseline and reaches for RNNoise-in-WebAssembly only where users hit its limits. The pattern we apply is to fix the latency budget first, then the privacy and platform requirements, and only then weigh quality against cost — because a suppressor that blows the latency budget is not high quality, it is unusable.

How To Choose — Five Questions

Work through these in order; each one narrows the field.

  1. Is the built-in browser suppressor already enough? Turn on noiseSuppression: true, test with real users, and if complaints stop, ship it and stop here. Do not integrate a model you do not need.
  2. What is your latency budget? Run the G.114 arithmetic for your pipeline. If you have under ~20 milliseconds of headroom, RNNoise is the only safe self-host option; DeepFilterNet's ~40 milliseconds will not fit.
  3. What hardware must it run on? A single CPU core or a phone or the browser points to RNNoise. A capable modern CPU that can absorb the load points to DeepFilterNet for its quality. Need it everywhere with no per-platform work? Krisp.
  4. Can you afford the operations, or do you want to buy them away? Self-hosting RNNoise or DeepFilterNet is free in licence but costs engineering time to integrate, tune, and maintain. Krisp costs money but removes that burden across every platform.
  5. Does privacy or regulation forbid sending audio off-device? All three run on-device, so all three pass — but verify the specific deployment, and prefer the open models or Krisp's local SDK over any cloud-API suppressor.

What To Read Next

Talk To Us / See Our Work / Download

References

  1. Valin, J.-M. A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement, Proc. IEEE Multimedia Signal Processing (MMSP) Workshop, 2018, arXiv:1709.08243, accessed 2026-05-31. https://arxiv.org/abs/1709.08243. Primary source (the original RNNoise paper, by RNNoise's author) for the hybrid DSP + deep-learning design: 10 ms frames at 48 kHz, 22 Bark-scale frequency bands, a Gated Recurrent Unit network chosen over LSTM for lower CPU and memory, per-band gain estimation, and real-time operation on a single CPU core.
  2. Schröter, H., Escalante-B., A. N., Rosenkranz, T., Maier, A. DeepFilterNet: A Low Complexity Speech Enhancement Framework for Full-Band Audio based on Deep Filtering, arXiv:2110.05588, 2021, accessed 2026-05-31. https://arxiv.org/abs/2110.05588. Primary source for the original DeepFilterNet two-stage design: ERB-scaled band gains in stage one, deep filtering in stage two to reconstruct periodic voice detail.
  3. Schröter, H., et al. DeepFilterNet2: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio, arXiv:2205.05474, 2022, accessed 2026-05-31. https://arxiv.org/abs/2205.05474. Primary source for the embedded-optimised version and its measured real-time factor of 0.42 on a Raspberry Pi 4, plus grouped linear layers and depthwise-separable convolutions that halve the footprint.
  4. Schröter, H., et al. DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement, Interspeech 2023, arXiv:2305.08227, accessed 2026-05-31. https://arxiv.org/abs/2305.08227. Primary source for DeepFilterNet3: ~0.19 real-time factor on a single laptop CPU thread, ~40 ms added algorithmic latency (20 ms window, 10 ms hop, two-frame look-ahead), and deep filtering applied to the lowest frequency bins below ~5 kHz.
  5. Krisp. Real-Time AI Voice SDK and Noise Cancellation (developer documentation), accessed 2026-05-31. https://krisp.ai/developers/ and https://sdk-docs.krisp.ai/docs/noisecancellation. First-party source for Krisp's on-device processing (no audio uploaded), platform coverage (Windows, macOS, Linux, Android, iOS, browsers via WebAssembly), 10 ms internal frames, ~25 ms typical added algorithmic latency (≈15 ms for the lite voice-isolation model), inbound/outbound directions, and deployment scale across Discord, RingCentral, and 100+ apps.
  6. ITU-T. Recommendation P.835 — Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm, approved 13 November 2003, accessed 2026-05-31. https://www.itu.int/rec/T-REC-P.835. Primary standards source for the three-axis subjective test (SIG speech-signal, BAK background-noise, OVRL overall, each 1–5) that is the human gold standard for noise-suppression evaluation. Per the standards-first rule, this controls the definition of how suppression quality is measured; vendor "before/after" demos are not a substitute.
  7. ITU-T. Recommendation P.808 — Subjective evaluation of speech quality with a crowdsourcing approach, 2021, accessed 2026-05-31. https://www.itu.int/rec/T-REC-P.808. Primary standards source for running P.835-style listening tests at scale over the internet (ACR/DCR/CCR rating methods), complementary to the lab-based P.800/P.835.
  8. ITU-T. Recommendation G.114 — One-way transmission time, accessed 2026-05-31. https://www.itu.int/rec/T-REC-G.114. Primary standards source for the mouth-to-ear delay budget: ≤150 ms one-way for high-quality interactive voice, 150–400 ms acceptable but degraded — the budget every suppressor's added latency must fit inside.
  9. W3C. Media Capture and Streams (Recommendation), accessed 2026-05-31. https://www.w3.org/TR/mediacapture-streams/. Primary standards source for the browser noiseSuppression constraint on getUserMedia audio: how it is requested, how capability is reported (a source that cannot disable NS reports true; a scriptable one reports [true, false]), and that constraints apply to tracks from MediaDevices.getUserMedia().
  10. Reddy, C. K. A., Gopal, V., Cutler, R. DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors, ICASSP 2022, arXiv:2110.01763, accessed 2026-05-31. https://arxiv.org/abs/2110.01763. Primary academic source for the automatic metric that predicts P.835 SIG/BAK/OVRL scores without a human panel (reported correlation 0.94 SIG, 0.98 BAK/OVRL), the practical workhorse for scoring many clips.
  11. Reddy, C. K. A., et al. ICASSP 2022 Deep Noise Suppression Challenge, arXiv:2202.13288, 2022, accessed 2026-05-31. https://arxiv.org/abs/2202.13288. Source for the Microsoft DNS Challenge series (2020–2023), its P.835-based subjective framework, the NSNet/NSNet2 baseline models, fullband 48 kHz datasets, and the DNSMOS P.835 and word-accuracy objective metrics provided to participants.
  12. Xiph.Org. rnnoise (repository and licence), accessed 2026-05-31. https://github.com/xiph/rnnoise. First-party source for RNNoise's BSD licence (free commercial use) and reference implementation.
  13. Rikorose. DeepFilterNet (repository and licence), accessed 2026-05-31. https://github.com/Rikorose/DeepFilterNet. First-party source for DeepFilterNet's dual MIT / Apache 2.0 licensing (free commercial use) and the Rust reference implementation.
  14. Gcore. Noise reduction in WebRTC, accessed 2026-05-31. https://gcore.com/blog/noise-reduction-webrtc. Vendor-deployer source for RNNoise compiled to WebAssembly for in-browser WebRTC noise suppression and its low CPU footprint on phones, tablets, and laptops.