Published: 2026-06-04 · Reading time: 16 min read · Author: Nikolay Sapunov, CEO at Fora Soft
Why this matters
If you build or run a video product — a conferencing app, a streaming service, an e-learning platform, a telemedicine tool — the sample rate is one of the first numbers your engineers will set, and the wrong choice quietly costs you bandwidth, CPU, and sometimes lip-sync. You do not need to be an engineer to make the right call, but you do need to understand what the number controls and why "higher" is not automatically "better". This article is written for a smart reader with no audio background; a senior engineer should still find every fact correct, cited to the standard that defines it. By the end you will be able to choose a sample rate with confidence and explain the choice to anyone on your team.
A one-paragraph refresher: what a sample rate is
Before this number means anything, recall how sound becomes data, covered in full in What is digital audio: from sound wave to bits. A microphone turns the air's pressure changes into a smoothly wiggling electrical voltage. A chip called an analog-to-digital converter then measures the height of that voltage at regular, closely spaced moments, and writes each measurement down as a number called a sample. The number of measurements taken each second is the sample rate. Think of it as the tick of a very fast clock: at a sample rate of 48,000 hertz — written 48 kHz, where "k" means thousand and "hertz" means times per second — the system asks the microphone "how loud are you right now?" forty-eight thousand times every second. Everything in this article is about choosing how fast that clock should tick.
The one rule that decides the floor: Nyquist
There is exactly one law of physics that sets a hard minimum for the sample rate, and every sensible rate is built on top of it. It is the Nyquist–Shannon sampling theorem, and in plain words it says: to capture a sound faithfully, you must sample more than twice as fast as the highest frequency in that sound. The frequency is how fast the wave wiggles — a deep rumble wiggles slowly, a high whistle wiggles fast — and it is also measured in hertz.
Walk the arithmetic out loud, because it is the whole foundation. Human hearing reaches up to about 20,000 Hz, or 20 kHz, at its best in youth. Twice that is:
2 × 20,000 Hz = 40,000 Hz = 40 kHz
So any sample rate below 40 kHz throws away part of what a person can hear. That single number, 40 kHz, is why every full-quality rate you will ever meet — 44.1, 48, 96, 192 — sits at or above it, never below. The rates differ in how far above the floor they sit and why.
What happens if you break the rule and sample too slowly? The fast wiggles you failed to measure do not simply vanish. They reappear during playback as a wrong, lower pitch that was never in the room — a false tone the system invents from the gaps in its own measurements. This artifact is called aliasing, and it is the signature defect of an under-sampled recording. To prevent it, every recording chain runs the sound through an anti-aliasing filter first — a circuit that removes frequencies too high to be sampled correctly before the measuring even begins.
Figure 1. Human hearing tops out near 20 kHz, so Nyquist requires at least 40 kHz of sampling. Below that floor, an unmeasured high tone reappears as a false low-pitched alias.
That filter is also the reason the common rates sit a little above 40 kHz rather than exactly on it. No real filter can cut off perfectly at 20 kHz and leave everything below untouched; every filter needs a transition band, a small frequency range over which it ramps down from "let through" to "block". A 44.1 kHz rate can capture up to 22.05 kHz (half the rate, called the Nyquist frequency), leaving roughly 2 kHz of room between the 20 kHz audible ceiling and the 22.05 kHz wall for the filter to do its work. A 48 kHz rate captures up to 24 kHz, giving the filter even more room and letting it ramp down more gently without harming the audible range. That extra breathing room is one of two real reasons video chose the higher number — the other is about staying in sync with picture, which is the heart of this article.
Where 44.1 kHz came from: a video-tape accident
The most surprising fact about the rate that all recorded music uses is that it was never chosen for how it sounds. It was chosen because of how digital audio was stored in the late 1970s, before hard drives were big or cheap enough for the job.
At that time, the only affordable way to record the huge stream of numbers that digital audio produces was to disguise those numbers as a video signal and record them onto a video cassette using a PCM adaptor — a box that sat between the audio gear and a video tape recorder. Sony's PCM-1600, introduced in 1979, did exactly this, and its descendants became the professional digital recorders of the era. When the compact disc was designed shortly after, it inherited the rate those machines already used, so that recordings could move to CD without resampling.
Why 44,100 specifically? Because that number fell out of the structure of a video frame. A video signal is built from horizontal lines, drawn in bursts called fields, a fixed number of times per second. The PCM adaptor packed audio samples into the usable lines of each field. The two television systems of the day produced two nearly identical answers:
NTSC (US/Japan): 245 usable lines × 60 fields/s × 3 samples/line ≈ 44,056 samples/s
PAL (Europe): 294 usable lines × 50 fields/s × 3 samples/line = 44,100 samples/s
The PAL figure, 44,100, won as the round number, and CD enshrined it in the Red Book standard in 1980. There is no acoustic magic in it. It is the fingerprint of a 1970s storage hack, and every song you have ever streamed still carries that fingerprint because the entire music catalogue was built on it.
Where 48 kHz came from: built to match picture
While CD audio inherited 44.1 kHz from old video tape, the broader industry knew it needed one agreed rate for professional and broadcast work, and it started choosing deliberately in 1981. The candidates clustered between 45 and 60 kHz. Sixty kHz would have been ideal for film and video synchronization but was judged wastefully high for audio that never needed to reach 30 kHz. After narrowing the field to rates that synchronized cleanly with film and television — 45, 48, 50, 52.5, and 54 kHz — the industry landed on 48 kHz.
The deciding factor was synchronization with picture, and it comes down to simple division. Film and much of television run at 24 frames per second. For audio to align neatly with each video frame — so that you can cut, edit, and package sound and picture together without fractional leftovers — the number of audio samples per second should divide evenly by the number of frames per second. Check both candidates:
48,000 samples/s ÷ 24 frames/s = 2,000 samples per frame (whole number — clean)
44,100 samples/s ÷ 24 frames/s = 1,837.5 samples per frame (fractional — messy)
Forty-eight thousand divides cleanly into 24; forty-four thousand one hundred does not. That half-sample of leftover, repeated frame after frame, is exactly the kind of drift that causes audio and video to slide apart over a long program — the lip-sync problems covered in Lip-sync: ITU-R BT.1359 tolerance windows. Europe had a second reason to like 48 kHz: it was already broadcasting in 32 kHz, and 48 relates to 32 by a clean 3:2 ratio, making conversion between the two painless.
The Audio Engineering Society made the choice official and still maintains it. Its recommended practice AES5-2018 (reaffirmed 2023) names 48 kHz as the preferred sampling frequency for professional audio origination, processing, and interchange, while recognizing 44.1 kHz for consumer music, 32 kHz for some transmission, and 96 kHz for higher-bandwidth work. When a standards body says "preferred", it means: use this unless you have a specific reason not to.
Figure 2. How the two rates were chosen: 44.1 kHz fell out of 1970s video-tape storage; 48 kHz was selected to divide cleanly into video frame rates and confirmed by AES5.
Why 48 kHz still wins today: the whole pipeline speaks it
A choice made in 1981 could have faded, but instead it hardened into a default that runs through the entire modern stack — which is the practical reason you should reach for it now. Three layers reinforce each other.
First, the operating systems. The audio engine inside Windows, Linux, and Android uses 48 kHz as its standard internal rate for output. When your app plays sound, the OS is mixing everything at 48 kHz; feeding it 48 kHz audio means no resampling, and every resample is a small chance to add noise or cost CPU.
Second, the codecs. Opus, the codec that the IETF made mandatory for WebRTC and that powers most real-time voice and video calls, runs its core internally at 48 kHz — the specification, IETF RFC 6716, fixes the codec's high-quality (MDCT) layer at exactly that rate, and the decoder simply resamples down for lower rates because every supported rate divides evenly into 48 kHz. The WebRTC pipeline, covered in The WebRTC audio pipeline end-to-end, assumes 48 kHz from microphone to speaker.
Third, the platforms. YouTube's published encoding guidance recommends 48 kHz audio, and as of its 2025 specification update it accepts modern formats including AAC-LC, Opus, and its new Eclipsa immersive format — all at 48 kHz. Deliver music at 44.1 kHz and YouTube will resample it to 48 kHz on ingest anyway.
The lesson for a video product is blunt: capture, process, and deliver at 48 kHz, and the whole chain — OS, codec, platform — agrees with you and does no extra work. The only common exception is a pure music application whose source library is 44.1 kHz; there, keeping 44.1 kHz end-to-end avoids one resample. Mixing the two rates inside one pipeline is the worst of both worlds.
When higher rates earn their keep: 88.2, 96, 192 kHz
If 48 kHz already captures everything the ear can hear, why do professional recorders and editing suites offer 96 kHz and 192 kHz at all? Not because they sound better on playback — the evidence on that is settled and discussed below — but because of what happens during recording and processing, before the final file exists.
Two real benefits explain the high rates. The first is filter headroom. At 96 kHz the Nyquist frequency is 48 kHz, so the anti-aliasing filter has an enormous, gentle transition band far above anything audible; the filter can be simpler and introduce less distortion in the audible range. The second, and more important, is nonlinear processing. Effects like distortion, saturation, and clipping deliberately create new high frequencies (harmonics) that did not exist in the original sound. At 48 kHz, harmonics that land above 24 kHz have nowhere to go and fold back down as aliasing — audible grit. Recording or processing at 96 kHz gives those harmonics room above the audible range, which is why software synthesizers and distortion plugins often have an "oversampling" button that temporarily runs the math at a higher rate. The Recording Academy and the high-resolution audio standard set by the AES, CTA, and JAS treat 24-bit/96 kHz as the preferred resolution for tracking, mixing, and mastering for these production reasons.
The cost is mechanical and real. Double the sample rate and you double the data, the storage, and the processing load:
96 kHz vs 48 kHz: 2× the samples → 2× the file size and roughly 2× the CPU
192 kHz vs 48 kHz: 4× the samples → 4× the file size and roughly 4× the CPU
For a real-time product, that doubling competes directly with the CPU budget you need for echo cancellation and noise suppression. For a streaming product, it doubles your storage and egress for audio that listeners cannot distinguish from 48 kHz. High rates belong in the studio, not in the delivery file.
The expensive mistake: "higher numbers always sound better"
The single most common and most costly misconception in audio is that a bigger sample rate automatically means better sound for the listener. It does not, and the science here is not ambiguous. In properly controlled double-blind ABX tests — where listeners, including trained audio professionals, must identify which of two clips is the higher rate without knowing which is which — people cannot reliably tell 44.1 kHz or 48 kHz apart from 96 kHz or 192 kHz on the same finished content. Even the Wikipedia summary of the standards-era discussion states it plainly: humans cannot easily hear the difference between 48, 44.1, and other similar rates.
This makes sense from Nyquist. Once you sample above roughly 40 kHz, you have already captured every frequency a human can hear; everything a 192 kHz file adds lives above the range of human hearing, where it does nothing for the listener but quadruple the file. The higher rates earn their place during capture and editing, where headroom protects against mistakes — not at delivery, where the extra data is pure overhead.
The practical failure modes follow directly:
- Choosing 96 kHz for a conferencing app does not improve call clarity. It wastes bandwidth and CPU that would be better spent on echo cancellation, and most microphones and network paths cannot benefit from it anyway.
- Streaming 96 kHz audio to consumers doubles your storage and delivery cost for a difference no listener can hear in a blind test.
- Mixing 44.1 kHz and 48 kHz inside one pipeline forces a resample at every boundary; each conversion is a small quality and CPU cost, and because 48 ÷ 44.1 is not a whole number, the resampling math is the messy kind. Pick one rate for the whole chain.
Match the rate to the job. Capture and edit high if your work involves heavy nonlinear processing; deliver at 48 kHz (or 44.1 kHz for a pure-music catalogue) always.
A decision guide you can use today
The choice collapses to a short decision tree. Read it top to bottom and stop at the first row that matches your product.
| Your situation | Use this rate | Why |
|---|---|---|
| Real-time voice/video (WebRTC, conferencing, telemedicine) | 48 kHz | Opus and the OS audio stack are native 48 kHz; no resampling |
| Video streaming / OTT / e-learning delivery | 48 kHz | Matches picture frame rates and platform (YouTube) defaults |
| Pure-music streaming from a 44.1 kHz catalogue | 44.1 kHz end-to-end | Avoids one lossy resample of the source library |
| Studio recording with heavy effects/processing | 96 kHz capture | Headroom for nonlinear harmonics; deliver downsampled to 48 kHz |
| Archival master of irreplaceable source | 96 kHz, 24-bit | High-resolution standard for preservation; storage is cheap for masters |
| Voice-only narrowband (legacy telephony) | 8 or 16 kHz | Defined by the codec (G.711, Opus narrowband); covered in Block 2 |
Table 1. Choosing a sample rate by product type. When in doubt for any video product, the answer is 48 kHz.
Figure 3. A top-down decision tree: most video products land on 48 kHz; only music catalogues and studio capture branch away.
Where Fora Soft fits in
Across the video products we build at Fora Soft — conferencing, telemedicine, OTT, e-learning, surveillance, and AR/VR — the sample-rate decision is one we make early and rarely revisit, because getting it right at capture is the cheapest quality choice in the whole pipeline. For real-time voice in conferencing and telemedicine, we run 48 kHz to stay native with Opus and the WebRTC stack, or step down to 16 kHz wideband when the goal is voice clarity and we want CPU headroom for echo cancellation and noise suppression. For OTT and e-learning playback we deliver 48 kHz to lock audio to picture. The mistake we most often correct in inherited code is a pipeline that captures at one rate and resamples at every stage; standardizing on 48 kHz end-to-end removes a whole class of quality and sync bugs.
What to read next
- Bit depth and dynamic range: 16-bit, 24-bit, 32-bit float
- Channels and channel layouts: mono, stereo, 5.1, 7.1, 7.1.4
- The WebRTC audio pipeline end-to-end
Call to action
- Talk to a audio engineer — book a 30-minute scoping call to talk through your sample rate for video plan.
- See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
- Download the Sample rate cheat sheet — One-page reference of the Nyquist rule, the frame-rate divisibility math behind 48 kHz, the origins of 44.1 vs 48 kHz, when to use 96/192 kHz, and a per-product rate guide.
References
- AES5-2018 (r2023) — AES recommended practice for professional digital audio: Preferred sampling frequencies for applications employing pulse-code modulation. Audio Engineering Society. Tier 1 (standard). Names 48 kHz as the preferred professional/video sampling frequency; recognizes 44.1, 32, and 96 kHz. https://www.aes.org/publications/standards/search.cfm?docID=14 (accessed 2026-06-04).
- IETF RFC 6716 (September 2012) — Definition of the Opus Audio Codec. JM. Valin, K. Vos, T. Terriberry. Tier 1 (standard). §2 / §4: the MDCT (CELT) layer operates internally at 48 kHz; the decoder resamples to other rates because they divide evenly into 48 kHz. Updated by RFC 8251. https://www.rfc-editor.org/rfc/rfc6716.html (accessed 2026-06-04).
- Nyquist–Shannon sampling theorem — C. E. Shannon, "Communication in the Presence of Noise," Proceedings of the IRE, 37(1), 1949. Tier 5 (foundational academic source). The minimum-sampling-rate rule: sample at more than twice the highest frequency. https://doi.org/10.1109/JRPROC.1949.232969 (accessed 2026-06-04).
- "Digital Audio Sample Rates: The 48 kHz Question" — Randy Hoffner, TVTechnology, 2003. Tier 4 (industry, broadcast engineer). The 1981 standardization process; candidate rates 45/48/50/52.5/54 kHz; the leap-frame and frame-rate reasoning behind 48 kHz. https://www.tvtechnology.com/opinions/digital-audio-sample-rates-the-48-khz-question (accessed 2026-06-04).
- "Explanation of 44.1 kHz CD sampling rate" — Henning Schulzrinne, Columbia University. Tier 6 (academic reference page). The NTSC/PAL line-and-field arithmetic (≈44,056 / 44,100 samples/s) behind the CD rate. https://www1.cs.columbia.edu/~hgs/audio/44.1.html (accessed 2026-06-04).
- 48,000 Hz — Wikipedia, accessed 2026-06-04. Tier 6 (orientation only). Corroborates frame-rate divisibility (48,000 ÷ 24 = 2,000; 44,100 ÷ 24 = 1,837.5), the 3:2 ratio to 32 kHz, OS-level 48 kHz defaults, and the no-audible-difference summary. Used for navigation; standards facts trace to AES5 and RFC 6716. https://en.wikipedia.org/wiki/48,000_Hz (accessed 2026-06-04).
- "Encoding specifications" / "Recommended upload encoding settings" — YouTube Help (Google), 2025 specification update. Tier 4 (platform documentation). 48 kHz recommended; AAC-LC, Opus, and Eclipsa Audio accepted; lossless (FLAC/PCM) preferred for the upload master. https://support.google.com/youtube/answer/1722171 (accessed 2026-06-04).
- "High Sampling Rates — Is there a Sonic Benefit?" — Sweetwater InSync. Tier 4 (vendor educational). The production case for 96 kHz (nonlinear-processing headroom, oversampling) and the storage/CPU cost; the 24-bit/96 kHz high-resolution baseline from AES/CTA/JAS and the Recording Academy. https://www.sweetwater.com/insync/high-sampling-rates-sonic-benefit/ (accessed 2026-06-04).
Per §4.3.2 source hierarchy: where vendor blogs imply higher sample rates audibly improve playback, this article follows the standards-and-science position — AES5 names 48 kHz as preferred for delivery, Nyquist explains why nothing audible lives above it, and controlled blind tests show no perceptible playback difference. The high-rate case is restricted to its true domain: capture and processing.


