Why This Matters
If you build or buy any product that plays video — streaming, conferencing, OTT, e-learning, telemedicine, or VR — you will eventually be asked "does it do spatial audio?" or "can we ship Dolby Atmos?" To answer well, you need to know what those words actually mean and why the industry arrived here. This article gives a product person the full timeline, the vocabulary (5.1, 7.1.4, object-based, ambisonics), and the pattern behind every successful and failed format — so you can tell a real feature from marketing, and scope an audio upgrade without guessing. It is written for someone with zero audio background; every term is defined before it is used.
The one pattern behind 60 years of audio
Before the timeline, here is the pattern that explains all of it. Every jump in the number of audio channels happened for the same reason: a new way of getting sound to the listener made the extra channels cheap. Stereo waited for the LP record and FM radio. Surround waited for the optical film soundtrack and then the DVD. Immersive audio waited for digital cinema files and internet streaming, where adding metadata costs almost nothing.
And every format that failed failed for the same reason too: it asked the buyer to spend money and effort without a clear, obvious payoff, or it forced them to pick a side in a format war they did not understand. Hold those two ideas — cheap delivery enables the leap, unclear benefit kills it — and the whole history reads like a single story.
Figure 1. Sixty years of channels, grouped into three eras: stereo, surround, and immersive.
Era 1: Stereo — the idea of two ears (1881–1940)
Humans hear in stereo because we have two ears spaced a few centimetres apart. A sound from your left reaches your left ear slightly sooner and slightly louder, and your brain reads that tiny difference as direction. Stereo audio — short for stereophonic, meaning "solid sound" — is simply the attempt to capture and replay those two slightly different signals so the illusion of direction survives.
The first demonstration came absurdly early. In 1881, the French engineer Clément Ader wired the Paris Opera to the International Exposition of Electricity two kilometres away. Listeners held two earpieces, one fed from a microphone on the left of the stage and one from the right, and reported hearing the orchestra spread out in front of them. That is binaural sound — two channels meant for two ears — half a century before anyone could record it.
The real foundation was laid in 1931 by Alan Blumlein, an engineer at EMI in Britain. His patent, UK 394,325 (filed 14 December 1931, granted June 1933), described almost everything modern stereo still uses: the coincident microphone pair for capturing a stereo image, the 45/45 method of cutting two channels into one record groove, and the matrix idea of combining and separating channels. Blumlein called it "binaural sound." Two years later, in April 1933, Bell Labs under Harvey Fletcher transmitted a live Philadelphia Orchestra concert, conducted by Leopold Stokowski, over three telephone lines to three loudspeakers in Washington D.C. — the first public demonstration of what Bell called "sound in auditory perspective."
The lesson of Era 1 is timing. The science of stereo was settled by 1933, but consumer stereo records did not arrive until 1958, because no affordable medium could carry two channels until then. Good engineering sat on a shelf for 25 years waiting for cheap delivery.
Era 2: Surround — putting the audience inside (1940–2010)
Surround sound is stereo's idea extended past the front wall: instead of two channels in front of you, you get channels around and behind you, so sounds can move through the room. The film industry led, because a cinema already had a justification to install many speakers and a captive audience who paid for the experience.
The film pioneers (1940–1955)
Disney got there first. Fantasia (1940) used a system called Fantasound, the first commercial film with a multichannel soundtrack. The premiere theatres were wired with dozens of speakers behind the screen and around the auditorium. (A note for accuracy: the music was recorded across many tracks, but the version actually played in theatres reproduced roughly three front channels plus surround, not the eight tracks sometimes claimed.) Fantasound was too expensive to roll out widely, and World War II killed it — another case of great engineering arriving before cheap delivery.
The widescreen boom of the 1950s revived multichannel sound as a weapon against television. Cinerama (1952) carried a separate multitrack magnetic soundtrack. CinemaScope (1953) striped four magnetic tracks onto 35mm film: left, centre, right, and one surround channel. Todd-AO (1955) put six magnetic tracks on wide 70mm prints. The common thread: magnetic stripes on film were the new cheap-enough delivery medium, so the channel count jumped.
The cautionary tale: quadraphonic (1970s)
Then the home audio industry tried to make the same leap — and face-planted. Quadraphonic sound, or "quad," put four channels in the home: two front, two rear. On paper it was surround for your living room a decade before anyone else managed it. In practice it became the textbook example of how to kill a good idea.
The fatal mistake was a format war. Three incompatible systems launched at once around 1971. SQ (from CBS) and QS (from Sansui) were matrix systems that folded four channels into two and tried to unfold them on playback — but the unfolding was imperfect, and on busy music the rear channels bled into the front. CD-4 (from JVC and RCA) was a discrete system that genuinely kept four separate channels using an ultrasonic carrier on the record, but it needed a special stylus, and ordinary record wear destroyed the rear-channel separation within a few playings.
A buyer in 1973 faced an impossible choice: pick one of three systems, buy extra speakers and amplifiers, and hope their favourite label backed the same horse. Most people heard a quad demo, shrugged, and kept their stereo. By 1977 the major labels had abandoned it. Quad failed not because the engineering was bad — CD-4 worked — but because it demanded money and a confusing commitment for a benefit most listeners could not clearly hear. Remember this story; the industry has repeated its mistakes several times since.
Dolby makes surround work (1976–1999)
The breakthrough that made home surround stick came from Dolby in 1976, and the trick was to not ask for new media at all. Dolby Stereo encoded four channels — left, centre, right, and a single surround — into the two optical soundtracks already printed on every 35mm film. This pair of mixed-down signals is called Lt/Rt, for "left-total" and "right-total." A decoder in the cinema unfolded the two tracks back into four. Because it rode on the existing two-track optical format, every cinema could play a Dolby Stereo film with no change to the projector. Star Wars (1977) made it famous.
Dolby brought the same matrix home. Dolby Surround (1982) let a home decoder pull a mono surround channel out of any stereo signal. Dolby Pro Logic (1987) improved the decoder, adding active steering and a discrete centre channel for dialogue. The genius was backward compatibility: a Dolby-encoded soundtrack still played perfectly in plain stereo, so studios lost nothing by adding it. Quad had forced a choice; Dolby forced nothing.
Digital 5.1 and the meaning of "point one" (1991–1993)
The next leap waited for digital audio, and it gave us the number everyone now recognises: 5.1.
In June 1992, Batman Returns became the first wide cinema release with Dolby Digital, a compression format known technically as AC-3. (An experimental Dolby Digital screening of Star Trek VI reportedly ran in a handful of theatres in late 1991, but Batman Returns was the first real rollout.) The format delivered five full-range channels plus one limited channel — written "5.1."
Here is what that notation actually means, because it confuses almost everyone:
5.1 = five full-bandwidth channels (Left, Centre, Right, Left Surround, Right Surround), each carrying the full ~20 Hz–20,000 Hz range, plus one Low-Frequency Effects channel (the "LFE") that carries only deep bass, roughly 20–120 Hz. The ".1" means that bass channel uses about a tenth of the bandwidth of a full channel — hence "point one," not a whole channel.
The LFE is the channel your subwoofer plays: the rumble of an explosion, not music. It is "point one" because it is a partial channel, not because there is one of it.
The 5.1 layout was not a Dolby invention alone; it was blessed by an international standard. Recommendation ITU-R BS.775 defines the reference "3/2" loudspeaker arrangement — three front channels, two surround channels, plus an optional LFE — that 5.1 follows: centre dead ahead, front left and right at ±30°, surrounds at roughly ±110°. The current edition, BS.775-4, was approved in December 2022, but the layout dates to the standard's first edition in 1994. AC-3 itself is formally specified in ATSC A/52 (current edition A/52:2015) and in ETSI TS 102 366.
A rival arrived almost immediately. DTS (Digital Theater Systems) debuted with Jurassic Park in 1993, also 5.1, but storing its audio on a CD-ROM synchronised to the film rather than printing it on the film itself. The Dolby-versus-DTS rivalry would shadow the industry for the next thirty years — but unlike the quad war, both were 5.1, so a buyer was never stranded.
Adding the back and the count climbs (1999–2010)
Once 5.1 was standard, more channels followed. Dolby Digital Surround EX (1999, debuting with Star Wars: Episode I) matrix-encoded a sixth channel — a centre-back surround — into the existing two surround channels, giving 6.1 while staying 5.1-compatible. The DVD and then Blu-ray gave home systems the bandwidth to go discrete: Dolby Surround 7.1 (2010) added two back-surround speakers, for seven full channels plus LFE.
7.1 = seven full-range channels (Left, Centre, Right, two side surrounds, two back surrounds) + one LFE. The number before the dot is always the count of full-range, ear-level speakers; the number after the dot is always the LFE.
By 2010 the surround era had reached its natural ceiling. You could keep adding speakers around the listener, but they were all at ear level, on a flat plane. The room had a floor of sound and no ceiling. The next leap had to go up — and it had to stop counting speakers altogether.
Era 3: Immersive — describing sound, not speakers (2010–today)
The immersive era did two things at once: it added height, and it changed what a soundtrack is. Instead of recording fixed speaker feeds, it started recording sounds as objects — each with a position in three-dimensional space — and let the playback device figure out which speakers to use. This is the most important conceptual shift in the whole history, so it deserves a careful walk-through.
Figure 2. The three ways to describe a soundtrack: locked to speakers, attached to objects, or captured as a whole soundfield.
Channel-based, object-based, scene-based
There are three philosophies for storing a soundtrack, and the immersive era is the move from the first to the second and third.
Channel-based audio is everything up to 7.1. The mix is a set of feeds, one per speaker. A 7.1 file literally contains eight signals labelled "this goes to the left-surround speaker." It is simple and exact, but it is locked: if your room has different speakers, the sound is wrong or has to be crudely folded down.
Object-based audio stores each sound as an object — a mono sound plus metadata saying "I am here, at this position, moving this way." There are no speaker labels. At playback, a piece of software called a renderer looks at the speakers you actually have — two, five, seven, or sixty-four — and computes, in real time, how loud each speaker should play that object so it appears at the right spot. The same file plays correctly on a phone, a soundbar, and a cinema, because the file describes intent, not output.
Scene-based audio records the entire soundfield as a mathematical sphere around a point, using a technique called ambisonics. Rather than tracking individual sources, it captures "what arrives at this listening position from every direction." It can be decoded to any speaker layout or to headphones. This is the favourite of virtual and augmented reality, where the listener's head turns and the whole soundfield must rotate with it.
The trend of the last decade is unmistakable: away from channel-based, toward object- and scene-based representations that do not care how many speakers you own.
Dolby Atmos arrives (2012)
Dolby Atmos, announced in April 2012 and first heard in Pixar's Brave in June 2012, was the format that made object-based audio mainstream. Its first-generation cinema system could handle up to 128 simultaneous audio objects and drive up to 64 unique speaker feeds — including, for the first time, speakers in the ceiling.
Atmos also gave us a new notation with three numbers. Where 5.1 and 7.1 had two, Atmos layouts have a third:
7.1.4 = 7 ear-level channels + 1 LFE + 4 overhead (height) channels. The third number is always the count of ceiling/height speakers. So 9.1.6 is 9 ear-level + 1 LFE + 6 overhead — a large home or screening-room setup.
Let us read one out loud to make the arithmetic concrete. A 7.1.4 system has 7 + 1 + 4 = 12 speakers total: seven around you at ear level, four above you, and one subwoofer. The "4" overhead is exactly what separates an immersive system from a 7.1 surround system — it is the ceiling the surround era never had.
Importantly, an Atmos file is not "12 channels." It is a small bed of channel-based audio (often a 7.1.2 or 5.1.2 foundation) plus a set of objects with positional metadata. The 7.1.4 speaker count is what the renderer produces, not what the file stores. This is why a single Atmos master can serve a cinema and a pair of headphones.
The competitors and the open standards
Atmos was not alone. Auro-3D (from Barco, with cinema formats launched in 2010) took a different, layer-based approach: a horizontal layer, a height layer, and a single overhead "Voice of God" speaker, in configurations like 9.1, 11.1, and 13.1. DTS:X (2015) matched Atmos's object-based, layout-independent philosophy and adapted to whatever speakers a room had.
Two open standards matter for anyone building broadcast or streaming products. MPEG-H 3D Audio (ISO/IEC 23008-3, published October 2015) is an ISO standard that supports all three philosophies — channel, object, and scene-based — in one bitstream, and adds interactive features like letting a viewer turn up the dialogue. South Korea has broadcast MPEG-H with its ATSC 3.0 ultra-HD television since 2017. For digital cinema, the SMPTE ST 2098 series (published 2018) standardised a single vendor-neutral immersive-audio bitstream so a cinema is not locked to one company's format.
Apple closes the loop for headphones (2021)
The final piece of the immersive story is the one most consumers actually experience: immersive audio on ordinary headphones. In June 2021, Apple launched Spatial Audio with Dolby Atmos in Apple Music. The technology renders an Atmos object mix down to two channels using an HRTF — a Head-Related Transfer Function, the mathematical model of how your own head and ears colour a sound depending on where it comes from. With head tracking, the AirPods sense when you turn your head and rotate the soundfield to keep the centre of the screen sounding like it stays in front of you. (Apple had shipped the video version of this in iOS 14 in September 2020; the Apple Music launch followed in June 2021.)
This is the immersive era's quiet triumph. The same object-based master that plays in a 64-speaker cinema also renders, on a phone and two earbuds, into a convincing three-dimensional space — no extra speakers, no format war, no consumer choice required. It is exactly what quad failed to deliver fifty years earlier: surround that just works, on hardware people already own.
Common pitfall: confusing the speaker count with the file
The single most common mistake product people make is treating "7.1.4" or "Atmos" as a property of the file. It is not. In the immersive era, the soundtrack is a description of objects and a small channel bed; the speaker count is what the renderer produces from it on a given device. A practical consequence: you do not store a separate file per speaker layout. You store one object-based master and render it per device. Teams that plan storage and bandwidth as if they need a distinct 5.1, 7.1, and 7.1.4 file each are budgeting for a problem that object-based audio was designed to eliminate.
A second, related pitfall: assuming "spatial audio" on headphones requires special content. It does not always — many products apply HRTF processing to ordinary stereo or 5.1 to simulate height. Knowing whether a feature uses a true object-based master or a playback-side upmix is the difference between a real Atmos pipeline and a marketing label.
A 60-year comparison at a glance
The table below compresses the whole history into the few facts that actually matter for a decision: what each era stored, how many channels, and what enabled it.
| Era / format | Year | Channels | What the file stores | What enabled it |
|---|---|---|---|---|
| Blumlein stereo | 1931 | 2 | Two speaker feeds (L, R) | Coincident mic + 45/45 record groove |
| CinemaScope | 1953 | 4 | Four speaker feeds | Magnetic stripes on 35mm film |
| Quadraphonic | 1971 | 4 | Two or four feeds (format-dependent) | LP record (failed: format war) |
| Dolby Stereo | 1976 | 4 → 2 → 4 | Lt/Rt matrix in two optical tracks | Existing optical film soundtrack |
| Dolby Digital (AC-3) 5.1 | 1992 | 5.1 | Six compressed speaker feeds | Digital optical data on film |
| Dolby Surround 7.1 | 2010 | 7.1 | Eight speaker feeds | Blu-ray bandwidth |
| Dolby Atmos | 2012 | up to 64 feeds | Bed + up to 128 objects + metadata | Digital cinema files; later streaming |
| MPEG-H 3D Audio | 2015 | layout-independent | Channel + object + scene, one bitstream | ISO standard; ATSC 3.0 broadcast |
| Apple Spatial Audio | 2021 | 2 (rendered) | Atmos master → HRTF binaural | Head-tracking earbuds + on-device DSP |
Winner cells are not marked because no single era "wins" — each was right for its delivery medium.
Where Fora Soft fits in
We build video products across conferencing, OTT and Internet TV, e-learning, telemedicine, video surveillance, and AR/VR, and audio is where these systems most often disappoint users first. The history above is not trivia for us: knowing that immersive audio is object-based, not speaker-based, shapes how we design storage, bandwidth, and rendering for a streaming or conferencing platform that wants to ship spatial audio without exploding its CDN bill. In AR/VR work in particular, the scene-based (ambisonics) approach is the right tool, because the listener's head moves and the soundfield must rotate with it. We treat audio as a first-class part of the pipeline, not an afterthought bolted on at the end.
What to read next
- Channel-based vs object-based vs scene-based audio
- Dolby Atmos for film, broadcast, music, and streaming
- Channels and channel layouts: mono, stereo, 5.1, 7.1, 7.1.4
Call to action
- Talk to a audio engineer — book a 30-minute scoping call to talk through your history of surround sound plan.
- See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
- Download the Multichannel Audio History Timeline — Every milestone from 1881 stereo to 2021 Apple Spatial Audio on one page: dates, formats, channel counts, and what each era's file actually stores.
References
- Recommendation ITU-R BS.775-4 (December 2022), "Multichannel stereophonic sound system with and without accompanying picture" — the reference standard for the 3/2 (5.1) loudspeaker layout and LFE band (20–120 Hz). https://www.itu.int/rec/R-REC-BS.775-4-202212-I/en — Tier 1. Used for the canonical 5.1 layout and LFE definition; overrides popular articles that describe the LFE as a full channel.
- ATSC A/52:2015, "Digital Audio Compression (AC-3) Standard" (24 November 2015). https://www.atsc.org/wp-content/uploads/2016/03/a_52-2015.pdf — Tier 1. Normative specification of AC-3 / Dolby Digital.
- ETSI TS 102 366 V1.4.1 (September 2017), "Digital Audio Compression (AC-3, Enhanced AC-3) Standard." https://www.etsi.org/deliver/etsi_ts/102300_102399/102366/01.04.01_60/ts_102366v010401p.pdf — Tier 1. European normative AC-3 / E-AC-3 specification.
- ISO/IEC 23008-3:2015, "Information technology — High efficiency coding and media delivery in heterogeneous environments — Part 3: 3D audio" (MPEG-H 3D Audio). https://www.iso.org/standard/63878.html — Tier 1. Defines channel-, object-, and scene-based audio in one bitstream.
- SMPTE ST 2098-2:2018, "Immersive Audio Bitstream Specification" (and ST 2098-1/-5). https://www.smpte.org/ — Tier 1. Vendor-neutral immersive-audio bitstream for digital cinema.
- UK Patent 394,325 (filed 14 December 1931, accepted June 1933), A. D. Blumlein, "Improvements in and relating to Sound-transmission, Sound-recording and Sound-reproducing Systems." EMI Archive Trust summary: https://www.emiarchivetrust.org/alan-blumlein-and-the-invention-of-stereo/ — Tier 3 (rights-holder archive). Foundation of stereo: coincident mic, 45/45 groove, matrix.
- Apple Newsroom, "Apple Music announces Spatial Audio with Dolby Atmos" (17 May 2021). https://www.apple.com/newsroom/2021/05/apple-music-announces-spatial-audio-and-lossless-audio/ — Tier 3. Apple Spatial Audio launch date and HRTF/head-tracking description.
- Dolby, "Dolby Atmos" technology overview and cinema specifications (128 objects / 64 speaker feeds). https://www.dolby.com/technologies/dolby-atmos/ — Tier 3. Object-based model and capacity figures.
- DTS (Xperi), "Welcome to DTS:X" (9 April 2015). https://dts.com/ — Tier 3. DTS:X object-based, layout-independent launch.
- Barco / Auro Technologies, Auro-3D layer-based immersive audio (cinema formats 2010). https://www.barco.com/ — Tier 3. Layer-based 9.1/11.1/13.1 architecture.
- ETHW (IEEE), "Leopold Stokowski and Bell Labs, a Sound Collaboration" — the 27 April 1933 three-channel transmission. https://ethw.org/Leopold_Stokowski_and_Bell_Labs,_a_Sound_Collaboration — Tier 5.
- EBU, Audio Definition Model — "Types of Audio" (channel / object / scene-based definitions). https://adm.ebu.io/background/audio_types.html — Tier 1 (EBU). Definitional source for the three audio philosophies.
- Quadraphonic sound and Dolby Digital (orientation and dates for SQ/QS/CD-4, Surround EX, 7.1), Wikipedia. https://en.wikipedia.org/wiki/Quadraphonic_sound — Tier 6. Used for orientation only; dates cross-checked against first-party sources where possible.


