MPEG-H 3D Audio: the ISO Immersive Standard

Why this matters

If you build an OTT service, an Internet-TV product, a music app, or anything that delivers sound to modern televisions and headphones, MPEG-H 3D Audio is one of two systems the next era of broadcast is built on, and you need to know what it does and where it actually ships before you plan a delivery pipeline. This article is written for a product manager, founder, or operations lead with no audio background: by the end you will understand what MPEG-H is, the three kinds of sound it carries in one stream, how personalization works, where it is deployed in 2026, and how it compares to Dolby AC-4. Every technical number traces back to the controlling standard — ISO/IEC 23008-3 and ATSC A/342 Part 3 — or to its makers' published figures, not a secondhand blog.

The open standard built for immersive, interactive sound

For most of audio history, a "channel" meant a speaker. Stereo is two channels — left and right. A 5.1 surround mix is six channels poured into six specific speakers. The whole system breaks the moment the listener's setup does not match: a 5.1 mix on stereo headphones, or on a soundbar with the speakers in the wrong place, is a compromise the mixer never heard. The world MPEG-H was built for is exactly that messy world — phones, headphones, soundbars, and home cinemas, all different, all expecting good sound.

MPEG-H 3D Audio is the answer from the ISO/IEC Moving Picture Experts Group — the standards body known as MPEG, the same group behind MP3, AAC, and the HEVC video codec. It is formally published as ISO/IEC 23008-3, "MPEG-H Part 3: 3D audio"; the current edition is ISO/IEC 23008-3:2026 (ISO catalogue). MPEG issued the call for proposals in January 2013, and in February 2015 announced the technology would be published as an International Standard (MPEG, February 2015). The technology was developed and is championed by the MPEG-H Audio Alliance — Fraunhofer IIS, Technicolor, and Qualcomm — with Fraunhofer IIS as the primary developer (Fraunhofer IIS, MPEG-H Audio).

Like AAC and Dolby's codecs, MPEG-H is built on a mathematical tool called the Modified Discrete Cosine Transform, or MDCT, which turns a slice of sound into frequency ingredients and spends bits only where the ear will notice — the same family of math explained in how audio compression works. MPEG-H uses an improved version of that transform. But the compression engine is not the point. The point is what the stream is allowed to carry.

The one fact to hold onto: MPEG-H is not just a more efficient codec — it is a flexible audio system that carries channels, objects, and a recorded sound field in one stream, and hands the viewer the controls.

Figure 1. MPEG-H from a 2013 call for proposals to a 2026 broadcast standard. The Korean 2017 launch and Sony's 2019 music service are the two deployments that made it real.

Three kinds of sound in one stream

This is the heart of MPEG-H, and it is the thing no older broadcast codec does. MPEG-H can represent sound three different ways at the same time, and mix them freely (ISO/IEC 23008-3; Wikipedia, MPEG-H 3D Audio).

Channel-based audio. This is the traditional way: a fixed set of channels, each aimed at a specific speaker — stereo, 5.1, 7.1, or a 3-D layout like 7.1.4 with overhead speakers. Channels are still the right tool for a finished music mix or a film bed where the artist decided exactly where each sound sits. The idea, and how channel layouts are named, is covered in channels and channel layouts.

Object-based audio. An audio object is a single sound — a voice, an instrument, a sound effect — tagged with metadata that says where it should be in 3-D space and how loud it is. The object is not tied to a speaker. At playback, the MPEG-H renderer places it correctly for whatever speakers (or headphones) the listener actually has. Because the object is a separate element with its own controls, the viewer can be allowed to move it or change its volume — which is how personalization works, below.

Scene-based audio. This is a recording of the entire sound field around a point, stored in a format called Higher Order Ambisonics, or HOA. Instead of describing individual sources, it captures "what the air sounds like" at a location and can be rotated and rendered to any speaker layout. Scene-based audio is the natural fit for virtual and augmented reality, where the listener turns their head and the whole soundscape must turn with them, explored in Ambisonics, HRTF, and binaural rendering.

The three philosophies — channels, objects, scenes — and where each one fits are the subject of channel-based vs object-based vs scene-based audio. MPEG-H's distinction is that it does not force a choice: a single bitstream can carry a 5.1 channel bed for the music and effects, a handful of dialogue and commentary objects on top, and HOA components for ambience, all decoded together.

Diagram of one MPEG-H stream carrying three kinds of audio into a single renderer: a channel bed such as a 5.1 or 7.1.4 mix, a set of audio objects like dialogue and commentary tagged with positions, and scene-based Higher Order Ambisonics for the ambience; the renderer combines all three and adapts the result to the listener's actual speakers or headphones with binaural rendering Figure 2. One MPEG-H stream, three sound types, one renderer. The renderer adapts the mix to whatever the listener actually has — a 7.1.4 cinema, a soundbar, or stereo headphones.

How personalization actually works

The feature viewers feel first is personalization, and it falls out naturally from objects. Because dialogue, commentary, and effects can travel as separate objects rather than baked into one mix, the decoder can expose them as choices.

Three concrete examples, all of which ship today. A sports broadcast can carry the home commentary and the away commentary as two objects, and let you pick — or carry a "stadium only" object so you can switch the commentator off and hear just the crowd. A viewer who is hard of hearing can turn up the dialogue object relative to the music and effects, the same accessibility win AC-4 offers but driven by separate objects. A drama can ship an audio-description track — narration of the on-screen action for blind and partially sighted viewers, covered in multi-language and descriptive audio — as one more small object the decoder mixes in on demand, instead of a whole second program.

The broadcaster sets the limits: which objects the viewer may touch, and how far. The decoder presents only the allowed choices, so the artistic intent is protected while the viewer still gets meaningful control. None of this is possible once dialogue and effects are mixed into the same channels, which is why object-based audio, not raw efficiency, is the reason broadcasters chose next-generation systems at all.

Profiles, levels, and how big the sound can get

A flexible system needs guard rails so that a cheap television and a premium receiver can both claim to support "MPEG-H" and still interoperate. MPEG-H provides those through profiles and levels.

The Main profile is the full toolkit, defined with five levels that cap how much sound a decoder must handle (ISO/IEC 23008-3 DAM 3D Audio Profiles, MPEG). The table below shows the ceiling at each level.

Level	Max core channels	Max loudspeaker channels
1	8	8
2	16	16
3	32	24
4	64	24
5	128	64

Table 1. The five levels of the MPEG-H 3D Audio Main profile. "Core channels" is how many compressed audio streams the decoder must handle; "loudspeaker channels" is how many speakers it can render to. Source: ISO/IEC 23008-3 3D Audio Profiles (MPEG).

At the top, MPEG-H supports up to 64 loudspeaker channels and 128 codec core channels (ISO/IEC 23008-3) — far beyond anything a home needs, which is the point: the same standard scales from headphones to a 22.2-channel theatre.

For broadcast, MPEG handled the toolkit's complexity by defining a Low Complexity (LC) profile in Amendment 3 (finalized late 2016), which adds coding efficiency and broadcast-specific features while staying cheap enough to put in a television chip (Wikipedia, MPEG-H 3D Audio; Fraunhofer IIS). A further subset called the Baseline profile maximizes interoperability and cuts the implementation and testing effort, and is the profile Sony's 360 Reality Audio uses on consumer devices (Fraunhofer IIS, Sony licenses MPEG-H Audio Decoder, October 2021). In practice, broadcast and streaming devices implement the LC or Baseline profile, not the full Main profile.

A worked example: the bandwidth of personalization

Personalization is not free, but it is cheaper than the obvious alternative. Suppose a sports broadcast wants a 5.1 stadium bed plus two commentary-language options. The brute-force approach is to ship two complete 5.1 mixes — one per language — and let the player choose:

two full 5.1 mixes = 2 × 192 kbit/s = 384 kbit/s

The MPEG-H approach ships one 5.1 bed plus two mono commentary objects:

one 5.1 bed + two mono objects ≈ 192 kbit/s + (2 × 48 kbit/s) = 288 kbit/s

That is roughly a 25% saving on this example, and the saving grows with every extra language, because each new option is one small object rather than a whole new surround mix. The exact numbers depend on the encoder, but the structural point holds: objects let you add personalization without multiplying the whole program.

Where MPEG-H actually lives in 2026

MPEG-H is a broadcast-and-music system, and that is where it ships. Its anchor deployment is South Korea. The Korean Telecommunications Technology Association published its terrestrial UHD TV standard in June 2016, based on ATSC 3.0, and named MPEG-H 3D Audio as the sole audio codec for the 4K service that launched in February 2017 (TTA, June 2016; Fraunhofer IIS). It has been on air 24/7 ever since, and Korean broadcaster SBS has used it to carry immersive audio for events like the World Cup (TV Tech). Korea is the proof that an entire national broadcast system can run on MPEG-H.

In the United States, MPEG-H is one of two audio systems standardized for ATSC 3.0 ("NextGen TV"). The ATSC defines it in A/342 Part 3, "MPEG-H System", and defines Dolby's rival in A/342 Part 2, "AC-4 System" — both are valid "Next Generation Audio" systems in the A/342 family. The current MPEG-H revision is A/342:2026-04, approved 14 April 2026 (ATSC A/342-3). But the US market split the other way: the North American Broadcasters Association recommended AC-4 as the preferred over-the-air audio codec back in May 2016, and most US NextGen TV broadcasts have used AC-4 since, covered in AC-4 — Dolby's next-generation broadcast codec. So the two systems coexist in the same standard but won different regions: MPEG-H owns Korea, AC-4 leads the US.

Outside television, MPEG-H's biggest consumer footprint is music. Sony's 360 Reality Audio, launched in January 2019, is an object-based immersive music format built on MPEG-H: individual sounds — vocals, instruments, even the live audience — are placed as objects in a 360-degree sphere around the listener (Sony, January 2019). Sony licensed Fraunhofer's MPEG-H Baseline Profile decoder and joined the MPEG-H trademark program in October 2021, putting MPEG-H decoding into a wide range of consumer devices (Fraunhofer IIS, October 2021).

A note on the trademark program: Fraunhofer runs an MPEG-H trademark that identifies products proven to interoperate, the same role the "Dolby" badge plays for Dolby formats (Fraunhofer IIS, January 2017). For a product team, that badge is the practical signal that a device will actually decode an MPEG-H stream correctly.

Common mistake: assuming "ATSC 3.0 audio" means one codec

The most common planning error is treating ATSC 3.0 audio as a single thing. It is not. ATSC 3.0 standardizes two Next Generation Audio systems — MPEG-H and AC-4 — and a given market, broadcaster, or device may support one, the other, or both. A receiver certified for AC-4 will not necessarily decode MPEG-H, and vice versa. If your product has to ingest or play ATSC 3.0 audio across regions, you cannot assume one codec: a stream authored for Korean broadcast is MPEG-H, a US NextGen TV stream is most likely AC-4, and a robust pipeline must detect which system a stream carries and route it to the matching decoder. Shipping an MPEG-H-only player to a US NextGen audience, or an AC-4-only player to Korea, guarantees silence on a large slice of content.

MPEG-H vs AC-4 at a glance

Both are object-capable "Next Generation Audio" systems standardized inside ATSC 3.0, and for most practical purposes they solve the same problems — immersive sound, dialogue control, accessibility — by different routes. The table summarizes the differences that matter for a delivery decision.

Property	MPEG-H 3D Audio	Dolby AC-4
Standard body	ISO/IEC MPEG (open)	Dolby / ETSI
Spec	ISO/IEC 23008-3	ETSI TS 103 190
Audio types	Channel + object + scene (HOA)	Channel + object
Max loudspeaker channels	64 (128 core)	5.1 core → 7.1.4 + objects
Personalization	Yes (object-based)	Yes (object-based)
ATSC 3.0 role	A/342 Part 3	A/342 Part 2
Anchor broadcast market	South Korea (sole codec)	United States (NABA pick)
Flagship music use	Sony 360 Reality Audio	— (AC-4 is broadcast-first)

Table 2. MPEG-H 3D Audio vs Dolby AC-4. Both are object-based NGA systems in ATSC 3.0; the headline differences are MPEG-H's open ISO governance and native scene-based (HOA) support versus AC-4's Dolby ecosystem. Standards numbers from ISO/IEC and ETSI; deployment from TTA, NABA, and Sony.

The deeper difference is governance and reach. MPEG-H is an open ISO/IEC standard with multiple implementers and native support for scene-based audio, which makes it attractive for VR/AR and for markets that prefer a non-proprietary path. AC-4 is the continuation of the Dolby ecosystem that already owns the living room, which is its own kind of advantage. Neither is "better" in the abstract; the right one depends on which market and which devices you have to reach.

Where Fora Soft fits in

We build OTT and Internet-TV products, video-streaming and conferencing platforms, telemedicine and e-learning systems, and surveillance and AR/VR software, and the choice of audio system comes up whenever a product touches modern televisions or immersive media. When a client ships to Korean broadcast, to an immersive-music experience, or to the newest smart TVs, MPEG-H enters the conversation, and the engineering work is practical: detecting which Next Generation Audio system a stream carries, routing it to the right decoder, preserving the object metadata that personalization depends on through every transcode, and keeping a sensible fallback to AAC or Dolby for devices that cannot decode MPEG-H. We have made those trade-offs across streaming, broadcast, and AR/VR builds since 2005.

Call to action

Talk to a audio engineer — book a 30-minute scoping call to talk through your mpeg-h 3d audio plan.
See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
Download the MPEG-H 3D Audio - cheat sheet — One page: the three audio types (channel, object, scene/HOA), the five Main-profile levels, the MPEG-H vs AC-4 comparison, deployment status, and the 'ATSC 3.0 audio is not one codec' pitfall.

References

ISO/IEC 23008-3:2026, Information technology — High efficiency coding and media delivery in heterogeneous environments — Part 3: 3D audio. The controlling specification for MPEG-H 3D Audio. Current edition; the source for the three audio representation types (channel, object, HOA), the improved MDCT core, and the 64-loudspeaker / 128-core-channel ceiling. https://www.iso.org/standard/90199.html
ISO/IEC 23008-3 DAM, 3D Audio Profiles (MPEG). First-party source for the Main profile's five levels and the per-level core-channel and loudspeaker-channel maxima in Table 1. https://web.archive.org/web/20160304052009/http://mpeg.chiariglione.org/standards/mpeg-h/3d-audio/text-isoiec-23008-3dam-1-3d-audio-profiles
ATSC A/342:2026-04 Part 3, ATSC Standard: A/342 Part 3, MPEG-H System (approved 14 April 2026). The North American broadcast specification defining MPEG-H as one of the two ATSC 3.0 Next Generation Audio systems, alongside AC-4 in Part 2. Source for the ATSC 3.0 role and the coexistence of MPEG-H and AC-4. https://www.atsc.org/atsc-documents/a342-part-32017-mpeg-h-system/
Bleidt, R. L., Sen, D., Niedermeier, A., Czelhan, B., Füg, S., et al., Development of the MPEG-H TV Audio System for ATSC 3.0, IEEE Transactions on Broadcasting, 63(1):202–236 (2017), doi:10.1109/TBC.2017.2661258. The primary peer-reviewed paper from the system's developers; source for the broadcast architecture, the personalization model, and the profile design. https://www.iis.fraunhofer.de/content/dam/iis/en/doc/ame/Conference-Paper/BleidtR-IEEE-2017-Development-of-MPEG-H-TV-Audio-System-for-ATSC-3-0.pdf
Fraunhofer IIS, MPEG-H Audio (product page, accessed 2026-06-05). First-party source from the primary developer for the channel/object/scene model, personalization features, the Low Complexity and Baseline profiles, and the trademark program. https://www.iis.fraunhofer.de/en/ff/amm/broadcast-streaming/mpegh.html
Telecommunications Technology Association (TTA), Transmission and Reception for Terrestrial UHDTV Broadcasting Service (TTAK.KO-07.0127, 24 June 2016). The Korean standard, based on ATSC 3.0, that specifies MPEG-H 3D Audio as the sole audio codec for the terrestrial 4K service launched in February 2017. https://web.archive.org/web/20170926190807/http://www.tta.or.kr/English/new/standardization/eng_ttastddesc.jsp?stdno=TTAK.KO-07.0127
Fraunhofer IIS, World's 1st Terrestrial UHD TV Service With MPEG-H Audio (Fraunhofer Audio Blog, 30 June 2016). First-party confirmation of the Korean MPEG-H deployment and its 24/7 on-air status. https://www.audioblog.iis.fraunhofer.com/mpeg-h-audio-korea/
Sony, Sony Introduces All New "360 Reality Audio" (press release, 8 January 2019). Source for 360 Reality Audio as an object-based immersive music format built on MPEG-H 3D Audio. https://www.prnewswire.com/news-releases/sony-introduces-all-new-360-reality-audio-music-experience-that-immerses-listeners-in-a-three-dimensional-sound-field-powered-by-object-based-spatial-audio-technology-300774115.html
Fraunhofer IIS, Sony has Licensed MPEG-H Audio Decoder Software and joined the MPEG-H Trademark Program (22 October 2021). Source for Sony's licensing of the Baseline Profile decoder and the role of the MPEG-H trademark program. https://www.iis.fraunhofer.de/en/pr/2021/20211022_sony_audio_decoder_trademark_program.html
North American Broadcasters Association (NABA), recommendation of Dolby AC-4 as preferred OTA audio codec (May 2016), as reported by The Broadcast Bridge. Source for the US market preference for AC-4 over MPEG-H within ATSC 3.0; used only for the regional-adoption framing, with the standards facts cross-checked against ATSC A/342. https://www.thebroadcastbridge.com/content/entry/7989/dolby-ac-4-recommended-for-north-american-broadcasters-migrating-to-atsc

MPEG-H 3D Audio: the ISO Immersive Standard

Why this matters

The open standard built for immersive, interactive sound

Three kinds of sound in one stream

How personalization actually works

Profiles, levels, and how big the sound can get

A worked example: the bandwidth of personalization

Where MPEG-H actually lives in 2026

Common mistake: assuming "ATSC 3.0 audio" means one codec

MPEG-H vs AC-4 at a glance

Where Fora Soft fits in

What to read next

Call to action

References

Related glossary terms

MPEG-H 3D Audio: the ISO Immersive Standard

Why this matters

The open standard built for immersive, interactive sound

Three kinds of sound in one stream

How personalization actually works

Profiles, levels, and how big the sound can get

A worked example: the bandwidth of personalization

Where MPEG-H actually lives in 2026

Common mistake: assuming "ATSC 3.0 audio" means one codec

MPEG-H vs AC-4 at a glance

Where Fora Soft fits in

What to read next

Call to action

References

Related glossary terms

AC-4

Channel

MPEG-H 3D Audio

Scene-based audio

Stereo

Audio codec

Mono

AAC