Scene-based audio represents the entire sound field around a point as a single mathematical whole, independent of any speaker layout — in practice, almost always Ambisonics. Instead of channels or objects, it stores the directional components of the field, and a decoder renders that field to whatever output exists: a speaker array, or binaural headphones via HRTFs. Its defining strength is rotation: because the whole field is described coherently, it can be turned cheaply and accurately in response to head tracking, which is exactly what VR, AR, and 360-degree video need so the soundscape stays fixed as the viewer looks around. Scene-based, object-based, and channel-based are the three audio paradigms, and formats like MPEG-H carry all three at once.

