Audio level here refers to the RTP header extensions (RFC 6464 client-to-mixer, RFC 6465 mixer-to-client) that carry each packet's loudness as a small value in the header, computed by the sender. Their value is that a server or client can know who is speaking and how loudly without decoding the audio at all — cheap and privacy-friendly. An SFU reads it to forward only the active speakers in a large call instead of everyone, saving bandwidth, and clients use it to drive the 'who's talking' highlight and active-speaker layout. It is a small but pivotal piece of plumbing that lets big conferences scale and gives the UI its speaker indicators essentially for free.