When a player or SSAI server needs an ad to play, it sends an HTTP request to an ad server with parameters (content ID, viewer geo, content duration, viewer attributes). The ad server returns a VAST response: an XML document listing one or more ads with their media files (one MP4/HLS URL per quality tier), tracking URLs for various viewability events (impression, quartile completion, click), companion creatives, and required-display rules.
VAST 4 (latest minor 4.3 in 2023) added support for adaptive bitrate ads (multiple bitrate variants per ad creative), longer-form ads (15-second to 3-minute creatives), and richer interactivity. Most ad networks (Google Ad Manager, Magnite, Amazon Publisher Services, Xandr) speak VAST 4.x; legacy networks may emit VAST 3 or 2 (the player handles all three).
VAST is the protocol; VPAID and IMA are layers on top. VPAID (Video Player-Ad Interface Definition) is an older interactive-ad spec being phased out. IMA (Interactive Media Ads) is Google's player-side SDK for parsing VAST and managing ad playback, used by tens of thousands of players. SSAI servers parse VAST themselves and stitch the referenced media URLs into the manifest. Either way, VAST is the wire format between ad server and downstream consumer.

