A metadata stream is a separate channel of structured data — not pixels — that a camera or analytics engine sends alongside the video, describing what is in the scene. Instead of (or as well as) the picture, it carries machine-readable facts: a person detected at these coordinates, a vehicle of this class, a line crossed, a confidence score, a timestamp. In ONVIF terms this travels under Profile M (and Profile T's basic events), letting a VMS receive analytics results over the open standard.

The metadata stream is what makes a surveillance system searchable and reactive rather than just recorded. Because it is tiny compared with video, it can be transmitted, indexed, and stored cheaply: a camera doing edge analytics can send mostly metadata and almost no video, cutting bandwidth by around 99% while still telling the VMS everything that happened. That index is later what powers forensic search — finding "all red vehicles between 2 and 4 pm" by querying metadata instead of replaying hours of footage.

The pitfall is forgetting that the metadata is only as good as the analytics that produced it and the standard that carries it. ONVIF standardises the transport of the metadata, not its accuracy, and many advanced analytics emit vendor-defined metadata rather than standardised types — so a metadata stream can be rich but proprietary, and you can only search later for what was actually indexed at record time.