Object classification assigns a label to a detected object — deciding that this box is a "person", that one a "car", a "truck", a "bag", or a "dog". It is the partner to detection: detection finds and locates objects, classification names them. In many modern models the two happen together, but conceptually they answer different questions — "where is it?" versus "what is it?" — and classification is what lets the system reason in categories.

Classification is what makes analytics rules meaningful. "Alert on vehicles in the pedestrian zone" or "ignore animals on the perimeter line" both depend on correctly naming the object, so the quality of classification directly drives the quality of every downstream rule and search filter. It runs from the camera's NPU up to server models and surfaces the class as metadata over ONVIF Profile M, where it becomes a searchable attribute.

The pitfall is treating the label as certain. Classification returns a most-likely class with a confidence, not a fact: similar shapes get confused (a van for a truck, a child for a small adult), unusual angles and poor light degrade it, and the confidence threshold you set decides how often it guesses versus stays silent. Accuracy is a range that depends on the classes, the scene, and the conditions — never 100% — so rules built on classification need tolerance for the occasional miss or mislabel. The classification model internals belong to the AI for Video Engineering section.