Video Analytics Software: What a System Can Detect · Video Surveillance & VMS

This is engineering guidance, not legal advice. Confirm specifics with qualified counsel.

Why this matters

If you are scoping or buying a surveillance system, the vendor will hand you a feature list — "people detection, vehicle classification, face recognition, LPR, intrusion, loitering, anomaly, smart search" — and every box will be ticked. The list tells you nothing about which analytics you actually need, where each one runs, what it costs in false alarms, or which ones cannot be switched on without a privacy review. This article is the catalogue that turns that undifferentiated feature list into a map you can reason about: what each analytic detects, the realistic accuracy it delivers under real lighting and load, how it shows up as a searchable event in the software your operators use, and which ones are biometric and therefore legally gated. A security integrator, a product lead, or a retail or city operations team should be able to finish this and know exactly which deep-dive to read next — and which features to leave switched off until counsel has signed off.

From pixels to events: what "video analytics" actually means

Start with the one definition the whole article rests on. Video analytics is software that watches a camera feed and turns the raw picture into meaning — instead of leaving a person to stare at a wall of monitors, the software reports "a person crossed this line at 14:03 on camera 7." The reader-friendly way to hold it: a camera produces pixels; video analytics produces events. An event is a small, typed, timestamped record — an object type, a box on the image, a confidence score, sometimes an attribute like color or a plate string — and that record is what makes a surveillance system searchable and alertable rather than just a hard drive full of footage nobody can find anything in.

Keep one thing separate as we go, because it is a different question with its own article. Where the analytics run — inside the camera, on a server in the building, or in the cloud — is the deployment decision, and it sets latency, bandwidth, and privacy. This article does not re-argue that; it is covered in edge vs cloud video analytics. Here we answer the other half: what the system can detect and how each detection becomes a usable event. The two halves share a name — "video analytics software" — but split cleanly: that article owns where the work happens, this one owns what the work produces.

There is an industry standard for the "becomes a usable event" step, and it is the part this section owns. ONVIF is the common language that lets cameras and software from different makers understand each other, and one ONVIF profile is built specifically for analytics. ONVIF Profile M standardizes the metadata and events that analytics emit: generic object classification, plus specific metadata for geolocation, vehicle, license plate, human face, and human body, and event interfaces for object counting, license-plate recognition, and face recognition (ONVIF, Profile M). The detail that matters for this map is that a Profile M product can be the camera, an on-site server, or a cloud service, and the consumer can be the VMS, an NVR, or a cloud app — so the same event language works no matter where the analytic ran (ONVIF). When a vendor says an analytic "surfaces into the VMS," Profile M is, increasingly, how. The metadata can ride inside the video stream, through ONVIF's event service, or over MQTT, a lightweight messaging protocol common in connected-device systems. As always with ONVIF, conformance is a baseline — two Profile M products will exchange standard events reliably, but a vendor's special attribute may still need that maker's own software kit. For the standards layer beneath this, see events, metadata, and the ONVIF analytics interface.

Pipeline from camera pixels through an analytic to a typed metadata event, into the VMS event index, and out to live alerts and forensic search. Figure 1. The pipeline every analytic shares. Pixels enter on the left; the analytic turns them into a small typed event (ONVIF Profile M metadata); the VMS indexes that event so it can drive a live alert now and a forensic search months later. The video is the raw material; the event is the product.

The map: one base, seven families

Almost everything a surveillance system can detect is built from one foundation and arranges into seven families. The foundation is object detection — finding things in the frame and drawing a box around each. Everything richer is a layer on top: classifying what each box is, following a box over time, recognizing a specific identity inside it, reading text off it, applying a rule to its movement, or learning what is "normal" so the unusual stands out. Hold the map in mind and any vendor's feature list sorts itself into place.

A second organizing idea runs through the whole map and is the section's editorial backbone: accuracy is always a range, never a single number, and never "100%." Every analytic's quality is reported as two figures in tension — precision, the share of its alerts that are real, and recall, the share of real events it catches — and both move with the scene, the lighting, the camera angle, and how well the system was tuned. Where this article gives numbers, they are honest ranges with their conditions attached. The model engineering that produces those numbers — the detection networks, the training, the accuracy-per-compute-budget — lives in our AI for Video Engineering section; this article owns how each analytic plugs into the camera, the VMS, and storage, and what it delivers in practice.

A layered map of seven video-analytics families built on an object-detection base, each tagged with where it runs and whether it is biometric. Figure 2. The whole catalogue on one map. Object detection is the base; classification, tracking and re-identification, face recognition, license-plate recognition, behavioral rules, anomaly detection, and search-by-event are the families above it. Orange marks the two biometric analytics — face and plate — that are a legal gate before they are a feature.

1. Object detection and classification — the foundation

What it detects: things, and what they are. Object detection draws a box around each object of interest in the frame; classification labels each box — person, car, truck, bicycle, bag, animal. This is the analytic every other one is built on, and for most buyers the single most useful: "alert me when a person (not a cat, not a shadow) enters this area after hours" already eliminates the majority of nuisance alarms that plague old motion detection, which fired on any pixel change.

The accuracy reality: detection quality on the standard academic benchmark (the COCO dataset) is reported as mean average precision (mAP), and current real-time detectors score roughly 38–55% mAP at the strict mAP@50–95 measure — for example the compact YOLOv10n at about 38.5 and a heavier transformer detector like RF-DETR around 54.7 (Ultralytics; Roboflow). That strict score sounds low because it demands near-perfect boxes across many object sizes; in a fixed, well-lit surveillance scene a tuned detector reaches much higher operational precision and recall on the few classes you care about. The honest framing for a buyer: ask for precision and recall in your lighting and camera placement, never a single "99%."

Where it runs and how it surfaces: detection is light enough to run on the camera's own AI chip, so it is usually the edge tier, emitting a classified-object event per detection. In the VMS it surfaces as a filterable event ("show me all vehicle events on the loading dock between 02:00 and 04:00"). The model internals are the territory of object detection and classification in surveillance, which links onward to the AI section.

2. Object tracking and re-identification — following one thing

What it detects: continuity. Tracking links a single object's boxes across consecutive frames so the system knows "this is the same person walking, not five different people one per frame." Re-identification (re-ID) extends that across cameras — recognizing that the person who left camera 3's view is the one who appeared on camera 8 — without knowing who they are. It is the difference between counting bodies and following a path.

The accuracy reality: tracking within one camera is mature; re-identification across cameras is materially harder and degrades with time gaps, changes in lighting, and crowding, because the system is matching appearance (clothing, build, gait), not an identity document. Treat cross-camera re-ID as a strong investigative aid with a real error rate, not a guarantee.

Where it runs and the privacy weight: tracking often runs at the edge alongside detection; cross-camera re-ID usually needs a server or the cloud because it must compare across many feeds. Re-ID carries more privacy weight than plain detection — it builds a movement trail of an individual — even though it does not name them. The mechanics and the privacy nuance get a full treatment in object tracking and re-identification across cameras.

3. Face recognition — identity, and the first legal gate

What it detects: who. First, separate two terms people blur. Face detection finds that there is a face in the frame (a box) — non-biometric, and the basis for privacy features like blurring. Face recognition measures the face into a numeric template and matches it against a gallery of known people to answer "is this Person X?" — that is biometric identification, and a different thing entirely.

The accuracy reality: in NIST's independent testing, the best face-recognition algorithms are extraordinarily accurate under controlled conditions — the top performer in the 2025 one-to-many evaluation missed as few as about 0.07% of searches against a 12-million-image gallery (NIST FRTE). But surveillance is not a controlled condition: accuracy falls with off-angle faces, motion blur, poor light, and low-resolution footage, and NIST's own work documents that error rates can differ across demographic groups (NIST FRVT demographic effects). So the honest statement is "near-perfect in a passport booth, materially worse on a wide-angle camera at dusk" — never a flat "100%."

The legal gate comes before the capability. Because a face template is biometric data, face recognition is heavily restricted. Under the EU's GDPR, biometric data processed to uniquely identify a person is special-category data (Art. 9) that generally needs an explicit lawful basis and a Data Protection Impact Assessment (GDPR; EDPB Guidelines 05/2022). The EU AI Act goes further: it bans untargeted scraping of CCTV to build face-recognition databases and bans real-time remote biometric identification in public spaces for law enforcement (in force since February 2025), and classifies other remote biometric identification as high-risk, with those obligations applying from 2 December 2027 under the 2026 simplification agreement (European Commission, AI Act). In Illinois, the Biometric Information Privacy Act (BIPA, 740 ILCS 14) gives individuals a private right of action with statutory damages of \$1,000 per negligent and \$5,000 per intentional violation — though a 2024 amendment (SB 2979) limited repeated collection of the same identifier to a single recovery. Decide whether you may legally run face recognition before you scope whether you can. The full law is in face recognition in surveillance and Block 6; this is engineering guidance, not legal advice.

4. License-plate recognition (LPR / ANPR) — reading text off the world

What it detects: plate strings. License-plate recognition — also called automatic number-plate recognition (ANPR) — finds a plate in the frame and reads its characters into text, turning "a vehicle passed" into "vehicle ABC-123 passed at 14:03." It powers parking, access control, toll, and city traffic systems, and it is the analytic with the clearest, most commercial demand of the whole block.

The accuracy reality: real-world LPR runs roughly 90–98% under good conditions and can exceed 99% in a controlled lane with a purpose-built camera, but accuracy drops sharply — below 70–80% in some tests — with motion, oblique angles, dirty or damaged plates, glare, and heavy rain or snow, and varies by region and plate design (Carmen Cloud; survey literature). A plate camera is a specialized tool: correct mounting angle, shutter speed, and often infrared illumination matter more than raw megapixels.

The privacy weight: a plate is personal data in most jurisdictions because it links to a registered owner, so LPR carries a privacy and retention obligation even though it is not "biometric" in the face sense. Treat plate data as regulated. The capture-to-character pipeline and the legal framing are in license-plate recognition (LPR / ANPR).

5. Behavioral analytics — rules on top of detection

What it detects: movement against a rule you author. Behavioral analytics sits on top of detection and tracking and fires when an object's behavior matches a pattern you defined in the VMS: a person crossing a virtual line (tripwire), entering a drawn zone (intrusion), staying too long (loitering), a crowd exceeding a density threshold, or an object that appears and stays (left-object) or disappears (removed-object). The intelligence is in the rule, drawn on the camera view, not in a new kind of model.

The accuracy reality: behavioral rules are only as good as the detection beneath them and the care in drawing the rule. A poorly placed intrusion zone that includes a windblown tree or a headlight reflection will generate false alarms all night. This is where tuning earns its keep, and why the block has a whole article on it — tuning analytics: false alarms, accuracy, and the operator's reality.

Where it runs and how it surfaces: rules typically run wherever detection runs (often the edge) and surface as named rule events ("Intrusion — Zone A — camera 12"). The rule-authoring craft is in behavioral analytics: loitering, intrusion, crowd, and zones.

6. Anomaly detection — flagging the unusual without a rule

What it detects: deviation from normal. Where behavioral analytics fires on a rule you wrote, anomaly detection learns the normal pattern of a scene over time and flags what departs from it — a car driving the wrong way, a person in an area that is usually empty at this hour, an unusual crowd. It is useful exactly where you cannot enumerate every bad event in advance.

The accuracy reality: anomaly detection trades coverage for false alarms more sharply than any other analytic, because "unusual" is inherently fuzzy. Tuned loose it cries wolf; tuned tight it misses. It is best used as a triage hint that raises an operator's attention, not as a hard trigger. The model internals are owned by the AI section — see the surveillance application here in anomaly detection in surveillance video, which cross-links to that playbook.

7. Search by event — the payoff for all of it

What it detects: nothing new — it uses everything the other six produced. Search-by-event (forensic search) is the feature that makes analytics worth the money for most buyers. Because every detection has been stored as a typed, timestamped, attributed event, an investigator can ask "show me every red truck on the north gate last Tuesday between noon and 2 p.m." and get answers in seconds instead of scrubbing 48 camera-hours of video by hand. It reads the metadata produced upstream (often via Profile M) and the footage stored by the recording layer.

This is also where the whole map pays off architecturally: the value of detection, classification, tracking, and recognition is realized at search time, when months of footage become a queryable database. The search experience itself is covered in search by event: making months of footage findable; the storage beneath it is how surveillance storage works.

The catalogue on one page

Here is the whole map as a table — the version to keep beside a vendor's feature list. Read a row to place an analytic; read the "privacy weight" column first if compliance is your binding constraint.

Analytic	What it detects	Accuracy reality (with conditions)	Where it usually runs	Surfaces in VMS as	Privacy weight
Object detection + classification	People, vehicles, objects — and what they are	~38–55% mAP@50–95 on COCO; higher tuned precision/recall in a fixed scene	Edge (camera)	Classified-object events	Low
Tracking + re-identification	One object across frames and cameras	Strong in-camera; cross-camera re-ID degrades with light, gaps, crowding	Edge (track) / server (re-ID)	Object paths, re-ID matches	Medium — builds a movement trail
Face recognition	A specific identity	Near-perfect in controlled tests (~0.07% miss, NIST); worse on wide-angle/low-light; demographic variation	Server / cloud	Identity match events	High — biometric, legally gated
License-plate recognition	Plate characters as text	~90–98% good conditions; <70–80% with motion, angle, dirt, weather	Edge (plate camera) / server	Plate-read events	High — personal data
Behavioral analytics	Rule breaches: line, zone, loiter, crowd, left/removed object	As good as the detection beneath it and the rule you drew	Edge	Named rule events	Low–medium
Anomaly detection	Departures from learned "normal"	Coverage-vs-false-alarm tradeoff is sharp; best as a triage hint	Server / cloud	Anomaly alerts	Medium
Search by event	Nothing new — queries stored events	Only as good as the metadata feeding it	Server / cloud	The search interface	Inherits the source analytics' weight

Table 1. The seven analytics families against what a buyer actually needs to know. The two high-privacy rows — face recognition and license-plate recognition — are the ones that need a privacy/legal review before deployment, not just a configuration toggle.

A six-column reference card laying out the seven analytics families with what each detects, its accuracy reality, where it runs, how it surfaces, and its privacy weight. Figure 3. The catalogue as a reference card. The same seven rows as the table, color-coded by privacy weight — the orange rows (face, plate) are the legally gated analytics.

A worked example: why "99% accurate" can still drown your operators

The single most important number in this whole map is not accuracy — it is the false-alarm volume, and a little arithmetic shows why. Take a 30-camera site running intrusion detection. Across those cameras, the analytics evaluate a large number of candidate movements every day — people, vehicles, blowing debris, shadows, headlights. Say the site produces 100,000 candidate motion events a day (a busy mixed indoor/outdoor site easily does).

Now suppose the analytic is "99% accurate" in the sense vendors love to quote — a 1% false-positive rate on those candidates. The math is unforgiving:

false alarms per day = 1% × 100,000 candidates = 1,000 false alarms.

Against that, the number of real intrusions might be 10 a day. So the operators see roughly 1,010 alerts to find 10 real ones — the precision of the system as experienced is about:

precision ≈ 10 real ÷ 1,010 total ≈ 1%.

A "99% accurate" detector delivered a 1%-useful alert stream. This is the base-rate problem, and it is why the section refuses the phrase "100% accuracy": at scale, what decides whether a system is usable is not how often it is right on a single frame, but how many false alarms survive to reach a human. The map's layered shape is the fix — cheap, light detection at the base filters the candidates so that expensive analytics (and your operators) only ever see the few that matter. Getting that filtering right is the subject of the block's honest capstone, tuning analytics.

A common mistake to avoid

The costliest pattern we see is buying analytics by the feature-list checkbox instead of the false-alarm rate in your scene — closely followed by switching on a biometric analytic as if it were just another toggle. A vendor sheet with every box ticked tells you nothing about how the intrusion rule behaves at 2 a.m. with wind in the trees, or whether face recognition can lawfully run at your site at all. The fix is the discipline this map is built around: pick the analytics the job needs, demand precision and recall in your conditions, place the biometric ones behind a privacy review before any technical work, and judge the whole thing by the alert stream an operator actually lives with — not the demo.

Where Fora Soft fits in

Fora Soft has built real-time video, streaming, and computer-vision software since 2005, across 625+ shipped projects, and the analytics map is the conversation we have with almost every surveillance client, because off-the-shelf platforms ship the whole feature list switched on and leave the buyer to discover the false-alarm bill and the compliance gate later. Teams come to us to build the analytics they actually need — detection and behavioral rules tuned to their scenes so operators trust the alerts, cross-camera tracking that holds up, and biometric analytics kept on hardware and under a lawful basis that satisfies a data-protection officer. The framing we lead with is always how the system behaves under real load first: the realistic precision and recall in your lighting, the false-alarm rate your operators will live with, and the privacy posture that keeps the deployment legal — then the capability. A system operators trust beats one that demos well.

Call to action

Talk to a surveillance engineer — book a 30-minute scoping call to talk through your video analytics software plan.
See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
Download the Video-Analytics Catalogue — One-Page Reference — The seven analytics families on one page: what each detects, its honest accuracy reality with conditions, where it usually runs, how it surfaces into the VMS, and its privacy weight — with the two biometric analytics (face, plate)….

References

ONVIF — "Profile M" (standardizes analytics metadata and events: generic object classification, and metadata for geolocation, vehicle, license plate, human face and human body; event interfaces for object counting, license-plate recognition, and face recognition; metadata over the stream, the ONVIF event service, or MQTT; a conformant product can be a camera, a server, or a cloud service, and a client can be a VMS, NVR, or cloud service). Primary (tier 1). https://www.onvif.org/profiles/profile-m/
European Union — "GDPR, Regulation (EU) 2016/679, Art. 9 and Art. 35" (biometric data processed to uniquely identify a person is special-category data under Art. 9; a Data Protection Impact Assessment is required for high-risk processing under Art. 35). Primary (tier 1). https://eur-lex.europa.eu/eli/reg/2016/679/oj
European Commission — "AI Act (Regulation (EU) 2024/1689)" (prohibited practices in force Feb 2025 include untargeted CCTV scraping to build face-recognition databases and real-time remote biometric identification in public spaces for law enforcement; remote biometric identification is high-risk; under the simplification agreement of 7 May 2026, high-risk obligations in biometrics apply from 2 December 2027; transparency rules from August 2026). Primary (tier 1). https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai
Illinois General Assembly — "Biometric Information Privacy Act, 740 ILCS 14" (private right of action; statutory damages of \$1,000 negligent / \$5,000 intentional per violation; SB 2979 (2024) limits repeated collection of the same identifier to a single recovery). Primary (tier 1). https://www.ilga.gov/legislation/ilcs/ilcs3.asp?ActID=3004
European Data Protection Board — "Guidelines 05/2022 on the use of facial recognition technology" (facial recognition entails heightened risk; business use generally requires explicit consent and a DPIA). Primary (tier 1/2). https://www.edpb.europa.eu/our-work-tools/documents/public-consultations/2022/guidelines-052022-use-facial-recognition_en
NIST — "Face Recognition Technology Evaluation (FRTE) 1:N Identification" (leading 2025 algorithms reach miss rates near 0.07% against a 12-million-image gallery under controlled conditions; the leaderboard ranks 1:1 and 1:N accuracy). Primary/government testing (tier 1/3). https://pages.nist.gov/frvt/html/frvt1N.html
NIST — "Face Recognition Vendor Test (FRVT) Part 3: Demographic Effects" (documents that false-match and false-non-match rates can vary across demographic groups — the basis for the 'accuracy is conditional' framing). Primary/government testing (tier 1/3). https://pages.nist.gov/frvt/html/frvt_demographics.html
Ultralytics — "The best object detection models of 2025" / Roboflow — "Object detection metrics: precision, recall, and mAP" (real-time detectors score roughly 38–55% mAP@50–95 on COCO; detection quality is a precision/recall tradeoff, not a single number). First-party engineering / educational (tier 3/6). https://www.ultralytics.com/blog/the-best-object-detection-models-of-2025
Carmen Cloud (Adaptive Recognition) — "ANPR accuracy unveiled" / ANPR survey literature (real-world plate-recognition accuracy ~90–98% in good conditions, can exceed 99% in controlled lanes, and falls below ~70–80% with motion, oblique angles, dirty plates, and adverse weather; accuracy varies by region). Institutional/educational (tier 5/6). https://carmencloud.com/anpr-accuracy-unveiled-how-reliable-is-automatic-number-plate-recognition/
MarketsandMarkets — "Intelligent Video Analytics Market" (≈ USD 14.65B in 2026 to ≈ USD 41.39B by 2031 at ~23.1% CAGR; AI surveillance, smart cities, and retail analytics among the drivers). Institutional/analyst (tier 5). https://www.marketsandmarkets.com/Market-Reports/intelligent-video-analytics-market-778.html

The Video-Analytics Map: What a Surveillance System Can Detect

Why this matters

From pixels to events: what "video analytics" actually means

The map: one base, seven families

1. Object detection and classification — the foundation

2. Object tracking and re-identification — following one thing

3. Face recognition — identity, and the first legal gate

4. License-plate recognition (LPR / ANPR) — reading text off the world

5. Behavioral analytics — rules on top of detection

6. Anomaly detection — flagging the unusual without a rule

7. Search by event — the payoff for all of it

The catalogue on one page

A worked example: why "99% accurate" can still drown your operators

A common mistake to avoid

Where Fora Soft fits in

What to read next

Call to action

References

Related glossary terms

The Video-Analytics Map: What a Surveillance System Can Detect

Why this matters

From pixels to events: what "video analytics" actually means

The map: one base, seven families

1. Object detection and classification — the foundation

2. Object tracking and re-identification — following one thing

3. Face recognition — identity, and the first legal gate

4. License-plate recognition (LPR / ANPR) — reading text off the world

5. Behavioral analytics — rules on top of detection

6. Anomaly detection — flagging the unusual without a rule

7. Search by event — the payoff for all of it

The catalogue on one page

A worked example: why "99% accurate" can still drown your operators

A common mistake to avoid

Where Fora Soft fits in

What to read next

Call to action

References

Related glossary terms

Face recognition

ONVIF

Precision

Video analytics

Object detection

Recall

Anomaly detection

Re-identification