This is engineering guidance, not legal advice. Confirm specifics with qualified counsel.
Why this matters
If you are choosing surveillance analytics, anomaly detection is the feature most often oversold and least often understood. A vendor demo shows it catching a fall or a fight; your real building shows it firing every time a delivery van parks in a new spot. This article is written for the security integrator, product manager, or operations lead who needs to know what anomaly detection actually buys over a fixed rule, what it costs in false alarms and compute, and how to wire it into a system without shipping something an operator will switch off in a week. Get the operating point and the deployment right and it is the analytic that finds the thing you did not know to look for; get them wrong and it is noise.
What "anomaly detection" means — and what it does not
Start with the word. An anomaly is simply something that does not match the usual pattern. Anomaly detection is software that builds a model of what is normal for a particular camera view, then scores every new moment by how far it departs from that model. When the departure is large enough, it raises a flag. Nobody told it what a "fight" or a "fall" or a "wrong-way driver" looks like in advance — it only knows that this moment is unlike the thousands of normal moments it has seen.
That is the whole point, and it is best understood against the analytic it is usually confused with. In the previous article we covered behavioral analytics — line crossing, intrusion zones, loitering, and counting. Those are rule-based: a human draws a line, picks a direction, sets a dwell timer, and the system fires when a tracked object satisfies the rule. Rule-based analytics answer the question "tell me when this specific thing happens." You have to know the thing in advance.
Anomaly detection answers a different question: "tell me when something I did not anticipate happens." The motivation is practical. As the foundational research on real-world surveillance anomalies put it, it is impractical to list every possible anomalous event, so it is desirable that the detector not rely on prior information about what the anomaly will be. A virtual tripwire cannot catch an event you never imagined; an anomaly detector is built precisely to surface the unimagined one.
Two cautions before we go further, because both are common misunderstandings:
First, anomaly detection is not a separate kind of camera or a magic "detect everything" button. It runs on top of the same building blocks as every other analytic — object detection and tracking produce a description of the scene, and the anomaly model scores that description. The model internals (how an autoencoder, a one-class model, or a vision-language model is built and trained) belong to the AI for Video Engineering section's anomaly-detection playbook; this article covers the surveillance application — how it plugs into the camera, the VMS, and the operator's day.
Second, "anomalous" is not the same as "dangerous" or "criminal." A detector flags the statistically unusual. A person in a wheelchair on a stairwell camera, a street performer drawing a crowd, or a maintenance worker on a roof at night are all unusual and all innocent. This gap between "unusual" and "worth acting on" is the source of both the technology's value and its false-alarm problem, and we will return to it.
What anomaly detection buys you over a fixed rule
There are three things a learned anomaly model does that a hand-drawn rule cannot.
It covers events you cannot enumerate. A perimeter has a finite set of rules you can write — cross this line, enter this zone. An open retail floor, a hospital corridor, or a rail platform has an effectively infinite set of "wrong" things that can happen, and you cannot pre-draw a rule for each. The anomaly model treats them all the same way: not-normal.
It adapts to the scene instead of to your imagination. A good anomaly model learns the specific rhythm of one camera — the direction people usually walk, the speed cars usually move, the hours the loading dock is usually busy. A wrong-way runner or a car on the footpath is anomalous against that learned pattern, with no rule authored.
It needs few or no anomaly examples to train. The most common family of anomaly models is trained only on normal footage — it never has to see a fight to know a fight is abnormal. That matters because real anomalies are rare and hard to collect, so a method that learns "normal" from the abundant footage you already have is far more practical than one that needs a labelled library of every bad event.
The trade for all three is the same: a fixed rule is precise and predictable, and an anomaly model is broad and fuzzy. The rule fires on exactly what you specified and nothing else. The anomaly model fires on anything unusual — including the harmless unusual. Choosing between them is choosing between missing the unanticipated and drowning in the merely new, and most mature systems use both: rules for the known boundaries, anomaly detection as a wide net behind them.
Figure 1. Rule-based analytics fire on a thing you named in advance; anomaly detection flags departures from a learned normal. The first is precise and narrow; the second is broad and noisier.
How a learned anomaly detector actually works
You do not need the model mathematics to deploy the system well, but you do need the shape of it, because the shape explains the false alarms.
Every anomaly detector works in two phases. In the learning phase, it watches a long stretch of ordinary footage from a camera and builds a compact statistical picture of normal — the usual shapes, motions, speeds, and timings. In the scoring phase, it watches live or recorded video and, frame by frame, asks "how well does this fit the picture of normal I learned?" The answer is a number, the anomaly score. A low score means business as usual; a high score means this moment is unlike what the model expects.
The families differ in how they build the picture of normal, and the AI section's playbook covers each in depth. In brief, so the terms are not a mystery:
A reconstruction model (typically an autoencoder, a network trained to compress a normal frame and rebuild it) learns to redraw normal scenes accurately. When it is shown something abnormal it has never learned to redraw, it reconstructs it badly, and that reconstruction error becomes the anomaly score. A prediction model is the same idea over time: it learns to predict the next frame from the last few, and an event it cannot predict scores high. These are unsupervised — they train on normal video alone, with no anomaly labels.
A weakly-supervised model is given a coarser hint: a library of videos labelled only at the whole-video level ("this clip contains an anomaly somewhere," "this one is normal"), never frame by frame. It learns to find the anomalous moments inside the flagged clips. This is the approach behind the well-known real-world surveillance benchmark, and it tends to do better on messy real footage because it has seen examples of trouble, even if loosely labelled.
The newest family, maturing through 2025 and 2026, uses vision-language models — the same kind of AI that can describe an image in words. Training-free methods such as LAVAD and VERA caption each frame and reason over the captions to score anomalies, often with no task-specific training at all, and they can return a sentence explaining why a moment was flagged ("a person is lying motionless on the platform"). That explainability is a real operational gain, because an alert a human can understand is an alert a human will trust. These methods are promising but compute-heavy and still being proven on field benchmarks; treat them as the rising edge, not the default.
Figure 2. Learn a model of "normal," then score each moment's deviation. A threshold turns the continuous score into an alert — and where you set it is the whole game.
How accurate is it, really?
This is where the section's accuracy-vs-performance stance earns its keep, because the honest answer is more useful than the impressive one.
Researchers measure anomaly detectors on public benchmark datasets — recorded video with the anomalies marked, so a method's output can be scored against ground truth. The names you will see are UCSD Ped1/Ped2 and CUHK Avenue (older, controlled campus scenes), ShanghaiTech (437 videos across 13 fixed street-camera scenes), UCF-Crime (1,900 untrimmed real-world surveillance videos, 128 hours, 13 anomaly types such as fighting, robbery, and road accidents), and XD-Violence (4,754 videos, 217 hours).
The usual score is AUC — the area under the ROC curve, a single number from 50% (a coin flip) to 100% (perfect) that summarises how well the detector separates anomalous frames from normal ones across every possible threshold. For XD-Violence the comparable score is AP (average precision, the area under the precision–recall curve). The important thing for a buyer is that AUC is not "percent correct" and not a false-alarm rate — it is a ranking quality, and a high AUC can still hide a painful number of false alarms once you pick a real threshold.
With that caveat, here is roughly where the field sits in 2025–2026. On the controlled, single-scene benchmarks the best methods report very high AUC — around 97% on UCSD Ped2, around 96% on CUHK Avenue, and around 98% on ShanghaiTech. On the hard, real-world UCF-Crime benchmark the strong methods land around 81–88% AUC, and on XD-Violence around 86–94% AP. That spread is the lesson in one line: on tidy, predictable scenes anomaly detection is excellent; on messy, varied, real-world footage it is merely good — and "merely good" on a rare-event problem behaves worse than the number sounds.
Why does the field number disappoint relative to the benchmark? Because "normal" is not fixed. The same research that built these benchmarks notes that the scene a camera watches changes drastically — day versus night, sun versus rain, a quiet Tuesday versus an event crowd — and a model of normal learned in one condition produces false alarms in another. Insects on the lens, a sudden gathering, and darkness are all cited as ordinary triggers of false positives. A benchmark holds the world still; a real camera does not.
Figure 3. Controlled benchmarks (Ped2, Avenue, ShanghaiTech) sit near 96–98%; the real-world benchmarks (UCF-Crime, XD-Violence) sit lower — and the field is harder still. Accuracy is a range tied to scene, never a single number, and never 100%.
False alarms are the real metric — and the math says why
The single most important thing to understand about anomaly detection is that its success or failure is decided by the false-alarm rate at a useful sensitivity, not by AUC, and the reason is arithmetic about rare events. Let us walk it out loud, because it is the calculation most vendor demos skip.
Take one camera. Analytics usually score a few frames per second rather than all thirty, so say the model produces one score per second: that is 86,400 scored moments in a day. Now be generous and assume something truly unusual happens ten times that day — ten anomalous moments worth a look. The other 86,390 moments are normal.
Suppose you tune the detector to a sensitivity that catches 80% of the real anomalies — it finds 8 of the 10. That is good recall. Now suppose that at that same setting it wrongly flags just 1% of normal moments. One percent sounds tiny. But one percent of 86,390 normal moments is about 864 false alarms in a day, against your 8 true ones. The operator sees roughly 872 alerts and only about 1 in 110 is real — a precision near 0.9%.
Tighten the screw. Push the false-positive rate down tenfold, to a strict 0.1% — now you get about 86 false alarms a day versus 8 true, and precision climbs only to about 8%. You have to drive the false-positive rate to a fraction of a percent before the alert list is even readable, and every step you take to do that quietly throws away real detections too.
This is the base-rate problem, and it is not a flaw in any one product — it is the mathematics of looking for rare events in a huge stream. It is why a detector with an excellent-sounding benchmark score can still flood a control room, and why alert fatigue is the true failure mode of surveillance analytics: when the list is mostly noise, operators stop reading it, or switch the analytic off, and then it catches nothing at all.
The practical conclusions follow directly. Do not deploy anomaly detection as an autonomous, acts-on-its-own alarm. Deploy it as a triage layer: let it rank the day's footage by unusualness and surface the top handful of clips to a human, or feed them into forensic search so an investigator can review months of recording in minutes instead of weeks. At a triage sensitivity, a 90%-noise list is fine, because a person is making the final call and the alternative was watching everything. The honest framing is the one the whole section uses, and the tuning and false-alarm article is built around it: you do not get an accuracy, you get a dial, and where you set the dial is a business decision about how much noise an operator can carry.
Figure 4. One camera, ~86,400 scored moments a day, ~10 real anomalies. Even a 1% false-positive rate buries the 8 true catches under ~864 false ones — so tune for triage, not for autonomous alarms.
A common mistake to avoid
The mistake that sinks anomaly-detection projects is treating it like a rule: wiring its output straight to a siren or a guard dispatch and expecting it to be right. It will not be right often enough for that, on any real scene, at any setting that also catches the events you care about. The fix is not a better model; it is a better design — anomaly detection cues, a human (or a tightly-scoped second analytic) confirms. The second-most-common mistake is forgetting that, unlike an event-triggered rule, an anomaly model scores continuously — it runs on every frame, all day, which is a real and ongoing compute cost we will come to next. The third is letting the model drift: a "normal" learned last winter is wrong by summer, so plan for periodic retraining.
Where it runs: the camera, a local server, or the cloud
Like every analytic in this section, anomaly detection can run in three places, and the choice drives latency, bandwidth, cost, and privacy. The deployment trade-offs are covered in full in edge vs cloud analytics and latency and accuracy at each tier; here is what is specific to anomaly detection.
On the camera (edge). A lightweight anomaly model runs on the camera's own AI chip. The win is speed and privacy: a local alert lands in well under a tenth of a second, and only a score, a clip, or metadata leaves the device, so the raw video can stay on site. The limit is compute — the model must be small, so edge anomaly detection tends to be simpler and a little blunter than what a server can run.
On a local server (edge server). A box on the same network runs a heavier model across many cameras. This is the common home for anomaly detection, because it has the GPU headroom for a capable model while keeping video on the premises, and it is where periodic retraining for drift naturally lives. Latency is still low, typically well under a second.
In the cloud. Video or compressed features are sent to a data centre where large models run and everything is stored for later review and retraining. The cost is latency and bandwidth — round trips of several hundred milliseconds to seconds — and, because anomaly detection scores continuously rather than only on a trigger, sending every frame up is the analytic most likely to produce a nasty cloud-egress and compute bill. One field study measured roughly one second end-to-end when all stages ran on the edge versus around twelve seconds when frames were shipped to the cloud per image.
The continuous-scoring point is the one to internalise. A line-crossing rule only does heavy work when something crosses; an anomaly model is busy every second on every camera. That makes the edge or a local server the natural default for anomaly detection on any system above a handful of cameras — it keeps the latency low, the video private, and the recurring bill bounded.
Figure 5. The same analytic, three homes. Because anomaly detection scores every frame continuously, the edge and the local server are the usual default for latency, privacy, and a bounded cloud bill.
How it surfaces into the VMS — and the ONVIF catch
An analytic is only useful if its result reaches the Video Management System (VMS) — the software that records the cameras and that an operator actually watches — as an event you can alert on, bookmark, and search. For the rule-based analytics in the last article, there is a clean standard path. For anomaly detection, the path is real but rougher, and the difference is worth understanding before you buy.
The relevant standard is ONVIF, the common language that lets cameras and recording software from different makers work together. Its analytics specification defines a small set of normative rule types — a Line Detector, a Field Detector, a Loitering Detector, and counting rules — so that a line-crossing event from one vendor's camera means the same thing to another vendor's VMS. Those events reach the VMS through ONVIF Profile M, the part of the standard that carries analytics metadata and events.
Here is the catch: "anomaly" is not one of the standardized rule types. ONVIF standardizes the common, nameable behaviors; an open-ended "this looks unusual" detector is a vendor's own analytic. It can still travel over Profile M — the standard provides a general channel for a device to publish analytics metadata and custom events — but the meaning of the anomaly score, its scale, and its tuning are defined by the vendor, not the standard. In plain terms: a behavioral rule is portable in both its trigger and its meaning; an anomaly detector is, at best, portable in its plumbing while its substance stays tied to the vendor's SDK.
The practical consequences are concrete. Expect anomaly detection to be more vendor-locked than rule-based analytics — swapping the camera or the analytics engine is more likely to change how anomalies behave. Confirm exactly how a candidate product exposes its anomaly events to your VMS: as a Profile M metadata stream, as a generic ONVIF event, or only through a proprietary API. And remember the standard's own boundary, which holds for every analytic in this section: ONVIF conformance is a baseline for interoperability, not a guarantee of accuracy. A camera can be perfectly Profile M conformant and still have a noisy, poorly-tuned anomaly model. For the commercial overview of how ONVIF profiles fit a security system, see Fora Soft's ONVIF profiles guide; the deep engineering treatment is in events, metadata, and the ONVIF analytics interface.
The privacy line: personal data, usually not biometric — with a twist
Anomaly detection sits in the same legal frame as the other behavioral analytics, and the same disclaimer applies: this is engineering guidance, not legal advice, and a biometric or profiling use needs a qualified privacy reviewer. The full treatment is in GDPR for video surveillance and the Block 6 privacy articles; here is the orientation.
Watching identifiable people and scoring their behavior is processing personal data under the EU's General Data Protection Regulation (GDPR, Regulation (EU) 2016/679, Art. 4(1)), even when no name is attached — a person singled out in the footage is identifiable. So an anomaly system needs a lawful basis, clear notice, and, for systematic monitoring of a public area, a Data Protection Impact Assessment (GDPR Art. 35; European Data Protection Board Guidelines 3/2019). That is the ordinary surveillance baseline, not a special burden of this analytic.
The reassuring part: plain anomaly detection that scores how a scene deviates, not who is in it, is generally not Article 9 biometric data. It measures motion and pattern, not identity. The EDPB ranks scene-level video analysis well below biometric identification in intrusiveness. So a model that flags "unusual motion on the platform" is, on its own, a lighter-touch tool than face recognition.
Now the twist that is specific to anomaly detection. It crosses into the heavier regime in two ways that rule-based analytics rarely do. First, the moment it is wired to identify the unusual person — linking a flagged track to a face-recognition watchlist — it becomes a biometric system with the full weight of GDPR Art. 9 and, in the United States, statutes such as Illinois BIPA. Second, some anomaly products edge toward inferring intent or emotion ("aggressive posture," "suspicious behavior"), and the EU AI Act (Regulation (EU) 2024/1689) restricts emotion-recognition systems in workplaces and schools and heavily curtails real-time remote biometric identification in public; its prohibitions have applied since February 2025, transparency duties for emotion-recognition and biometric-categorization systems apply from 2 August 2026, and the high-risk obligations covering biometrics apply from 2 December 2027 under the timeline agreed in the May 2026 AI Act revision. (Dates move; confirm them at review.)
There is also a fairness point that is unique to "flag the unusual." Because the model fires on the statistically atypical, it can disproportionately flag people who simply look or move differently — a disability, an unfamiliar outfit, a cultural difference in how people gather. An anomaly is not a verdict. The design rule that keeps you safe, legally and ethically, is the same one that keeps false alarms manageable: let anomaly detection cue a human, never judge a person, and keep it anonymous by design unless you have a specific, reviewed, lawful reason to identify.
Anomaly detection vs rule-based analytics at a glance
The table below is the decision in one view. Most systems are not "rules or anomaly" — they are rules for the known boundaries and anomaly detection as the wide net behind them.
| Dimension | Rule-based behavioral analytics | Anomaly detection |
|---|---|---|
| What you specify | The exact event (line, zone, dwell time) | Nothing — it learns "normal" |
| What it catches | Only what you named | The unanticipated unusual |
| Typical accuracy | Crisp event, high reliability | 80–98% AUC by scene; lower in the field |
| Dominant failure | Missing what you did not foresee | False alarms on the harmless-unusual |
| Compute pattern | Works on a trigger | Scores every frame, continuously |
| Best place to run | Edge or server | Edge or local server (continuous cost) |
| ONVIF support | Standardized rule types + Profile M | Profile M plumbing; meaning is vendor-defined |
| Right role | Autonomous alert on known events | Triage / cueing for human or forensic search |
| Privacy weight | Personal data; not biometric | Personal data; not biometric unless it identifies or infers emotion |
Where Fora Soft fits in
Fora Soft has built video streaming, real-time communication, and computer-vision software since 2005, across 625+ delivered projects for 400+ clients, with surveillance and computer vision at the centre of that work. On anomaly detection our stance is the accuracy-vs-performance one this article argues for: we tune the operating point to your control room's false-alarm budget before we celebrate a detection rate, we default to running the continuous scoring at the edge or on a local server for latency and privacy, and we wire the output into your VMS over ONVIF Profile M and into forensic search so it works as a triage layer rather than a noisy siren. We treat "flag the unusual" as a cue for a person, not a judgment about one — which keeps the system both effective under real load and defensible under review.
What to read next
- Behavioral analytics: loitering, intrusion, crowd, and zones — the rule-based counterpart anomaly detection complements.
- Tuning analytics: false alarms, accuracy, and the operator's reality — how to set the dial this article keeps pointing at.
- Search by event: making months of footage findable — where anomaly detection delivers most of its value, as a triage layer.
Call to action
- Talk to a surveillance engineer — book a 30-minute scoping call to talk through your anomaly detection surveillance video plan.
- See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
- Download the Anomaly Detection in Surveillance Video — One-Page Reference — What anomaly detection is (learns 'normal', flags deviation — no rule authored) and how it differs from rule-based behavioral analytics; the four model families (reconstruction/prediction autoencoders, one-class, weakly-supervised MIL,….
References
- ONVIF Analytics Service Specification — ONVIF. (Tier 1.) Defines the analytics architecture as a scene-description engine plus a normative rule engine, with Annex A specifying the Line Detector, Field Detector, Loitering Detector, and counting rules as the standardized rule types — establishing that an open-ended "anomaly" detector is not one of the normative rule types and therefore travels as vendor-defined metadata. https://www.onvif.org/specs/srv/analytics/ONVIF-Analytics-Service-Spec.pdf
- Profile M — Metadata and events for analytics applications — ONVIF. (Tier 1.) Standardizes the channel through which analytics metadata and events reach a VMS (metadata stream, event service, or MQTT), and states that conformance is a baseline for interoperability, not a guarantee of accuracy — the basis for "Profile M carries the anomaly event; it does not standardize its meaning or quality." https://www.onvif.org/profiles/profile-m/
- GDPR — Regulation (EU) 2016/679, Art. 4(1) (personal data) and Art. 35 (Data Protection Impact Assessment) — European Union (EUR-Lex). (Tier 1.) Art. 4(1) makes behavior scored on an identifiable person personal data even without a name; Art. 35 requires a DPIA for systematic monitoring of a publicly accessible area. The basis for the privacy baseline. https://eur-lex.europa.eu/eli/reg/2016/679/oj
- Guidelines 3/2019 on processing of personal data through video devices — European Data Protection Board (EDPB). (Tier 1.) Ranks intelligent video analysis from less intrusive (scene-level analysis) to more intrusive biometric technologies, and sets the DPIA expectation for systematic public monitoring — the basis for "anomaly detection is personal data but generally not biometric until it identifies." https://www.edpb.europa.eu/our-work-tools/our-documents/guidelines/guidelines-32019-processing-personal-data-through-video_en
- Artificial Intelligence Act — Regulation (EU) 2024/1689, Art. 5 (prohibited practices) — European Union (EUR-Lex). (Tier 1.) Restricts emotion-recognition systems in workplaces and education and curtails real-time remote biometric identification in public spaces — the basis for the line where anomaly detection that infers emotion or identifies a person enters the heavier regime. https://eur-lex.europa.eu/eli/reg/2024/1689/oj
- AI Act implementation timeline — European Commission, Shaping Europe's Digital Future. (Tier 2.) The issuing body's current timeline: prohibitions in force since February 2025; transparency obligations for emotion-recognition and biometric-categorization systems from 2 August 2026; high-risk obligations covering biometrics from 2 December 2027 under the May 2026 revision. The basis for the dated regulatory note. https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai
- Real-World Anomaly Detection in Surveillance Videos — Sultani, Chen, Shah (CVPR 2018), UCF Center for Research in Computer Vision. (Tier 5, academic primary.) Introduces the UCF-Crime benchmark (1,900 untrimmed real-world videos, 128 hours, 13 anomaly types) and the weakly-supervised approach using video-level labels; argues that anomalies cannot be enumerated and that real-world false-alarm rates are the central difficulty. The basis for the rule-free motivation and the benchmark figures. https://openaccess.thecvf.com/content_cvpr_2018/papers/Sultani_Real-World_Anomaly_Detection_CVPR_2018_paper.pdf
- Video Anomaly Detection with Probabilistic Modelling and Ensemble Learning on Deep Spatiotemporal Features — Bameri et al., IET Image Processing (2025). (Tier 5, academic.) Reports 2025 frame-level results across the standard benchmarks (≈97% AUC on UCSD Ped2, ≈96% on CUHK Avenue, ≈98% on ShanghaiTech, with lower scores on real-world UCF-Crime and XD-Violence) — the basis for the controlled-vs-real-world accuracy spread. https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.70247
- VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models — CVPR 2025. (Tier 5, academic.) A representative 2025 vision-language method that scores anomalies and returns natural-language explanations without instruction tuning — the basis for the explainable, training-free VLM family (alongside LAVAD). https://github.com/vera-framework/VERA
- Anomaly Detection on the Edge Using Smart Cameras under Low-Light Conditions — Sensors / MDPI (2024). (Tier 5, academic.) Demonstrates anomaly detection running on edge smart cameras with low latency and bandwidth, sending only metadata and clips onward — the basis for the edge-deployment latency and privacy framing. https://www.mdpi.com/1424-8220/24/3/772


