AI-based anomaly detection in surveillance systems for public space security

Key takeaways

AI-based anomaly detection in surveillance learns “normal” from video and flags everything else — loitering, falls, abandoned bags, intrusion, fights — in real time.

The bottleneck is false positives, not detection. A 95% accurate model that fires 20 false alarms an hour will be muted by week two. Architect for precision and operator workflow first.

Edge + cloud is the dominant 2026 stack. Run lightweight detection at the camera or NVR (NVIDIA Jetson / Hailo / Coral), escalate to a cloud model only when confidence is borderline.

Compliance is non-optional. GDPR, BIPA, the EU AI Act, NDAA hardware bans, and sector rules (HIPAA in healthcare, PCI in retail) shape what you can deploy and where data can live.

Custom beats off-the-shelf when you have a specific scene type (factory floor, courtroom, stadium, hospital) and operators who’ll act on alerts. Off-the-shelf VMS plugins are fine for generic intrusion detection only.

Why Fora Soft wrote this playbook

Fora Soft has been shipping video and surveillance products since 2005, including custom VMS platforms, courtroom recording systems, and AI-driven retail security stacks. Our practice covers the full pipeline — camera capture, edge inference, cloud aggregation, operator workflow, and audit trail — not just the model.

This article is the same shortlist we walk through with founders, security leaders and product owners during scoping calls. We pull from playbooks already published on this blog — the edge AI vs cloud AI breakdown and our custom VMS guide — plus production benchmarks from real deployments.

We use Agent Engineering on every engagement, which is why our scoping and integration timelines land faster than traditional teams. Where we cite a price band below it’s the realistic Fora Soft band, not a generic agency one.

Need anomaly detection that actually fires fewer than five false alarms a day?

We design the model, the edge stack, and the operator workflow. Most pilots are live within 8 weeks.

Book a 30-min call → WhatsApp → Email us →

What AI-based anomaly detection actually is

Anomaly detection is the class of computer-vision tasks where a model learns the distribution of normal behaviour in a scene — the typical motion vectors, dwell time, occupancy, object trajectories — and flags any frame, segment, or trajectory that lies outside that distribution. Unlike object detection (“is there a person?”) or activity recognition (“is this person running?”), anomaly detection answers “is this unusual for this scene at this time?”

Most production systems combine three layers: a fast object detector (YOLOv8/YOLOv11, RT-DETR), a temporal model that tracks objects across frames (ByteTrack, BoT-SORT), and an anomaly head that compares observed behaviour against either a learned normality model (autoencoder, one-class SVM, normalising flow) or a rule engine (zone intrusion, loitering thresholds, abandoned-object timers). The hybrid is where production accuracy lives.

Why it matters in 2026

Three forces have made AI anomaly detection table-stakes in surveillance this year. First, camera fleets are huge; the average enterprise deployment now exceeds 200 channels and no human operator can monitor that. Second, edge AI accelerators (NVIDIA Jetson Orin Nano, Hailo-15, Google Coral) shipped at $300–$700 a unit so on-camera inference is finally affordable. Third, the EU AI Act and emerging US state laws (Illinois BIPA, California CPRA) require auditable, explainable analytics — which off-the-shelf cloud-only systems often can’t deliver.

Use cases that actually pay back

Sector Anomaly to detect Typical ROI driver
RetailShoplifting, sweethearting, queue length, slip & fallShrink reduction (target 0.3–1.0% of sales)
HealthcarePatient falls, wandering, restricted-area entryReduced incident response time, insurer discount
ManufacturingPPE non-compliance, conveyor jams, near-miss safetyOSHA risk, downtime per incident
Transit / airportsAbandoned bags, wrong-way movement, crowd densityFaster evacuation, regulator compliance
Stadiums & eventsCrowd surge, fights, restricted-zone breachEvent safety scoring, insurance premium
Logistics & warehousingForklift near-miss, theft, dock-door mismatchLoss prevention, audit trail for claims
Critical infrastructurePerimeter intrusion, drone, vehicle anomalyRegulatory mandate, insurance discount

For deeper retail-AI patterns we shipped, see our cloud video platform development for AI-powered retail security writeup.

Reference architecture — the four layers that ship

A production anomaly detection stack we deploy almost always splits into four layers, with explicit contracts between them so each can be replaced or upgraded independently.

1. Capture & transport. ONVIF/RTSP-compatible cameras (4–8 MP), with H.264 / H.265 encoding, streaming over a private VLAN. NDAA-compliant hardware (no Hikvision / Dahua firmware) is mandatory for US federal contracts.

2. Edge inference. A small box per 8–16 cameras running an object detector + tracker + lightweight anomaly head. NVIDIA Jetson Orin Nano (~$300, 40 TOPS), Hailo-15 (~$200, 20 TOPS), or rack-style Jetson AGX Orin (~$1,500, 275 TOPS) for dense sites. ONVIF events fire only on confirmed anomaly.

3. Cloud aggregation & second-pass model. When edge confidence is borderline, push the 5–10 second clip to a cloud model (heavier video transformer, multi-camera reasoning) for confirmation before alerting. Storage is typically S3-compatible with 30–90 day retention.

4. Operator console & audit trail. Web app for triage, mobile push for on-call, MQTT/webhooks for VMS / SIEM integration, immutable event log for compliance. The operator workflow is what determines whether the system saves money or just collects dust.

Reach for edge-first when: you have hundreds of cameras, strict latency budgets (sub-300 ms), or tight egress costs — the edge vs cloud cost math typically lands edge ahead at 50+ channels.

The algorithms behind the magic

There’s no single algorithm for anomaly detection — production systems blend several, each with its own strength.

1. Object detectors. YOLOv8 / YOLOv11, RT-DETR, EfficientDet. Output: bounding boxes + class. Used as the foundation for everything.

2. Trackers. ByteTrack, BoT-SORT, OC-SORT. Convert per-frame detections into trajectories so “a person is loitering” can be expressed as “trajectory id N has been within 5 m of point X for >3 minutes.”

3. Action recognition. SlowFast, X3D, VideoMAE-v2 for “is this falling?”, “is this a fight?”, “is this throwing?”. Ten years ago this required hand-crafted features; today a fine-tuned VideoMAE-v2 hits 90%+ on Kinetics-400 subsets.

4. Anomaly heads. Autoencoders (reconstruction error as anomaly score), one-class SVM, Isolation Forest, normalising flows (FastFlow, CFLOW). For appearance-based anomalies (broken machinery, unauthorised access).

5. Rule engines. Counterintuitively, simple rule layers (zone, dwell time, count, line crossing) catch 60–70% of useful events at near-zero compute cost. Always shipped alongside the ML models.

Read more in our companion piece on machine learning algorithms for anomaly detection.

Build, buy, or hybrid — the comparison matrix

Dimension Off-the-shelf VMS plugin Cloud AI service (Verkada, Avigilon, Spot AI) Custom build
Time to first detectionDaysWeeks8–14 weeks
Tunable to your sceneLimitedLimitedFull
Per-camera cost / month$2–$10$30–$80$3–$15 (after build)
Edge inferenceSometimesVendor-locked hardwareYes — any accelerator
Data residency controlLimitedVendor controlsFull
Audit trail / EU AI ActPartialVendor-definedEngineered to spec
Integrate with your VMS / SIEMNative (one VMS)Their VMS onlyAny — via webhooks / MQTT / ONVIF
Best forGeneric intrusion, single-siteMid-market multi-site, IT-light teamsDomain-specific, regulated, multi-system

A worked cost example: 100 cameras, retail chain

Below is the model we use during scoping. Imagine a 100-camera retail chain spread over five stores, looking for shoplifting, slip-and-fall, queue length, and after-hours intrusion.

Line item Quantity Indicative cost
Edge boxes (Jetson Orin Nano, 1 per ~10 cameras)10~$3,500 capex
Cloud second-pass + storage / month100 ch × 30 days~$400–$900/mo
Custom build (8–12 weeks, lean team)1$60k–$140k
Operator console + alert routing1Included in build
Annual run-rate after launch100 channels~$5k–$11k/yr ops + ~$3–$15/cam/mo

By comparison a Verkada-class cloud subscription for the same fleet runs $40–$80/camera/month, or $48k–$96k/year. The custom build pays back inside 18–24 months for any fleet over ~50 cameras — faster if you need data residency or domain-specific events the off-the-shelf SaaS can’t emit.

The false-positive problem — how we hold the rate under five per day

A model with a 95% true-positive rate is useless if it fires 50 times an hour. The single biggest determinant of whether anomaly detection succeeds in production is the false-positive rate. The four levers we pull, every project:

  • Per-scene calibration. A learned baseline for each camera and each time-of-day — so “person at 3 a.m.” means something different in a 24/7 hospital vs an office.
  • Two-stage cascade. Edge model with ~70% precision triggers a cloud second-pass model with ~95% precision. Operators only see confirmed alerts.
  • Operator feedback loop. Every false positive flagged by an operator is a labelled negative for the next training cycle. The model learns the venue.
  • Suppression rules. Hardcoded ignore-zones (cleaning crew route, delivery dock at 5–7 a.m., known maintenance shift). Boring — and wildly effective.

Done well, this gets enterprise sites under 5 false alarms per day per 100 cameras — the threshold below which operators stop muting alerts.

Burning operator hours on false alarms?

We’ll review your current detection stack and propose the calibration, cascade, and suppression changes that drop false positives 80–90%.

Book a 30-min call → WhatsApp → Email us →

Data collection, labelling, and model training

1. Capture. Two weeks of recorded streams per camera, covering all shift patterns. Ten hours is not enough; 200 hours is plenty.

2. Pre-processing. Frame sampling (1–5 fps for non-action; 10+ fps for falls and fights), background subtraction where lighting is stable, masking for privacy zones (children’s areas, restrooms).

3. Labelling. Active learning with a small expert pool. Tools we use: Label Studio, CVAT, Roboflow. Budget 50–100 hours of labelling per anomaly class for a starter dataset.

4. Training. Fine-tuning open-weights models (YOLOv11, VideoMAE-v2) is faster, cheaper and more interpretable than training from scratch. Few-shot anomaly approaches (PatchCore, FastFlow) work well when labelled anomaly examples are scarce.

5. Continuous learning. Operator-flagged events feed back into a weekly retraining job. Drift checks every release: if precision drops > 5 points on the holdout set, block deploy.

Edge vs cloud — latency, cost, and the hybrid default

Edge inference cuts latency from seconds to under 200 ms, eliminates most egress, and keeps raw video on-prem. Cloud inference scales models that don’t fit on edge hardware, runs cross-camera reasoning, and absorbs heavy retraining. The 2026 default is hybrid: edge does the first pass on every frame; cloud handles confirmation and historical search.

A few rules of thumb from our deployments:

  • < 50 cameras, regulated industry: hybrid with NDAA-compliant cameras, on-prem servers, optional cloud audit log only.
  • 50–500 cameras: edge boxes per ~10–16 cameras, cloud second-pass + storage. Best price-performance.
  • 500+ cameras: rack-style edge servers per site (Jetson AGX Orin, A2/A30 GPUs), regional cloud aggregation. Plan for 24/7 SRE.

Security, privacy, and compliance — the deal-breaker filter

1. EU AI Act. Real-time biometric ID in public spaces is largely banned; non-biometric anomaly detection (loitering, intrusion, falls) remains permitted but classed as high-risk in many sites — meaning audit logs, human oversight, and conformity assessments are mandatory.

2. GDPR & UK GDPR. Lawful basis required (legitimate interest test, DPIA), retention limits, subject-access rights. Privacy masking on faces in public-area footage is the simplest mitigation.

3. NDAA Section 889. US federal projects cannot use Hikvision / Dahua / Huawei cameras or core components. Choose Axis, Hanwha, Bosch, i-PRO, or Pelco where possible.

4. BIPA & state laws. Illinois, Texas, Washington and a growing list of states regulate biometric capture. Anomaly detection that does not extract biometric features is generally outside scope — but face-recognition add-ons are not.

5. Sector frameworks. HIPAA in healthcare, PCI in card-present retail, NERC CIP in energy, CJIS in law enforcement. Each adds explicit controls on top of the baseline.

Mini case: a courtroom and a retail chain — same stack, two scenes

Situation. A court-recording client and a regional retail chain came to us within the same quarter, both wanting AI surveillance. The court needed wrong-room entry detection and unattended-evidence alerts; the retail chain needed shoplifting and slip-and-fall alerts. Both insisted on data residency and on-prem processing.

12-week plan. Same backbone — YOLOv11 + ByteTrack + autoencoder anomaly head — with two scene-specific heads fine-tuned on 200 hours of recorded footage. Edge boxes per zone (Jetson Orin Nano), private VLAN, on-prem operator console. Audit log shipped to a write-once store for both clients.

Outcome. Court site: false-alert rate < 2/day across 30 cameras; mean operator response time on flagged events fell from ~95 seconds to ~22. Retail site: shoplifting incidents flagged at ~88% recall on internal evaluation; slip-and-fall response time from ~7 minutes to ~90 seconds. Same engineering investment, two products.

Want a similar assessment?

A decision framework — pick a path in five questions

Q1. How specific is your scene? Generic intrusion / motion — off-the-shelf VMS plugin works. Domain-specific (factory, courtroom, hospital, sports) — custom or hybrid.

Q2. How many cameras? Fewer than 30 — cloud SaaS is fine. 30–500 — hybrid wins on cost. 500+ — custom edge fleet.

Q3. Are you under EU AI Act, NDAA, BIPA or sector compliance? Yes — bias toward custom build with explicit audit trail.

Q4. Will operators actually act on alerts? If the answer is no, fix the workflow first — no model fixes a broken response process.

Q5. What’s your data-residency requirement? Cloud-only is fine for non-regulated; otherwise put inference on-prem and only push aggregated metadata.

Five pitfalls we see every quarter

1. Picking the model before the workflow. Operators’ willingness to act on alerts is the limiting factor. Design the dispatch flow first, then pick the model that hits the precision threshold.

2. Ignoring camera quality. A 4K camera with bad bitrate is worse than a 2 MP camera with high bitrate. Validate the encoder before training.

3. Not budgeting for labelling. Most projects under-spend by 3–5x on labelling. Plan it in.

4. Skipping the operator console. A great model with a bad triage UI fails. Treat the console as a first-class product surface.

5. Treating compliance as paperwork. EU AI Act audit logs and BIPA notices are deployment blockers if discovered late. Wire them in from day one.

KPIs to track once you’re live

1. Quality KPIs. Precision (target > 90%), recall (> 80% for safety events), false-positive rate per 100 cameras (target < 5/day).

2. Operations KPIs. Mean time to operator action (target < 60 s for high-priority), alert mute rate (< 5%), incident-to-resolution time vs baseline.

3. Reliability KPIs. Per-camera uptime (> 99.5%), edge inference latency P95 (< 250 ms), audit-log completeness (100%).

When you shouldn’t deploy AI anomaly detection (yet)

Three scenarios where the right answer is “not now”. First, when you don’t have an operator team or escalation process — you’ll generate alerts no one acts on. Second, when your camera fleet is bad enough (< 1 MP, low bitrate, intermittent power) that you’d be teaching a model on noise. Fix the cameras first. Third, when your jurisdiction’s AI rules are still in flux for your sector — pilot in a non-regulated area while the legal picture clarifies.

Want a 30-minute architecture review for your surveillance stack?

Bring your camera count, your scenes, your compliance constraints — we’ll come back with a build vs buy recommendation and a written cost band.

Book a 30-min call → WhatsApp → Email us →

FAQ

How accurate is AI-based anomaly detection in 2026?

For well-defined anomalies (loitering, intrusion, abandoned object) precision > 95% is achievable on calibrated scenes. For nuanced behaviours (shoplifting, fights, falls) production systems typically operate at 85–92% precision and 75–88% recall after 4–8 weeks of operator-feedback retraining.

Do I need new cameras to deploy AI detection?

Usually no — if your existing cameras are 2 MP+ ONVIF/RTSP at a stable bitrate. We can run inference on the stream directly. We’ll only recommend replacement when cameras can’t deliver the resolution or framerate the model needs.

Can the system detect what hasn’t been labelled?

Yes — that’s the “true” anomaly-detection mode. A normality model (autoencoder, normalising flow) flags deviations even without labels for the specific event. Precision is lower than supervised, so we usually combine the two.

How does this handle privacy under GDPR or BIPA?

For non-biometric anomaly detection we deploy on-prem inference, blur faces and licence plates in stored footage, set short retention windows, and document the lawful basis (DPIA for GDPR, written consent flow for BIPA where applicable). The audit log is the proof.

What does an 8–12 week pilot cost?

For a typical 30–100 camera scope we deliver pilots in the $40k–$120k band, including model training, edge deployment, operator console MVP and audit-log wiring. Final number depends on scene complexity and compliance lift — we won’t commit to a fixed price without scoping.

Will it work in low-light or outdoor scenes?

Yes, with caveats. Cameras with IR or thermal sensors maintain accuracy down to twilight; pure colour models drop sharply below ~3 lux. For outdoor scenes we add weather-augmented training data and per-condition models.

Does it integrate with our existing VMS?

Yes. We typically push events as ONVIF Profile T metadata, MQTT messages, or webhook calls into Milestone XProtect, Genetec Security Center, Avigilon ACC, Frigate or any custom VMS. The console is optional if you already have a triage UI.

Can you ship without using Hikvision / Dahua hardware?

Yes — we routinely deliver NDAA-compliant deployments using Axis, Hanwha, Bosch, i-PRO and Pelco cameras with NVIDIA Jetson or Hailo edge accelerators. All our reference designs are NDAA-compliant by default.

Edge vs Cloud

Edge AI vs Cloud AI for Video Surveillance

2026 latency and cost breakdown for real deployments.

Custom VMS

Custom VMS Development Guide

How to build a Video Management System that fits your scenes.

Retail Security

Cloud Video Platform for AI Retail Security

A 2026 architecture for shoplifting and queue analytics.

Video Recognition

Video Recognition Software Development

Building custom video recognition for modern applications.

Ready to ship anomaly detection that operators trust?

AI-based anomaly detection in 2026 is mature, affordable, and regulated. The accuracy ceiling has stopped being the issue; the false-positive floor, the operator workflow, and the audit trail are. Pick custom when your scenes are specific and operators will act; pick a hybrid edge + cloud stack when your fleet is bigger than 30 cameras; and put compliance into the design from day one.

If you want a partner who’s shipped this on regulated, on-prem, multi-site deployments — from courtroom recording to retail to manufacturing — talk to us. We’ll tell you whether build or buy fits your case in 30 minutes and back the recommendation with the same numbers above.

Anomaly detection that pays back in 18 months

30 minutes, your fleet, an honest plan. We’ll show the build vs buy math and the next 8 weeks of work.

Book a 30-min call → WhatsApp → Email us →

  • Technologies