Published 2026-06-03 · 30 min read · By Nikolay Sapunov, CEO at Fora Soft
Why this matters
Video surveillance is a large and fast-moving market — the video surveillance systems market is on the order of $64–72 billion in 2026, the intelligent-video-analytics slice alone is around $15 billion and growing, and the subscription model, video surveillance as a service, is near $7.6 billion and compounding at double digits as buyers shift from buying hardware to renting capability. "Add AI to the cameras" is now on the roadmap of nearly every security integrator, building operator, retailer, and city, and behind that phrase sit real engineering and legal decisions: which analytics run on the camera versus a server versus the cloud, what bandwidth and hardware that implies, what the EU AI Act and GDPR let you point a camera at, and which vendors you are even allowed to buy under US procurement law. This playbook answers those for the surveillance vertical specifically. It is written so a product manager can plan the feature and its risk posture without a computer-vision or law degree, and so an engineer can see exactly where each analytic taps the video stream and where it goes wrong. The deeper lessons in this section are the per-component manuals — object detection, tracking, anomaly detection, face recognition, edge hardware; this is the vertical map that tells you which one to open.
What "AI in a security camera" actually means
Strip away the marketing and an AI security camera is a normal camera bolted to a small computer that runs intelligent video analytics — software, abbreviated IVA, that looks at the video and answers questions about it in structured data instead of just recording pixels. The single question it answers most often is the one that matters most: is the thing that just moved something I should care about? Everything else is built on that.
It helps to see the whole catalog before reasoning about any one feature, because the features sort into three groups that the law, the hardware, and the budget all treat differently.
The first group is detecting objects and events. The camera runs object detection — the technology that draws a box around a thing in the frame and names it, "person", "car", "truck", "dog" — and from that single capability you get the headline features: a person in a restricted zone after hours, a vehicle crossing a virtual line, a bag left behind, a worker not wearing a hard hat, someone who has fallen and not got up. This is the bulk of the value, and the rest of this article keeps coming back to it.
The second group is analysing behaviour over time. Once the camera can detect a thing in one frame, it can follow that thing across frames — multi-object tracking, abbreviated MOT — and counting, dwell time, loitering, queue length, crowd density, people-flow heat maps, and automatic license-plate recognition (ANPR) all fall out of tracking detected objects through time. These features answer "how many, how long, which way" rather than just "is something there".
The third group is identifying who a specific person is, by their face or other biometric — face recognition that matches a face against a watchlist or a database of known individuals. This group looks like just another analytic. It is not. As the next sections show, the moment a camera stops asking "is there a person?" and starts asking "is this Jane Doe?", it crosses into biometric processing governed by a different body of law and a different engineering standard, and confusing it with the first two groups is the single most expensive mistake a surveillance team can make.
Figure 1. The AI-in-surveillance feature catalog, grouped by what the AI does. The first two groups watch for events; the third identifies people — and that distinction drives the entire playbook.
Why AI cameras exist at all: the false-alarm problem
Before the topology and the law, it is worth being concrete about why anyone adds AI to a camera, because the reason is a single, measurable number. Traditional cameras detect activity with motion detection — they compare each frame to the last and raise an alert when enough pixels change. The trouble is that a swaying tree, a passing cloud shadow, headlights sweeping a wall, rain, a spider on the lens, and a plastic bag in the wind all change pixels, so motion detection cries wolf constantly.
Put numbers on it. A single outdoor parking-lot camera on plain motion detection generates roughly 80–120 motion events on a typical night; of those, perhaps two or three are a real security concern. Scale that to a modest 50-camera site and you have around 5,000 motion alerts every night. If an operator spent just 15 seconds glancing at each, that is:
5,000 alerts × 15 seconds = 75,000 seconds ≈ 20.8 hours of review — every night
No security team does 20 hours of review a night, so what actually happens is that the alerts get muted and ignored — the industry calls it alarm fatigue — and the one real intruder is lost in a sea of branches and shadows. A motion-detection system at scale is, in practice, an off switch with extra steps.
AI changes the one decision that matters. Instead of "did pixels change?", the camera asks "did a person or a vehicle appear?" — object classification before alerting — so trees, shadows, rain, and animals never raise an alarm. Across the industry this cuts false alarms by about 90–95%. Run the same 50-camera site again: AI filters each camera down to roughly 2–5 genuine-candidate events a night, call it 4, so:
50 cameras × 4 events = 200 alerts per night
200 alerts × 15 seconds ≈ 3,000 seconds = 50 minutes of review
Fifty minutes is a job a single operator can actually do, every alert reviewed, nothing muted. That move — from 5,000 ignored alerts to 200 actioned ones, a 96% cut — is the entire reason AI security cameras exist. Every other feature in the catalog is gravy on top of that one calculation. The detection model doing the classifying is the subject of the YOLO production lineage lesson and the object detection and tracking lesson; the "is this normal?" judgement layered on top is in the anomaly detection playbook.
The surveillance wrinkle: where does the AI actually run?
Here is the structural decision that ties "AI" to "surveillance" specifically, and it is the part most buyer's-guide round-ups skip. In an e-learning product the binding question is what clock a feature runs on; in surveillance it is where the analytics physically execute, because that one choice decides bandwidth, latency, privacy, and cost all at once. There are three places, and most real systems end up using more than one.
The first place is on the camera itself — the edge. Modern cameras carry a small vision chip, a system-on-chip with a built-in neural accelerator (Ambarella's CVflow line, Axis's ARTPEC, or a bolt-on module like a Hailo-8), that runs the detection model inside the camera. The camera watches its own video, decides "person, zone 3, 02:14, confidence 0.92", and ships only that little packet of metadata plus a short clip when something actually happens. Nothing leaves the camera unless it matters. This is cheap on bandwidth, fast to react, and private by default, and it limits the blast radius of a breach to one camera rather than a whole site.
The second place is an on-premises server — a box in the building running video management software (VMS) that pulls in every camera's stream and runs the analytics centrally, usually on a GPU. The server's superpower is correlation: because it sees all the cameras at once, it can follow one person across camera 4, then 7, then 12, and stitch a path through the site — something no single edge camera can do. The cost is that every camera has to send full video to the server continuously, and the server needs real hardware.
The third place is the cloud — video surveillance as a service (VSaaS), where the streams go to a provider's data center and the analytics, storage, and dashboard all live there as a subscription. The appeal is operational: no servers to maintain, capacity that scales with a slider, and a monthly bill instead of a capital purchase. The cost is that the video has to leave the premises — which is both a bandwidth bill and a privacy decision — and that you are renting, not owning, the capability.
In 2026 the honest answer for most non-trivial deployments is hybrid: detection runs on the camera to kill the bandwidth and false-alarm problem, a server or the cloud correlates events and holds the searchable record, and the two talk over a standard metadata channel. The deployment-topology trade-off — what runs at the edge versus the cloud — is the subject of the latency and deployment-topology lesson, and the edge silicon and how many camera streams a Jetson or Hailo board can actually carry is worked through in the Jetson Orin capstone.
Figure 2. The three places the AI can run. Edge keeps bandwidth and exposure down; a server correlates across cameras; the cloud trades premises-bandwidth for zero-maintenance scale — and most real systems blend them.
The bandwidth math that pushes analytics to the edge
The three-place choice is not aesthetic; it is arithmetic, and the arithmetic is lopsided enough to be worth showing. Take a common modern camera: 4-megapixel sensor, H.265 compression, streaming at a typical 4 megabits per second. Send 50 of those continuously to a cloud analytics service and the sustained upload is:
50 cameras × 4 Mbps = 200 Mbps of upload, 24 hours a day
Most buildings do not have 200 Mbps of upload to spare, and a full day of it is:
200 Mbps × 86,400 seconds/day ÷ 8 bits/byte ≈ 2.16 terabytes per day
of raw video crossing the internet and landing in cloud storage and egress charges. Now run the detection on the camera instead. A single person-detection event — "person, camera 12, zone 3, 02:14:33, confidence 0.92" — is a few hundred bytes of JSON. Even a busy night of 200 such events across the site is:
200 events × ~300 bytes ≈ 60 kilobytes of metadata for the whole night
plus a handful of short clips for the events that matter. The uplink falls from a sustained 200 Mbps to occasional kilobytes — a reduction of roughly four to five orders of magnitude. That gap is why "do the analytics on the camera and send only what matters" became the dominant architecture, and why the biggest shift of the last few years has been away from recording everything to a central box and toward cameras that think for themselves. The metadata standard that lets a camera's events flow into any vendor's software is covered two sections down.
The line that decides everything: watching events vs identifying people
In the false-alarm section the binding constraint was accuracy; in the topology section it was bandwidth; in the law it is what the camera is allowed to recognise about a person, and the rule that follows is blunt: detecting that there is a person is ordinary analytics, while identifying who that person is, by their face or body, is biometric processing — a different legal category with a much higher standard. Europe wrote this line into law, and because the EU AI Act (Regulation (EU) 2024/1689) applies to any system used in the EU, and GDPR applies to anyone processing EU residents' data, it sets the floor for almost any serious deployment.
The line sorts surveillance analytics into three tiers.
Figure 3. The legal tiers for surveillance AI. Event analytics sit in the light-touch top band under GDPR; identifying a named person is high-risk biometric; real-time identification of people in public spaces is largely prohibited.
The top tier is standard event analytics, and it holds most of what teams actually want to ship. Detecting a person or vehicle, flagging an intrusion or a crossed line, counting occupancy, spotting loitering, reading a license plate — all of these watch for events without putting a name to anyone. The EU AI Act does not drop its heavy high-risk regime on them. Their obligations come from data-protection law: under GDPR you need a lawful basis for the camera, clear signage that filming is happening, a sensible retention period rather than keeping footage forever, and, for most CCTV that systematically monitors a public-facing area, a data protection impact assessment (DPIA) — a written analysis, required by GDPR Article 35, of the privacy risk and how you have reduced it. This is the bucket to plan first and deploy fast, with privacy controls baked in.
The middle tier is high-risk biometric identification. The moment the system tries to determine which specific individual a face belongs to — matching it against a watchlist or an identity database — it is a biometric identification system, listed as high-risk in Annex III, point 1 of the EU AI Act. The high-risk regime applies from 2 August 2026: a documented risk-management system, governance of the training data, human oversight by design, demonstrated accuracy across different groups of people, automatic logging, technical documentation, and a conformity assessment before the system goes into service. On top of that, GDPR treats biometric data used to identify someone as special-category data under Article 9 — the most protected class — which generally requires explicit consent or a specific legal authorisation and makes the impact assessment mandatory, not optional. This is not a checkbox on a camera; it is a separate product with its own compliance project, and it is the part teams routinely underestimate because a vendor presents face recognition as just another toggle.
The bottom tier is prohibited. Since 2 February 2025, Article 5 of the EU AI Act bans several biometric practices outright. The headline one for surveillance is real-time remote biometric identification in publicly accessible spaces — pointing a live face-recognition camera at a street, a station, or a square to identify people as they pass — which is forbidden except for narrowly drawn law-enforcement purposes (searching for specific victims, preventing an imminent terrorist threat, locating a serious-crime suspect), and even then only with prior judicial or independent authorisation and a fundamental-rights assessment. Two more bans matter here: inferring people's emotions from their faces in the workplace or in schools, and building face databases by untargeted scraping of the internet or CCTV. The penalties are not symbolic — up to €35 million or 7% of global annual turnover. A "live face ID on the public entrance" or "read the crowd's mood by webcam" feature is a line you do not cross.
The same logic holds outside Europe even where the statute differs: identifying a named individual carries consequences that counting anonymous people never does, so it earns a higher standard of accuracy testing, fairness review, human oversight, and record-keeping wherever it ships. The EU AI Act simply makes the line explicit and enforceable. The biometric components themselves, and how to build them inside the law, are in the face detection and recognition under the EU AI Act lesson, and the broader regulatory engineering is in the EU AI Act regulatory lesson.
The detection bucket: where bias and accuracy hide
Take the event-analytics tier first, because it is where the value is and where a good engineering decision saves the most grief. The defining property is that the camera produces a flag, not a verdict: it proposes "person in zone 3", and a human — an operator, a guard, a reviewer — decides what to do. Get that shape right and the system is an asset; treat the flag as an automatic enforcement action and it becomes a liability.
Accuracy has two failure modes, and both cost money. A false positive — calling a shadow a person — pulls the system back toward alarm fatigue, the very problem AI was meant to solve, so the detection model and its confidence threshold have to be tuned to the site, the lighting, and the season, not left at a vendor default. A false negative — missing a real person because they are small in the frame, partly hidden, or in poor light — is the more dangerous one, because a security system that quietly fails to detect is worse than no system, since it breeds false confidence. The honest engineering answer is to measure both rates on your footage before trusting the system, and to keep a human in the loop on anything consequential.
There is a fairness dimension that surveillance teams have to take seriously, because cameras get pointed at people. Detection and especially recognition models have a documented history of uneven performance across skin tones, lighting conditions, and camera placements, and an analytic that works well on one group and poorly on another is not just a quality bug — in a security context it produces unequal treatment. The discipline is to test detection and recognition parity across the populations the camera will actually see, before deployment, and to design so that a low-confidence result triggers human review rather than an automatic consequence. When the question is genuinely "is this normal for this scene?" rather than "is this a known object?", the right tool is anomaly detection rather than a fixed class list — covered in the anomaly detection algorithms lesson — and when a fixed list of classes is too rigid for an open-ended scene, open-vocabulary detection (described in the open-vocabulary detection lesson) lets the camera find things it was never explicitly trained on.
Common pitfall: flipping on face recognition because the camera offers it. A modern camera ships with a face-recognition or "real-time identification" toggle, a demo makes it look like a free upgrade, and a team enables it to "know who's on site". In one switch the deployment leaves the GDPR-managed event-analytics tier and enters high-risk biometric territory (EU AI Act Annex III point 1, special-category data under GDPR Article 9, mandatory impact assessment) — or, if the camera watches a publicly accessible space and runs identification live, it crosses into a prohibited practice under Article 5, with penalties up to €35 million or 7% of turnover. The fix is structural, not a disclaimer: keep biometric identification off by default, treat it as a separate, deliberately scoped product with its own legal sign-off, never run real-time remote identification in a public space, and ask of every camera, "does this need to know who someone is, or only that a person is there?" — because almost always the answer is the latter.
The identification bucket: a different product, not a bigger feature
The high-risk tier is where surveillance AI gets genuinely hard, and where most teams should slow down. Face recognition and watchlist matching share a property the event analytics do not: a wrong output changes a person's standing — a false match can get an innocent person detained, denied entry, or followed — and that is a harm the person cannot easily undo, which is exactly why the law puts it under the high-risk regime and why the engineering bar rises to match.
The defensible pattern mirrors the detection bucket but with more weight on every safeguard. The system proposes a possible identity with a confidence score and the evidence for it; a trained human reviews the match before any action is taken; the match threshold is set high enough that the system prefers to say "not sure" over guessing; the model's accuracy is measured separately across demographic groups and the gaps are documented and addressed; every query is logged; and the legal basis — consent, or a specific statutory authorisation — is established before a single face is enrolled. Verification (confirming someone is who they claim, one-to-one, at a door they chose to approach) is a lighter case than identification (picking a face out of a crowd against a database, one-to-many), and the guidelines draw that line deliberately: a person presenting themselves to a sensor is different from a camera scanning a street.
The takeaway for a product lead is the same as the pitfall: do not let an identification feature ride into the roadmap on the momentum of the detection features. It is a separate product with its own compliance work, its own fairness testing, its own legal basis, and its own timeline, and the honest default for most surveillance products is to ship the event-analytics bucket now and treat biometric identification as a deliberate, scoped, later decision — if at all.
Three ways to add AI to a surveillance system
If you are building or integrating the surveillance system itself, "add AI" resolves to one of three routes, and they trade speed against control the way platform decisions always do.
Figure 4. Three routes to AI inside a surveillance system. Embedding over ONVIF deploys fastest; building on open models at the edge keeps the video and the model inside your own hardware.
The first route is to embed via a standard and a vendor. Surveillance has a real advantage here: a mature interoperability standard. ONVIF is the industry body whose specifications let cameras and software from different manufacturers talk to each other, and its Profile M (for metadata) standardises exactly the thing AI cameras produce — the analytics events and metadata, including object classification, counting, and face- and license-plate events, carried in a common format (including JSON events over the lightweight MQTT messaging protocol). Profile M means an AI camera from one vendor can feed its detections into a video management system from another without custom glue code. Through ONVIF and a VMS or analytics vendor you can stand up detection, intrusion, and counting in days to weeks, inheriting the vendor's models and a camera that emits standardised events. The cost is that the analytics quality and roadmap live with the vendor, and you fit within the features they expose. ONVIF Profile M is worth understanding in its own right; Fora Soft has written a deeper engineering treatment of it on our blog, and the buyer-versus-builder view of video management software more broadly is on our VMS playbook.
The second route is to assemble a computer-vision stack — wire an object-detection and tracking model to your own event logic and review dashboard, on an on-premises server or in the cloud. This is a step up in effort, weeks to a few months, and it buys real control: you decide which analytics run, how they are tuned, and where the video flows. The trade is that the models, the accuracy tuning, and the privacy design are now yours to own. The question of when a general vision-language model can replace a stack of custom detectors — increasingly relevant for open-ended "describe what's happening" surveillance queries — is in the "just use a VLM" lesson.
The third route, and the only one that keeps the video fully inside your own hardware, is to build on open-weights models at the edge — run an open detector such as YOLO on edge silicon (an NVIDIA Jetson module, a Hailo accelerator, an Ambarella SoC) inside your own cameras or gateways. This takes the most engineering up front, typically months, but the video never leaves the device, there is no per-camera cloud fee, and you control the model and the data end to end. It is the route for a deployment where privacy or independence from a vendor is the whole point — a prison, a hospital, a defense site, a privacy-sensitive retailer. Shrinking a model to fit a camera's modest chip without losing too much accuracy is its own craft, covered in the distillation and quantization for edge lesson.
One procurement note cuts across all three routes and surprises teams late: you may not be free to buy any camera you like. In the United States, Section 889 of the National Defense Authorization Act and the FCC's "Covered List" bar federal agencies and their contractors from using equipment from certain manufacturers — Hikvision and Dahua most prominently — which by some estimates removes around a third of global camera supply from eligible bids, and pushes buyers toward compliant brands (Axis, Hanwha, and others) that often cost 20–35% more. Even if you are not selling to the government, NDAA compliance has become a common enterprise procurement requirement, so the hardware decision has a legal dimension before the AI does.
A worked cost example: the false-alarm dividend and the bandwidth bill
Two different numbers drive a surveillance-AI budget, and they pull in the same direction once you do the arithmetic. The first is the operator-cost dividend from killing false alarms, calculated above: a 50-camera site goes from ~5,000 nightly alerts (≈ 21 hours of impossible review) to ~200 (≈ 50 minutes of real review), a 96% cut. The value is not just labour saved — it is that the alerts become trustworthy again, so a real intrusion is actually seen. That dividend is the core return on the whole investment.
The second is the bandwidth-and-infrastructure bill, which the place-the-AI-runs decision swings by orders of magnitude. Streaming 50 cameras to the cloud for analysis is ~200 Mbps sustained and ~2.16 TB/day of video; running detection on the camera and shipping only events is tens of kilobytes a night plus a few clips. Put the routes side by side and the shape is clear: the edge route trades a higher per-camera hardware cost for near-zero bandwidth and storage; the cloud route trades low upfront cost for an ongoing bandwidth-and-subscription bill that scales with every camera you add.
| Cost driver | Scales with | Motion-detection baseline | AI / edge approach |
|---|---|---|---|
| Operator review | Alerts per night | ~5,000 alerts (≈ 21 h) → ignored | ~200 alerts (≈ 50 min) → actioned |
| False-alarm rate | Environment | Trees, shadows, rain all alert | ~90–95% fewer false alarms |
| Uplink bandwidth | Cameras streamed off-site | ~200 Mbps for 50 cameras (cloud) | tens of KB/night of metadata (edge) |
| Video stored/egressed | Hours retained | ~2.16 TB/day raw (cloud) | event clips only (edge) |
| Camera hardware | Per camera | cheaper "dumb" camera | edge-AI camera costs more upfront |
The numbers are illustrative and move with camera specs, codecs, and provider prices; the point is the shape. The expensive parts of a cloud-everything design — sustained uplink, storage, egress — are exactly the parts on-camera analytics make nearly free, while the expensive part of an edge design — smarter cameras — is a one-time capital cost. The per-feature cost method behind all three routes is in the real cost of AI in video lesson.
The gate every deployment passes: privacy, lawful basis, and retention
This is where a surveillance roadmap quietly becomes a legal one. Treat what follows as engineering-relevant context, not legal advice — confirm specifics with a qualified lawyer for the jurisdictions you operate in.
A lawful basis and notice come first. Under GDPR you cannot point a camera at people just because you can; you need a lawful basis (commonly "legitimate interests", balanced against the privacy intrusion and documented), and you must tell people they are being filmed with clear signage. A camera that systematically monitors a publicly accessible area generally requires a data protection impact assessment before it goes live — a written analysis of the risk and the mitigations — and for any biometric identification that DPIA is mandatory and the data is special-category, demanding a stronger legal basis still.
Data minimisation and retention come second. The privacy-respecting design keeps as little as it can for as short as it can: store event clips rather than continuous footage where the use case allows, set a retention period and delete on schedule rather than hoarding video indefinitely, and prefer architectures — on-camera analytics, metadata instead of raw video — that never move personal imagery off the device in the first place. GDPR also expects the footage itself to be protected: encrypted in transit and at rest, with access limited to the people who genuinely need it.
The EU AI Act layer comes third, on top of GDPR, not instead of it. For event analytics the Act is light-touch and the work is the GDPR work above. For biometric identification the Act adds the full high-risk regime — risk management, data governance, human oversight, accuracy and bias testing, logging, technical documentation, and a conformity assessment from 2 August 2026 — and for real-time remote identification in public spaces it removes the option entirely outside narrow law-enforcement exceptions. The two regimes stack: a face-recognition deployment in the EU has to satisfy GDPR and the AI Act, and the broader regulatory engineering across both is in the EU AI Act lesson.
The rule across all three gates is the same one that governs the whole article: a human stays in control of anything that affects a person. The lawful basis is your authority to film; minimisation and retention are your discipline about what you keep; the AI Act is the ceiling on what the camera may recognise. A deployment that respects all three is an asset; one that skips any of them is a liability waiting for a regulator.
The playbook: from "add AI to the cameras" to a deployed system
Put the pieces together and adding AI to a surveillance system reduces to four questions, asked in order.
Figure 5. The playbook in one path. Sort the feature by whether it identifies a person, place the analytics in the right tier, enforce flag-and-verify, and pass every deployment through the privacy-and-compliance gate.
First, sort the feature: does it identify a specific person — face recognition, watchlist matching, biometric ID? If no, it is standard event analytics; deploy it under GDPR with signage, retention, and an impact assessment. If yes, it is high-risk biometric under Annex III point 1; scope it as a separate compliance product with fairness testing and human oversight, and never run real-time remote identification in a publicly accessible space, which is prohibited outright. Second, place the analytics: on the camera for low bandwidth and privacy, on a server to correlate one person across many cameras, or in the cloud for maintenance-free scale — and expect to blend them. Third, the flag-and-verify rule: let the AI flag a detection or propose a match, and put a human on the decision before any consequence — never auto-lock a door, dispatch a guard, or act on an identification from an AI flag alone. Fourth, and without exception, the compliance gate: establish a lawful basis and post signage, complete a DPIA, set retention limits and encryption, and for any biometric feature confirm the EU AI Act high-risk obligations and the NDAA hardware rules before launch.
That is the entire playbook. The deeper lessons in this section are the manuals for each box — YOLO detection and multi-object tracking for the event analytics, anomaly detection for the "is this normal?" layer, retail and industrial intelligent video analytics for the behaviour-counting bucket, face detection under the EU AI Act for the identification components you should approach with the most caution, and the video investigator agent for searching the archive after the fact.
Where Fora Soft fits in
We build the video-surveillance, computer-vision, and streaming platforms that these analytics live inside — VMS dashboards, edge-analytics pipelines, and the camera integrations underneath them — so we run this playbook with clients regularly. When a client wants to deploy fast on standard hardware, we integrate AI cameras and an analytics package over ONVIF Profile M into a VMS and wire the lawful-basis, signage, and retention decisions in first. When privacy or independence is the point, we build on owned pipelines — running open detection models on edge silicon so the video never leaves the device, correlating events on an on-premises server, and designing data minimisation and the flag-and-verify pattern into the flow from the first sprint. When a client raises face recognition or biometric identification, we treat it as the separate, high-risk product it is, with its own legal sign-off. The four questions in this playbook are the same ones we weigh in scoping calls when a surveillance client asks where AI belongs in their system.
What to read next
- Face detection / recognition under the EU AI Act
- Computer vision in retail + industrial + intelligent video analytics
- Video investigator agent — surveillance use case
Talk to us / See our work / Download
- Talk to a video engineer — scope AI into your surveillance system: book a 30-minute call.
- See our work — computer-vision and video-surveillance systems we have shipped: computer vision for video surveillance.
- Download the AI Surveillance Engineering & Compliance Decision Sheet — the feature catalog, the three places the AI runs, the legal tiers, the build-versus-buy split, the false-alarm and bandwidth math, and the privacy-and-compliance gate on one page: AI Security Camera + Intelligent Video Analytics — Engineering & Compliance Decision Sheet.
References
- Regulation (EU) 2024/1689 (EU AI Act) — Article 5 (Prohibited AI practices). Bans real-time remote biometric identification in publicly accessible spaces for law enforcement except narrow, judicially authorised cases; bans emotion recognition in workplaces and education; bans untargeted scraping of facial images. In force since 2 Feb 2025; penalties up to €35M or 7% of global turnover. Read directly from the consolidated Article 5 text. Tier 1 (official EU regulation). https://artificialintelligenceact.eu/article/5/
- Regulation (EU) 2024/1689 (EU AI Act) — Annex III, point 1 (Biometrics). Classifies remote biometric identification systems, biometric categorisation, and emotion recognition (where not prohibited) as high-risk; high-risk obligations (risk management, data governance, human oversight, accuracy/robustness, logging, conformity assessment) apply from 2 Aug 2026. Tier 1 (official EU regulation). https://artificialintelligenceact.eu/annex/3/
- Regulation (EU) 2016/679 (GDPR) — Articles 9, 32, 35. Biometric data used to identify a person is special-category data (Art. 9); processing requires security incl. encryption (Art. 32); a Data Protection Impact Assessment is required for high-risk processing such as systematic monitoring of public areas and biometric identification (Art. 35). The data-protection floor for any CCTV deployment. Tier 1 (official EU regulation). https://eur-lex.europa.eu/eli/reg/2016/679/oj
- ONVIF — Profile M (Metadata and events for analytics applications). Standardises analytics metadata and events — object classification, counting, face- and license-plate events — between cameras, VMS, and software, including JSON events over MQTT, so multi-vendor AI cameras interoperate. Released by the ONVIF standards body; see also Profiles S/T/G for streaming and recording. Tier 6 (industry standards body). https://www.onvif.org/profiles/profile-m/
- US National Defense Authorization Act, Section 889 + FCC "Covered List". Bars US federal agencies and contractors from using telecommunications/video-surveillance equipment from named manufacturers (incl. Hikvision, Dahua); removes a large share of global camera supply from eligible bids; NDAA compliance is now a common enterprise procurement requirement. Tier 1 (official US statute/regulatory list). https://www.fcc.gov/supplychain/coveredlist
- Ambarella — CV7 edge-AI vision SoC (announced CES, January 2026). Third-generation CVflow AI accelerator, >2.5× AI performance over CV5, ~20% lower power on Samsung 4nm, up to 8K; targets multi-imager enterprise security cameras. Illustrative of 2026 on-camera edge-AI silicon (alongside NVIDIA Jetson AGX Orin at 275 TOPS and Hailo-8 for single/few-camera systems). Tier 4 (vendor / deployer). https://www.ambarella.com/news/ambarella-launches-powerful-edge-ai-8k-vision-soc-with-industry-leading-ai-and-multi-sensor-perception-performance/
- Bay Alarm / Sirix Monitoring / Guardian — AI vs. motion-detection false-alarm reduction. AI object classification reduces false alarms by ~90–95% vs. pixel-based motion detection; an outdoor parking-lot camera produces ~80–120 motion events/night with only ~2–3 genuine, filtered to ~2–5 flagged events with AI. The quantified value of object classification over motion detection. Tier 7 (industry/vendor analysis). https://www.bayalarm.com/blog/ai-motion-sensors-false-alarm-era/
- Mordor Intelligence / Grand View Research / Fortune Business Insights — market sizing (2026). Video analytics market ≈ $15B; video surveillance systems market ≈ $64–72B; VSaaS ≈ $7.6B growing ~11% CAGR. Estimates vary by definition and source. Tier 7 (analyst). https://www.mordorintelligence.com/industry-reports/video-analytics-market
- European Commission — Guidelines on prohibited AI practices and on high-risk classification (2025). Clarify the line between "remote" biometric identification and active, conscious verification (a person presenting to a sensor is out of scope; scanning a crowd is in scope) and the Annex III high-risk criteria. Tier 1 (official EU guidance). https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai
- Fora Soft — "Edge AI vs Cloud AI for Video Surveillance: 2026 Latency & Cost Breakdown" and "ONVIF Profile M in 2026" and "Video Surveillance Management Systems: The 2026 Buyer & Builder Playbook." First-party engineering write-ups on edge-vs-cloud topology trade-offs, the Profile M metadata standard, and VMS build-vs-buy. Tier 4 (deployer engineering blog). https://www.forasoft.com/blog/article/edge-ai-vs-cloud-ai-video-surveillance


