
Key takeaways
• Edge wins on latency by an order of magnitude. Edge AI inference for video surveillance lands in 20–100 ms; round-trip cloud inference takes 300–800 ms. For real-time alerts, that’s the difference between intervention and after-the-fact review.
• Cloud-only egress quietly bankrupts you above ~20 cameras. A 100-camera 1080p deployment streaming to the cloud burns roughly $76,000 a year in AWS egress alone. Edge inference + selective upload cuts that to about $1,500.
• The EU AI Act of August 2026 makes edge processing the safer default. Remote biometric identification on CCTV is now classified high-risk; keeping raw video on-device shrinks your compliance surface for GDPR, HIPAA, BIPA, and CCPA.
• Hybrid is the 2026 production pattern. Verkada, Rhombus, Avigilon Unity, and Axis ACAP all do edge inference plus cloud orchestration plus async retraining. Pure-cloud surveillance is contracting; AWS Panorama shuts down on May 31, 2026.
• The right answer depends on five questions. Camera count, latency budget, regulatory exposure, network reliability, and model-update cadence dictate whether you go edge-primary, cloud-primary, or hybrid — and we walk you through each below.
Why this debate matters in 2026
Three forces collided this year and forced every CTO running surveillance to revisit their architecture: the EU AI Act’s high-risk classification of remote biometric identification took effect on August 2; AWS Panorama announced its end-of-life for May 31, 2026; and the price of capable edge accelerators dropped to where a Hailo-8 module costs less than a single month of cloud-GPU inference.
The result: every new video surveillance project has to defend its compute placement on the merits. “We send everything to the cloud” is no longer a defensible default in front of legal, finance, or operations. This article gives you the numbers, the trade-offs, and a five-question framework to make that call — aimed at teams scoping a custom AI surveillance product, not a 5-camera retail rollout.
Why Fora Soft wrote this playbook
Fora Soft has built video and AI products since 2005. Across 600+ shipped projects, our work concentrates around three things: real-time video pipelines, computer-vision detection and tracking, and surveillance-grade reliability. We use spec-driven agent engineering to compress what used to be 6-month builds into 8–12 weeks — and we bring that into pricing.
A few projects shape this playbook. V.A.L.T. is a video evidence platform used by 650+ US police, child-advocacy, and medical organizations — 2,500 IP cameras, 25,000 daily users, $9.7M revenue. DSI Drones is an aerial surveillance system streaming RF-link video with on-device vehicle and human detection. Netcam Studio is a video surveillance web UI built on YOLOv8/YOLOv9 + DeepSORT. We’ve made every architectural decision in this article in production at least once.
Stuck choosing edge or cloud for your surveillance build?
Bring us the camera count, latency target, and compliance constraints. We’ll sketch a hybrid architecture and a budget on a 30-min call.
Edge AI vs Cloud AI: the 60-second answer
Edge AI runs detection and tracking on or beside the camera — on a smart sensor, an embedded SoC like Hailo-8, or a small NVIDIA Jetson. Frames never leave the building unless an alert fires. Cloud AI ships every frame (or every keyframe) over the network to a remote GPU, runs inference there, and sends results back. Hybrid does edge inference for real-time detection, then forwards a small slice — alert clips, embeddings, or low-rate snapshots — to the cloud for long-horizon analytics, retraining, and fleet-wide reasoning.
If you want the answer in one sentence: edge for the alert, cloud for the insight, and a thin pipe between them. The rest of the article is the math behind that recommendation.
Latency: tens of ms vs hundreds of ms
Latency is the easiest comparison and the most consequential one for surveillance. Detection latency determines whether a guard, a turnstile, or an automated alert can intervene before a person walks past a checkpoint. We measure four legs of the pipeline.
| Pipeline leg | Edge AI (typical) | Cloud AI (typical) | Notes |
|---|---|---|---|
| Capture & encode | 15–33 ms | 15–33 ms | Identical — same hardware path. |
| Network upload | 0 ms | 40–200 ms | RTT to nearest cloud region; jitter on cellular. |
| Inference | 8–30 ms (YOLOv8n INT8 on Orin Nano) | 10–25 ms (YOLOv8m FP16 on T4) | Cloud uses a bigger model; edge uses a quantized smaller one. |
| Result delivery | 2–10 ms (LAN webhook) | 40–200 ms (return RTT) | Add another network hop on the way back. |
| End-to-end | 25–100 ms | 300–800 ms | Cloud over cellular regularly exceeds 1.2 s. |
For point-of-sale loss prevention, perimeter intrusion, manufacturing defect rejection, and child-safeguarding alerts, you cannot afford 800 ms. A person walks one stride in that window. For trend analytics — foot traffic heatmaps, dwell time, queue length over a shift — cloud latency is irrelevant; the question is throughput and cost, not milliseconds.
Reach for edge inference when: your alert needs to fire in under 200 ms, the camera is on cellular or spotty Wi-Fi, or the use case is safety-critical (intrusion, fall detection, line-stop).
Cost breakdown: edge hardware vs cloud egress vs SaaS
Three separate cost shapes show up on every surveillance budget. Read them as competing line items, not interchangeable: edge dollars are CapEx, cloud dollars are OpEx that scales with cameras and hours.
Edge hardware per camera (2026 list prices)
NVIDIA Jetson Orin Nano Super lists at $249 with 67 TOPS at 7–15 W. Hailo-8 modules sit around $150–200 with 26 TOPS at 2.5 W — the best TOPS-per-watt number in the category. Google Coral USB and Coral M.2 sit between $50 and $100 for lighter workloads. Smart cameras with built-in AI (Hanwha Wisenet 9, Axis embedded analytics with ACAP) ship at $300–800 per camera with the inference silicon already inside.
Cloud GPU inference
A continuously running AWS p3.2xlarge instance (NVIDIA V100) costs about $3.06/hour on demand — roughly $22,000/month per GPU. One GPU can comfortably serve 25–40 1080p·30fps streams with YOLOv8m, depending on batch size and post-processing. Spread that GPU cost across the streams and you land near $550 per camera per month for cloud-only inference, before storage and bandwidth.
Closed-cloud SaaS subscriptions
Verkada lists at $300–500 per camera per year (proprietary hardware required). Eagle Eye Networks runs $5–30 per camera per month on open ONVIF cameras. AWS Panorama (until shutdown on May 31, 2026) was $8.33/camera/month. Subscriptions are easy to start and lock you in fastest.
| Architecture | Year-1 / 100 cameras | Year-3 / 100 cameras | Lock-in |
|---|---|---|---|
| Edge DIY (Jetson + open-source models) | ~$45k hardware + integration | ~$60k | Low |
| Smart cameras (Hanwha / Axis ACAP) | ~$60k hardware + light integration | ~$70k | Medium (vendor SDK) |
| Cloud SaaS (Verkada-class) | ~$95k hardware + ~$40k subscription | ~$215k | High (proprietary cameras) |
| Cloud-only inference (DIY GPU) | ~$30k cameras + ~$80k egress + ~$260k GPU | ~$1.05M | Cloud-vendor |
| Hybrid (edge inference + cloud orchestration) | ~$55k hardware + ~$2k cloud | ~$70k | Low–medium |
Read the table from right to left. By year three, hybrid is roughly 15× cheaper than pure cloud inference and pays back the up-front edge investment in under 12 months for any deployment north of about 20 cameras with continuous streams.
Reach for cloud-only when: you have under 10 cameras, you genuinely don’t need real-time alerts, and you’re fine with a SaaS subscription that may double in price at renewal.
Bandwidth math: where cloud-only quietly bleeds money
An H.264 1080p·30fps stream averages around 4 Mbps. Multiply across cameras, hours, and 30 days, and the egress bill becomes the largest single line item in a cloud-first architecture — ahead of GPU compute, ahead of storage.
| Scenario | Aggregate uplink | Annual egress (AWS $0.09/GB) |
|---|---|---|
| 10 cameras, full upload, 24/7 | 40 Mbps | ~$14k |
| 100 cameras, full upload, 24/7 | 400 Mbps | ~$140k |
| 100 cameras, edge filters 90% frames | 40 Mbps avg | ~$14k |
| 100 cameras, edge inference + alert clips only | ~4 Mbps avg | ~$1.4k |
Two non-obvious points. First, AWS egress prices drop with volume tiers, but inbound to the cloud is free — what kills you is fetching results, dashboards, and clip exports back out. Second, ISP uplinks become the bottleneck before egress costs do. A 400 Mbps sustained upload from a single retail location requires a business-class symmetric circuit; most stores have 50 Mbps up. Edge inference makes that physical constraint disappear.
Accuracy: what INT8 quantization actually costs you
The fair criticism of edge AI is that you’re running a smaller, quantized model. The fair answer is that the gap is far narrower than it was three years ago. Modern post-training INT8 quantization with proper calibration loses 5–8% mAP on YOLOv8s and delivers 30–50% faster inference. Quantization-aware training closes most of that to under 3%. For typical surveillance classes — person, vehicle, package, weapon, fire — the practical detection rate is indistinguishable.
Cloud’s real accuracy advantage isn’t in raw object detection. It’s in reasoning: correlating across 50 cameras to track a person of interest, applying a vision-language model to describe an unusual scene, or feeding multi-modal context into a behavior classifier. Edge devices can’t fit a 7B-parameter VLM in 8 GB of unified memory. That’s the case for hybrid: detect at the edge, reason in the cloud.
Practical edge model recipe
1. Pick the smallest model that hits your mAP floor. YOLOv8n or YOLOv8s for most surveillance; reserve YOLOv8m for forensic search in the cloud.
2. Quantize with calibration data from your actual cameras. Sampling 500–1000 frames covering day, night, weather, and crowd conditions removes most of the accuracy gap.
3. Pair detection with ByteTrack or BoT-SORT. Trackers reduce false positives by enforcing temporal consistency — a one-frame mistake doesn’t fire an alert. Our deeper guide covers this end-to-end: Build Custom AI Video Surveillance with YOLO, ByteTrack, BoT-SORT & DeepSORT.
4. Send hard cases to the cloud. Anything below your confidence threshold is a candidate for re-inference with a bigger model and for adding to the retraining set.
Reach for cloud-side reasoning when: you need to correlate detections across cameras, run a vision-language model for scene description, or apply a per-customer behavior classifier that’s too heavy for an embedded SoC.
Privacy and the EU AI Act of August 2026
As of August 2, 2026, real-time remote biometric identification on CCTV is high-risk under the EU AI Act. It’s not banned, but it carries onerous obligations: conformity assessment, fundamental rights impact assessment, registered logging, and human oversight. Biometric scraping from CCTV feeds is outright prohibited, as is emotion inference in workplace and education. The financial exposure for non-compliance reaches €35M or 7% of global turnover.
Edge processing is the cleanest compliance posture for three reasons. First, raw video never leaves the building, which sidesteps cross-border data transfer issues under GDPR Chapter V. Second, you can anonymize at capture — blur faces, drop license plates — before any data is stored, which satisfies data-minimization (Article 5). Third, you maintain a clear data-controller boundary; nobody at AWS, Azure, or GCP touches your subjects’ biometric data.
The same logic applies in other jurisdictions. BIPA in Illinois requires explicit consent for biometric collection — edge processing without retention can avoid the trigger. HIPAA covers patient surveillance footage in US healthcare; keeping that footage on a hospital’s VLAN is dramatically easier to justify than shipping it to a third-party cloud. NDAA Section 889 prevents US federal agencies from using surveillance equipment with foreign cloud dependencies. We dig deeper into the trust and ethics layer in our companion piece on 2026 AI Surveillance Trends and Ethics.
Reach for edge anonymization when: your deployment touches EU residents, hospitals, schools, government buildings, or any subject category where consent is impractical at scale.
Reliability and offline operation
Cloud-only surveillance has a hard dependency on the internet link. When the ISP drops, the cameras are dumb pixel sources. Edge AI keeps detecting, alerting via local sirens, IO relays, or LAN webhooks, and buffering events until the connection returns. In our V.A.L.T. deployments serving police evidence chains, that buffering is a legal requirement — a missing 30 minutes of footage during a network outage breaks chain of custody.
Architectures that mix edge buffering with cloud sync are objectively more reliable than either pure end. Edge devices store 24–72 hours locally on SD or NVMe (Hanwha Wisenet smart cameras ship with up to 4 TB), then trickle-upload during off-peak hours. Mean time to recover after an outage drops to seconds, not the 15 minutes–2 hours you see when a cloud-only system has to re-sync state.
Need offline-resilient surveillance you actually own?
We design hybrid systems where the edge keeps running when the cloud is unreachable — and your data stays where compliance demands.
Hybrid architecture: what production systems actually do
Look at the surveillance vendors growing fastest in 2026 and they all converge on the same shape: edge inference, cloud orchestration, async retraining. Verkada, Rhombus, Avigilon Unity, and Axis ACAP plus a custom backend all fit this pattern. We build the same pattern for our clients.

Figure 1. The 2026 hybrid reference architecture for AI video surveillance.
What lives on the edge
Real-time detection (YOLO + tracker), face/plate anonymization, alert rules, ring-buffer storage of the last 24–72 hours, and a watchdog that retries cloud connectivity. The edge is the single source of truth for “what just happened in this frame.”
What lives in the cloud
Fleet management (configuration, model versioning, OTA updates), long-term cold storage of alert clips, cross-camera reasoning (re-identification, person-of-interest tracking across a campus), VLM-driven scene description, dashboards, and the retraining pipeline that pulls hard examples back from the edge.
What flows between them
Three thin streams: alerts (small JSON + clip), embeddings (a few KB per detection for retrieval), and selective hard-case frames (low rate). Total uplink stays well under 100 kbps per camera even during busy periods. Our pieces on integrating video analytics with surveillance and real-time video processing best practices walk through the streaming details.
Vendor pricing: Verkada, Eagle Eye, Rhombus, DIY edge
If you’re evaluating a build vs. buy decision, here’s the snapshot we share with clients. Numbers are list-price, US, 2026.
| Vendor | Camera cost | Recurring | Architecture | Lock-in |
|---|---|---|---|---|
| Verkada | $500–3,000 | $300–500/cam/yr | Cloud-managed, edge-processed (proprietary) | High |
| Eagle Eye Networks | $200–800 (any ONVIF) | $60–360/cam/yr | Cloud-first VMS | Medium |
| Rhombus Systems | $600–1,000 | $100–250/cam/yr | Hybrid (edge + cloud orchestration) | Medium |
| Axis + ACAP | $300–800 | $0–500/site for analytics | Edge-extensible camera | Medium |
| Hanwha Wisenet 9 | $400–600 | VMS optional | Smart camera with on-board AI | Low–medium |
| DIY edge (Jetson + open-source) | $249 + camera | Engineering | Custom hybrid | Lowest |
For most product teams the practical pick is “smart cameras + custom backend” or “DIY edge + custom backend.” The former trades engineering hours for vendor maturity; the latter trades vendor maturity for full control over the cost curve and the model. We’ve shipped both.
Use-case decision rules across 7 industries
| Industry | Recommended placement | Why |
|---|---|---|
| Retail loss prevention | Hybrid | Edge for shoplift alerts; cloud for cross-store correlation and shrinkage analytics. |
| Manufacturing | Edge-primary | Defect/PPE detection needs <100 ms; IP-sensitive footage stays on the plant network. |
| Logistics & warehousing | Edge | Spotty connectivity; pallet/SKU detection at the dock door. |
| Smart cities & traffic | Hybrid | Edge for signal control; cloud for city-wide planning. |
| Healthcare & eldercare | Edge-only | HIPAA + patient dignity; fall-detection alerts to the floor in <1 s. |
| Critical infrastructure / government | Edge + on-prem | NDAA, data residency, sovereign clouds. |
| SMB / residential | Cloud SaaS | 5–10 cameras, low complexity, subscription is fine. |
A decision framework — pick edge, cloud, or hybrid in five questions
1. How many cameras and what bitrate? Under 10 cameras at low bitrate — cloud SaaS is fine. Above 20 at 4 Mbps each — edge or hybrid pays for itself within a year on bandwidth alone.
2. What’s the alert latency budget? Under 200 ms — edge. 200 ms–1 s — hybrid. Multi-second is acceptable — cloud works.
3. What regulators apply? EU AI Act high-risk, GDPR, HIPAA, BIPA, NDAA — edge with anonymization at capture is the safest default. CCPA/general commercial — cloud is workable with the right DPA.
4. How reliable is the network? Industrial sites, vehicles, remote facilities — edge so the system survives outages. Stable corporate Wi-Fi — cloud or hybrid.
5. How often do you retrain? Quarterly or rarer — edge with periodic OTA updates. Weekly or daily — you need the cloud retraining loop, but inference can still run on the edge.
If three or more answers point to edge, design hybrid with edge-primary. If three or more point to cloud, design hybrid with cloud-primary. Pure-edge and pure-cloud are corner cases.
Mini case: V.A.L.T. — 2,500 cameras, 650 organizations, 25,000 daily users
Situation. A US-based video evidence platform serving police, child-advocacy centers, and medical organizations. The system needed to handle PTZ control, scheduled and triggered recordings, push-to-talk, and a tamper-evident chain of custody — all over rooms with mixed network quality.
What we built. Camera-side recorder agents that run locally on each station (edge-buffered evidence with cryptographic hashes), plus a cloud orchestration layer for scheduling, search, and access control. Detection runs on the camera; correlation, search, and audit logs live in the cloud. The system gracefully tolerates ISP outages because the recorder keeps the chain of custody intact.
Outcome. 2,500 IP cameras under management, 25,000 users a day, 650 customer organizations, $9.7M revenue. Zero chain-of-custody failures across hundreds of investigations. Read the full V.A.L.T. project page or book a 30-min review if you’re building something similar.
Five pitfalls we see teams hit
1. Buying the smallest edge box that benchmarks well. Jetson Orin Nano clocks great in a lab. In a 40 °C electrical closet, it throttles. Spec the next size up or commit to active cooling and proper enclosures from day one.
2. Skipping the retraining loop. An edge model deployed once and never updated drifts. After 12–18 months, false-positive rates climb noticeably. Plan the cloud-side hard-case collection and OTA model update path before launch.
3. Letting cameras dictate compute placement. Buying Verkada-class cameras locks you into Verkada cloud. Buying open ONVIF cameras keeps options open. Make the architectural decision before the procurement decision.
4. Underestimating storage at the edge. A camera writing 4 Mbps continuously fills 4 TB in 90 days. Either size SD/NVMe accordingly or design event-only retention with a clear policy on what counts as an event.
5. Forgetting AWS Panorama is going away. If your current architecture leans on Panorama, you have until May 31, 2026 to migrate. Edge-native + Lambda or Fargate for orchestration is the cleanest replacement.
KPIs to measure
Quality KPIs. Detection mAP per class against your own validation set (not COCO). Tracker ID-switch rate. False-positive alerts per camera per week — target under 5 for a healthy deployment.
Business KPIs. Cost per camera per month all-in. Time from alert to operator acknowledgement (median should be under 30 s). Customer-reported incidents the system missed — trend it down quarter over quarter.
Reliability KPIs. Edge uptime (99.9% target on industrial deployments). Mean-time-to-recover after a network outage (target <30 s thanks to edge buffering). Percent of alert clips successfully synced to cloud within 60 minutes of generation.
When NOT to choose edge
Edge is wrong when the use case fundamentally needs cross-camera reasoning at low latency — for example, real-time person-of-interest tracking across a 50-camera campus where a single edge box can’t see the whole graph. It’s also wrong if your team is small, you have under 10 cameras, and you don’t want to maintain firmware. In those cases a cloud SaaS is fine; just price the lock-in into your three-year plan.
Cloud-only is wrong almost everywhere except trend analytics and SMB. The middle ground — hybrid — covers 80% of serious surveillance products in 2026.
Ready to validate your edge-vs-cloud math?
Send us your camera count, latency target, and budget. We’ll redline a hybrid architecture and give you a delivery estimate — agent-engineered, so it’s faster than you expect.
FAQ
Is edge AI always cheaper than cloud AI for video surveillance?
Above roughly 20 cameras with continuous streams, yes — the bandwidth and cloud-GPU bill outpaces edge hardware within 12 months. Below 10 cameras with low duty cycles, cloud SaaS can win on total cost because you avoid integration engineering. The crossover is sensitive to bitrate, hours, and whether you need real-time alerts.
How much accuracy do I lose by quantizing a YOLO model to INT8 for edge inference?
With proper calibration on representative data, post-training INT8 typically costs 5–8% mAP. Quantization-aware training brings that under 3%. For surveillance classes (person, vehicle, package, weapon), the practical detection rate is indistinguishable from the FP32 model.
Does the EU AI Act prohibit cloud-based facial recognition on CCTV?
No. It classifies real-time remote biometric identification as high-risk — restricted, heavily regulated, but not banned outright. Specific applications, like emotion inference at work or in schools, and untargeted scraping of CCTV/internet for face databases, are prohibited. Edge processing simplifies compliance because raw biometric data never leaves the building.
What hardware should I pick for an edge AI surveillance prototype?
For one or two cameras and a fast prototype, NVIDIA Jetson Orin Nano Super ($249) is the easiest start — it runs YOLOv8n at real-time framerates and the entire NVIDIA tooling chain works out of the box. For dense deployments where power and heat matter, Hailo-8 modules ($150–200) deliver more inference per watt. Smart cameras (Hanwha Wisenet 9, Axis with ACAP) skip the integration step entirely.
How do I keep edge models from drifting over time?
Build a hard-case collection pipeline. When the edge model returns low-confidence detections or your operators correct an alert, push that frame plus metadata to the cloud for retraining. Retrain quarterly, validate against your held-out set, and ship updates over the air. Without this loop, expect noticeable false-positive growth after 12–18 months.
Can I run a vision-language model on the edge for scene description?
Small VLMs (1–3 B parameters, quantized) run on Jetson AGX Orin and high-end industrial PCs in 2026. For typical Orin Nano deployments, save VLM scene description for the cloud and trigger it only on alerts — you don’t need it on every frame.
What replaces AWS Panorama after the May 2026 shutdown?
There is no drop-in replacement from AWS. The clean migration path is custom edge inference (Jetson, Hailo, or smart cameras) plus a cloud orchestration layer running on Lambda, Fargate, or your own Kubernetes. We’ve done several of these migrations and the architecture is straightforward; the timeline is the constraint.
How long does a custom hybrid surveillance build take with Fora Soft?
Because we use spec-driven agent engineering, a working pilot of 5–10 cameras with edge inference, alert pipeline, and a basic cloud dashboard typically takes 8–12 weeks. Production rollout depends on the camera count, certifications, and integrations. Bring us a scope and we’ll give you a number on a call.
What to Read Next
Surveillance architecture
YOLO + ByteTrack + BoT-SORT + DeepSORT 2026 Guide
The detection and tracking stack that runs under the hood of every edge AI surveillance build.
Privacy & trust
2026 AI Surveillance Trends: Data Quality & Ethics
EU AI Act, GDPR, and the trust playbook for biometric surveillance products.
Real-time monitoring
Why Use AI for Video Anomaly Detection
A buyer’s playbook for ML-driven anomaly detection on edge and hybrid stacks.
AI enhancement
Generative AI & Contextual Video Intelligence
From pure detection to intent understanding with cloud-side VLMs and reasoning.
Engineering practices
Real-Time Video Processing with AI: Best Practices
Architecture patterns and latency budgets from 625+ shipped video projects.
Ready to design your hybrid surveillance architecture?
The 2026 answer for serious AI video surveillance is hybrid: edge for the alert, cloud for the insight, and a thin pipe between them. Edge wins on latency, bandwidth, privacy, and offline reliability; cloud wins on cross-camera reasoning, retraining, and long-horizon analytics. Pure-cloud is contracting, pure-edge is fragile, and the vendors growing fastest are the ones that picked the middle.
If you’re scoping a custom AI surveillance product, the technology choices are well-understood. The hard part is fitting them to your camera count, latency budget, regulatory exposure, network reality, and retraining cadence. That’s the conversation we have with prospective clients on a 30-min scoping call — bring the constraints and we’ll bring the architecture and a delivery estimate.
Talk to a team that has shipped 600+ video and AI products
Edge inference, cloud orchestration, OTA model updates, EU AI Act-ready data flows. We do this for a living — and faster than you expect, because of agent engineering.



.avif)

Comments