
Key takeaways
• ML anomaly detection is a buy-or-build call, not a research project. Platforms like Avigilon, Verkada and BriefCam cover 80% of generic anomalies; custom builds pay off above ~200 cameras or when the anomaly class is domain-specific.
• Architecture beats algorithm. Edge inference on Jetson Orin Nano ($249, 40 TOPS) delivers sub-100 ms alerts; pushing raw 4K streams to the cloud adds 200–2000 ms and a painful bandwidth bill.
• False positives, not accuracy, kill deployments. Untuned systems fire 30–70% false alerts; adaptive thresholds plus temporal filtering cut that by 67% without losing true positives.
• GDPR & the EU AI Act narrow your options. Face recognition and real-time remote biometric ID are prohibited or high-risk; object/behavior anomaly detection (loitering, intrusion, crowd density) stays compliant.
• Fora Soft has shipped this exact stack. Our V.A.L.T video platform runs production anomaly detection for regulated clients, including courtroom-grade deployments in Kazakhstan, with 100% on-time delivery across 21 years of video work.
Why Fora Soft wrote this playbook
Fora Soft has built video and multimedia products since 2005. Over those 21 years we’ve shipped 250+ projects, and video surveillance, AI-assisted video analytics and real-time streaming are the spine of our portfolio. Our engineers wire up detection pipelines on top of WebRTC, RTMP/HLS, RTSP and NVIDIA DeepStream, run them on Jetson edge boards and AWS/Hetzner clusters, and keep a 100% on-time delivery record across those projects.
The clearest case study is V.A.L.T, our video surveillance and observation platform. V.A.L.T is deployed in regulated environments (law enforcement, clinical settings, a courtroom project in Kazakhstan) where false alerts are unacceptable and audit trails are mandatory. That operational reality—not a Kaggle leaderboard—shapes the recommendations in this guide.
The rest of this playbook answers the questions buyers actually ask us on intro calls: which model families work, what good architecture looks like, what the real costs are, where GDPR and the EU AI Act stop you, and when it’s cheaper to buy an off-the-shelf VMS instead of building one. Use it to scope your own project—or send us the spec and we’ll scope it with you.
Scoping an ML-based video surveillance product?
Bring us your camera count, anomaly classes and compliance zone. We’ll tell you in one call whether to build, buy or go hybrid—and what it realistically costs.
What counts as a video anomaly in a security context
Before picking a model or a vendor, nail the anomaly taxonomy. Most “AI surveillance” failures happen because the buyer never separated “anomalous pixels” from “anomalous events” from “policy violations”. These are three different problems, with three different cost curves.
1. Object-level anomalies. Something that shouldn’t be there: a person in a restricted zone, a vehicle on a pedestrian path, an abandoned bag. Solvable with mature object-detection models (YOLOv8, RT-DETR) plus geofence rules. Commodity work.
2. Motion & behavior anomalies. Running in a no-running zone, loitering, a fall, a fight, a sudden crowd dispersion. These need spatio-temporal models (I3D, TimeSformer, Video Swin) and 3–10 seconds of context. Solvable but not trivial.
3. Contextual & policy anomalies. “This door should never open after 9 PM.” “This operator never logs in from outside the country.” These combine video events with access-control, schedule and identity data. The ML part is the easy bit; the integration with your existing systems is where 60% of the timeline goes.
Reach for object-level only when: you already know the exact object and zone you care about (intruder in perimeter, vehicle in fire lane), and false-positive tolerance is medium. Behavior and context models are overkill for these use cases.
Market snapshot: why every VMS vendor now ships an AI SKU
The AI video surveillance market sat at roughly $5.0–6.5 B in 2024 and is projected to reach $12–28 B by 2030, growing at a 20–31% CAGR depending on which analyst you believe (MarketsAndMarkets, Grand View Research). The broader video analytics segment—which includes retail, traffic and industrial use cases—hits about $37.8 B by 2030 at a 19.5% CAGR.
Three forces are driving that curve. First, edge silicon got cheap: a Jetson Orin Nano at $249 now runs models that needed a $5k workstation in 2020. Second, pre-trained backbones (I3D, Video Swin, SAM) collapsed the data-collection cost for buyers who don’t have a million labeled frames lying around. Third, vendors have proven they can cut operator workload—Motorola’s Avigilon reports ~90% reduction in false alarms with its self-learning Unusual Motion Detection.
The practical read: if you’re commissioning this product in 2026, you’re not on the bleeding edge, you’re adopting a tech stack that’s been production-hardened for 3–5 years. That’s why estimates are falling, not rising.
Model families that actually ship to production
Five architectures cover 95% of real deployments. Pick by how much labeled data you have, how tight your latency budget is, and whether your anomalies are spatial, temporal or both.
CNN backbones (ResNet, DenseNet, EfficientNet)
Why pick it. Fastest inference (2–5 TFLOP/frame), tiny memory footprint, every edge runtime supports it. Perfect for per-frame object or scene classification.
Limits. No temporal reasoning. A CNN cannot tell “a person walked into a bank” from “a person ran into a bank”. High false positives on static scenes with lighting flicker.
3D-CNN / I3D
Why pick it. Learns spatial + temporal features jointly. On UCF-Crime, I3D-based ensembles still sit at ~84.6% frame-level AUC—competitive with 2024 Transformer baselines. Mature tooling in NVIDIA TAO and DeepStream.
Limits. Memory-heavy: expect 6–8 GB VRAM per stream at 224×224, 32-frame clips. Slower than 2D alternatives on edge devices without TensorRT optimization.
Video Transformers (TimeSformer, Video Swin)
Why pick it. State of the art. Swin-3DART hit 0.861 ROC AUC on ShanghaiTech; SwinAnomaly (conditional GAN + Video Swin) holds near-SOTA while running real-time on Orin-class hardware. Long-range temporal context without LSTM’s vanishing-gradient pain.
Limits. Hungry for training data and compute; not the right pick if you have fewer than 10k labeled clips. Deployment requires careful attention to quantization to fit on edge devices.
Autoencoder & VAE reconstruction
Why pick it. Unsupervised. Train on hours of “normal” footage, flag frames the model can’t reconstruct. No labels needed. Ideal when your anomaly class is “anything weird” rather than a specific event.
Limits. Assumes reconstruction error correlates with abnormality—often false. Prone to high false positives on lighting changes, camera shake, new objects that are normal-but-rare.
Self-supervised & contrastive (MoCo, SimCLR, VideoMAE)
Why pick it. Few-shot friendly. Pre-train on large unlabeled video, fine-tune on a few hundred labeled anomaly clips. Cuts labeling budget by 10× in our experience.
Limits. Pre-training is expensive. Only worth it if you can amortize that cost across multiple deployments or sites.
Reach for a Transformer when: you have >50k labeled clips, need long-range temporal context (fights, falls, crowd flows), and can afford Orin NX-class edge hardware. Stick with I3D or CNN + rules if you have less data or tighter BOM budgets.
Comparison matrix: which model for which job
| Model family | Best for | Labels needed | Edge-friendly | Typical AUC |
|---|---|---|---|---|
| 2D CNN | Object & zone rules | 1–5k labeled frames | Yes (Jetson Nano+) | 0.75–0.85 |
| 3D-CNN / I3D | Fights, falls, running | 5–50k clips | Orin Nano+ | 0.82–0.90 |
| Video Transformer | Long-context crowd & scene anomalies | 50k+ clips | Orin NX+ | 0.86–0.97 |
| Autoencoder / VAE | “Anything weird” unsupervised | None (normal footage only) | Yes (Jetson Nano+) | 0.70–0.82 |
| Self-supervised fine-tune | Few-shot, multi-site | 100–500 clips per site | Orin Nano+ | 0.80–0.92 |
Reference architecture: edge, cloud, or hybrid
The architecture decision dominates cost and latency far more than the model choice. Three patterns exist; hybrid wins for almost every serious deployment.
Edge-only
Detection runs on the camera or a local Jetson/Hailo module. Latency 10–100 ms. Bandwidth to headquarters is just alerts plus short event clips. Works offline. Best for fewer than 8 cameras per site or strict privacy regimes where raw video must stay on-premise.
Cloud-only
All streams go to AWS/GCP for inference. Latency 200–2000 ms. Bandwidth: ~5–50 Mbps per stream. Egress costs pile up fast; a 500-camera site at 4 Mbps average generates ~21 TB/day. Only justifiable for forensic search, low-rate polling, or when the customer has a dedicated backbone.
Hybrid (edge detection + cloud correlation)
Edge models filter the 99%+ of frames that are normal; only events and low-bitrate metadata travel to the cloud, where you do multi-camera correlation, long-term pattern mining, and dashboard/UX. Latency 50–300 ms. Bandwidth ~10% of raw streams. This is the default for everything Fora Soft ships today.
Reach for cloud-only when: you need forensic search across historical footage (BriefCam-style) more than real-time alerting, and bandwidth is not a concern. Everyone else should default to hybrid.
Edge hardware: what to put in the BOM
Your edge chip decides how many streams per box you can analyze and what models fit. In 2026 the practical shortlist is short.
| Device | TOPS | TDP | Price | Best for |
|---|---|---|---|---|
| Jetson Orin Nano | 34–40 | 5–25 W | $249 | 2–4 streams, standard CNN/I3D |
| Jetson Orin NX | 100 | 10–25 W | ~$700 | 8–16 streams, Transformers |
| Hailo-8 | 13 | 3 W | ~$400 module | Battery-powered cameras, fixed CNN |
| Google Coral TPU | 4 | 2 W | $50–150 | PoC, single-stream, TFLite only |
| Ambarella CV2x | 5–20 | 3–5 W | $300–600 | Surveillance-grade cameras with onboard ISP |
For most of our clients, Jetson Orin Nano plus a local PoE switch covers 2–4 streams per appliance and clears all standard anomaly models after TensorRT optimization. Step up to Orin NX when you need Transformer-class models or >8 streams per box.
What a real-time pipeline looks like under the hood
A production pipeline is seven stages, not one model call. Missing any stage shows up as false alarms, latency spikes, or storage bills.
1. Ingest. RTSP pull (ONVIF Profile S cameras) or WebRTC for newer stacks. Use GStreamer or NVIDIA DeepStream nvstreammux for multi-stream batching.
2. Decode & pre-process. H.264/H.265 hardware decode on NVDEC or the camera’s SoC. Histogram equalization and resolution crop for the model input.
3. Detection / tracking. YOLOv8 or RT-DETR for objects; ByteTrack for multi-object tracking that survives occlusion; I3D or Video Swin for temporal anomalies.
4. Temporal filter. Require an anomaly signal across 3–5 consecutive frames before firing an alert. This alone removes 40–60% of false positives at a 5–10 ms latency cost.
5. Multi-camera fusion. Correlate events across neighboring cameras via NTP-synced timestamps. Shared anomalies (e.g., a runner crossing two feeds in sequence) get scored higher.
6. Rule & policy engine. Geofences, schedules, access-control events. This is where you convert “person in zone” into “person in restricted zone during off-hours”.
7. Alert dispatch & audit trail. WebSocket to the ops console, push to mobile, write to an immutable audit log. The audit log is what sells you into regulated accounts—skip it and you lose every enterprise deal.
Need a second opinion on your pipeline design?
Send us the sketch. One of our video engineers will red-team it for latency, false positives and bandwidth before you commit to hardware.
Benchmarks and datasets: what “state of the art” actually means
When a vendor quotes a headline accuracy number, ask which dataset. The five that matter, ranked by difficulty:
UCF-Crime. 1,900 untrimmed videos, 128 hours, 13 real-world anomalies (abuse, robbery, shooting, arson). The hardest public benchmark. Current SOTA: ~84.6% frame-level AUC.
XD-Violence. Large-scale violent behavior, weakly labeled. Current ensemble results: ~88% AUC.
ShanghaiTech. 13 scenes, controlled splits. Easier than UCF-Crime. Top methods: 92–98% AUC.
Avenue. Outdoor traffic anomalies. 85–96% typical.
CUHK Abnormality in Crowds. Pushing, bumping, fighting. 75–95% typical.
Real-world warning: deployment on your own site typically drops AUC by 15–25% because of domain shift. A model that hits 0.90 on ShanghaiTech will land at 0.70–0.75 the first week of site deployment. Budget a retraining sprint on your own labeled data.
The false-positive problem (the only thing operators actually care about)
An untuned model fires a false alarm on 30–70% of events. Past about 30 false alerts per camera per day, human operators start ignoring 40–70% of everything the system raises—including the real events. Every mature deployment spends more engineering on false-positive suppression than on the detector itself.
Five layers actually work, in combination:
1. Adaptive thresholding. Tune per-camera confidence cutoffs on-site. Published adaptive methods report 67% false-positive reduction while keeping >94% true-positive rate.
2. Temporal consensus. Require the anomaly to persist across 3–5 consecutive frames. 40–60% false-positive reduction at negligible latency cost.
3. Multi-camera correlation. Alerts corroborated by a neighboring camera score higher. 70–80% false-positive reduction on regional events.
4. Human-in-the-loop feedback. Let operators flag false positives from the UI; retrain weekly on that feedback. Closes the domain-shift gap in 4–8 weeks.
5. Multi-modal fusion. Combine video with audio, door sensors, access-control events, and (for industrial sites) machine telemetry. Avigilon’s self-learning UMD reports roughly 90% false-alarm reduction in the field.
Buy vs build: third-party platforms you’ll compare against
Any honest custom-build pitch starts by comparing the platform market. If one of these gets you 80% of what you want at 40% of the cost, buy it.
| Platform | Sweet spot | Pricing signal | Pick it when |
|---|---|---|---|
| Avigilon (Motorola) | Self-learning anomaly + unified cameras | $800–3,400/camera HW + $10–30/ch/mo SW | Managed service, <100 cameras |
| Genetec | Enterprise unified video + access control | $50–200/camera/mo | >500 cameras, security is a business function |
| Verkada | Cloud-native SMB | $500–3,000/cam HW + $199–1,799/cam/yr | No IT staff, fast rollout <200 cameras |
| BriefCam | Forensic search, video synopsis | $500–3,000/cam/yr | Investigation > real-time alerting |
| Amazon Rekognition Video | Pay-per-minute analysis | $0.10–1.00/min analyzed | Sporadic or event-triggered analysis |
| NVIDIA DeepStream | Developer SDK | Free SDK + $10–50k/yr enterprise support | Custom pipeline, in-house engineering |
Reach for a custom build when: you have >200 cameras, a domain-specific anomaly class no vendor covers (pharmaceutical clean-room breach, cockpit compliance, courtroom behavior), or a data-residency requirement that rules out cloud-only VMS.
Cost model: a realistic 100-camera deployment
Planning is easier with concrete numbers. Here’s how a 100-camera anomaly-detection deployment lands across the three sourcing paths, at typical 2026 market rates. These ranges assume off-the-shelf anomaly classes (intrusion, loitering, fights, falls). Domain-specific anomalies push custom-build cost up.
| Cost bucket | Off-the-shelf VMS | Hybrid (edge + SaaS) | Custom build (Fora Soft) |
|---|---|---|---|
| Initial CapEx | $100k–250k | $80k–180k | $50k–150k |
| Dev timeline | 2–6 weeks setup | 2–4 months | 4–8 months MVP |
| Annual OpEx | $50k–200k | $30k–120k | $20k–80k |
| Ops staffing | 0.25–0.5 FTE | 0.5–1 FTE | 1–2 FTE |
| 5-year TCO | $350k–700k | $250k–600k | $200k–500k |
Our agent-engineering workflow (LLM-assisted coding, reusable internal libraries for ingest/decoding/tracking/UI) shortens the custom-build timeline versus the baseline industry figures above. The “$50–150k initial” range reflects that, not a low-ball—talk to us with a concrete camera count before you anchor to a number.
Mini case: V.A.L.T in production
V.A.L.T is our long-running video observation and surveillance platform. It sits inside law-enforcement, clinical training and courtroom deployments where the core job is recording, review and rule-based anomaly flagging. The Kazakhstan courtroom deployment is one public example: hundreds of rooms, mandatory audit trails, zero tolerance for missed events.
The architecture that won the contract is the hybrid pattern in this playbook. Edge appliances at each room handle ingest, H.265 encoding, tagging and first-pass anomaly detection. Only events and indexed metadata travel to the central cloud, where supervisors review flagged segments, search across rooms, and export court-admissible packages with tamper-evident logging.
Two lessons carried over to every subsequent surveillance build: invest in the audit trail early because regulated buyers won’t sign without it, and keep the “rules” engine independent of the ML model so non-engineers can add and tune alert policies. Want a similar assessment for your deployment?
A decision framework in five questions
Q1. How many cameras, at how many sites? Below 50 cameras at one site, a commercial VMS like Verkada or Avigilon wins on TCO. Above 200 cameras or across multiple sites with data-residency rules, custom starts to make sense.
Q2. Are your anomalies in the off-the-shelf catalog? Intrusion, loitering, fights, falls, abandoned objects, crowd density—all covered by BriefCam, Avigilon and Verkada. Domain-specific classes (clean-room breach, procedural non-compliance, cockpit behavior, courtroom signals) are not. That’s where custom earns its fee.
Q3. What is your latency budget? Live intervention (robbery, violence) needs <100 ms alerts; forensic review is fine at minutes. Tight latency forces edge or hybrid; forensic can live in the cloud.
Q4. Where does the data legally live? EU AI Act and GDPR effectively kill cloud-first face-recognition-style deployments. If your buyer is a public body or an EU enterprise, default to edge + on-prem storage, detect objects/behavior only.
Q5. Who will retrain the model? Domain shift costs 15–25% AUC on day one. If you don’t have a plan for weekly retraining on site data, pick a vendor that owns that burden or sign a maintenance contract with your custom-build partner.
Five pitfalls that sink deployments
1. Benchmarking on the wrong dataset. Shipping a model that hits 0.95 AUC on ShanghaiTech into a warehouse yard will embarrass you. Always fine-tune on 500+ site-specific clips and report real-world AUC, not research numbers.
2. Treating false positives as a post-launch problem. By the time operators tell you alerts are noisy, they’ve already muted notifications. Temporal filtering, adaptive thresholds and multi-camera fusion go into the MVP, not the backlog.
3. Streaming 4K to the cloud for inference. 500 cameras at 4 Mbps is 21 TB/day. Cloud egress at $0.09/GB adds up to six figures a year. Edge inference is not an optimization, it’s a line-item survival move.
4. Ignoring domain shift and seasonal drift. “Loitering” in an outdoor mall has different baselines in July and December. Without scheduled retraining or online learning, alert precision degrades inside three months.
5. Skipping the audit trail. Regulated buyers (healthcare, law enforcement, courts, finance) walk away the moment they see you can’t produce a tamper-evident log of every alert, every override and every model change. Bolt this on in week one of the build, not month six.
KPIs: what to actually measure
Quality KPIs. Precision and recall on site-specific clips (not public benchmarks). Target: precision ≥ 0.9 at recall ≥ 0.85 after three retraining cycles. Track AUC drift weekly.
Business KPIs. Alerts per camera per day (target <10), operator acknowledgement rate (>80%), time-to-first-response on true positives (<60 s), and percentage of real incidents caught by the system vs. discovered later (>85%).
Reliability KPIs. Edge appliance uptime (99.5%+), detection-to-alert latency p95 (<500 ms), and training pipeline health (successful retraining runs/month, >90%).
Privacy, GDPR and the EU AI Act: what you can and can’t ship
Regulation separates “anomaly detection” into two bands, and treats them very differently.
Low-risk (shippable under GDPR and the EU AI Act). Object detection, motion anomaly, behavior anomaly that does not re-identify individuals. Crowd density, loitering by silhouette, intrusion detection, fall and fight detection. Requires standard GDPR notice and a privacy impact assessment.
High-risk or prohibited. Real-time remote biometric identification in public spaces is prohibited for law enforcement under the EU AI Act. Untargeted scraping of CCTV for facial-recognition databases is explicitly banned. Gait recognition and ethnicity inference are “high-risk” AI systems under the Act and demand full conformity assessment.
Practical guidance: design for the low-risk band by default. If a buyer insists on facial recognition, route the request to a compliance-first vendor and scope a separate high-risk engagement. Enforcement is aggressive—the French DPA fined Clearview AI €20M in 2022 for unlawful face-collection practices, and EU regulators have been consistent since.
Need a GDPR-safe anomaly detection stack?
We’ve shipped video products into regulated environments for 21 years. Send us your compliance zone and we’ll shortlist the models and deployment pattern that won’t get you fined.
When machine-learning anomaly detection is the wrong answer
ML is not a universal surveillance upgrade. Three situations where a rule-based system or a plain human operator is a better fit:
Tiny camera counts with dense human coverage. A single parking-lot camera watched by a 24/7 guard doesn’t benefit enough from an ML layer to justify the hardware, licensing and retraining burden.
Zero training data and zero operator bandwidth. ML needs weeks of labeled “normal” footage plus a feedback loop from operators. If both are missing, a well-tuned motion-detection rule beats an under-trained model.
Life-safety with deterministic rules. Fire, smoke and gas detection are better handled by dedicated sensors than by ML on video. Use video as confirmation, not as primary signal.
A minimal DeepStream pipeline (for engineers scoping this)
For the engineers reviewing this playbook, here’s the skeleton of a multi-stream detection pipeline on Jetson. Useful to set hardware expectations during procurement; not production-ready.
# DeepStream reference pipeline, 4 RTSP streams -> YOLOv8 -> tracker -> sink gst-launch-1.0 \ nvstreammux name=mux batch-size=4 width=1280 height=720 live-source=1 ! \ nvinfer config-file-path=/opt/yolov8.txt ! \ nvtracker ll-lib-file=/opt/libnvds_nvmultiobjecttracker.so ! \ nvinfer config-file-path=/opt/anomaly_i3d.txt ! \ nvmultistreamtiler rows=2 columns=2 ! \ nvvideoconvert ! nvdsosd ! \ nveglglessink \ rtspsrc location=rtsp://cam1:554/h264 ! rtph264depay ! h264parse ! nvv4l2decoder ! mux.sink_0 \ rtspsrc location=rtsp://cam2:554/h264 ! rtph264depay ! h264parse ! nvv4l2decoder ! mux.sink_1 \ rtspsrc location=rtsp://cam3:554/h264 ! rtph264depay ! h264parse ! nvv4l2decoder ! mux.sink_2 \ rtspsrc location=rtsp://cam4:554/h264 ! rtph264depay ! h264parse ! nvv4l2decoder ! mux.sink_3
An Orin Nano clears four 1080p streams with a YOLOv8-s detector and an I3D anomaly head after TensorRT optimization. Swap in a Transformer head and you’ll want Orin NX or better.
FAQ
How accurate is ML-based video anomaly detection in real deployments?
Public benchmark AUCs sit between 0.80 and 0.97 depending on dataset difficulty. In the field, expect a 15–25% drop from day one because of domain shift. With 2–3 retraining cycles on site data, production precision of 0.90+ at recall 0.85+ is achievable on off-the-shelf anomaly classes.
What’s the realistic timeline for a custom video anomaly detection product?
A focused MVP for 3–5 standard anomaly classes on a single site lands in 4–8 months. Multi-site, multi-tenant platforms with rules engines, audit trails and operator UX take 9–14 months. Our agent-engineering workflow and reusable internal video-pipeline libraries compress those windows relative to industry baselines.
Can I run anomaly detection on existing IP cameras without replacing them?
Yes, in most cases. Any ONVIF Profile S or RTSP-capable camera pushes H.264/H.265 to an edge appliance (Jetson Orin Nano, Hailo-8, Ambarella gateway). That means your existing fleet keeps working; you add compute and software, not cameras.
How do I compare a custom build against Avigilon or Verkada?
Score on five axes: camera count, anomaly specificity, latency requirement, data-residency, and retraining ownership. Commercial VMS wins on the first row whenever your anomaly classes are generic and cameras are fewer than ~150. Custom wins above that, or when you need a domain-specific detector, tight audit trails or strict on-prem deployment.
Is facial recognition legal for my project under the EU AI Act?
Real-time remote biometric identification in public spaces is prohibited for law-enforcement use, with narrow exceptions. Other uses are “high-risk” and require a full conformity assessment, registration, logging, and data-governance controls. For most anomaly-detection use cases, avoid facial recognition entirely and rely on object and behavior signals.
How do I reduce false positives without missing real events?
Layer five controls: adaptive per-camera thresholds, temporal consensus across 3–5 frames, multi-camera correlation, a human-in-the-loop feedback UI, and multi-modal fusion (audio, door sensors, access control). Published adaptive-threshold methods report 67% false-positive reduction at >94% true-positive rate.
What edge hardware should I standardize on in 2026?
Jetson Orin Nano ($249, 40 TOPS) for 2–4 streams with CNN/I3D models; Orin NX (~$700, 100 TOPS) for Transformer-class models or 8+ streams; Hailo-8 for ultra-low-power, battery-operated cameras; Ambarella CV2x when you also want the SoC to handle the camera ISP. Avoid Jetson Nano (original) for new projects in 2026—it’s too slow for modern video models.
Who owns the model once the project ships?
In our custom builds, you do—source, weights, training data pipelines, the lot. That’s a key reason buyers pick custom over platform: no vendor lock-in, full right to retrain in-house or move providers. We offer maintenance contracts if you prefer us to own the retraining and drift-monitoring burden.
What to read next
Surveillance
Custom Video Surveillance Solutions With AI
End-to-end overview of how we scope and ship custom video surveillance products.
Real-time
Real-Time Anomaly Detection in Video Surveillance
Latency budgets, pipeline design and tuning strategies for live alerting.
Automation
Automated Anomaly Detection for Security Cameras
How to automate event triage on existing camera fleets without ripping them out.
Algorithms
Top Algorithms for Surveillance Anomaly Detection
Deeper dive into which algorithms ship well in security products and why.
Ready to scope ML anomaly detection for your fleet?
The shortlist is clear. Decide the anomaly classes, camera count and data-residency zone first. Pick hybrid as the default architecture, Jetson Orin Nano or NX as the default edge chip, and I3D or Video Swin as the default model family. Invest in the false-positive stack (adaptive thresholds, temporal consensus, multi-camera fusion, operator feedback) from day one, not after launch.
Then run the build-vs-buy math honestly: if you’re under 150 cameras and your anomalies are generic, a commercial VMS probably wins. Above that, or if your anomaly class is domain-specific, custom is cheaper at five-year TCO and gives you model ownership. Fora Soft has been on the custom side of that line for 21 years—bring us your spec and we’ll tell you which side you’re on before you commit a dollar.
Let’s scope your ML video surveillance project
Share your camera count, anomaly classes and compliance zone. You’ll leave the call with a build-vs-buy verdict, a realistic timeline, and a rough budget band.


.avif)

Comments