What is AI video surveillance software?

Software that uses computer vision and machine learning to detect people, objects, faces, vehicles, behaviors, or risks in live video feeds.

What is computer vision in video surveillance?

Computer vision enables software to automatically interpret video content, detecting people, objects, faces, vehicles, and behaviors without human monitoring.

How accurate is custom computer vision detection?

With environment-specific model training, accuracy typically reaches 90-98% depending on task complexity and camera quality.

Can you develop a fully custom AI video analytics platform?

Yes. Every part can be customized: AI models, UI, workflows, integrations, dashboards, hardware, and deployment.

Can computer vision systems scale to thousands of cameras?

Yes. With distributed inference and optimized pipelines, large-scale multi-site deployments are achievable.

Can this work with my existing cameras?

Yes. We support ONVIF, RTSP, IP cameras, DVR/NVR systems, and hybrid environments.

Is the system GDPR-compliant?

Yes. We support privacy controls, on-device inference, encrypted streams, and role-based access.

Should AI video analytics run on edge or cloud?

Edge AI reduces latency and bandwidth use. Cloud enables centralized storage and large-scale retraining. Most enterprise systems use hybrid architecture.

How long does development take?

Basic systems take from 6-10 weeks. Advanced multi-site systems take from 3-6 months.

AI Video Surveillance DevelopmentCustom-built since 2005

AI video surveillance your team owns.
From day one.

You own the models, the infrastructure, and the operator UI from day one. Built on YOLOv8/v9, DeepSORT, and NVIDIA Jetson — 90-98% detection accuracy, sub-200ms response, deployed on-prem or in your VPC. The same stack now running at V.A.L.T (2,500+ cameras across 770+ U.S. police departments) and MindBox (50+ retail sites). No per-camera SaaS fees, no vendor lock-in.

Book a free 30-min call Get a project estimate

90-98%

Detection accuracy across YOLOv8/v9 pipelines

<200ms

Edge inference on NVIDIA Jetson Orin

2,500+

Cameras live on V.A.L.T — 770+ U.S. police departments

20+

Years building production video + AI systems

Built for

Retail loss preventionSchool & campus safetySmart cities & transitIndustrial safety & logisticsHealthcare facilitiesLaw enforcement & child advocacy

Traditional CCTV vs Custom AI Computer Vision

Same cameras. Different ceiling.

Traditional CCTV records footage and waits for someone to review it. Custom computer vision interprets the feed in real time and triggers action. The hardware can stay; the intelligence is where the difference lives.

Feature	Traditional CCTV / generic VMS	Custom AI computer vision
Event detection	Motion-only, post-hoc review	YOLOv8/v9 + DeepSORT — 90–98% accuracy, real-time
False alarm rate	One per 10–20 motion events on average	Sub-3% on tuned models with shadow / weather filters
Latency to alert	Seconds to minutes (manual review)	Sub-200ms edge inference on NVIDIA Jetson Orin
Per-camera cost (24/7 analytics)	$15–40 / mo SaaS analytics + license fees	$2–6 / mo amortized after build — you own the IP
Privacy posture	Vendor cloud + ambiguous data residency	On-prem or VPC, GDPR/CCPA/HIPAA enforceable
Extensibility	Locked to vendor's feature roadmap	Add ANPR/LPR, face recognition, behavior analytics on your timeline

Numbers reflect Fora Soft production deployments on V.A.L.T (2,500+ cameras across 770+ U.S. police departments) and MindBox (50+ retail locations). Your numbers will move with your camera count, scene density, and required event taxonomy.

How it works

Five stages from pixel to action. Each one budgeted for latency.

A real-time computer vision pipeline isn't one model — it's an inference graph with a latency budget at every stage. Miss any single budget and the system slips from prevention to after-the-fact review, which is the difference between stopping the event and reading about it later.

01

Capture

IP cameras stream H.265 / H.264 over RTSP, ONVIF Profile S/T, or WebRTC for low-latency feeds. Resolution and framerate are tuned per use case — 4MP at 12-15fps is the typical sweet spot for analytics, with 4K reserved for forensic recording.

Frame arrives • 0ms baseline

02

Edge inference

NVIDIA Jetson Orin Nano / Xavier NX runs Triton Inference Server with TensorRT-optimized models per camera or per zone. We deploy edge over cloud for sub-200ms response and privacy compliance — the raw video never leaves the perimeter unless an event triggers escalation.

Frame decoded + normalized • budget < 25ms

03

Detection

YOLOv8 or YOLOv9 (we pick by accuracy/latency target) runs object detection — person, vehicle, weapon, package, PPE state, license plate. Class taxonomy is yours, not a generic model's. Confidence thresholds are tuned per camera angle and lighting. EfficientDet is the fallback for low-power scenes.

Bounding boxes + class labels • budget < 60ms

04

Tracking + recognition

DeepSORT or ByteTrack maintains object identity across frames so the system reasons about trajectories — not just appearances. ArcFace / InsightFace handle face recognition when authorized; ANPR/LPR pipelines decode license plates. Both can be enrolled against allow-lists or watch-lists with audit logs.

Tracked identities + plates / faces • budget < 80ms

05

Alert + dashboard

Events fire to operator dashboards (web + mobile), PagerDuty/Opsgenie webhooks, VMS bridges (Milestone, Genetec, Avigilon), or your own SIEM. Operator video review tools use WebRTC for sub-second playback. Forensic search runs against a vector index of detected entities for second-level recall on historic footage.

Operator notified / action triggered • budget < 30ms

Total end-to-end budget: sub-200ms for live alerts; sub-second for operator-confirmed escalation. We benchmark each customer build against your scene density before sign-off — the budget is the contract.

System architecture

Eight layers. Named tools at each one.

Every layer of the stack is a deliberate choice, not a default. The list below is what we deploy in production today — not a survey of options. When something on this list doesn't fit your environment, we name the alternative in the recommendation document, not in marketing.

Layer

Tools we deploy

Capture & transport

RTSP, ONVIF Profile S/T, WebRTC (mediasoup 3.16, LiveKit 1.x for low-latency feeds), SRT for unreliable links

Edge runtime

NVIDIA Jetson Orin Nano / Xavier NX, NVIDIA Triton Inference Server, TensorRT, OpenVINO for Intel edge boxes, ONNX Runtime as portable fallback

Object detection

YOLOv8, YOLOv9 (Ultralytics), EfficientDet, RT-DETR for transformer-based detection where accuracy matters more than fps

Tracking

DeepSORT, ByteTrack, OC-SORT — chosen per scene density and re-identification requirements

Recognition

ArcFace, InsightFace, FaceNet (face); ANPR / LPR pipelines with custom-trained plate datasets per jurisdiction

Analytics & reasoning

Behavior analytics (loitering, crowd dynamics, PPE compliance, abandoned object), multimodal LLM passes (Gemini, Claude, OpenAI) for second-pass enrichment on flagged events

Storage & recall

S3-compatible object store for video, PostgreSQL or ClickHouse for events, pgvector / Milvus / Qdrant for entity-level forensic search

Dashboards & alerts

React + WebRTC operator console, mobile (React Native or native iOS/Android), Milestone / Genetec / Avigilon VMS bridges, PagerDuty / Opsgenie / Slack webhooks

Compliance overlays — GDPR, CCPA, HIPAA, SOC 2 — are not a separate layer. They're enforced inside each layer: encryption at rest and in transit, role-based access control for video review, audit logs on face / plate matches, data residency pinned to region.

Use cases

Same pipeline. Different verdicts depending on what you're watching for.

A behavior detector for retail shrinkage isn't the same model as one for industrial PPE compliance, even if the underlying YOLO + DeepSORT stack is identical. The taxonomy, thresholds, and escalation rules are where custom development pays back. Here are the six most common shapes we build.

Retail loss prevention

Real-time shrinkage detection, anti-sweethearting at POS, weapon detection, and crowd density for queue management. MindBox runs this exact stack across 50+ retail sites — store managers see shrinkage flags on a phone dashboard before the customer reaches the door, with audit-grade clip retention.

Industrial safety & logistics

PPE compliance (hard hat, vest, safety glasses), forklift / pedestrian collision risk, restricted zone intrusion, abandoned object detection on conveyor lines. Latency budget tightens to sub-150ms when the alert needs to trigger a physical interlock or e-stop.

Smart cities & transit

ANPR / LPR for tolling and parking, pedestrian counts for transit planning, abandoned vehicle detection, traffic incident classification. Models run at the camera edge to keep PII out of central systems — only event metadata and anonymized counts move to the cloud.

Healthcare facilities

Patient elopement detection, fall detection in long-term care, hand-hygiene compliance audits, restricted-area access for controlled substances. HIPAA-grade audit logs and on-prem inference are non-negotiable defaults — video stays inside the facility unless an authorized review triggers escalation.

Law enforcement & child advocacy

V.A.L.T — our flagship deployment — powers 770+ U.S. police departments with 2,500+ cameras serving forensic interview recording for child advocacy centers, medical education, and law enforcement evidence capture. Multi-camera sync, redaction workflows, and chain-of-custody logging are built in.

Custom — your event taxonomy

Most engagements start with a use case nobody packaged. The work is defining the event taxonomy, the scene constraints, and the false-positive budget your operators can absorb — then training and deploying models that hit it. Discovery call is the first hour.

Build vs Buy

Different ceiling. Different unit economics.

SaaS video analytics platforms — Eagle Eye Networks, Spot AI, Turing, Verkada — ship in days and work well within their template. Custom development takes longer to start and pays back the moment your event taxonomy, retention, residency, or unit economics stop matching that template. The decision is rarely “which is better” — it's “what's your three-year cost curve.”

Buy

Off-the-shelf VMS + analytics SaaS

Vendor-owned cloud, vendor-owned model roadmap, per-camera SaaS fees that scale linearly with deployment.

Live in 2-4 weeks with stock models

$15-40 / camera / month operational cost

Event taxonomy locked to vendor's roadmap

Data residency = wherever the vendor's cloud is

Use when: camera count is under 100, event taxonomy maps to vendor presets, and per-camera SaaS economics work for your business model.

Build

Custom AI computer vision — you own the stack

Models, infrastructure, and operator UI all under your control. Higher upfront effort; flat operational cost after build.

8-16 weeks to MVP on a defined scope

$2-6 / camera / month amortized at scale

Event taxonomy is yours — add classes on your timeline

On-prem or VPC — GDPR / CCPA / HIPAA enforceable

Use when: camera count is 200+, the event taxonomy doesn't fit a stock model, regulated data residency is required, or you need the IP on your balance sheet.

Figure 1. Decision matrix — per-camera monthly cost against detection accuracy across three deployment patterns. SaaS analytics holds the accuracy ceiling but locks the unit economics; custom AI computer vision moves both axes simultaneously once fleet size crosses ~200 cameras.

Hybrid is a real option — keep an existing VMS (Milestone, Genetec) for recording, layer custom analytics on top via ONVIF / RTSP. We architect that bridge in roughly 30% of engagements.

How we engage

Three ways in. One outcome — software that ships.

Engagement model is matched to where you are, not where we'd prefer you to be. The three shapes below cover roughly 90% of how Fora Soft enters a project.

From scratch

Build the platform end-to-end

Discovery → architecture → MVP → production. We own the stack and ship in 8–16 weeks on a defined scope. Best fit when there's no existing system or when the existing system is being decommissioned. V.A.L.T was built this way.

Discuss scope

Upgrades & improvements

Extend what's already running

Existing VMS plus custom analytics layer, new event classes added to a running model, latency rebudget on an architecture that's struggling at the camera count it's grown into. We integrate via ONVIF, RTSP, or vendor SDK without ripping out what works.

Discuss scope

Takeovers & fixes

Take the codebase off a stuck team

Inherited a system nobody fully understands? A previous vendor walked away mid-build? We've done the takeover dance enough times to make it boring: audit, stabilize, document, ship the next version. NDA before access; honest verdict on what's salvageable.

Discuss scope

Pricing

Three tiers. Named tech in each. No “contact sales” for the bracket.

The number you see is the bracket the build typically lands in. Final scope depends on camera count, model count, event taxonomy depth, integrations, and compliance overlays — we name the moving parts in the discovery call before you commit.

Startup

from $15K

8–10 weeks • single use case • up to ~50 cameras

YOLOv8 + DeepSORT on NVIDIA Jetson Orin Nano
One event taxonomy (e.g. retail shrinkage or PPE compliance)
Operator web dashboard + mobile alerts
S3 video archive + PostgreSQL event store
Slack / email / webhook alerts

Get an instant estimate

Most common

Growth

from $30K

12–16 weeks • multi-use-case • up to ~500 cameras

YOLOv8/v9 + DeepSORT/ByteTrack on Jetson Xavier NX or T4
2–3 event taxonomies + ArcFace or ANPR add-on
VMS bridge (Milestone, Genetec, Avigilon) included
ClickHouse / pgvector for forensic search across history
Role-based operator console, audit logs, redaction tooling
PagerDuty / Opsgenie integration

Book a free 30-min call

Enterprise

from $50K

16–24 weeks+ • multi-site • 500–2,500+ cameras

Triton Inference Server clusters, TensorRT-optimized models
Multi-region edge fleet, on-prem or VPC deployment
Full compliance overlay: GDPR, CCPA, HIPAA, SOC 2
Custom-trained models per site or per camera class
Dedicated SRE handover, runbooks, on-call rotation
Reference deployment: V.A.L.T at 2,500+ cameras, 770+ U.S. police departments

Book a free 30-min call

Add-ons priced separately: custom model training cycles, on-prem hardware sizing, third-party SDK licenses (Genetec, Milestone, ArcFace commercial), regulatory certification audits. We itemize before contract.

Free for qualified projects

Three deliverables. Yours within a week.

An independent assessment of your build, written by engineers who would actually ship it. Pick the one that fits where you are now: planning the MVP, mid-build, or stabilizing what's already in production. NDA before any code, footage, or system access changes hands.

MVP Planning and Preparation

Competitor analysis, core feature definition, monetization modeling, and a full launch blueprint — delivered within a week. Written by engineers who'll build what they plan.

For founders pre-launch

Architecture Review

An independent review of your system's technology choices, structural components, and workload fit — with a plain verdict on what's working, what's a liability, and exactly what to change to reach your goal. Delivered within a week.

For CTOs & engineering leads

Code Audit

A full audit of your code with every issue documented, evidenced, and located — exact file, exact line. Plus a system architecture review and a prioritized fix roadmap. Not a consultant's opinion. A case file. Delivered within a week.

For teams inheriting a codebase

Video Product Review

A specialist review of your video or streaming product covering latency, media server architecture, WebRTC, playback reliability, real-time chat, and scalability. Every finding is specific, located, and fixable. Delivered within a week.

For CTOs & engineering leads

Why hire Fora Soft

Twenty years of building video + AI systems that actually run in production.

Not a generalist studio with a computer vision practice. Not a SaaS vendor pretending to do custom work. Fora Soft has been building real-time video, WebRTC, and AI systems since 2005 — and the surveillance, computer vision, and edge inference work below is the same team, the same stack, the same engineering bar.

20+ years

Production track record since 2005

625+ products shipped. Video + real-time systems is what we built the company on — long before “computer vision” became a category. We've watched the surveillance stack transition from analog DVR to IP, from on-prem GPU to edge Jetson, and we've shipped systems through every generation.

2,500+ cameras

V.A.L.T — the flagship deployment

770+ U.S. police departments, 50,000 daily users, child advocacy interview recording, medical education, and law enforcement evidence capture. Built end-to-end by Fora Soft — multi-camera sync, redaction workflows, chain-of-custody logging, the full operator console.

50+ sites

MindBox — retail computer vision at scale

AI retail analytics across 50+ store locations: shrinkage detection, queue dynamics, weapon detection, sweethearting at POS. Sub-200ms edge inference, mobile-first operator dashboards, audit-grade clip retention. The model and unit economics that beat the SaaS analytics vendors.

100% in-house

One team. Computer vision, mobile, infra, ops.

No outsourcing chain. The CV engineer who trains your model sits next to the iOS engineer who builds the operator app and the SRE who runs your Triton cluster. 100% Upwork Top-Rated Plus, 100% job success on enterprise engagements. NDA before any code access; honest verdict before any contract.

Common questions

What buyers ask before the discovery call.

How accurate is custom AI video surveillance compared to traditional CCTV motion detection?

In production deployments, our YOLOv8/v9 + DeepSORT pipelines hit 90–98% detection accuracy on the event classes they're tuned for, with sub-3% false positive rate on tuned scenes. Traditional motion-only CCTV typically generates one false alarm per 10–20 motion events. The accuracy gap is what makes operator-driven workflows realistic at hundreds or thousands of cameras.

What's the typical latency from camera capture to alert?

Sub-200ms end-to-end for live alerts when running on NVIDIA Jetson Orin Nano or Xavier NX at the camera edge: roughly 25ms decode, 60ms YOLOv8 inference, 80ms DeepSORT tracking + recognition, 30ms alert dispatch. Cloud-only architectures typically land at 800ms–2s due to upload + queue time, which is acceptable for forensic search but not for real-time intervention.

Do you build for on-prem, cloud, or hybrid deployments?

All three. On-prem (NVIDIA Jetson at the edge + on-site server) is the default for healthcare, law enforcement, and any environment with data residency or HIPAA requirements. VPC / cloud (AWS, GCP, Azure) suits multi-site retail and smart-city deployments. Hybrid — edge inference + cloud forensic search — is the most common shape for 200+ camera fleets.

Can you integrate with our existing VMS (Milestone, Genetec, Avigilon)?

Yes — we layer custom analytics on top of an existing VMS via ONVIF, RTSP, or vendor SDK in roughly 30% of engagements. You keep the recording infrastructure and operator workflow you've already trained on; we add the AI event layer underneath. No rip-and-replace required.

Is AI video surveillance GDPR and CCPA compliant?

When architected correctly, yes. Compliance is enforced inside each layer of the stack: on-prem or VPC inference (no raw video leaving the perimeter), role-based access for operator review, audit logs on every face / plate match, data residency pinned to region, retention windows configurable per camera class. We sign Data Processing Agreements before any engagement and can support DPIA documentation as part of delivery.

How does this compare to Eagle Eye Networks, Spot AI, Verkada, or Turing?

Those vendors ship a SaaS analytics platform with stock models and per-camera fees. They work well when your camera count is under 100 and your event taxonomy fits the preset library. Custom development wins on three axes: unit economics at 200+ cameras (flat cost vs $15–40/camera/month SaaS), event taxonomy beyond presets, and regulated data residency. The Build vs Buy section above lays out the decision frame.

Do you handle face recognition and ANPR / license plate recognition?

Yes. Face recognition uses ArcFace or InsightFace against authorized enrollment sets with audit logs on every match — we don't deploy unauthorized facial recognition. ANPR / LPR pipelines are custom-trained on plate datasets per jurisdiction (US, UK, EU, GCC plates differ enough to need separate training). Both can be enabled or disabled per camera, with clear audit posture.

How long does an MVP take to ship?

8–10 weeks for a Startup-tier scope (single event taxonomy, up to ~50 cameras, one operator surface). 12–16 weeks for a Growth-tier scope (multi-taxonomy, VMS bridge, forensic search). 16–24 weeks+ for Enterprise (multi-site edge fleet, full compliance overlay, custom-trained models). Discovery call to first running model is typically 3–4 weeks regardless of tier.

Who owns the IP — us or Fora Soft?

You do. Models, training data, infrastructure code, and operator UI are all delivered to your repositories under your name. Fora Soft retains no claim on the IP. The benefit of custom development over SaaS is exactly this: the system, the data, and the unit economics live on your balance sheet rather than the vendor's.

What's the engagement model after launch?

Three shapes: handover to your in-house team with runbooks and on-call training (most common at Enterprise tier); ongoing SRE / model-tuning retainer (typical at Growth tier when in-house ML isn't on the roadmap); or fixed-scope quarterly improvement cycles (additional event classes, new sites, integrations). All three are scoped after the initial build, not bundled.

Go deeper before the call.

Computer Vision

Tell us about your computer vision idea.

Within 48 hours you'll get a realistic estimate, a technical recommendation, and an outline of next steps. No obligation. NDA before any access to your code, recordings, or operator dashboards.

Fill in the form Book a call WhatsApp us

AI video surveillance your team owns.From day one.

Same cameras. Different ceiling.

Five stages from pixel to action. Each one budgeted for latency.

Capture

Edge inference

Detection

Tracking + recognition

Alert + dashboard

Eight layers. Named tools at each one.

Same pipeline. Different verdicts depending on what you're watching for.

Retail loss prevention

Industrial safety & logistics

Smart cities & transit

Healthcare facilities

Law enforcement & child advocacy

Custom — your event taxonomy

Different ceiling. Different unit economics.

Off-the-shelf VMS + analytics SaaS

Custom AI computer vision — you own the stack

Three ways in. One outcome — software that ships.

Build the platform end-to-end

Extend what's already running

Take the codebase off a stuck team

Three tiers. Named tech in each. No “contact sales” for the bracket.

Three deliverables. Yours within a week.

MVP Planning and Preparation

Architecture Review

Code Audit

Video Product Review

Twenty years of building video + AI systems that actually run in production.

Production track record since 2005

V.A.L.T — the flagship deployment

MindBox — retail computer vision at scale

One team. Computer vision, mobile, infra, ops.

What buyers ask before the discovery call.

Go deeper before the call.

Anomaly Detection Models for Video Surveillance: A 2026 Buyer's Guide

Machine Learning Algorithms for Detecting Surveillance Anomalies

Multimodal Agentic AI for Real-Time Systems — Architecture Playbook

V.A.L.T — How Fora Soft built surveillance for 770+ U.S. police departments

AI Video Recognition Software Development — broader video AI scope

AI Scalable Video Streaming — when surveillance shares infra with streaming

Tell us about your computer vision idea.

AI video surveillance your team owns.
From day one.