AI Video Surveillance DevelopmentCustom-built since 2005

AI video surveillance your team owns.
From day one.

You own the models, the infrastructure, and the operator UI from day one. Built on YOLOv8/v9, DeepSORT, and NVIDIA Jetson — 90-98% detection accuracy, sub-200ms response, deployed on-prem or in your VPC. The same stack now running at V.A.L.T (2,500+ cameras across 770+ U.S. police departments) and MindBox (50+ retail sites). No per-camera SaaS fees, no vendor lock-in.

90-98%
Detection accuracy across YOLOv8/v9 pipelines
<200ms
Edge inference on NVIDIA Jetson Orin
2,500+
Cameras live on V.A.L.T — 770+ U.S. police departments
20+
Years building production video + AI systems
Built for
Retail loss preventionSchool & campus safetySmart cities & transitIndustrial safety & logisticsHealthcare facilitiesLaw enforcement & child advocacy
Traditional CCTV vs Custom AI Computer Vision

Same cameras. Different ceiling.

Traditional CCTV records footage and waits for someone to review it. Custom computer vision interprets the feed in real time and triggers action. The hardware can stay; the intelligence is where the difference lives.

Feature Traditional CCTV / generic VMS Custom AI computer vision
Event detection Motion-only, post-hoc review YOLOv8/v9 + DeepSORT — 90–98% accuracy, real-time
False alarm rate One per 10–20 motion events on average Sub-3% on tuned models with shadow / weather filters
Latency to alert Seconds to minutes (manual review) Sub-200ms edge inference on NVIDIA Jetson Orin
Per-camera cost (24/7 analytics) $15–40 / mo SaaS analytics + license fees $2–6 / mo amortized after build — you own the IP
Privacy posture Vendor cloud + ambiguous data residency On-prem or VPC, GDPR/CCPA/HIPAA enforceable
Extensibility Locked to vendor's feature roadmap Add ANPR/LPR, face recognition, behavior analytics on your timeline

Numbers reflect Fora Soft production deployments on V.A.L.T (2,500+ cameras across 770+ U.S. police departments) and MindBox (50+ retail locations). Your numbers will move with your camera count, scene density, and required event taxonomy.

How it works

Five stages from pixel to action. Each one budgeted for latency.

A real-time computer vision pipeline isn't one model — it's an inference graph with a latency budget at every stage. Miss any single budget and the system slips from prevention to after-the-fact review, which is the difference between stopping the event and reading about it later.

01

Capture

IP cameras stream H.265 / H.264 over RTSP, ONVIF Profile S/T, or WebRTC for low-latency feeds. Resolution and framerate are tuned per use case — 4MP at 12-15fps is the typical sweet spot for analytics, with 4K reserved for forensic recording.

Frame arrives • 0ms baseline
02

Edge inference

NVIDIA Jetson Orin Nano / Xavier NX runs Triton Inference Server with TensorRT-optimized models per camera or per zone. We deploy edge over cloud for sub-200ms response and privacy compliance — the raw video never leaves the perimeter unless an event triggers escalation.

Frame decoded + normalized • budget < 25ms
03

Detection

YOLOv8 or YOLOv9 (we pick by accuracy/latency target) runs object detection — person, vehicle, weapon, package, PPE state, license plate. Class taxonomy is yours, not a generic model's. Confidence thresholds are tuned per camera angle and lighting. EfficientDet is the fallback for low-power scenes.

Bounding boxes + class labels • budget < 60ms
04

Tracking + recognition

DeepSORT or ByteTrack maintains object identity across frames so the system reasons about trajectories — not just appearances. ArcFace / InsightFace handle face recognition when authorized; ANPR/LPR pipelines decode license plates. Both can be enrolled against allow-lists or watch-lists with audit logs.

Tracked identities + plates / faces • budget < 80ms
05

Alert + dashboard

Events fire to operator dashboards (web + mobile), PagerDuty/Opsgenie webhooks, VMS bridges (Milestone, Genetec, Avigilon), or your own SIEM. Operator video review tools use WebRTC for sub-second playback. Forensic search runs against a vector index of detected entities for second-level recall on historic footage.

Operator notified / action triggered • budget < 30ms

Total end-to-end budget: sub-200ms for live alerts; sub-second for operator-confirmed escalation. We benchmark each customer build against your scene density before sign-off — the budget is the contract.

System architecture

Eight layers. Named tools at each one.

Every layer of the stack is a deliberate choice, not a default. The list below is what we deploy in production today — not a survey of options. When something on this list doesn't fit your environment, we name the alternative in the recommendation document, not in marketing.

Layer
Tools we deploy
Capture & transport
RTSP, ONVIF Profile S/T, WebRTC (mediasoup 3.16, LiveKit 1.x for low-latency feeds), SRT for unreliable links
Edge runtime
NVIDIA Jetson Orin Nano / Xavier NX, NVIDIA Triton Inference Server, TensorRT, OpenVINO for Intel edge boxes, ONNX Runtime as portable fallback
Object detection
YOLOv8, YOLOv9 (Ultralytics), EfficientDet, RT-DETR for transformer-based detection where accuracy matters more than fps
Tracking
DeepSORT, ByteTrack, OC-SORT — chosen per scene density and re-identification requirements
Recognition
ArcFace, InsightFace, FaceNet (face); ANPR / LPR pipelines with custom-trained plate datasets per jurisdiction
Analytics & reasoning
Behavior analytics (loitering, crowd dynamics, PPE compliance, abandoned object), multimodal LLM passes (Gemini, Claude, OpenAI) for second-pass enrichment on flagged events
Storage & recall
S3-compatible object store for video, PostgreSQL or ClickHouse for events, pgvector / Milvus / Qdrant for entity-level forensic search
Dashboards & alerts
React + WebRTC operator console, mobile (React Native or native iOS/Android), Milestone / Genetec / Avigilon VMS bridges, PagerDuty / Opsgenie / Slack webhooks

Compliance overlays — GDPR, CCPA, HIPAA, SOC 2 — are not a separate layer. They're enforced inside each layer: encryption at rest and in transit, role-based access control for video review, audit logs on face / plate matches, data residency pinned to region.

Use cases

Same pipeline. Different verdicts depending on what you're watching for.

A behavior detector for retail shrinkage isn't the same model as one for industrial PPE compliance, even if the underlying YOLO + DeepSORT stack is identical. The taxonomy, thresholds, and escalation rules are where custom development pays back. Here are the six most common shapes we build.

Retail loss prevention

Real-time shrinkage detection, anti-sweethearting at POS, weapon detection, and crowd density for queue management. MindBox runs this exact stack across 50+ retail sites — store managers see shrinkage flags on a phone dashboard before the customer reaches the door, with audit-grade clip retention.

Industrial safety & logistics

PPE compliance (hard hat, vest, safety glasses), forklift / pedestrian collision risk, restricted zone intrusion, abandoned object detection on conveyor lines. Latency budget tightens to sub-150ms when the alert needs to trigger a physical interlock or e-stop.

Smart cities & transit

ANPR / LPR for tolling and parking, pedestrian counts for transit planning, abandoned vehicle detection, traffic incident classification. Models run at the camera edge to keep PII out of central systems — only event metadata and anonymized counts move to the cloud.

Healthcare facilities

Patient elopement detection, fall detection in long-term care, hand-hygiene compliance audits, restricted-area access for controlled substances. HIPAA-grade audit logs and on-prem inference are non-negotiable defaults — video stays inside the facility unless an authorized review triggers escalation.

Law enforcement & child advocacy

V.A.L.T — our flagship deployment — powers 770+ U.S. police departments with 2,500+ cameras serving forensic interview recording for child advocacy centers, medical education, and law enforcement evidence capture. Multi-camera sync, redaction workflows, and chain-of-custody logging are built in.

Custom — your event taxonomy

Most engagements start with a use case nobody packaged. The work is defining the event taxonomy, the scene constraints, and the false-positive budget your operators can absorb — then training and deploying models that hit it. Discovery call is the first hour.

Build vs Buy

Different ceiling. Different unit economics.

SaaS video analytics platforms — Eagle Eye Networks, Spot AI, Turing, Verkada — ship in days and work well within their template. Custom development takes longer to start and pays back the moment your event taxonomy, retention, residency, or unit economics stop matching that template. The decision is rarely “which is better” — it's “what's your three-year cost curve.”

Buy

Off-the-shelf VMS + analytics SaaS

Vendor-owned cloud, vendor-owned model roadmap, per-camera SaaS fees that scale linearly with deployment.

Live in 2-4 weeks with stock models
$15-40 / camera / month operational cost
Event taxonomy locked to vendor's roadmap
Data residency = wherever the vendor's cloud is
Use when: camera count is under 100, event taxonomy maps to vendor presets, and per-camera SaaS economics work for your business model.
Build

Custom AI computer vision — you own the stack

Models, infrastructure, and operator UI all under your control. Higher upfront effort; flat operational cost after build.

8-16 weeks to MVP on a defined scope
$2-6 / camera / month amortized at scale
Event taxonomy is yours — add classes on your timeline
On-prem or VPC — GDPR / CCPA / HIPAA enforceable
Use when: camera count is 200+, the event taxonomy doesn't fit a stock model, regulated data residency is required, or you need the IP on your balance sheet.
Figure 1. Build vs Buy decision matrix — per-camera cost vs detection accuracyTwo-axis chart: per-camera monthly cost on Y, detection accuracy on X. Generic VMS sits low-accuracy / low-cost; SaaS AI analytics like Eagle Eye and Verkada sit high-accuracy / high-cost at $15-$40/camera/month; Fora Soft custom AI sits high-accuracy / low-cost at $2-$6/camera/month amortized.COST / CAMERA / MODETECTION ACCURACY →$40$25$15$7$2~50%~75%~85%90–98%Generic VMSmotion-onlySaaS AI analyticsEagle Eye · Spot · Verkada$15–40 / cam / moCustom AI buildFora Soft · you own the stack$2–6 / cam / mo amortizedmigration pathat scaleNumbers reflect 200+ camera fleets. SaaS economics scale linearly per camera; custom builds amortize after launch.
Figure 1. Decision matrix — per-camera monthly cost against detection accuracy across three deployment patterns. SaaS analytics holds the accuracy ceiling but locks the unit economics; custom AI computer vision moves both axes simultaneously once fleet size crosses ~200 cameras.

Hybrid is a real option — keep an existing VMS (Milestone, Genetec) for recording, layer custom analytics on top via ONVIF / RTSP. We architect that bridge in roughly 30% of engagements.

How we engage

Three ways in. One outcome — software that ships.

Engagement model is matched to where you are, not where we'd prefer you to be. The three shapes below cover roughly 90% of how Fora Soft enters a project.

From scratch

Build the platform end-to-end

Discovery → architecture → MVP → production. We own the stack and ship in 8–16 weeks on a defined scope. Best fit when there's no existing system or when the existing system is being decommissioned. V.A.L.T was built this way.

Discuss scope
Upgrades & improvements

Extend what's already running

Existing VMS plus custom analytics layer, new event classes added to a running model, latency rebudget on an architecture that's struggling at the camera count it's grown into. We integrate via ONVIF, RTSP, or vendor SDK without ripping out what works.

Discuss scope
Takeovers & fixes

Take the codebase off a stuck team

Inherited a system nobody fully understands? A previous vendor walked away mid-build? We've done the takeover dance enough times to make it boring: audit, stabilize, document, ship the next version. NDA before access; honest verdict on what's salvageable.

Discuss scope
Pricing

Three tiers. Named tech in each. No “contact sales” for the bracket.

The number you see is the bracket the build typically lands in. Final scope depends on camera count, model count, event taxonomy depth, integrations, and compliance overlays — we name the moving parts in the discovery call before you commit.

Startup
from $15K
8–10 weeks • single use case • up to ~50 cameras
  • YOLOv8 + DeepSORT on NVIDIA Jetson Orin Nano
  • One event taxonomy (e.g. retail shrinkage or PPE compliance)
  • Operator web dashboard + mobile alerts
  • S3 video archive + PostgreSQL event store
  • Slack / email / webhook alerts
Get an instant estimate
Most common
Growth
from $30K
12–16 weeks • multi-use-case • up to ~500 cameras
  • YOLOv8/v9 + DeepSORT/ByteTrack on Jetson Xavier NX or T4
  • 2–3 event taxonomies + ArcFace or ANPR add-on
  • VMS bridge (Milestone, Genetec, Avigilon) included
  • ClickHouse / pgvector for forensic search across history
  • Role-based operator console, audit logs, redaction tooling
  • PagerDuty / Opsgenie integration
Book a free 30-min call
Enterprise
from $50K
16–24 weeks+ • multi-site • 500–2,500+ cameras
  • Triton Inference Server clusters, TensorRT-optimized models
  • Multi-region edge fleet, on-prem or VPC deployment
  • Full compliance overlay: GDPR, CCPA, HIPAA, SOC 2
  • Custom-trained models per site or per camera class
  • Dedicated SRE handover, runbooks, on-call rotation
  • Reference deployment: V.A.L.T at 2,500+ cameras, 770+ U.S. police departments
Book a free 30-min call

Add-ons priced separately: custom model training cycles, on-prem hardware sizing, third-party SDK licenses (Genetec, Milestone, ArcFace commercial), regulatory certification audits. We itemize before contract.

Free for qualified projects

Three deliverables. Yours within a week.

An independent assessment of your build, written by engineers who would actually ship it. Pick the one that fits where you are now: planning the MVP, mid-build, or stabilizing what's already in production. NDA before any code, footage, or system access changes hands.

Why hire Fora Soft

Twenty years of building video + AI systems that actually run in production.

Not a generalist studio with a computer vision practice. Not a SaaS vendor pretending to do custom work. Fora Soft has been building real-time video, WebRTC, and AI systems since 2005 — and the surveillance, computer vision, and edge inference work below is the same team, the same stack, the same engineering bar.

20+ years

Production track record since 2005

625+ products shipped. Video + real-time systems is what we built the company on — long before “computer vision” became a category. We've watched the surveillance stack transition from analog DVR to IP, from on-prem GPU to edge Jetson, and we've shipped systems through every generation.

2,500+ cameras

V.A.L.T — the flagship deployment

770+ U.S. police departments, 50,000 daily users, child advocacy interview recording, medical education, and law enforcement evidence capture. Built end-to-end by Fora Soft — multi-camera sync, redaction workflows, chain-of-custody logging, the full operator console.

50+ sites

MindBox — retail computer vision at scale

AI retail analytics across 50+ store locations: shrinkage detection, queue dynamics, weapon detection, sweethearting at POS. Sub-200ms edge inference, mobile-first operator dashboards, audit-grade clip retention. The model and unit economics that beat the SaaS analytics vendors.

100% in-house

One team. Computer vision, mobile, infra, ops.

No outsourcing chain. The CV engineer who trains your model sits next to the iOS engineer who builds the operator app and the SRE who runs your Triton cluster. 100% Upwork Top-Rated Plus, 100% job success on enterprise engagements. NDA before any code access; honest verdict before any contract.

Common questions

What buyers ask before the discovery call.

How accurate is custom AI video surveillance compared to traditional CCTV motion detection?

In production deployments, our YOLOv8/v9 + DeepSORT pipelines hit 90–98% detection accuracy on the event classes they're tuned for, with sub-3% false positive rate on tuned scenes. Traditional motion-only CCTV typically generates one false alarm per 10–20 motion events. The accuracy gap is what makes operator-driven workflows realistic at hundreds or thousands of cameras.

What's the typical latency from camera capture to alert?

Sub-200ms end-to-end for live alerts when running on NVIDIA Jetson Orin Nano or Xavier NX at the camera edge: roughly 25ms decode, 60ms YOLOv8 inference, 80ms DeepSORT tracking + recognition, 30ms alert dispatch. Cloud-only architectures typically land at 800ms–2s due to upload + queue time, which is acceptable for forensic search but not for real-time intervention.

Do you build for on-prem, cloud, or hybrid deployments?

All three. On-prem (NVIDIA Jetson at the edge + on-site server) is the default for healthcare, law enforcement, and any environment with data residency or HIPAA requirements. VPC / cloud (AWS, GCP, Azure) suits multi-site retail and smart-city deployments. Hybrid — edge inference + cloud forensic search — is the most common shape for 200+ camera fleets.

Can you integrate with our existing VMS (Milestone, Genetec, Avigilon)?

Yes — we layer custom analytics on top of an existing VMS via ONVIF, RTSP, or vendor SDK in roughly 30% of engagements. You keep the recording infrastructure and operator workflow you've already trained on; we add the AI event layer underneath. No rip-and-replace required.

Is AI video surveillance GDPR and CCPA compliant?

When architected correctly, yes. Compliance is enforced inside each layer of the stack: on-prem or VPC inference (no raw video leaving the perimeter), role-based access for operator review, audit logs on every face / plate match, data residency pinned to region, retention windows configurable per camera class. We sign Data Processing Agreements before any engagement and can support DPIA documentation as part of delivery.

How does this compare to Eagle Eye Networks, Spot AI, Verkada, or Turing?

Those vendors ship a SaaS analytics platform with stock models and per-camera fees. They work well when your camera count is under 100 and your event taxonomy fits the preset library. Custom development wins on three axes: unit economics at 200+ cameras (flat cost vs $15–40/camera/month SaaS), event taxonomy beyond presets, and regulated data residency. The Build vs Buy section above lays out the decision frame.

Do you handle face recognition and ANPR / license plate recognition?

Yes. Face recognition uses ArcFace or InsightFace against authorized enrollment sets with audit logs on every match — we don't deploy unauthorized facial recognition. ANPR / LPR pipelines are custom-trained on plate datasets per jurisdiction (US, UK, EU, GCC plates differ enough to need separate training). Both can be enabled or disabled per camera, with clear audit posture.

How long does an MVP take to ship?

8–10 weeks for a Startup-tier scope (single event taxonomy, up to ~50 cameras, one operator surface). 12–16 weeks for a Growth-tier scope (multi-taxonomy, VMS bridge, forensic search). 16–24 weeks+ for Enterprise (multi-site edge fleet, full compliance overlay, custom-trained models). Discovery call to first running model is typically 3–4 weeks regardless of tier.

Who owns the IP — us or Fora Soft?

You do. Models, training data, infrastructure code, and operator UI are all delivered to your repositories under your name. Fora Soft retains no claim on the IP. The benefit of custom development over SaaS is exactly this: the system, the data, and the unit economics live on your balance sheet rather than the vendor's.

What's the engagement model after launch?

Three shapes: handover to your in-house team with runbooks and on-call training (most common at Enterprise tier); ongoing SRE / model-tuning retainer (typical at Growth tier when in-house ML isn't on the roadmap); or fixed-scope quarterly improvement cycles (additional event classes, new sites, integrations). All three are scoped after the initial build, not bundled.

Further reading

Go deeper before the call.

Have an idea?

Tell us about your computer vision idea.

Within 48 hours you'll get a realistic estimate, a technical recommendation, and an outline of next steps. No obligation. NDA before any access to your code, recordings, or operator dashboards.

+1 (914) 775-5855
New York · USA
Specialist software house for video, real-time and AI products. Founded 2005.
50 in-house engineers.
Describe your project and we will get in touch
Enter your message
Enter your email
Enter your name

By submitting data in this form, you agree with the Personal Data Processing Policy.

Your message has been sent successfully
We will contact you soon
Message not sent. Please try again.