You own the models, the infrastructure, and the operator UI from day one. Built on YOLOv8/v9, DeepSORT, and NVIDIA Jetson — 90-98% detection accuracy, sub-200ms response, deployed on-prem or in your VPC. The same stack now running at V.A.L.T (2,500+ cameras across 770+ U.S. police departments) and MindBox (50+ retail sites). No per-camera SaaS fees, no vendor lock-in.
A real-time computer vision pipeline isn't one model — it's an inference graph with a latency budget at every stage. Miss any single budget and the system slips from prevention to after-the-fact review, which is the difference between stopping the event and reading about it later.
IP cameras stream H.265 / H.264 over RTSP, ONVIF Profile S/T, or WebRTC for low-latency feeds. Resolution and framerate are tuned per use case — 4MP at 12-15fps is the typical sweet spot for analytics, with 4K reserved for forensic recording.
NVIDIA Jetson Orin Nano / Xavier NX runs Triton Inference Server with TensorRT-optimized models per camera or per zone. We deploy edge over cloud for sub-200ms response and privacy compliance — the raw video never leaves the perimeter unless an event triggers escalation.
YOLOv8 or YOLOv9 (we pick by accuracy/latency target) runs object detection — person, vehicle, weapon, package, PPE state, license plate. Class taxonomy is yours, not a generic model's. Confidence thresholds are tuned per camera angle and lighting. EfficientDet is the fallback for low-power scenes.
DeepSORT or ByteTrack maintains object identity across frames so the system reasons about trajectories — not just appearances. ArcFace / InsightFace handle face recognition when authorized; ANPR/LPR pipelines decode license plates. Both can be enrolled against allow-lists or watch-lists with audit logs.
Events fire to operator dashboards (web + mobile), PagerDuty/Opsgenie webhooks, VMS bridges (Milestone, Genetec, Avigilon), or your own SIEM. Operator video review tools use WebRTC for sub-second playback. Forensic search runs against a vector index of detected entities for second-level recall on historic footage.
Total end-to-end budget: sub-200ms for live alerts; sub-second for operator-confirmed escalation. We benchmark each customer build against your scene density before sign-off — the budget is the contract.
Every layer of the stack is a deliberate choice, not a default. The list below is what we deploy in production today — not a survey of options. When something on this list doesn't fit your environment, we name the alternative in the recommendation document, not in marketing.
Compliance overlays — GDPR, CCPA, HIPAA, SOC 2 — are not a separate layer. They're enforced inside each layer: encryption at rest and in transit, role-based access control for video review, audit logs on face / plate matches, data residency pinned to region.
A behavior detector for retail shrinkage isn't the same model as one for industrial PPE compliance, even if the underlying YOLO + DeepSORT stack is identical. The taxonomy, thresholds, and escalation rules are where custom development pays back. Here are the six most common shapes we build.
Real-time shrinkage detection, anti-sweethearting at POS, weapon detection, and crowd density for queue management. MindBox runs this exact stack across 50+ retail sites — store managers see shrinkage flags on a phone dashboard before the customer reaches the door, with audit-grade clip retention.
PPE compliance (hard hat, vest, safety glasses), forklift / pedestrian collision risk, restricted zone intrusion, abandoned object detection on conveyor lines. Latency budget tightens to sub-150ms when the alert needs to trigger a physical interlock or e-stop.
ANPR / LPR for tolling and parking, pedestrian counts for transit planning, abandoned vehicle detection, traffic incident classification. Models run at the camera edge to keep PII out of central systems — only event metadata and anonymized counts move to the cloud.
Patient elopement detection, fall detection in long-term care, hand-hygiene compliance audits, restricted-area access for controlled substances. HIPAA-grade audit logs and on-prem inference are non-negotiable defaults — video stays inside the facility unless an authorized review triggers escalation.
V.A.L.T — our flagship deployment — powers 770+ U.S. police departments with 2,500+ cameras serving forensic interview recording for child advocacy centers, medical education, and law enforcement evidence capture. Multi-camera sync, redaction workflows, and chain-of-custody logging are built in.
Most engagements start with a use case nobody packaged. The work is defining the event taxonomy, the scene constraints, and the false-positive budget your operators can absorb — then training and deploying models that hit it. Discovery call is the first hour.
SaaS video analytics platforms — Eagle Eye Networks, Spot AI, Turing, Verkada — ship in days and work well within their template. Custom development takes longer to start and pays back the moment your event taxonomy, retention, residency, or unit economics stop matching that template. The decision is rarely “which is better” — it's “what's your three-year cost curve.”
Vendor-owned cloud, vendor-owned model roadmap, per-camera SaaS fees that scale linearly with deployment.
Models, infrastructure, and operator UI all under your control. Higher upfront effort; flat operational cost after build.
Hybrid is a real option — keep an existing VMS (Milestone, Genetec) for recording, layer custom analytics on top via ONVIF / RTSP. We architect that bridge in roughly 30% of engagements.
Engagement model is matched to where you are, not where we'd prefer you to be. The three shapes below cover roughly 90% of how Fora Soft enters a project.
Discovery → architecture → MVP → production. We own the stack and ship in 8–16 weeks on a defined scope. Best fit when there's no existing system or when the existing system is being decommissioned. V.A.L.T was built this way.
Discuss scopeExisting VMS plus custom analytics layer, new event classes added to a running model, latency rebudget on an architecture that's struggling at the camera count it's grown into. We integrate via ONVIF, RTSP, or vendor SDK without ripping out what works.
Discuss scopeInherited a system nobody fully understands? A previous vendor walked away mid-build? We've done the takeover dance enough times to make it boring: audit, stabilize, document, ship the next version. NDA before access; honest verdict on what's salvageable.
Discuss scopeThe number you see is the bracket the build typically lands in. Final scope depends on camera count, model count, event taxonomy depth, integrations, and compliance overlays — we name the moving parts in the discovery call before you commit.
Add-ons priced separately: custom model training cycles, on-prem hardware sizing, third-party SDK licenses (Genetec, Milestone, ArcFace commercial), regulatory certification audits. We itemize before contract.
An independent assessment of your build, written by engineers who would actually ship it. Pick the one that fits where you are now: planning the MVP, mid-build, or stabilizing what's already in production. NDA before any code, footage, or system access changes hands.
Competitor analysis, core feature definition, monetization modeling, and a full launch blueprint — delivered within a week. Written by engineers who'll build what they plan.
An independent review of your system's technology choices, structural components, and workload fit — with a plain verdict on what's working, what's a liability, and exactly what to change to reach your goal. Delivered within a week.
A full audit of your code with every issue documented, evidenced, and located — exact file, exact line. Plus a system architecture review and a prioritized fix roadmap. Not a consultant's opinion. A case file. Delivered within a week.
Not a generalist studio with a computer vision practice. Not a SaaS vendor pretending to do custom work. Fora Soft has been building real-time video, WebRTC, and AI systems since 2005 — and the surveillance, computer vision, and edge inference work below is the same team, the same stack, the same engineering bar.
625+ products shipped. Video + real-time systems is what we built the company on — long before “computer vision” became a category. We've watched the surveillance stack transition from analog DVR to IP, from on-prem GPU to edge Jetson, and we've shipped systems through every generation.
770+ U.S. police departments, 50,000 daily users, child advocacy interview recording, medical education, and law enforcement evidence capture. Built end-to-end by Fora Soft — multi-camera sync, redaction workflows, chain-of-custody logging, the full operator console.
AI retail analytics across 50+ store locations: shrinkage detection, queue dynamics, weapon detection, sweethearting at POS. Sub-200ms edge inference, mobile-first operator dashboards, audit-grade clip retention. The model and unit economics that beat the SaaS analytics vendors.
No outsourcing chain. The CV engineer who trains your model sits next to the iOS engineer who builds the operator app and the SRE who runs your Triton cluster. 100% Upwork Top-Rated Plus, 100% job success on enterprise engagements. NDA before any code access; honest verdict before any contract.
In production deployments, our YOLOv8/v9 + DeepSORT pipelines hit 90–98% detection accuracy on the event classes they're tuned for, with sub-3% false positive rate on tuned scenes. Traditional motion-only CCTV typically generates one false alarm per 10–20 motion events. The accuracy gap is what makes operator-driven workflows realistic at hundreds or thousands of cameras.
Sub-200ms end-to-end for live alerts when running on NVIDIA Jetson Orin Nano or Xavier NX at the camera edge: roughly 25ms decode, 60ms YOLOv8 inference, 80ms DeepSORT tracking + recognition, 30ms alert dispatch. Cloud-only architectures typically land at 800ms–2s due to upload + queue time, which is acceptable for forensic search but not for real-time intervention.
All three. On-prem (NVIDIA Jetson at the edge + on-site server) is the default for healthcare, law enforcement, and any environment with data residency or HIPAA requirements. VPC / cloud (AWS, GCP, Azure) suits multi-site retail and smart-city deployments. Hybrid — edge inference + cloud forensic search — is the most common shape for 200+ camera fleets.
Yes — we layer custom analytics on top of an existing VMS via ONVIF, RTSP, or vendor SDK in roughly 30% of engagements. You keep the recording infrastructure and operator workflow you've already trained on; we add the AI event layer underneath. No rip-and-replace required.
When architected correctly, yes. Compliance is enforced inside each layer of the stack: on-prem or VPC inference (no raw video leaving the perimeter), role-based access for operator review, audit logs on every face / plate match, data residency pinned to region, retention windows configurable per camera class. We sign Data Processing Agreements before any engagement and can support DPIA documentation as part of delivery.
Those vendors ship a SaaS analytics platform with stock models and per-camera fees. They work well when your camera count is under 100 and your event taxonomy fits the preset library. Custom development wins on three axes: unit economics at 200+ cameras (flat cost vs $15–40/camera/month SaaS), event taxonomy beyond presets, and regulated data residency. The Build vs Buy section above lays out the decision frame.
Yes. Face recognition uses ArcFace or InsightFace against authorized enrollment sets with audit logs on every match — we don't deploy unauthorized facial recognition. ANPR / LPR pipelines are custom-trained on plate datasets per jurisdiction (US, UK, EU, GCC plates differ enough to need separate training). Both can be enabled or disabled per camera, with clear audit posture.
8–10 weeks for a Startup-tier scope (single event taxonomy, up to ~50 cameras, one operator surface). 12–16 weeks for a Growth-tier scope (multi-taxonomy, VMS bridge, forensic search). 16–24 weeks+ for Enterprise (multi-site edge fleet, full compliance overlay, custom-trained models). Discovery call to first running model is typically 3–4 weeks regardless of tier.
You do. Models, training data, infrastructure code, and operator UI are all delivered to your repositories under your name. Fora Soft retains no claim on the IP. The benefit of custom development over SaaS is exactly this: the system, the data, and the unit economics live on your balance sheet rather than the vendor's.
Three shapes: handover to your in-house team with runbooks and on-call training (most common at Enterprise tier); ongoing SRE / model-tuning retainer (typical at Growth tier when in-house ML isn't on the roadmap); or fixed-scope quarterly improvement cycles (additional event classes, new sites, integrations). All three are scoped after the initial build, not bundled.
Within 48 hours you'll get a realistic estimate, a technical recommendation, and an outline of next steps. No obligation. NDA before any access to your code, recordings, or operator dashboards.