Cloud Video Platform Development for AI-Powered Retail Security in 2026

Blog: Cloud Video Platform Dev: Building AI-Powered Retail Security Solutions in 2026

Key takeaways

• Retail shrink hit $121.6B in 2024 and shoplifting incidents are up 93% since 2019. Cloud video platforms with AI loss prevention now pay back in 90–180 days, not years.

• Hybrid edge–cloud is the only architecture worth shipping in 2026. Edge inference (<50 ms) catches scan-avoidance live; cloud holds the 90-day evidence vault, runs ORC pattern search, and retrains the models.

• Build-vs-buy is mostly a math problem. Below ~75 stores a managed VSaaS (Solink, Rhombus, Verkada, Spot AI) wins on TCO; above ~150 stores or with a unique POS / loss-prevention workflow, custom development pays for itself within three years.

• Skip naked facial recognition. BIPA ($1K–$5K per record), CCPA ($7,500 per violation) and GDPR (€20M / 4% revenue) make pose-, gait- and behavior-based detection the lower-risk path. Facial match is only worth it for known-offender lists with airtight consent.

• What we’d build for you. A YOLOv11 + RTSP / MediaMTX edge pipeline, Kinesis or self-hosted ingest, S3 / R2 evidence vault, POS exception correlation and a React loss-prevention console — benchmarked against our V.A.L.T. surveillance platform that already serves 700+ organizations and 25,000 daily users.

Why Fora Soft wrote this playbook

Fora Soft has spent 21 years building real-time video and computer-vision products. Our flagship surveillance project, V.A.L.T., runs as a SaaS video-management platform across more than 700 organizations — US police departments, courtrooms, child-advocacy centers, medical schools — ingesting live RTSP from 2,500+ IP cameras and serving 25,000 active users a day. That is the operational scale at which "cloud video platform" stops being a marketing phrase and becomes a daily fight against bandwidth bills, false-positive fatigue and discovery requests.

For retail specifically, we’ve shipped object-recognition camera solutions, retail video analytics dashboards and POS-correlated alert engines on top of YOLO, OpenCV, FFmpeg, MediaMTX and the AWS / Azure media stacks. We use Agent Engineering — multi-agent code generation paired with senior review — which compresses our discovery and prototyping cycles by roughly 30–40% versus a hand-coded baseline. That is why a 20-store pilot with us typically lands in 10–14 weeks rather than the 16–20 weeks vendors quote for the same custom scope.

This guide is the playbook we hand to retail security and engineering leaders before they sign anything — with us or with anyone else. It assumes you know what shrink, ORC and POS exceptions are, and you want to leave the page with a defensible architecture, an honest cost model, and a list of the five questions that decide build vs buy.

Already losing more than $30K a month to shrink?

Book a 30-minute architecture review. We’ll size a hybrid edge–cloud pipeline against your store count and tell you honestly whether to build, buy, or stage both.

Book a 30-min call →

What a cloud video platform actually is in retail

A cloud video platform — the more honest term is Video Surveillance as a Service (VSaaS) — is the layer that ingests RTSP from your IP cameras, holds the recordings in object storage with a defined retention policy, runs detection / classification models against the streams, and exposes a web or mobile console for loss prevention, store ops and legal discovery. The cloud part means there is no on-prem NVR or DVR you need to babysit; the AI part means you stop staffing 24/7 monitor rooms whose effective attention span is roughly 22 minutes.

Three things separate retail-grade VSaaS from generic IP-camera SaaS:

1. POS / SCO correlation. Every alert is joined against the point-of-sale stream so you can prove a "no-sale" or "post-void" coincided with concealment behavior on camera, in the same lane, in the same second.

2. Behavior models, not just object detection. Generic SaaS will tell you a person walked into frame. Retail-grade tooling tells you when someone is concealing, scan-avoiding, or matches a previously flagged ORC pattern across stores.

3. Evidence-grade storage. Hash-chained clips, 7- to 365-day tiered retention, audit logs and signed export bundles your legal team can hand to a prosecutor without an evidentiary chain-of-custody objection.

Market snapshot: shrink, VSaaS, and AI in retail

The market context decides which architecture is worth your money. Three numbers do most of the work.

Indicator	2024 / 2025 reading	Why it matters for build / buy
North-American retail shrink	$121.6B (NRF, 2024)	Bigger than most retailers’ net income; justifies a six-figure platform spend.
Shoplifting incidents per store	+93% vs 2019; +24% in H1 2024	Manual monitoring no longer scales; AI triage is mandatory.
Self-checkout fraud (NA)	$5–8B / yr; 1 in 3 incidents	Drives ROI of POS-correlated alerts at SCO lanes specifically.
Global VSaaS market	$3.6B–$11.8B in 2024; 16–27% CAGR	Healthy vendor pool — you can buy with reasonable continuity risk.
AI in retail	$10B (2023) → ~$55B by 2033, 18.6% CAGR	Loss prevention is funded out of digital-transformation budgets, not just security.

Read the table as: the problem is large enough to fund a real platform, the market is mature enough to buy if you want, and the spend now sits inside CIO / CTO budgets, not just LP’s discretionary spend.

Reference architecture: hybrid edge–cloud, end to end

Every successful retail VSaaS we’ve seen converges on the same shape. Cameras send RTSP to an in-store gateway. The gateway runs first-pass inference and POS correlation locally, then forwards events — not raw streams — to a cloud control plane that handles long-term storage, cross-store analytics and the operator UI.

[ Store ]
  IP cameras  —RTSP—>  Edge gateway (Jetson / mini-PC)
                                |
                                |--> YOLOv11 + behavior model
                                |--> POS / SCO event bus
                                |--> Local SQLite event queue
                                |
                                v
                        [ Hybrid uplink ]
                                |
                                v
[ Cloud control plane ]
  · MediaMTX / SRS for clip relay
  · Kinesis Video Streams or self-hosted ingest
  · S3 / R2 evidence vault (90–365 d, hot → cold)
  · TimescaleDB or OpenSearch for events
  · Re-inference + ORC pattern matching
  · React / Next.js LP console + mobile alerts

Figure 1. Reference hybrid edge–cloud architecture for a multi-store retail VSaaS.

Three things make or break this architecture in production:

1. Edge gateways must hold a queue. When the store WAN drops — and it will — events buffer locally and re-sync. Without that, the day a leased-line goes down is the day you lose evidence.

2. The cloud only sees what matters. Streaming raw 1080p from 200 cameras 24/7 to S3 will quietly rack up $25–40K/month in egress and storage. Push event clips and metadata, not full feeds, unless an investigation flag asks for the full take.

3. POS correlation is a first-class citizen, not an integration. The POS event bus has to land at the same gateway as the video pipeline so a "no-sale" and a "concealment" event can be joined in <1 second.

Reach for hybrid edge–cloud when: you have more than five stores, average uplinks under 100 Mbps, or any store in a region where you cannot legally export raw video (EU, certain APAC). Below that threshold, cloud-only is fine for a pilot.

Streaming protocols: RTSP in, HLS / WebRTC out

Every retail platform we’ve built ends up using two or three protocols at different layers. Pick wrong and you either burn money on bandwidth or force operators into a 6-second-lagged dashboard during an active incident.

Protocol	Latency	Where to use it	Watch out for
RTSP / RTP	<100 ms	Camera → gateway ingest	NAT / firewall traversal; use ONVIF for discovery.
HLS	2–12 s	Multi-store dashboards over CDN	Too laggy for real-time SCO interventions.
LL-HLS	2–5 s	Live LP wall, near-real-time playback	Player support is uneven outside Safari / hls.js.
WebRTC	<1 s	Two-way audio, mobile field response	More infra: TURN servers, SFU/MCU; per-viewer cost rises.
SRT / RTMP	1–3 s	Backhaul over lossy WANs, broadcast bridges	Less common in retail, useful for kiosks / pop-up stores.

In practice we use RTSP for ingest, LL-HLS for the operator console, WebRTC for the LP mobile app, and a thin SRT bridge if a store has unstable LTE-failover. Open-source MediaMTX handles all the conversions on the gateway and removes a layer of vendor lock-in.

Reach for WebRTC when: the LP team needs to push voice back into the store ("lane 4 closing now") within one second, or you sell a customer-service video assist feature on top of the same stack. Otherwise stick with LL-HLS for cost.

AI models that actually catch shrink

Object detection alone does not catch shoplifting; it tells you a person and a bag are in frame. The lift comes from layering a behavior / pose model on top, then correlating with POS exceptions.

1. YOLOv11 / YOLOv8 for detection. Anchor-free, ~50–120 FPS on a single NVIDIA T4 per 1080p stream, robust on small objects (cosmetics, razors, infant formula — the categories that actually walk out). YOLOv11 (2024) and the D-FINE refinement layer give you tighter bounding boxes for concealment classification.

2. Pose / behavior heads (HRNet, OpenPose, MediaPipe BlazePose). These classify body kinematics — reaching into a jacket, hand-into-pocket near a high-shrink fixture, dwell-then-pivot at the SCO. They work without facial recognition, which is the right default for compliance reasons.

3. Anomaly models for ORC. Autoencoders or contrastive embeddings learn "normal customer trajectory" per store and flag deviations: fast in-and-out, group splitting at fixtures, repeat visits across days. We cover the model menu in detail in our surveillance anomaly detection guide.

4. POS / SCO exception classifiers. Tabular models on the transaction stream that flag no-sale, post-void, refund-after-close, and item-mismatch. Joined to the video event by camera-id + timestamp, they raise true-positive precision from ~70% (video alone) to 90%+.

5. Cross-store re-identification. Embedding-based ReID (clothing + gait, never face) lets you connect the same booster across multiple stores in your chain — the exact pattern that defines organized retail crime.

Reach for behavior + POS correlation (not face match) when: your legal team has any heartburn about BIPA, CCPA or GDPR. You give up ~5–8 percentage points of recall, but you avoid statutory damages that can dwarf a year of platform spend.

Vendor comparison: Verkada, Rhombus, Solink, Spot AI, Everseen

If you decide to buy, the realistic shortlist below covers ~80% of the mid-market. Pricing is per public benchmarks and partner quotes from 2024–2025; treat it as a planning anchor, not a procurement quote.

Vendor	Hardware model	POS correlation	Per-camera / month*	Best fit
Verkada	Proprietary cameras only	Limited (via partners)	~$17–$150 license + camera capex	Greenfield rollouts that want one throat to choke.
Rhombus	Own + 3rd-party ONVIF	Good	~$8–$25	Compliance-forward chains; better BIPA / CCPA documentation.
Solink	Bring-your-own cameras	150+ POS integrations	~$3–$12	Existing camera fleet; QSR / convenience / pharmacy.
Spot AI	Free NVR + bring-your-own cameras	Growing	~$5–$20	Mid-market; ops + LP shared dashboards.
Everseen	Camera-agnostic enterprise	Deep SCO / POS	Custom enterprise pricing	Tier-1 grocers focused on SCO fraud; 2–5% FP rate.

*Per-camera/month is a license + cloud / storage estimate, exclusive of camera hardware and installation. Real quotes vary ±30% with camera count and retention.

Custom build cost model: 20-store retail chain

If you decide to build, here is the 3-year cost shape we’d quote a 20-store chain at 10 cameras per store (200 cameras total), 90-day hot retention, hybrid edge-gateway architecture, EU + US data residency. The numbers below assume our Agent-Engineering pipeline and a senior squad of one tech lead, two backend engineers, one ML engineer (50%), one front-end engineer, and a DevOps lead.

Stream	Year 1 (build)	Year 2–3 (run / iterate)	Notes
Architecture, discovery, compliance	$45–$70K	$15K / yr	Includes BIPA / CCPA / GDPR review.
Backend (ingest, API, multi-tenant)	$140–$180K	$60K / yr	Go / Python; MediaMTX wrapper.
Edge / gateway software	$70–$95K	$30K / yr	YOLOv11 + behavior model on Jetson Orin Nano.
Web LP console + mobile alerts	$80–$110K	$40K / yr	React + LL-HLS, push to iOS / Android.
ML pipeline + retraining	$55–$80K	$35K / yr	Labeling, A/B, drift detection.
QA, security, pen-test	$45–$65K	$25K / yr	SOC 2 readiness; annual audit.
Cloud infra (compute, storage, egress)	$25–$45K	$60–$110K / yr	Hetzner AX + R2 / S3 mix to control egress.
Year totals (typical)	~$520–$640K	~$280–$320K / yr	Hardware (gateways + cameras) tracked separately.

A 3-year all-in for a custom platform on this footprint usually lands between $1.1M and $1.3M, depending on how much of the LP console you keep custom vs reuse from open-source like Frigate. A managed VSaaS for the same footprint will run $0.7–$1.0M over the same period — so build only makes sense when you also need workflows the vendors won’t sell you (proprietary POS, integrations into your warehouse-management or staffing platform, white-label for franchisees, etc.).

Want a defensible cost model before procurement?

We’ll size the build vs buy line for your exact store count, retention policy and POS stack, and send you a one-pager you can take to your CFO.

Book a 30-min call →

Storage and retention: the silent budget killer

Storage looks cheap on a marketing page and ugly on an AWS invoice. A 1080p H.264 stream at 15 FPS produces ~150–300 GB / camera / month. At 200 cameras and 90-day retention you sit on roughly 90–180 TB of hot data and 45–90 TB of bandwidth per month.

Three levers actually work:

1. Tiered retention. Hot S3 / R2 for 7–14 days, infrequent-access for 30–90 days, glacier-class for legal-hold-only. We’ve seen this cut storage spend by 55–70%.

2. Event-only cloud upload. Push 30-second pre/post-event clips, not 24/7 raw streams. Full-take only on investigation flag.

3. Cloudflare R2 or self-hosted on Hetzner AX-line. Egress on R2 is zero, on AWS it is $0.05–$0.09 / GB. For evidence vaults that get pulled by lawyers and police, the difference is five-figures per quarter.

Compliance: BIPA, CCPA, GDPR, SOC 2, PCI-DSS

If you sell across the US and EU, your platform must satisfy at least four overlapping regimes. The fines are large enough to dwarf a year of platform spend, so we treat compliance as an architectural constraint, not a checklist.

Regime	Key requirement	Penalty floor	Architectural impact
Illinois BIPA	Written consent for biometrics	$1K (negligence) / $5K (intent) per record	Avoid face / iris embeddings; use pose / behavior.
California CCPA / CPRA	Consumer access / delete rights	Up to $7,500 / violation	Deletion API; data inventory; signage.
EU GDPR	Lawful basis + DSAR	€20M or 4% global revenue	EU-resident storage; DPA with vendors; PIA per use case.
SOC 2 Type II	Documented controls + audit	Lost enterprise deals	Centralized logging, IAM, change control.
PCI-DSS (POS adjacency)	Tokenized PAN, segmented network	Acquirer fines; loss of merchant cert	Never persist raw card data; isolate VLAN.

Two practical heuristics. First: if you can solve the LP problem without facial recognition, do that — the BIPA / GDPR exposure of "we hold biometric templates" is rarely worth 5–8 percentage points of recall. Second: data residency is non-negotiable in the EU. Spin a separate stack in Frankfurt or Dublin (Hetzner AX + R2) so that no EU-store video ever crosses to a US bucket.

Mini case: lessons from V.A.L.T. that carry into retail

V.A.L.T. is not a retail platform — it serves law enforcement, child-advocacy and clinical-skills training — but the operational shape is identical to a retail VSaaS at scale: thousands of cameras, evidentiary recordings, multi-tenant org structure, fine-grained access control. Three lessons translate directly:

Situation. When V.A.L.T. crossed ~1,500 cameras and ~15K daily users, the original cloud-only ingest pipeline started to choke on bursty multi-camera recordings during scheduled interviews, and audit-export latency jumped from 30 seconds to 4 minutes.

Plan. Over a 12-week sprint we (1) introduced edge buffering on a per-site appliance, (2) moved evidence storage to a tiered hot / warm / cold layout, (3) re-implemented the export pipeline as a job queue with hash-chained chunks, and (4) added a Prometheus / Grafana SLO board the ops team owned end-to-end.

Outcome. Average export latency dropped from 240 seconds to 28 seconds, evidence retrieval failures fell from 0.9% to under 0.05%, and we held storage spend roughly flat through 60% growth in camera count. The platform now serves 700+ organizations and 25K daily users on the same architecture.

Want a similar diagnostic for your VSaaS or in-house surveillance stack? Book a 30-min architecture review and we’ll come back with a one-page issue map.

POS correlation: the single biggest precision lever

Loss-prevention teams routinely tell us they ignore 60–80% of video-only alerts. The fix is not a smarter video model; it is correlating the video stream with the POS / SCO event bus so that every alert carries a transactional reason for review.

The pattern that works in production:

video_event = {
  camera_id, ts_start, ts_end, kind: "concealment",
  bbox, confidence: 0.91, clip_uri
}

pos_event = {
  store_id, lane_id, ts, type: "no_sale" | "void" | "refund",
  cashier_id, amount
}

# Correlate within 5s window on same lane
def correlate(v, p):
    if v.camera_id == lane_to_cam(p.lane_id) \
       and abs(v.ts_end - p.ts) < 5 \
       and v.confidence > 0.7:
        return Alert(level="HIGH", v=v, p=p)

Figure 2. Minimal correlation rule between a video event and a POS exception.

Once that join lands, the LP queue becomes one screen of high-confidence cases per shift, not a wall of low-confidence motion. Solink built a $200M+ business mostly on this single insight; we’ve seen alert-review rates jump from ~30% to 80%+ on the same camera fleet after wiring it up.

A decision framework: build or buy in five questions

1. How many stores will this run in over 36 months? Below ~75 stores, vendor TCO almost always wins. Above ~150 stores, custom development’s per-store amortization starts to dominate.

2. Is your POS / SCO stack standard or proprietary? If you run NCR, Toshiba, Square or Lightspeed, vendors already integrate. If you run an in-house or heavily customized POS, building wins.

3. Do you need real-time intervention (<1 s alert) or post-incident review (<1 hr)? Real-time intervention requires WebRTC / edge inference; some vendors don’t support it without expensive add-ons.

4. Where will the data live? If you operate in the EU or have a single very-large enterprise customer with strict data-residency clauses, multi-region custom is often cheaper than bolting region-locks onto a vendor.

5. Can you fund a 2–3 person engineering team for at least three years? If not, do not start a build — the platform will rot the moment the original squad disbands.

Reach for build when: three or more of the questions above push toward custom — especially if the POS or workflow integration is non-standard. Otherwise stage a vendor pilot first and revisit in 12 months.

Five pitfalls we keep cleaning up after

1. Tuning false-positive rate to zero. A 0% FP threshold also gives you a 60–70% true-positive rate. Aim for 3–5% FP, >92% TP, and add a human-in-the-loop tier-1 review queue.

2. Skipping the site survey. Backlight on glass doors, mirrors at the SCO, wrong focal length on a fisheye — all of these tank model performance no matter how good the weights are. Walk every store before you ship cameras.

3. Treating the POS integration as a phase-2 nice-to-have. Without correlation, the platform is a expensive motion detector. Bake POS / SCO joins into the MVP.

4. Edge inference without cloud failover. A gateway dies on a Friday night, recordings vanish, and on Monday the prosecutor asks for the clip you no longer have. Always queue locally + replicate to cloud.

5. Shipping a new model with no rollback path. Keep three model versions (prod, canary, experimental), A/B 10% of cameras for one week, and define an automatic rollback on precision drop >3 points.

KPIs that decide whether the platform is working

Quality KPIs. True-positive (recall) >92%, precision (1−FP) >95%, F1 >0.93, end-to-end alert latency <5 seconds. These are the model’s honest report card; track them per camera, per store, per model version.

Business KPIs. Shrink reduction 15–30% within 12 months, alert review rate >80%, average investigation time <15 minutes, ROI payback in 90–180 days. If shrink doesn’t move, the platform is theater — cut it or fix the LP workflow.

Reliability KPIs. Cloud uptime ≥99.9%, gateway availability ≥99.5% per store, video integrity at 0% bit-rot (S3 checksums + monthly validation), alert delivery latency <30 seconds end-to-end. Without these, your evidence is challengeable in court.

When NOT to build a custom cloud video platform

There are five situations where we will tell a prospect not to build, even though we’d be happy to take the work:

1. You operate fewer than 50 stores and have no plan to grow past 100. The per-store math doesn’t work; an off-the-shelf VSaaS will cost half of custom over three years.

2. You need to be live in <90 days. A Solink or Spot AI pilot can stand up in 4–8 weeks; a defensible custom MVP needs 12–16.

3. You don’t have a product owner or LP champion who will live with the system for two years. Custom platforms die without an internal sponsor.

4. The compliance bar is generic and your POS is mainstream. Buying is just cheaper.

5. You don’t have multi-year operating budget. A platform that ships and then starves of maintenance is worse than no platform — it generates evidence you can’t defend.

A realistic 16-week rollout plan

For a 20-store pilot, here’s the cadence we run with our Agent-Engineering pipeline. It compresses what is usually a 5–6 month build into ~16 weeks because most boilerplate (auth, multi-tenant, audit log, IaC) is generated and reviewed rather than hand-coded.

Phase	Weeks	Outcome
Discovery + site survey	1–2	Camera audit; LP workflow doc; compliance scope.
Architecture + IaC	3–4	Terraform / Pulumi stacks; ingest skeleton.
Edge gateway + RTSP ingest	3–6	Jetson images; YOLOv11 baseline.
POS correlation	5–8	Adapters for top 3 POS / SCO; alert join.
LP console + mobile	7–12	React + LL-HLS, push notifications.
Pilot in 3–5 stores	10–13	Tune thresholds; LP team training.
Hardening + compliance	13–15	SOC 2 controls; pen-test; PIA.
Full 20-store rollout	15–16	Phased cut-over; 24/7 support runbook.

Integrations that earn their keep

A retail VSaaS is more useful when it talks to the systems your operators already live in. The shortlist of integrations we ship in week 1, not "phase 2":

POS / SCO event bus. NCR Voyix, Toshiba TCx, Square, Lightspeed, Oracle Simphony — cover your top three, expose a generic webhook for the rest.

Access control. Genetec, Brivo, Openpath / Avigilon Alta, Kisi — correlate after-hours motion with badge swipes.

People counting. Either bake it in (low-cost, lower accuracy) or bridge to V-Count / FootfallCam / Hella Aglaia (higher accuracy, ~$200–$500 / sensor).

Ticketing. Jira, ServiceNow, Zendesk — create an investigation ticket from any alert with the clip URL embedded.

Analytics warehouse. BigQuery, Snowflake, Redshift — ship daily aggregates for the BI team so shrink is on the same dashboard as same-store sales.

Security hardening: the platform is also an attack surface

Cloud video platforms have been compromised at scale before — the 2021 Verkada breach exposed 150,000 cameras across factories, schools and hospitals. We bake five controls into every deployment:

1. mTLS between cameras and gateway, and gateway and cloud. No unauthenticated RTSP, ever.

2. Hardware-backed credential storage. Use Jetson’s secure-boot + TPM or equivalent on the gateway. No plaintext API keys on disk.

3. Per-tenant row-level encryption keys. A breach of one franchisee’s data should not unlock the rest of the chain.

4. Just-in-time access for engineers. No standing prod access; every session is ticketed, logged and auto-expires.

5. Quarterly red-team exercises. Internal or third-party. The cost is rounding error compared to one breach.

Need a 30-min security gut-check on your VSaaS plan?

We’ll walk through your current architecture, flag the top three breach paths and tell you what to fix in the first 30 days.

Book a 30-min call →

How Agent Engineering changes the build economics

Three years ago, building a 20-store custom VSaaS comfortably ran past $1.5M in year 1. Today, with multi-agent code generation paired with senior architectural review, we squeeze the same scope into the $520–$640K band shown above. The savings are concentrated in a few specific areas:

Boilerplate generation. CRUD APIs, multi-tenant scaffolding, IaC modules, RBAC schema — agents emit ~70% of the first draft, senior engineers refactor and harden.

Test scaffolding. Generated unit and integration tests cover the happy paths; humans add the failure modes that actually trip in production.

Documentation kept in sync. Spec, OpenAPI, runbook generated from the same source of truth, so the LP team gets accurate docs at handover instead of the usual six-month-stale wiki.

It does not change anything load-bearing in the architecture — you still need a senior team for the protocol design, ML pipeline and security model. But it reliably collapses 4–6 weeks out of a typical 16–20 week pilot.

FAQ

Do we have to replace our existing IP cameras to get AI loss prevention?

No, in most cases. Solink, Spot AI, Rhombus and a custom build can all consume RTSP / ONVIF from your existing fleet. You save 60–70% in capex versus a Verkada-style refresh. Replace cameras only where the optics are physically wrong (low resolution, bad angle, no IR) for the AI use case you care about.

How accurate is AI shoplifting detection in the real world, not a vendor demo?

Best-in-class vendors (Everseen, Veesion) report 92–98% true-positive rates in controlled studies. Real stores see 85–92% with 2–8% false positives, depending on lighting, occlusion and crowd density. Our planning baseline is 90% TP at 5% FP, with a tier-1 human review queue that turns the remaining noise into trustworthy cases.

Will using facial recognition land us in legal trouble?

It can. BIPA in Illinois ($1K–$5K per record), CCPA in California (up to $7,500 / violation) and GDPR in the EU (€20M / 4% revenue) make facial recognition expensive to do correctly. The pragmatic choice for retail LP is to lean on body pose, gait, concealment behavior and POS correlation. Reserve facial match for known-offender lists with explicit, signed consent and aggressive deletion policies.

How fast can we stand up a 20-store pilot?

Off-the-shelf vendors (Solink, Spot AI, Rhombus) can be live in 4–8 weeks if your network is reasonable. A custom build with our Agent-Engineering pipeline lands in 14–16 weeks for an MVP, 22–26 weeks for a hardened, SOC 2-ready system. We typically recommend a vendor pilot first; switch to custom only when ROI and integration gaps justify it.

What ROI should we expect, and over what horizon?

Successful deployments pay back in 90–180 days. Typical impact: 15–30% reduction in shrink within 12 months, 50–70% reduction in investigation time, and a measurable drop in alert fatigue once POS correlation is wired in. If you don’t see movement in the first quarter, stop adding cameras and fix the LP workflow instead.

Edge inference vs cloud-only — which should we pick?

Pure cloud is fine for a single-store pilot or fewer than ~15 cameras. Beyond that, a hybrid model wins on every dimension: edge gives you <50 ms alerts and 70–90% bandwidth savings; cloud handles long-term storage, ORC pattern search and retraining. We rarely recommend pure cloud for chains larger than 5–10 stores.

How do we keep the LP team from drowning in alerts?

Three rules. (1) Tier alerts: critical, medium, info; only critical pages people. (2) Require POS / SCO correlation on tier-1 alerts so the LP analyst has a transactional reason to look. (3) Track alert review rate as a first-class KPI; if it falls below 50% for two weeks, raise the model thresholds or kill the noisiest rules.

How does Fora Soft typically engage on a project like this?

A 60-minute discovery, then a 2-week paid architecture sprint that produces a target architecture, cost model and 16-week roadmap. From there it’s a fixed-scope MVP (12–16 weeks), a soft launch in 3–5 stores, and a phased rollout. We can also slot in for a "second-opinion" review of an existing build or vendor evaluation if that’s where you are. Book a 30-min call to scope it.

What to Read Next

Analytics

Retail Video Analytics: 2025 Guide

People counting, heatmaps and queue management on the same VSaaS stack.

AI & Anomaly Detection in Video Surveillance

Behavior models, false-positive control, alert design.

Architecture

Scalable Video Management Systems in 2026

Five engineering decisions that decide whether your VMS scales.

Build guide

Custom VMS Development: Complete Guide

Building a video management system from scratch — teams, stack, costs.

Engineering

Real-Time Video Processing with AI: Best Practices

Edge inference, latency budgets and the OpenCV / FFmpeg toolchain.

Ready to ship a retail VSaaS that actually moves shrink?

A modern retail cloud video platform is no longer a security toy — it is a measurable contributor to gross margin, with 90–180-day payback in well-run deployments. The architecture has converged on hybrid edge–cloud with RTSP ingest, LL-HLS / WebRTC egress, YOLOv11 plus pose / behavior heads, and tight POS correlation. The vendors are mature enough to buy when the math fits, and modern engineering practices have brought custom-build cost low enough that 100+ store chains and franchise networks should at least model both paths.

If you are the person on the hook for shrink, fraud or LP modernization, you don’t need another generic deck. You need the build vs buy line drawn against your store count, your POS, your jurisdictions, and your operating budget. We’ll bring the architecture, the cost model, and 21 years of receipts from V.A.L.T. and our broader video surveillance engineering work.

Get a build-vs-buy verdict in 30 minutes

Bring your store count, POS stack and current camera fleet. We’ll come back with an architecture sketch, a cost model, and an honest recommendation.

Book a 30-min call →

Technologies
Development
Services