Face Recognition in Surveillance: How It Works & the Law · Video Surveillance & VMS

This is engineering guidance, not legal advice. Confirm specifics with qualified counsel.

Why this matters

Face recognition is the single analytic most likely to get a surveillance project sued, banned, or quietly shelved after launch — not because the math fails, but because teams treat it as a checkbox feature instead of a regulated capability with a real error rate. The person specifying it is usually a security integrator, retail-operations lead, building-security manager, or product manager who has seen a vendor demo hit a face instantly and now has to decide whether to deploy it, where, and under which law — and the gap between that demo and a lawful, reliable production system is enormous. This article is written so a non-technical reader can understand both halves: how the recognition pipeline actually works and where its accuracy collapses, and why a face template is legally radioactive in a way a license plate or a person-detection box is not. A senior video engineer will find the standards and accuracy framing precise; the writing serves the decision-maker first.

Face detection is not face recognition — the distinction everything rests on

Before anything else, separate two words that sound alike and are constantly confused, because the entire legal and technical story depends on the difference.

Face detection answers one question: is there a face in this frame, and where? It draws a box around each face it finds, the same way the object detection we covered earlier draws a box around each person or car. Detection does not know, and does not try to know, whose face it is. Your phone camera drawing a yellow square around each face before you take a photo is doing detection. It is the cheap, mature, run-on-the-camera part, and on its own it carries almost no privacy weight — a count of how many faces are present is not a record of who those people are.

Face recognition answers a far heavier question: whose face is this? It takes the detected face, measures its geometry into a numeric signature, and compares that signature against others to decide whether two faces belong to the same person. Recognition is what turns "a person is here" into "this specific named individual is here." That single step — from anonymous presence to identity — is what triggers nearly every law discussed later in this article.

Keep this line bright, because vendors blur it constantly. "Our cameras have face AI" might mean only detection (harmless) or full recognition (heavily regulated). The clean test: does the system try to tell people apart and put a name or a persistent identity to them? If yes, it is recognition, and the legal gate applies. If it only counts or blurs faces without identifying them, it is detection, and it does not.

Face detection finds that a face is present; face recognition decides whose face it is and crosses a biometric line. Figure 1. Detection versus recognition. Detection answers "is a face here?" — anonymous, light, low privacy weight. Recognition answers "whose face is this?" — it measures the face into a template and matches it to an identity, crossing into biometric data and the legal gate. The same camera can do one, the other, or both; only recognition triggers the law.

How face recognition works: the four-step pipeline

Modern face recognition is a four-step assembly line. Each step is worth understanding because each one is where accuracy is won or lost.

Step 1 — Detect. First the system finds the face in the frame and crops it out, exactly the detection step above. If the face is too small, too turned, or too blurry to crop cleanly, everything downstream suffers. Garbage in, garbage out starts here.

Step 2 — Align. The cropped face is rarely straight-on. The system locates a handful of landmarks — the corners of the eyes, the tip of the nose, the edges of the mouth — and uses them to rotate, scale, and straighten the face into a standard pose, like turning a tilted photo upright on a scanner before reading it. Alignment is what lets a face turned slightly left still match the same face turned slightly right. The more extreme the original angle, the less alignment can rescue.

Step 3 — Embed (the template). This is the heart of recognition, and the most misunderstood part. A neural network looks at the aligned face and outputs a list of numbers — typically a vector of 512 numbers — that summarizes the face's geometry. This list is called the template (or "embedding," or "faceprint"). The crucial point for both engineers and lawyers: the template is a string of numbers, not a stored photograph. You generally cannot look at the 512 numbers and see a face. Modern systems use training methods such as ArcFace that deliberately pull the templates of the same person close together and push different people's templates far apart, so the same face always lands in nearly the same spot in this numeric space (Deng et al., "ArcFace," CVPR 2019).

Step 4 — Match. Finally, the new template is compared against one or more stored templates by measuring how close the numbers are — usually a cosine similarity, a single score for how alike two templates are. If the score clears a chosen threshold, the system declares a match; if not, no match. That threshold is a dial, and where you set it decides everything about the error behavior, as the next sections show. There is no "just right" setting that is accurate for everyone in all conditions.

The four-step face recognition pipeline: detect, align, embed into a numeric template, then match against stored templates. Figure 2. The recognition pipeline. A face is detected and cropped, aligned to a standard pose using facial landmarks, embedded into a numeric template (a vector of ~512 numbers, not a photo), and matched against stored templates by similarity score against a threshold. Accuracy is won or lost at every step — most of all at the threshold dial.

Two very different jobs: 1:1 verification and 1:N identification

The word "recognition" hides two jobs with completely different risk profiles. Confusing them is one of the most expensive mistakes in this field.

1:1 verification asks: are these two faces the same person? You claim an identity — by tapping a badge, presenting a passport, or unlocking a phone — and the system checks the live face against the one template on file for that identity. This is the cooperative, well-lit, you-are-looking-at-the-camera case. It is the easy version, and it is accurate.

1:N identification asks: who, if anyone, is this — out of a gallery of N people? A face appears (often without the person's cooperation, on a surveillance camera) and the system searches it against a watchlist or gallery of N enrolled templates, returning the closest match or matches. This is the hard version. It is what people usually mean by "surveillance face recognition," and its error behavior gets worse as the gallery grows, for a reason worth showing with arithmetic.

Here is the scaling trap in plain numbers. Suppose a 1:N system has a false positive identification rate — the chance it wrongly flags a non-member as a match — of just 0.1% per comparison, which sounds excellent. Now screen a busy transit hub:

people screened per day: 1,000,000
false-positive rate per person: 0.1% = 0.001
expected wrong "hits" per day: 1,000,000 × 0.001 = 1,000

A thousand innocent people flagged every single day, from a system that is "99.9% accurate." The accuracy number was never wrong; it just means something different at scale than buyers assume (NIST, FRTE 1:N reporting). The larger the watchlist and the more faces you run, the more false alarms you generate — which is why serious deployments either keep galleries small and thresholds tight, or accept that every hit is an investigative lead to be confirmed by a human, never a conclusion.

The accuracy reality: superb in the lab, much weaker on real video

Now the rule the whole section is built on, applied to faces: accuracy is a range tied to image quality, pose, lighting, and the people in front of the camera — never a single number, and never 100%.

The benchmark numbers are genuinely impressive. The U.S. National Institute of Standards and Technology runs the Face Recognition Technology Evaluation (FRTE) — the renamed, ongoing successor to the long-running FRVT — and the best 1:1 algorithms verify pristine, cooperative portrait photos with false-non-match rates well under 1%, in some constrained "visa-quality" tests below 0.1% (NIST, FRTE 1:1 Verification, ongoing). If your job is matching a clear passport photo to a person standing still at a kiosk, the technology is excellent.

Surveillance is the opposite of that test. The camera is overhead, the subject is moving and not looking at it, the light is mixed, the face is one small smudge of a 1080p frame at fifteen metres. Independent analyses and vendor field reports converge on the same uncomfortable finding: the same algorithm that scores ~99% on NIST's cooperative imagery commonly falls to roughly 65–85% on real CCTV conditions, a drop of tens of percentage points driven mostly by blur, low resolution, and pose (industry field analyses, 2025). The lab score is real; it just does not transfer to a corridor. Any vendor quoting you their NIST number as if it were their CCTV number is, intentionally or not, misleading you.

Face recognition scores near-perfect on cooperative lab photos but commonly falls to 65 to 85 percent on real CCTV, never 100 percent. Figure 3. Accuracy collapses outside the lab. On cooperative, well-lit, frontal NIST portrait imagery the best algorithms exceed ~99%; on real surveillance video — angle, motion, mixed light, distance, low resolution — the same algorithm commonly lands around 65–85%. The number is a range that moves with image quality and the people on camera, never a flat 100%.

Accuracy is not equal across faces

There is a second accuracy fact that is also a fairness and legal fact. NIST's landmark demographic study found that error rates are not the same across groups: many algorithms produce higher false positives for women, the elderly, children, and people from some racial and regional groups, with false-positive differentials between groups spanning a wide range — in the worst algorithms, factors of dozens to over a hundred (NIST, FRVT Part 3: Demographic Effects, NISTIR 8280, and the ongoing demographics report updated 2025). The gap has narrowed in the very best modern algorithms but has not vanished. For a buyer this means two things at once: a system tuned to one population can misfire on another, and a deployment that disproportionately false-matches one demographic is not only inaccurate, it is a discrimination and legal-exposure problem before it is anything else.

What "much weaker on real video" looks like in the world

This is not theoretical. In the United States there are more than a dozen documented wrongful arrests tied to police use of face recognition, and the pattern is consistent: a low-quality probe image, a 1:N search that returns a plausible-but-wrong candidate, and an officer who treated the lead as proof. The first publicly reported case, Robert Williams in Detroit in 2020, came from a blurry shoplifting-surveillance still run against a driver's-license database; the match was simply wrong, and he was arrested in front of his family. A later Detroit case wrongly arrested a woman who was eight months pregnant. All three known Detroit cases involved Black residents, echoing exactly the demographic-error finding above, and the resulting settlement now bars that department from arresting on a face-recognition result alone (ACLU, Williams v. City of Detroit, 2020–2024). The engineering lesson is blunt: a face-recognition hit on surveillance video is a lead, not an identification, and any system or policy that treats it as proof is built to fail.

How a face event surfaces in the VMS — and what ONVIF does and does not standardize

A face match is only useful if the Video Management System — the software that ingests and records many camera streams, called a VMS — can receive and act on it. This is the part the surveillance course owns, as opposed to the AI course that owns the model.

ONVIF is the common language that lets cameras and software from different makers understand each other, and ONVIF Profile M is the profile built for analytics metadata and events. Profile M defines a standardized way for a device to send face-related metadata and recognition events into a VMS — the event that says "a face was detected here, with these attributes" or "a recognition event fired" — carried inside the video stream, through the ONVIF event service, or over the lightweight messaging protocol MQTT (ONVIF, Profile M). That standardized event interface is why a Profile M camera's face events can light up alarms and searches in a Profile M VMS from a different vendor.

Here is the boundary that catches teams out. ONVIF standardizes how a face event is reported, not how accurate the recognition is or how it is computed. The recognition algorithm itself — the embedding model, the gallery, the threshold, the demographic tuning — is the vendor's own, reached through their SDK, not guaranteed by ONVIF. As always, ONVIF conformance is a baseline for interoperability, not a promise of feature parity or accuracy. Keep "ONVIF-conformant face events" and "good, lawful face recognition" firmly separate. For the standard beneath this, see events, metadata, and the ONVIF analytics interface; for the commercial overview of how the profiles fit security systems, Fora Soft's ONVIF profiles in security systems is the reference.

Where the two halves run

The pipeline splits across the system, and where each part runs shapes cost, latency, and — most of all — privacy. Face detection is light and runs at the edge, on the camera's own neural processing unit (NPU) or a small on-site box, because finding and cropping a face is cheap and must keep up with the live frame. Recognition — the embedding and the gallery match — usually runs on a server or in the cloud, because matching against a watchlist of N people needs the gallery of templates in one place and more compute to run the embedding model and the search.

The consequence for a buyer is a familiar tradeoff with an unfamiliar stake. Doing recognition centrally means sending faces or face templates up from every camera and holding a gallery of biometric templates in one place — which is operationally convenient and a concentrated pool of the most sensitive, most regulated data you can hold. Keeping more at the edge (detecting on-camera, sending only events) cuts bandwidth and shrinks the central biometric store, but limits how richly you can match. Where analytics run — camera, on-site server, or cloud — is its own decision with its own latency, bandwidth, and privacy profile, covered in edge vs cloud video analytics. This article owns what recognition does and what law wraps it; that one owns where.

Face detection runs at the edge on the camera; recognition and the biometric gallery run on a server or cloud, surfacing events via ONVIF Profile M. Figure 5. Where the halves run. Detection is light and stays on the camera (edge); recognition needs the gallery of templates and runs centrally on a server or in the cloud. The match surfaces back into the VMS as an ONVIF Profile M event. That central gallery is a concentrated store of biometric data — the most regulated asset in the system.

The model lives in another section — on purpose

One boundary keeps this article honest. How the recognition network is built and trained — the embedding architectures, the margin-based training behind ArcFace, the loss functions that separate identities — belongs to our AI for Video Engineering section. This surveillance article owns the application: how recognition plugs into the camera, the VMS, and storage, how accurate it is in the field, and what law governs deploying it. When you need the model internals, follow the cross-link; when you need to specify, deploy, and stay legal, stay here.

The legal gate: a face template is biometric data, and the law treats it that way

This is the part no surveillance team can skip, because with face recognition the law comes before the feature. A face template is biometric data — it measures a physical characteristic to identify a specific person — and almost every major privacy regime singles biometric data out for the strictest treatment. Four pillars matter most.

Europe — GDPR. The EU's General Data Protection Regulation defines biometric data and, in Article 9, classes biometric data processed for the purpose of uniquely identifying a person as special-category data, which is prohibited to process unless a narrow exception applies (such as explicit consent or substantial public interest) (GDPR, Reg. (EU) 2016/679, Art. 9 and Art. 4(14)). The European Data Protection Board's guidance on video devices spells out that running face recognition on surveillance footage is special-category processing that needs a lawful basis, transparency, and — for systematic monitoring of a public space — a Data Protection Impact Assessment before switch-on (EDPB, Guidelines 3/2019; GDPR Art. 35). Plain detection or blurring is not this; identifying people is.

Europe — the EU AI Act. On top of GDPR, the AI Act regulates the technology directly. Real-time remote biometric identification of people in publicly accessible spaces — live-matching faces in a crowd — is a prohibited practice for most purposes, in force since 2 February 2025, with only narrow, pre-authorized law-enforcement exceptions (EU AI Act, Reg. (EU) 2024/1689, Art. 5). Other biometric-identification and biometric-categorisation systems are classed high-risk, carrying heavy obligations (risk management, logging, human oversight, registration); the bulk of those high-risk obligations apply from 2 August 2026, with parts of the biometric high-risk timeline addressed by later amendment — confirm the current effective date for your exact use case, as this is still moving. The takeaway is structural: in the EU, the most surveillance-flavored form of face recognition — live, public, identify-everyone — is largely off the table, and the rest is high-risk by default.

United States — Illinois BIPA. The U.S. has no federal biometric law, but Illinois' Biometric Information Privacy Act is the one that reshapes the market. BIPA requires informed, written consent before a private entity collects a person's face geometry, mandates a retention-and-destruction policy, and — uniquely — gives individuals a private right of action with statutory damages of $1,000 per negligent violation and $5,000 per reckless or intentional violation (BIPA, 740 ILCS 14). The Illinois Supreme Court held that no actual injury is needed to sue (Rosenbach v. Six Flags, 2019) and that violations could accrue per scan (Cothron v. White Castle, 2023); a 2024 amendment (SB 2979) then limited that to a single recovery per person per collection method and confirmed electronic consent. Even after that softening, BIPA exposure is real and large — Facebook settled a BIPA face-tagging class action for $650 million. If your cameras might capture an Illinois resident's face, BIPA is a gating design constraint, not a footnote.

United States — Texas CUBI and the patchwork. Texas' Capture or Use of Biometric Identifier Act (CUBI) requires consent to capture biometric identifiers; it is enforced by the state attorney general rather than private suits, and in 2024 Texas reached a $1.4 billion settlement with Meta over facial-geometry capture — the largest privacy settlement ever obtained by a single state, and the first CUBI enforcement (Texas AG, State v. Meta, 2024). Washington has its own biometric statute, and more states are adding them every year. The result is a patchwork where the strictest applicable rule effectively governs a system sold across regions — so design to the strictest, not the most permissive.

The clean engineering rule that falls out of all four: face recognition is a legal gate you pass through before it is a feature you ship. Confirm the lawful basis and consent regime, run the DPIA, keep the gallery minimal and access-controlled, set retention to the legal maximum not the operational convenience, treat every hit as a lead requiring human confirmation, and — because the rules differ by place and keep changing — get qualified counsel for your jurisdiction. This is engineering guidance, not legal advice.

A map of the legal gate around face recognition: GDPR Article 9, the EU AI Act real-time ban, Illinois BIPA, and Texas CUBI. Figure 4. The legal gate. Detection sits outside it; recognition crosses into biometric data and four overlapping regimes — GDPR Art. 9 special-category processing (DPIA under Art. 35), the EU AI Act's prohibition on real-time public biometric identification, Illinois BIPA (consent, private right of action, statutory damages), and Texas CUBI (AG-enforced, $1.4B Meta settlement). The strictest applicable rule governs.

Face recognition at a glance

	Face detection	1:1 verification	1:N identification
Question answered	Is a face present?	Are these two the same person?	Who is this, out of N?
Typical use	Counting, blurring, framing	Access control, phone unlock	Watchlist / surveillance search
Conditions	Any	Cooperative, lit, frontal	Uncooperative, often poor video
Accuracy behavior	High, easy	High on clear images	Falls with poor video; false hits scale with N
Where it runs	Edge (camera NPU)	Edge or server	Server / cloud (needs the gallery)
Surfaces in VMS as	Object/face metadata	Access event	ONVIF Profile M recognition event
Biometric? Legal gate?	No — anonymous	Yes — biometric, gated	Yes — biometric, most regulated

Table 1. Detection versus the two recognition jobs. Detection is anonymous and ungated; both recognition jobs are biometric and legally gated, and 1:N on surveillance video is both the least accurate and the most regulated. Match the column to your actual need before you specify a system.

A common mistake to avoid

The costliest pattern we see is treating a face-recognition hit as an identification rather than a lead — wiring a workflow, or worse a policy, so that a match is acted on as if it were proof, when on real surveillance video a meaningful fraction of hits are wrong and skewed by demographic error. The close runner-up is specifying with the NIST number and deploying into a corridor: the ~99% cooperative-portrait score is not the 65–85% you will get from an overhead 1080p camera at distance, and budgeting on the lab number guarantees disappointment. The third, most dangerous mistake is shipping recognition without clearing the legal gate first — collecting face templates with no consent regime, no DPIA, no retention limit, and no jurisdiction check — which is how a project ends up on the wrong side of BIPA, CUBI, GDPR, or the AI Act. Demand a field pilot on your real cameras, reported as a precision/recall range with the threshold stated, and a written legal sign-off, before anyone calls face recognition "done."

Where Fora Soft fits in

Fora Soft has built real-time video, streaming, and computer-vision software since 2005, across 625+ shipped projects, and face recognition is the feature where we spend the most time managing expectations before we write a line of code. Teams arrive wanting a switch labelled "identify everyone"; the responsible build is narrower and honest — detection at the edge, recognition confined to a minimal, access-controlled gallery, a threshold tuned and measured on the client's own cameras, every hit surfaced as a confirmable lead, and the whole thing gated behind the consent, DPIA, and retention regime its jurisdiction demands. We lead with how the system behaves under real load first: the realistic field accuracy on your video and your population, the size and security of the biometric store, and the legal posture — and only then the capability. A face system that is honest about its error rate and built to clear the legal gate beats one that demos flawlessly and becomes a liability.

Call to action

Talk to a surveillance engineer — book a 30-minute scoping call to talk through your face recognition surveillance plan.
See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
Download the Face Recognition in Surveillance — One-Page Reference — Detection vs recognition, the four-step pipeline (detect, align, embed, match), 1:1 verification vs 1:N identification with the false-positive scaling math, the honest field accuracy (cooperative ~99% vs CCTV ~65–85%, never 100%, with….

References

ONVIF — "Profile M" (standardizes analytics metadata and events, including face-recognition and object metadata; events carried over the stream, the ONVIF event service, or MQTT; conformance is a baseline for interoperability, not a guarantee of accuracy or feature parity — the standardized channel through which a face event surfaces into the VMS). Primary (tier 1). https://www.onvif.org/profiles/profile-m/
European Union — "GDPR, Regulation (EU) 2016/679, Art. 4(14) and Art. 9" (Art. 4(14) defines biometric data; Art. 9 makes biometric data processed for the purpose of uniquely identifying a person special-category, prohibited absent a narrow exception — the basis for 'a face template is special-category data and recognition is a legal gate'). Primary (tier 1). https://eur-lex.europa.eu/eli/reg/2016/679/oj
European Data Protection Board — "Guidelines 3/2019 on processing of personal data through video devices" (running face recognition on video is special-category processing needing a lawful basis and transparency; systematic monitoring of a publicly accessible area triggers a DPIA under Art. 35 — the basis for the GDPR deployment duties). Primary / issuing-body guidance (tier 1). https://www.edpb.europa.eu/our-work-tools/our-documents/guidelines/guidelines-32019-processing-personal-data-through-video_en
European Union — "EU AI Act, Regulation (EU) 2024/1689, Art. 5" (real-time remote biometric identification in publicly accessible spaces is a prohibited practice for most uses, in force since 2 February 2025, with narrow pre-authorized law-enforcement exceptions; other biometric identification/categorisation is high-risk — the basis for the public-space ban and the high-risk framing). Primary (tier 1). https://artificialintelligenceact.eu/article/5/
Illinois General Assembly — "Biometric Information Privacy Act, 740 ILCS 14" (informed written consent before collecting face geometry; retention/destruction policy; private right of action with statutory damages of $1,000 negligent / $5,000 reckless or intentional; with Rosenbach v. Six Flags (2019) on standing, Cothron v. White Castle (2023) on accrual, and the 2024 SB 2979 amendment limiting recovery — the basis for the US biometric-litigation exposure). Primary (tier 1). https://www.ilga.gov/legislation/ilcs/ilcs3.asp?ActID=3004
NIST — "FRVT Part 3: Demographic Effects (NISTIR 8280)" and the ongoing FRTE Demographic Effects report (error rates differ by sex, age, and race/region; many algorithms show elevated false positives for women, the elderly, children, and some groups — the basis for 'accuracy is not equal across faces'). Primary / standards body (tier 1). https://nvlpubs.nist.gov/nistpubs/ir/2019/NIST.IR.8280.pdf
NIST — "Face Recognition Technology Evaluation (FRTE) — 1:1 Verification and 1:N Identification" (ongoing benchmark; best 1:1 algorithms achieve false-non-match rates well under 1% on cooperative imagery; 1:N false-positive identification rates and their scaling with gallery size — the basis for the lab-accuracy figures and the 1:N scaling math). Primary / standards body (tier 1). https://pages.nist.gov/frvt/html/frvt11.html
Deng, J., et al. — "ArcFace: Additive Angular Margin Loss for Deep Face Recognition" (CVPR 2019, arXiv 1801.07698) (the margin-based embedding that pulls same-identity templates together and pushes different identities apart; 512-dimensional templates — the basis for the embedding/template step of the pipeline). First-party research (tier 3). https://arxiv.org/abs/1801.07698
American Civil Liberties Union — "Williams v. City of Detroit" and "More than a dozen wrongful arrests due to police reliance on face recognition" (the first publicly reported wrongful arrest from a false face-recognition match, 2020; the low-quality-probe / 1:N pattern; the Detroit cases involving Black residents; the settlement barring arrest on a face-recognition result alone — the basis for the real-world deployment-failure section). Institutional (tier 5). https://www.aclu.org/cases/williams-v-city-of-detroit-face-recognition-false-arrest
Office of the Texas Attorney General — "State of Texas v. Meta" ($1.4 billion settlement, 2024, under the Texas Capture or Use of Biometric Identifier Act (CUBI) for capturing facial-geometry data without consent — the largest single-state privacy settlement and the first CUBI enforcement). Primary / official (tier 1/2). https://www.texasattorneygeneral.gov/news/releases/attorney-general-ken-paxton-secures-14-billion-settlement-meta-over-its-unauthorized-capture

Face Recognition in Surveillance: How It Works and Where It Is Restricted

Why this matters

Face detection is not face recognition — the distinction everything rests on

How face recognition works: the four-step pipeline

Two very different jobs: 1:1 verification and 1:N identification