Published 2026-05-24 · 28 min read · By Nikolay Sapunov, CEO at Fora Soft
Why This Matters
If your product attaches a webcam to a person — a video conferencing tool, a telemedicine consultation app, an online proctored exam, a live-event check-in system, an AI security camera, a retail analytics dashboard, a dating app, a fitness app, a banking onboarding flow — you will at some point be asked to add a face feature. Sometimes the ask is innocent: blur the background only when a face is on screen; auto-frame the speaker; count how many people are in the room. Sometimes the ask carries enormous legal weight: identify who is on screen; flag a repeat visitor; verify the person at onboarding is the same person on the ID document; recognize the student who logged into the exam. Those two categories used to share the same code path. Since the AI Act prohibitions came into force on 2 February 2025 and the high-risk obligations land on 2 August 2026, they live in different worlds. The companies that confused the two — by scraping faces from the open web, by deploying real-time identification in public, by stitching emotion recognition into hiring or education — have already paid eight-figure fines. The companies that drew the line cleanly — detection in the front end, recognition behind a documented, gated, and registered backend, with everything written down — have not. This article shows you where the line is and how to draw it in the code.
What "Face Detection" And "Face Recognition" Actually Mean — And Why The Distinction Carries Legal Weight
People use face detection and face recognition as if they were two flavours of the same thing. They are not. They are two distinct tasks with two distinct outputs, two distinct compute profiles, and — in the European Union from 2026 onward — two distinct regulatory regimes.
Face detection is the task of finding all faces in an image and reporting their bounding boxes. The model is given an RGB frame and emits a list of rectangles, one per face. Some detectors also report five facial keypoints — left eye, right eye, nose, left mouth corner, right mouth corner — and a per-face confidence score, but at no point in the process does the detector know who any of those faces belong to. Detection works on every face the same way a person counter at a stadium gate works on every person the same way: it counts, it does not remember. Detection is what powers the auto-framing in a Zoom or Google Meet thumbnail, the face-locked autofocus on a phone camera, the trigger that tells your conferencing app when to engage background blur, the "smile shutter" on a point-and-shoot.
Face recognition is the task of taking a detected face and answering one of two further questions. The first form — face verification (also called 1:1 matching) — is the question "is this the same face as the one you have on file for this user?". You see this every time you unlock your phone with your face, or when a bank's onboarding flow checks that the selfie you just took matches the photo on your ID document. The second form — face identification (also called 1:N matching) — is the question "who is this person, out of these N people in our database?". You see this when a security camera flags a returning shoplifter, when a stadium gate identifies a season-ticket holder, when a missing-person search runs a captured face against a watchlist. Both forms work by passing the detected face through a recognition model that produces a fixed-length vector — typically 256, 384, or 512 floating-point numbers — called a face embedding. Two embeddings are then compared by cosine similarity or Euclidean distance, and a configurable threshold decides whether they came from the same person.
The two tasks share an input — a webcam frame, a still photo, a CCTV stream — and a first stage — the face has to be found before it can be matched. They diverge after that. Detection has no memory; recognition is built entirely around memory, because the whole point of an embedding is that you saved an earlier copy of it.
This single architectural difference is the seam that the EU AI Act and the GDPR cut along. The General Data Protection Regulation treats face data as a special category of personal data under Article 9 only when it is processed "for the purpose of uniquely identifying a natural person" — exactly the task that face recognition performs. Anonymous face detection, used for counting visitors, for triggering an auto-frame, for blurring a background, does not match that definition and stays under the lighter regime of ordinary personal data under Article 6. The AI Act draws the same line, in the same place, with different language: detection is a feature, recognition is a biometric identification system. Article 5 of the AI Act prohibits some uses of biometric identification outright; Annex III classifies the rest as high-risk and imposes a documented, registered, fundamental-rights-assessed compliance regime on every deployment.
We will spend the rest of this article assuming you understand that distinction. Almost every engineering decision in a face-aware video product follows from it.
Figure 1. The detection-only zone and the recognition zone live on the same wire, but they answer two different questions and live under two different regulatory regimes. The seam is the embedding step.
The Production Model Stack — Detection, Alignment, Embedding, Liveness
A modern production face pipeline has four stages, not two. Each stage uses a different model, with different size, latency, and accuracy trade-offs. The choices you make at each stage drive both cost and compliance posture.
Stage 1 — Detection
The detector is the always-on stage. It runs on every frame, or at least every Nth frame, of the input stream. Its job is to find faces and to give every downstream stage a well-cropped, well-rotated, well-scaled face image. Two model families dominate production in 2026.
The first family is SCRFD from the InsightFace project. SCRFD is a single-stage anchor-free face detector accepted at ICLR 2022. The published model family ranges from SCRFD-0.5GF, a 0.5 gigaflop tiny model intended for mobile and edge inference, up to SCRFD-34GF, a 34 gigaflop model intended for accuracy-critical batch processing. On the WIDER Face hard subset — the standard academic benchmark, which contains small, occluded, side-profile and blurry faces — SCRFD-34GF reaches around 96 percent average precision and outperforms the previous-best detector TinaFace by 4.78 percent while running more than 3× faster on a GPU. SCRFD-2.5GF, the middle of the family, is the practical workhorse: it weighs 2.5 megabytes after ONNX export and quantization, runs in ~7 milliseconds per frame on a single CPU core, and supports the five-keypoint output that downstream alignment needs.
The second family is YOLOv5-Face, YOLOv8-Face, YOLOv10-Face, and YOLOv11-Face — the YOLO production lineage adapted to the face-detection task by adding five keypoint regression heads to the standard YOLO head. The YOLO-face variants are slightly less accurate than SCRFD on WIDER Face hard but slightly faster, and they inherit the rich ONNX and TensorRT tooling that ships with the YOLO ecosystem. For teams already running the YOLO production lineage for general object detection, YOLO-face is the path of least friction.
Below those two sits MediaPipe Face Detector, the same model that powers the face thumbnail in Google Meet. MediaPipe Face Detector is a single-shot multibox detector tuned for the mobile and web case. Its model card claims around 1 millisecond per frame on a Pixel 6 GPU, and it ships natively in the @mediapipe/tasks-vision package that we recommended in the background-blur lesson. MediaPipe Face Detector is the right pick when the face feature is in-browser, anonymous, and not used as a step toward recognition.
For most server-side or surveillance use cases, SCRFD-2.5GF is the default choice in 2026 because it sits at the sweet spot of size, speed, keypoint accuracy, and ecosystem (it ships with the InsightFace embedding models in the same repository, with matching alignment conventions).
Stage 2 — Alignment
Face embeddings are sensitive to scale, rotation, and centring. A face that is tilted twelve degrees to the right produces a different embedding than the same face held upright, and the cosine similarity between the two can drop enough to push a verification decision over the threshold. Alignment fixes this. The aligner takes the five keypoints emitted by the detector and applies a similarity transform (a 2-D affine that allows rotation, uniform scale, and translation, but not shear) so that the two eye centres land at fixed pixel coordinates in a canonical crop — typically 112×112 pixels for the InsightFace ArcFace family.
Alignment is cheap. The transform is six multiplications per output pixel plus a bilinear interpolation. On a modern CPU it costs less than a millisecond per face. The reason we keep flagging it is that skipping alignment is the single most common cause of poor face-recognition accuracy in production. Teams ship the detector, ship the embedder, see false rejection rates of 5 to 10 percent, blame the embedder, and discover after weeks of debugging that the detector's box was unaligned. The InsightFace face_align.norm_crop() helper does the right thing in seven lines of Python; use it.
Stage 3 — Embedding
The embedding model takes an aligned 112×112 face crop and emits a fixed-length vector — typically 512 floats — that represents the identity. Two production families dominate.
The first is the ArcFace family, from the same InsightFace project as SCRFD. ArcFace was introduced at CVPR 2019 and remains the architectural baseline for production face recognition in 2026. The published model zoo includes ArcFace-MobileFaceNet (2 megabytes, intended for mobile), ArcFace-R50 (the 50-layer ResNet variant, ~95 megabytes, used in most server-side deployments), and ArcFace-R100 (the 100-layer variant, ~250 megabytes, used when accuracy is the top constraint). On the IJB-C benchmark — the standard academic identification benchmark, with 130,000 images of 3,500 identities — ArcFace-R100 reaches around 96 percent true acceptance rate at a 10⁻⁵ false acceptance rate, which is the threshold the NIST Face Recognition Technology Evaluation uses for its highest-accuracy tier.
The second is AdaFace, a 2022 refinement that adapts the margin in the loss function based on the difficulty of each training sample. AdaFace is roughly equivalent to ArcFace on clean benchmarks like LFW and slightly better on hard benchmarks like IJB-C and TinyFace, but the model topology and parameter count are the same as ArcFace, so the deployment cost is identical. For a green-field deployment in 2026, AdaFace is a sensible default; for a project that already runs ArcFace, the gain from switching is small.
Whichever family you pick, the embedding model is the point in the pipeline at which the GDPR Article 9 regime starts. A bounding box is not biometric data; a 512-float embedding that identifies a person is. Everything downstream of the embedder — storage, transmission, comparison, retention — is now operating on special-category personal data, and the data-protection design must respect that.
Stage 4 — Liveness / Anti-spoofing
A recognition system that does not check whether the face on screen belongs to a real, living person in front of the camera can be defeated by holding up a printed photograph or by replaying a deepfake video. The defence against this is liveness detection or face anti-spoofing. In 2026 the production pattern combines a passive model — a network that classifies a single frame as real or spoof — with an optional active challenge — a prompt that asks the user to blink, turn their head, or perform a specific gesture.
The passive model is trained on a dataset like CelebA-Spoof, which contains 625,537 images of 10,177 subjects with spoof images captured from 8 scenes using more than 10 sensors. A typical passive liveness model is a small MobileNetV3 or EfficientNet-B0 classifier that takes the same aligned 112×112 crop the embedder consumes and emits a real-or-spoof probability. The best published passive models in the 6th Face Anti-Spoofing Challenge reach an average classification error rate (ACER) of around 2.1 percent.
For high-assurance scenarios — bank onboarding, government KYC, exam proctoring — passive liveness alone is not enough. The industry standard is iBeta Presentation Attack Detection Level 1 or Level 2 certification, an independent third-party test against printed photos, screen replays, 3-D printed masks, and silicone masks. iBeta certification is what payment processors and identity-verification platforms (Onfido, Jumio, Veriff, Persona) advertise when they sell their liveness products. If you are building such a system yourself, plan to license a certified vendor SDK; reproducing iBeta-grade defences in-house takes 12 to 24 months of focused work and a continuous adversarial-testing programme.
Sizing The Choice — A Comparison Table
| Pipeline component | Model | Size | Latency (single face, CPU) | When to pick it |
|---|---|---|---|---|
| Detector | MediaPipe Face Detector | 0.2 MB | ~3 ms | In-browser, anonymous detection, no recognition downstream |
| Detector | SCRFD-0.5GF | 1.1 MB | ~3 ms | Mobile or edge, recognition downstream needed |
| Detector | SCRFD-2.5GF | 2.5 MB | ~7 ms | Server-side or surveillance default |
| Detector | YOLOv8-Face | 5–6 MB | ~5 ms | Teams already on the YOLO stack |
| Aligner | InsightFace norm_crop |
n/a | <1 ms | Always |
| Embedder | ArcFace-MobileFaceNet | 2 MB | ~5 ms | Mobile recognition |
| Embedder | ArcFace-R50 | 95 MB | ~15 ms | Server-side default |
| Embedder | ArcFace-R100 / AdaFace-R100 | 250 MB | ~35 ms | Accuracy-critical (banking, KYC, high-stakes ID) |
| Liveness | Custom MobileNetV3 + CelebA-Spoof | 5 MB | ~6 ms | Conferencing, e-learning |
| Liveness | iBeta-certified vendor SDK | n/a | varies | Banking, KYC, payments, gov ID |
Latencies are indicative figures for a single face on a modern x86 CPU core with ONNX Runtime; GPU figures are roughly 4–6× faster for the heavier models. The numbers cited per model match the figures published in the model cards, the original papers, and the InsightFace public benchmarks. Treat them as planning estimates, not as a substitute for measuring on your target hardware.
Figure 2. The four-stage face pipeline. The red line marks the seam between ordinary processing and special-category processing.
The EU AI Act, In Plain English, For Face Engineers
The EU AI Act is the European Union's horizontal regulation on artificial intelligence. The full text — Regulation (EU) 2024/1689 — entered into force on 1 August 2024. The obligations are phased in over three years. The phases that matter for any face feature are:
- 2 February 2025. Article 5 prohibitions took effect. After this date it is illegal in the EU to place on the market or use any AI system that falls into eight prohibited categories. Four of those eight categories touch face technology directly.
- 2 August 2025. Obligations on general-purpose AI model providers (the upstream foundation-model layer) took effect. This is mostly relevant for organizations that train or distribute large vision-language models.
- 2 August 2026. Obligations on high-risk AI systems under Annex III take effect. This is the deadline that matters for almost every face-recognition deployment in the EU. From this date, any organization that places on the market or puts into service a face recognition system in the EU must meet the full high-risk regime: risk management, quality management, technical documentation, post-market monitoring, conformity assessment, EU-database registration, transparency to deployers, and human oversight.
The Act is enforced by national authorities in each EU member state, coordinated by the European AI Office. Fines for the prohibited-practice categories reach 35 million euros or 7 percent of global annual turnover, whichever is higher. Fines for non-compliance with the high-risk obligations reach 15 million euros or 3 percent of global turnover. Both ceilings exceed the GDPR's maximum fine (20 million euros or 4 percent of global turnover).
The four prohibited face uses
Article 5(1) of the AI Act bans these face-related uses outright in the EU. There is no certification path, no opt-in mechanism, no industry safe harbour. Building or selling such a system is illegal.
- Untargeted scraping of facial images to build or expand face-recognition databases. Article 5(1)(e) explicitly names the Clearview pattern: crawling the web or harvesting CCTV footage to populate a face-search engine. This is what got Clearview AI fined 20 million euros by the French data-protection authority.
- Emotion recognition in workplaces and educational institutions. Article 5(1)(f) bans systems that infer the emotions of natural persons in the workplace or in education, with narrow exceptions for medical and safety reasons. A "candidate sentiment analyzer" in a hiring video pipeline, or a "student engagement scorer" in an online classroom, is illegal in the EU.
- Biometric categorization by sensitive attributes. Article 5(1)(g) bans systems that categorize natural persons using biometric data to infer race, political opinions, trade union membership, religious or philosophical beliefs, sexual orientation, or sex life. A "race detector" or "religion classifier" built on top of a face model is illegal.
- Real-time remote biometric identification in publicly accessible spaces for law enforcement. Article 5(1)(h) bans live face-recognition surveillance in public, with three narrow exceptions reserved for law-enforcement use only and under judicial authorization: targeted search for victims of abduction, trafficking, or sexual exploitation; prevention of an imminent threat to life or of a terrorist attack; and locating suspects of serious crimes punishable by at least four years' imprisonment. The exceptions require a fundamental-rights impact assessment under Article 27 and registration in the EU database under Article 49. A commercial product that places real-time face identification in a shopping mall, a stadium, or a train station, sold to private operators, does not fall under any exception and is illegal.
The high-risk regime for everything else
Face recognition that is not prohibited is high-risk by default. Annex III, point 1 of the AI Act classifies as high-risk any AI system intended to be used as a "remote biometric identification system" (with the narrow exception of biometric verification — the 1:1 unlock-your-phone case), for biometric categorization on sensitive attributes (when not outright prohibited), or for emotion recognition (when not outright prohibited).
A high-risk classification triggers nine concrete obligations. We summarize each, then describe the engineering work it implies.
1. Risk management system (Article 9). A continuous, documented process that identifies, analyzes, and mitigates risks across the system lifecycle. In engineering terms this means a written risk register, an owner per risk, mitigation evidence per risk, and quarterly re-review. Build it on whatever ISMS or quality-management foundation you already have; do not reinvent it.
2. Data governance (Article 10). Training, validation, and testing datasets must be relevant, representative, free of errors, and complete. For face systems this maps directly onto demographic fairness. The system must be tested for accuracy parity across age, gender, and skin tone — the NIST FRVT 1:1 verification track and the FRVT 1:N identification track publish exactly this evaluation, and a high-risk deployer should cite NIST FRVT or run the equivalent in-house.
3. Technical documentation (Article 11, Annex IV). A detailed dossier covering the system's intended purpose, architecture, datasets, performance metrics, risk management, monitoring plan, and instructions for use. This is the document a notified body or a national authority will demand during a conformity audit. Plan for 30 to 80 pages depending on system complexity.
4. Record-keeping (Article 12). Automatic logging of events relevant to the identification of risks. For a face-recognition system this means a tamper-evident log of every identification decision (match / no-match), every threshold change, every model version deployed, and every database modification.
5. Transparency to deployers (Article 13). The system must be accompanied by instructions for use that disclose accuracy, limitations, intended-use boundaries, and human-oversight requirements.
6. Human oversight (Article 14). The system must be designed so that a human can intervene, interpret outputs, and override decisions. For face identification this means a human-in-the-loop review of any consequential match before action is taken (no fully automated arrest, denial of entry, or contract termination on a face-match alone).
7. Accuracy, robustness, cybersecurity (Article 15). Quantified accuracy claims, resistance to adversarial inputs, and standard cybersecurity hygiene.
8. Conformity assessment (Articles 43, 44). A formal conformity assessment before placing on the market. For face systems, this is typically an internal conformity assessment under Annex VI, supported by a notified body only when harmonised standards are not yet available.
9. EU database registration (Article 49). The provider must register the system in the public EU database before placing it on the market. Deployers (the organizations that put the system into use, who may be different from the provider) must also register their use if they are public authorities.
Two other articles cut sideways through these obligations and matter for face systems:
- Article 27 — Fundamental rights impact assessment. Public bodies and private operators of high-risk biometric systems must conduct an FRIA before first use. The FRIA documents the categories of natural persons affected, the specific risks of harm, the human-oversight measures, and the governance arrangements. A bank deploying a face-verification onboarding flow must produce one before going live in the EU.
- Article 50 — Transparency obligations for certain AI systems. Even when not classified as high-risk, deployers of biometric categorization or emotion recognition (where not prohibited) must inform the natural persons exposed that they are interacting with such a system. Deployers of any system that generates or manipulates synthetic content (deepfakes) must disclose that the content is artificially generated.
Figure 3. Where your face feature lands in the AI Act risk hierarchy. The decision is binary at every level — there are no grey areas in the text.
The GDPR Layer Underneath The AI Act
The AI Act sits on top of the GDPR; it does not replace it. A face recognition system that complies with the AI Act still has to satisfy the GDPR. A face detection system that does not trigger the AI Act still has to satisfy the GDPR if the detection happens in the EU or processes EU residents' data.
The GDPR's treatment of face data has two prongs.
The first prong is the lawful basis under Article 6. Every processing operation needs a lawful basis. For a consumer-facing face feature, the most common bases are explicit consent (Article 6(1)(a)), performance of a contract (Article 6(1)(b)), and legitimate interests (Article 6(1)(f)). Anonymous face detection — for auto-framing, for visitor counting, for background-blur triggering — can usually rely on legitimate interests after a balancing test that documents the necessity of the processing and the minimality of the data retained.
The second prong is the special-category condition under Article 9, which kicks in the moment the system processes biometric data for the purpose of uniquely identifying a person. Article 9 starts from a default prohibition: processing biometric data for identification is forbidden, unless one of ten exceptions applies. For private-sector deployments the only realistic exception is Article 9(2)(a), explicit consent. Explicit consent is a higher bar than ordinary consent under Article 7: it must be a clear affirmative act, in writing or by a recorded statement, naming the specific processing operation and its purposes, and revocable at any time without detriment.
Two well-known enforcement actions show the cost of getting either prong wrong. The Spanish supermarket chain Mercadona was fined 2.52 million euros in 2021 by the AEPD for deploying face recognition in stores against shoplifters; the AEPD found no valid legal basis and no proportionality. The French CNIL fined Clearview AI 20 million euros in 2022 for collecting biometric data from over 20 billion online photos without consent, with the AI Act's 2025 prohibition later confirming the practice is now banned outright.
There are five other GDPR provisions that bite hard on face systems and that engineers tend to underestimate:
- Data minimization (Article 5(1)(c)). Store the smallest representation that solves the task. For verification you do not need to retain the photo — you need the embedding only. For identification you may not need both. Throw away pixels as soon as you have the vector.
- Storage limitation (Article 5(1)(e)). Retention must be tied to purpose. A face embedding stored "in case we need it later" is a violation. Set a retention policy per use case; we typically default to 24 hours for ephemeral identification (e.g., a visitor pass), 90 days for fraud-investigation embeddings, and the duration of the account for verification embeddings tied to a user identity.
- Right of access (Article 15) and right to erasure (Article 17). A user must be able to ask for a copy of their embedding and to demand its deletion. The data model has to support per-user enumeration and deletion of embeddings, including in any vector index.
- Data protection impact assessment (Article 35). Mandatory for "systematic and extensive evaluation of personal aspects" and for "processing on a large scale of special categories". A face recognition system in production almost always triggers both — a DPIA is essentially a precondition.
- Article 22 — automated decision-making. A face-recognition output that produces "a decision based solely on automated processing... which produces legal effects concerning him or her or similarly significantly affects him or her" gives the data subject the right to human intervention. This dovetails with AI Act Article 14 — human oversight is required twice, once by each regulation.
The two regulations interlock cleanly. Where the AI Act requires you to log decisions and run an FRIA, the GDPR requires you to log access and run a DPIA. Do both with one set of artefacts where possible; do not pretend either obligation supersedes the other.
A Reference Compliant Architecture
A face feature built for the EU in 2026 looks different from one built for the United States in 2020. The architectural pattern below is the one we use in Fora Soft projects. It is opinionated; it is not the only valid pattern, but it eliminates whole classes of compliance failure by construction.
The pattern has four guiding principles.
Detect on the client, recognize on the server. The detector runs in the browser or on the device with MediaPipe Face Detector or SCRFD-0.5GF in ONNX Runtime Web. The raw frame never leaves the device for detection-only features (background blur, auto-framing, presence detection). For features that require recognition, the client sends only the aligned 112×112 face crop to the server — and only after the user has given explicit consent for that specific operation. Pixels and bounding boxes do not flow into the recognition path unless the user has agreed.
Two storage zones with a one-way gate. Operational embeddings (for live calls, ephemeral identification) sit in a hot vector store with a 24-hour TTL. Persistent embeddings (for verification, for fraud investigation) sit in a separate cold store with explicit retention metadata per row and per-purpose access control. Code that reads from the cold store is gated behind a service that logs every access for the AI Act Article 12 audit trail.
No image retention by default. The pipeline produces an embedding and discards the source pixels in the same request. Retention of raw face images requires an explicit per-purpose flag, an explicit consent record, a retention period, and a deletion job. We have not seen a production use case in 2026 that justifies persistent raw-pixel retention outside of fraud-investigation forensics, and even there we use short-lived encrypted blobs.
Human-in-the-loop on every consequential match. No business decision — granting access, flagging fraud, denying a service — runs on a face match alone. A human reviewer sees the candidate match, the source frame, the matched template, and a structured confidence score before any action is taken. This is required by AI Act Article 14 and aligns with GDPR Article 22.
In code, these four principles translate into a handful of architectural fixtures: a client-side detection package, a face-crop submission API that takes a consent token, a stateless embedding service that emits a vector and does not store pixels, a hot vector index with a TTL job, a cold vector store fronted by an access-logging service, and a review UI that surfaces matches for human confirmation. None of this is exotic; all of it is tedious to retrofit. Build it from day one.
Where Fora Soft Fits In
We have built face-aware features in conferencing, telemedicine, video surveillance, e-learning, and dating products since 2018. The pattern in this article is distilled from those projects. The most common ask we hear in 2026 is from product teams that started with anonymous detection in 2022, added a "remember this user" feature in 2023, and now need to reconcile that quietly-shipped recognition feature with the AI Act's August 2026 deadline. The fix is rarely technical and almost always organizational: a written DPIA, a written FRIA, a written retention policy, a written human-oversight protocol, a registered entry in the EU database, and a frank conversation with the customer success team about what to disclose to end users. We help with the architecture and the engineering; the legal text we hand to the client's data-protection officer to refine, not to write from scratch.
Three Pitfalls That Keep Sinking Face Deployments
The same three failures appear in compliance audits over and over. Recognize them in your own roadmap before an auditor does.
The first pitfall is using the same code path for detection and recognition. Teams write a FaceService that detects, aligns, embeds, and matches in a single call, then expose it to every product surface. The result: the auto-framing feature, which never needed recognition, ends up running through code that touches the embedding store, the consent log, and the audit trail. Every audit costs ten times more than it should. Split the code paths from day one. Detection is one service; recognition is another; nothing crosses without an explicit consent token.
The second pitfall is using a hosted vendor without reading the data-processing addendum. Many face APIs — including some popular ones — train their models on customer-submitted images by default, unless the customer explicitly opts out in writing. Even if your contract is compliant, the vendor's reference architecture may not be. Audit the vendor's data flow, demand a DPA that prohibits secondary use, and verify the vendor's own AI Act and GDPR posture before integrating.
The third pitfall is shipping liveness-free recognition behind a "secure" label. Verification without anti-spoofing is defeated by a printed photo. Marketing a face-unlock or face-onboarding feature without a documented liveness defence is a fraud-risk disclosure problem before it is a compliance problem; once the first deepfake bypass hits the news, it becomes both. If the use case is anything other than ephemeral identification in a controlled environment, ship liveness from day one and budget for an iBeta certification path.
Figure 4. The three failure modes we see most often in face-feature compliance audits.
Numeric Worked Example — When Recognition Pays For Itself
A short example to ground the model choices in cost. Suppose you operate a 10,000-employee enterprise video conferencing platform. You want to add face-verification login: a 1:1 match between the user's webcam frame and the photo on file, used as a second factor after the password.
The per-login compute cost: one detection (SCRFD-0.5GF, ~3 ms), one alignment (~0.5 ms), one embedding extraction (ArcFace-MobileFaceNet, ~5 ms), one cosine similarity (negligible). Total per login: ~9 milliseconds of CPU.
The per-login storage cost: 1 embedding × 512 floats × 4 bytes = 2,048 bytes = 2 KB per user. Across 10,000 users: 20 megabytes. The vector store fits in RAM on the smallest cloud instance.
The per-login network cost: the 112×112 grayscale aligned crop is 12.5 KB; the embedding response is 2 KB. Round-trip well under 50 KB.
At 10,000 logins per day, the marginal compute cost is 0.025 CPU-hours per day, or roughly $0.03 per month on standard cloud pricing. The verification feature pays for itself the first time it blocks a credential-stuffing attack.
Now consider the same architecture turned into 1:N identification — the system has to recognize which of the 10,000 employees just walked into a meeting room. Per inference: one detection plus one embedding plus 10,000 cosine comparisons. Cosine on a 512-float vector is 1024 multiply-adds; 10,000 comparisons is roughly 10 million MACs, which a modern CPU dispatches in 5–10 milliseconds with vectorized code. Total per inference: ~20 milliseconds. Storage and network are unchanged.
The compute cost is barely different. The compliance cost is wildly different. The 1:1 verification case is excluded from Annex III, requires explicit consent under GDPR Article 9, but is not a high-risk AI system under the AI Act. The 1:N identification case is a high-risk AI system under Annex III, and triggers the full nine-obligation regime above. The same 20 milliseconds of CPU moves the project from a two-week sprint to a six-month compliance programme. Choose accordingly.
What To Read Next
- SAM 2 for video — memory module, propagation, rotoscoping — the segmentation primitive that underpins face-region masking when you need a mask instead of a box.
- MediaPipe Selfie Segmentation V2 with WebGPU — production background blur — the in-browser segmentation pipeline that pairs naturally with anonymous face detection for auto-framing and presence.
- The shape of AI inside a video product — architecture map — the architectural primer that explains where face features hook into a typical video pipeline.
Talk To Us / See Our Work / Download
Need a face-aware feature built right for the EU? Talk to a video engineer. Want to see what we have shipped in conferencing, telemedicine, surveillance, and e-learning? See our case studies. Or download our EU AI Act face-recognition compliance checklist — a one-page printable with the twelve questions to answer before deploying a face feature in the EU.
References
- Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). Official Journal of the European Union, OJ L, 2024/1689, 12.7.2024. Articles 5, 6, 9, 10, 11, 12, 13, 14, 15, 27, 43, 44, 49, 50; Annex III, point 1. https://eur-lex.europa.eu/eli/reg/2024/1689/oj — Tier 1 (official EU regulation).
- Regulation (EU) 2016/679 of the European Parliament and of the Council (General Data Protection Regulation). Articles 5, 6, 9, 15, 17, 22, 35. https://eur-lex.europa.eu/eli/reg/2016/679/oj — Tier 1.
- European Data Protection Board (EDPB), Guidelines 05/2022 on the use of facial recognition technology in the area of law enforcement (Version 2.0, adopted 26 April 2023). https://www.edpb.europa.eu/system/files/2023-05/edpb_guidelines_202304_frtlawenforcement_v2_en.pdf — Tier 1 (EU-level supervisory authority).
- EU AI Act Explorer / artificialintelligenceact.eu, Article 5 — Prohibited AI Practices. https://artificialintelligenceact.eu/article/5/ — Tier 3 (high-quality annotated source).
- Deng, J., Guo, J., Ververas, E., Kotsia, I., Zafeiriou, S. "RetinaFace: Single-Stage Dense Face Localisation in the Wild." CVPR 2020. arXiv:1905.00641. https://arxiv.org/abs/1905.00641 — Tier 5 (peer-reviewed).
- Guo, J., Deng, J., Lattas, A., Zafeiriou, S. "Sample and Computation Redistribution for Efficient Face Detection." ICLR 2022. arXiv:2105.04714. https://arxiv.org/abs/2105.04714 — Tier 5 (SCRFD original paper).
- InsightFace project, SCRFD model zoo and README, deepinsight/insightface (GitHub). https://github.com/deepinsight/insightface/tree/master/detection/scrfd — Tier 4 (reference implementation maintained by the spec's authors).
- Deng, J., Guo, J., Xue, N., Zafeiriou, S. "ArcFace: Additive Angular Margin Loss for Deep Face Recognition." CVPR 2019. arXiv:1801.07698. https://arxiv.org/abs/1801.07698 — Tier 5.
- Kim, M., Jain, A. K., Liu, X. "AdaFace: Quality Adaptive Margin for Face Recognition." CVPR 2022. arXiv:2204.00964. https://arxiv.org/abs/2204.00964 — Tier 5.
- Yang, S., Luo, P., Loy, C. C., Tang, X. "WIDER FACE: A Face Detection Benchmark." CVPR 2016. arXiv:1511.06523. https://arxiv.org/abs/1511.06523 — Tier 5 (standard benchmark used by SCRFD and YOLO-face).
- Zhang, Y., Yin, Z., Li, Y., Yin, G., Yan, J., Shao, J., Liu, Z. "CelebA-Spoof: Large-Scale Face Anti-Spoofing Dataset with Rich Annotations." ECCV 2020. https://arxiv.org/abs/2007.12342 — Tier 5.
- NIST Face Recognition Technology Evaluation (FRVT) program, ongoing benchmarks for 1:1 verification and 1:N identification across demographic groups. https://www.nist.gov/programs-projects/face-recognition-technology-evaluation-frvt — Tier 1 (US standards body benchmark).
- iBeta Quality Assurance, Presentation Attack Detection (PAD) testing per ISO/IEC 30107-3. https://www.ibeta.com/iso-30107-3-presentation-attack-detection-confirmation/ — Tier 3 (industry-standard certification scheme).
- CNIL decision SAN-2022-019 (Clearview AI), 20 million euro penalty for processing biometric data without legal basis. https://www.cnil.fr/en/facial-recognition-20-million-euros-penalty-against-clearview-ai — Tier 1 (national supervisory authority decision).
- Agencia Española de Protección de Datos (AEPD) decision PS/00120/2021 (Mercadona), 2.52 million euro penalty for facial-recognition deployment without lawful basis. https://www.aepd.es/es/documento/ps-00120-2021.pdf — Tier 1.
- European Commission, draft Guidelines on the implementation of high-risk AI systems under the EU AI Act (issued 2026). https://digital-strategy.ec.europa.eu/en/policies/ai-act — Tier 1 (EU-level guidance).
- MediaPipe Face Detector, model card and Tasks API documentation. https://ai.google.dev/edge/mediapipe/solutions/vision/face_detector — Tier 3 (Google AI Edge documentation, the vendor's own model card).
Per the standards-source hierarchy in our editorial rules, where the AOMedia, NIST, EDPB, or AI Act text disagrees with a secondary source, this article followed the primary text and noted the discrepancy in-line.


