This is engineering guidance, not legal advice. Confirm specifics with qualified counsel.
Why This Matters
If your platform issues a grade, a certificate, or a credential, every one of them is only worth what the hardest-to-cheat assessment behind it is worth. Anti-cheating is the control that protects that value — and it is the control teams most often get backwards, spending the budget on intrusive monitoring that learners resent and lawyers question, while leaving the test itself trivially cheatable. This article is for the product manager, L&D director, founder, or instructional designer deciding how much integrity machinery to build, buy, or skip, and how to avoid the two expensive failure modes at once: a credential nobody trusts, and a false-accusation scandal that does. We have built assessment, video, and real-time features for regulated products where what you flag and how you store it carries legal and reputational weight, and by the end you will be able to choose your layers, name the privacy questions, and put your effort where it actually pays off.
Three Jobs That Beginners Blur
The word "anti-cheating" hides three different jobs, and most failed integrity programs fail because they bought one and assumed they had all three.
Deterrence changes the decision before the exam. It is everything that makes an honest-but-tempted person choose not to cheat: a visible integrity statement, the knowledge that the session is monitored, a clear penalty, the simple friction of a locked-down screen. Deterrence prevents; it produces no evidence.
Detection acts during and after the exam. It is the locked browser's focus log, the plagiarism report, the proctor's flag, the answer-similarity analysis run across a cohort. Detection catches and documents; it does nothing to prevent the attempt, and every detection signal is a probability, never a verdict.
Assessment design changes the test itself so that cheating is hard, pointless, or impossible regardless of monitoring. A question drawn at random from a large bank, a problem with student-specific numbers, an open-book task that rewards judgment a search engine cannot supply — these defend integrity without watching anyone. Design is the slow, unglamorous layer, and it is the only one that does not erode as cheating tools improve.
The plain analogy is a shop guarding against theft. Deterrence is the camera dome on the wall that makes you think twice. Detection is the security tag that beeps at the door after the fact. Design is building the shop so the valuable thing is bolted down and useless once removed. A shop relying only on the beeping tag — only on detection — loses the goods and merely learns it was robbed. Most online-exam programs are exactly that shop.
Figure 1. The three layers, drawn by durability. Assessment design is the foundation because it resists the cheating arms race; deterrence and detection sit on top of it and weaken as tools improve. A program built only on the top layer is built on sand.
This article walks the three in the order you should trust them: detection (the most-sold, least-durable), deterrence (real but modest, and full of myths), and assessment design (the one that actually holds). Identity — proving the right person is taking the test at all — is a separate control covered in identity verification for assessments; behaviour monitoring by camera is covered in online proctoring. This piece is about everything else.
Detection 1: Browser Lockdown and Focus Signals
The most common technical control is the lockdown browser — a special application that takes over the screen during a test. Once a test launches inside it, the address bar disappears, the taskbar hides, copy-paste and printing are disabled, and operating-system shortcuts that switch applications (Alt-Tab on Windows, Command-Tab on macOS) are blocked. Respondus LockDown Browser, the best-known example in higher education, does exactly this: it does not merely detect tab switching, it physically prevents it inside the locked environment.
Layered alongside lockdown is focus and tab-switch detection, which web platforms can do even without a dedicated browser. The mechanism is a documented web standard. The browser's Page Visibility API — a W3C specification — fires a visibilitychange event and flips the document.hidden property to true the instant the exam tab stops being the active, visible tab. Paired with the window blur event, this tells the platform the moment a candidate clicks away to another window or application. A focus-monitoring script can poll which window is frontmost every second, and each loss of focus is timestamped into an activity log: the time it happened, how long it lasted, and what was attempted.
// Minimal focus / tab-switch logging using the W3C Page Visibility API.
// Records when the exam tab is hidden; the server stores the timeline.
document.addEventListener("visibilitychange", () => {
if (document.hidden) {
logEvent({ type: "tab_hidden", at: Date.now() }); // candidate left the tab
} else {
logEvent({ type: "tab_visible", at: Date.now() }); // candidate returned
}
});
window.addEventListener("blur", () => logEvent({ type: "window_blur", at: Date.now() }));
Here is the limit nobody selling lockdown leads with. These controls run on the candidate's own machine, and anything client-side can be bypassed. A lockdown browser stops a second tab on the same computer; it does nothing about the candidate reading answers off a second phone, a tablet propped beside the laptop, or another person in the room. Focus detection logs that the tab was hidden — it cannot see why, and an accidental notification or a screen-reader switching context produces the same blur event as cheating. So these tools are best understood as deterrence with a detection by-product: they raise the friction and the perceived risk, and they generate a log, but they are not proof and they do not stop the determined cheater with a spare device.
Common mistake: treating a focus-log flag as evidence of cheating. A
visibilitychangeevent means the tab lost focus, nothing more — a push notification, an accidental swipe, or assistive technology can all trigger it, and students are routinely kicked from exams for an accidental gesture. The flag opens a question; a human, with context, answers it. Auto-failing on a blur event manufactures false accusations and appeals.
Detection 2: Plagiarism, Code Similarity, and Answer Collusion
The second detection family compares what was submitted against other sources. Three distinct tools, often confused:
Text-matching (the classic "plagiarism check," e.g. Turnitin's similarity report) compares a submission against the web, published works, and a database of other students' papers, and returns a similarity percentage with the matched passages highlighted. It is reliable at what it actually does — finding copied text — but a similarity score is not a guilt score. Correctly quoted and cited material, common phrases, and a bibliography all raise the percentage; the number is a starting point for a human reading, not a verdict.
Code similarity detection compares programming submissions for structural resemblance even after variables are renamed or lines reordered. Stanford's MOSS (Measure of Software Similarity) is the long-standing reference tool. It is well suited to spotting two students who shared one solution, because it compares structure rather than surface text.
Answer-collusion / response-similarity analysis looks across a whole cohort for statistically improbable agreement — for example, two test-takers who got the same unusual wrong answers in the same order. This is a powerful post-hoc signal precisely because it does not rely on watching anyone; it reads patterns in the response data your platform already stores. Done well, it surfaces a small ranked list of pairs for a human to examine, not an automatic charge.
Detection 3: AI-Writing Detectors — Handle With Extreme Care
Since generative AI became universal, the most-requested detection feature is an AI-writing detector that claims to tell whether text was written by a human or a model. Treat this category as the least reliable tool in the entire integrity stack, and design your policy around its failure rate, not its marketing.
The evidence is blunt. Independent testing through 2025 found false-positive rates for popular detectors ranging widely — many in the 15–45% band across genres such as academic and creative writing. The bias is worse than the average: a Stanford study found detectors flagged the overwhelming majority of essays written by non-native English speakers as AI-generated, while rarely misflagging native writers — one set saw 97% of TOEFL essays flagged by at least one detector. Even Turnitin, which reports a low ~4% sentence-level false-positive rate, acknowledges false positives exist, and by 2023 a series of universities — Vanderbilt, Northwestern, and others — had disabled AI detection entirely rather than accept the risk.
The arithmetic shows why this is dangerous at scale. Take a single assignment across a 600-student cohort and a detector with a 4% false-positive rate — a good number by 2025 standards:
False accusations per assignment: 600 students × 4% = 24 honest students
Twenty-four wrongly accused students per assignment, every assignment, disproportionately the non-native speakers your platform should serve best. Loosen the threshold to cut that and you miss more real cases; tighten it and you accuse more innocents. There is no setting that is both safe and strict, which is exactly why the only defensible policy is: an AI-detector score may never, on its own, accuse a student. Use it, if at all, as one weak signal that prompts a human conversation — alongside draft history and the student's ability to explain their own work — never as a verdict. The detection model internals belong to a different field; for how these classifiers work and why they fail, see the AI for Video Engineering section. This article covers the assessment policy decision.
Figure 2. Detection records and documents but does not prevent; deterrence prevents but produces no evidence. Most controls do a little of both — and every detection signal is a probability that opens a human review, never a verdict that closes a case.
Deterrence: Real, Modest, and Full of Myths
Deterrence is the cheapest layer and the one most cluttered with discredited folklore. Two things are true at once: signalling that integrity matters does reduce cheating somewhat, and the single most-cited "trick" for doing it is fake.
Start with the fake one, because it is everywhere. The famous 2012 study claiming that asking people to sign an honesty pledge at the top of a form (before they fill it in) rather than the bottom sharply reduced dishonesty was retracted by PNAS in 2021 after investigators found the underlying field data had been fabricated, and independent replications had already failed. If you have read that "have them sign at the top" advice in an integrity guide — and it is in many — it rests on fraudulent data. Do not build a policy on it.
What does hold up is weaker and more boring. Double-blind randomized field experiments with real students taking real unproctored exams (for example, Zhao and colleagues, published in Contemporary Educational Psychology in 2023) found that reminding students of the integrity policy, of real prior cases, and of the concrete consequences of being caught measurably reduced cheating compared with no reminder. The effect is real but modest, and it points at a practical recipe: state the rule plainly, make the monitoring visible (a visible control deters more than a hidden one), and make the penalty credible. Deterrence works best as the framing around a well-designed test — not as a substitute for one.
The honest summary: deterrence buys you a discount on cheating, not immunity, and you should spend nothing on it that you cannot also defend on privacy grounds. Which brings the cost of the heavy controls into focus.
The Privacy and Fairness Cost Is Real
Every detection control collects data about a learner, and in Block 6 territory that data is regulated. A focus log, keystroke timing, a webcam feed, and behavioural analytics are personal data under the EU/UK General Data Protection Regulation (Regulation (EU) 2016/679); some of it (a face in a proctoring feed) is special-category biometric data under GDPR Article 9. In the United States, exam records tied to a student fall under the Family Educational Rights and Privacy Act (FERPA, 20 U.S.C. § 1232g). The governing principle in GDPR — data minimisation, Article 5(1)(c) — says collect only what the purpose needs. A locked browser and a focus log are far lighter on this scale than full-session camera-and-microphone monitoring, which is one more reason to reach for the design layer before the surveillance layer. The detailed legal map for this section is its own article: proctoring data, privacy, and the legal landscape.
Fairness compounds the privacy cost. The AI-detector bias against non-native speakers, the focus-log flag triggered by assistive technology, the lockdown browser that fights a screen reader — each turns an integrity control into an accessibility barrier for exactly the learners your platform should protect. The minimum bar is concrete and the same across every detection signal in this article: a flag triggers human review with context, there is a documented and fast appeal route, and nothing auto-fails a candidate.
Assessment Design: The Layer That Actually Holds
Now the part that pays off. The most effective, most durable, and least invasive anti-cheating control is a test that is hard to cheat — and it costs no learner privacy at all. Five design moves, roughly from easiest to most ambitious.
Randomized question pools. Instead of one fixed paper, build a large bank and draw each candidate's exam at random from it. The dilution is dramatic, and the math is worth seeing. Suppose you draw 10 questions from a pool of 40. The number of distinct possible question sets is:
C(40,10) = 40! / (10! × 30!) = 847,660,528 possible papers
The chance that two specific students receive the identical ten questions is therefore about 1 in 847 million — and even when questions overlap, randomizing their order (10 questions = 3,628,800 possible orderings) makes "tell me the answer to number 7" useless. A leaked screenshot of one student's paper helps almost no one else. Item banks and randomized forms are standard features of assessment platforms and are formally supported by the 1EdTech Question and Test Interoperability (QTI) 3.0 specification, the standard for portable, interoperable assessment content.
Parameterized / personalized variants. Go further: give every student the same problem with different numbers or a different dataset — a physics question where each candidate's mass and velocity differ, a programming task with a student-specific input file, a finance problem with personalized figures. The concept tested is identical; the answer is unique to each student. Sharing "the answer" becomes impossible because there is no single answer to share.
Application over recall. A question that asks the learner to recall a fact is answerable by any search engine or chatbot in seconds; a question that asks them to apply judgment to a novel scenario is not. Shift from "define X" to "here is a situation — diagnose it and justify your choice." This is the move that most directly defangs generative AI, because it rewards reasoning the model cannot copy from the prompt.
Open-book and authentic tasks. If the material is searchable anyway, stop pretending it is not. An open-book exam designed around analysis, or an authentic assessment that mirrors a real professional task (build the thing, present the case, solve the messy problem), measures capability that lookup cannot fake. You are no longer policing access to information; you are testing what the learner can do with it.
Process evidence and oral defence. For high-stakes written work, capture the process, not just the product: a tracked editor that records drafts, pauses, and revisions makes a fully pasted final document conspicuous, and a short recorded oral follow-up — "walk me through section 3 of your own essay" — is nearly impossible to fake for work you did not produce. Used as evidence rather than accusation, process signals are both fairer and harder to game than any AI detector.
Figure 3. Match the controls to the stakes. Start from what a successful cheat would cost, build the design layer first because it is the most durable, and add deterrence and detection on top only as the stakes justify — never the reverse.
The table below is the build-vs-buy and effort summary. Note the pattern: the cheap-to-buy controls are the weakest and most invasive, and the durable controls are the ones you have to design.
| Control | Layer | Cheating arms race | Privacy cost | Build / buy |
|---|---|---|---|---|
| Lockdown browser | Detection + deterrence | Erodes (second device) | Low–medium | Buy |
| Tab/focus detection (Page Visibility API) | Detection | Erodes (second device) | Low | Buy or build |
| Text / code similarity (Turnitin, MOSS) | Detection | Stable | Medium | Buy |
| Answer-collusion analysis | Detection | Stable | Low (uses existing data) | Buy or build |
| AI-writing detector | Detection | Losing badly; high false positives | Medium | Buy with extreme caution |
| Integrity reminders / honor code | Deterrence | N/A (modest, real effect) | None | Build |
| Randomized question pools (QTI) | Design | Durable | None | Build / configure |
| Parameterized variants | Design | Durable | None | Build |
| Application / authentic tasks | Design | Most durable | None | Build (instructional design) |
Where Fora Soft Fits In
Fora Soft builds assessment, video, and real-time features for products in e-learning, telemedicine, and other regulated verticals where a flag can become a formal accusation and stored behaviour can become a legal exposure. In anti-cheating specifically, the value we add is rarely a better detector — it is the engineering judgment to wire commodity detection (lockdown, focus logging via the Page Visibility API, similarity and collusion analysis) into a clean review-and-appeal workflow where no score ever auto-accuses, to build the randomized-pool and parameterized-variant machinery that makes a test cheat-resistant by design, and to design the data flow so the integrity signals you keep are minimised, proportionate, and defensible. We help teams put the budget where it lasts — the design layer — instead of on surveillance that learners resent and counsel questions.
What to Read Next
- Online proctoring: approaches, trade-offs, and privacy — the camera-monitoring layer this article deliberately set aside.
- Identity verification for assessments — proving the right person is taking the test at all.
- Proctoring and assessment reference design — the full annotated blueprint that wires these controls together.
Call to action
- Talk to a e-learning engineer — book a 30-minute scoping call to talk through your how to prevent cheating in online exams plan.
- See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
- Download the Cheat-Resistant Assessment Design Checklist — A one-page gate to run before you ship an online assessment: design the test to resist cheating first (randomized pools, parameterized variants, application and authentic tasks, process evidence), add proportionate deterrence and….
References
- W3C Page Visibility —
visibilitychangeevent anddocument.hidden/visibilityState. World Wide Web Consortium. Tier 1 (web standard). https://www.w3.org/TR/page-visibility/ - 1EdTech (IMS Global) Question and Test Interoperability (QTI) 3.0 — portable assessment content, item banks, and selection/ordering. 1EdTech Consortium. Tier 1 (standard). https://www.imsglobal.org/spec/qti/v3p0/oview
- General Data Protection Regulation (GDPR), Article 5(1)(c) — data minimisation; Article 9 — special-category (biometric) data. European Union, Regulation (EU) 2016/679. Tier 1 (law). https://gdpr-info.eu/art-5-gdpr/
- Family Educational Rights and Privacy Act (FERPA), 20 U.S.C. § 1232g; 34 CFR Part 99. U.S. Department of Education. Tier 1 (law). https://www.ecfr.gov/current/title-34/subtitle-A/part-99
- Retraction of "Signing at the beginning makes ethics salient…" (Shu, Mazar, Gino, Ariely, Bazerman, PNAS 2012) — retracted 2021 for fabricated data. Retraction Watch. Tier 5 (primary retraction record). https://retractionwatch.com/2021/09/14/highly-criticized-paper-on-dishonesty-retracted/
- Evidence of fraud in the 2012 signing-at-the-top dishonesty field experiment. Data Colada (post 98). Tier 5 (forensic analysis). https://datacolada.org/98
- Effects of honor code reminders on university students' cheating in unproctored exams: a double-blind randomized controlled field study (Zhao et al., 2023). Contemporary Educational Psychology. Tier 5 (peer-reviewed). https://www.sciencedirect.com/science/article/abs/pii/S0361476X2300067X
- GPT detectors are biased against non-native English writers (Liang et al., 2023) — most TOEFL essays misclassified as AI-generated. Patterns / Stanford. Tier 5 (peer-reviewed). https://arxiv.org/abs/2304.02819
- Independent 2025 testing of AI-writing detectors — false-positive rates and genre/native-speaker bias. HasteWire study summary. Tier 6 (industry testing). https://hastewire.com/blog/study-false-positives-in-ai-detectors-exposed
- MOSS (Measure of Software Similarity) — structural code-similarity detection. Stanford University. Tier 4 (first-party tool). https://theory.stanford.edu/~aiken/moss/
- Universities disabling Turnitin AI detection over false-positive risk (Vanderbilt, Northwestern, and others, 2023). Reporting / institutional statements. Tier 6 (reporting). https://www.vanderbilt.edu/brightspace/2023/08/16/guidance-on-ai-detection-and-why-were-disabling-turnitins-ai-detector/
Per the editorial conflict rule, where popular integrity guides repeat the "sign at the top" pledge advice and vendor marketing claims "accurate AI detection," this article follows the primary record — the 2021 PNAS retraction and the peer-reviewed false-positive and bias evidence — and treats the popular claims as discredited or unproven.


