HIPAA De-Identification: Safe Harbor vs Expert Determination

This is engineering guidance, not legal advice. Confirm specifics with qualified counsel.

Why this matters

Every product decision you will make — which screen loses patients before their first visit, which onboarding step drives churn, whether the new waiting room cut no-shows — depends on analytics, and every analytics event in a telemedicine product starts life as PHI. Most teams bolt on the same stack they used at their last startup: Google Analytics, a Meta pixel for ad attribution, a session-replay tool, a crash reporter. In healthcare, that default stack is a reportable breach waiting to be noticed, and regulators have been noticing: the FTC's health-data enforcement wave, the HHS Office for Civil Rights (OCR) tracking-technology bulletin, and state laws like Washington's My Health My Data Act all target exactly this pattern. This article is for the founder, product manager, or compliance lead who needs product metrics without betting the company on them: when health data legally stops being PHI, what Safe Harbor and Expert Determination actually require, and how to build an analytics pipeline that an auditor will sign off on.

Why analytics and HIPAA collide

Product analytics, as an industry, is built on one assumption: behavioral data is free. You drop a JavaScript snippet or a mobile software development kit (SDK) — a vendor's pre-packaged code that runs inside your app — and it collects everything: every page URL, every tap, the user's IP address, device identifiers, and a persistent user ID to stitch sessions together. The tool's business model assumes it may keep that data, enrich it, and sometimes use it for its own purposes.

The Health Insurance Portability and Accountability Act (HIPAA) — the US law protecting health information that can be tied to a person — is built on the opposite assumption. Health data tied to an identifiable person, called protected health information (PHI), may be used without the patient's permission only for treatment, payment, and health care operations (TPO), and it may be handled by an outside company only when that company signs a Business Associate Agreement (BAA) — the contract that binds a vendor to HIPAA's rules before it touches patient data (the BAA article covers it in depth).

Here is the part most teams miss, and it changes everything about analytics: in a telemedicine product, the fact that a person is a patient is itself PHI. An event named appointment_booked with a user ID and an IP address is not "product telemetry" — it is a disclosure that an identifiable person sought medical care. It does not matter that the event contains no diagnosis. The Privacy Rule's definition of individually identifiable health information covers the provision of health care to an individual, not just clinical content. The moment that event leaves your infrastructure for a vendor without a BAA, you have disclosed PHI impermissibly.

The good news, which most scare-content skips: HIPAA does not forbid analytics. Analyzing your own product's usage to improve quality is a textbook example of "health care operations" — the Privacy Rule's term for running your business, defined in 45 CFR §164.501 to include quality assessment and improvement. You can compute funnels, retention curves, and call-quality dashboards on full PHI all day long, provided the computation happens inside your compliance boundary: on your infrastructure or a vendor's that is covered by a BAA. The violation is never the analysis. It is who receives the data and what they may do with it.

"Secondary use" is the umbrella term for everything beyond the care itself: product analytics, marketing attribution, research, selling data, training artificial-intelligence (AI) models. Each secondary use must walk out of the PHI boundary through one of exactly three doors — and the doors have very different rules.

The three doors out of the PHI boundary

Think of your PHI boundary — the set of systems and vendors legally allowed to hold patient data, drawn in detail in the HIPAA overview article — as a building with three exits. Every byte of patient data that leaves must pass through one of them.

Door 1 — the BAA. The data stays PHI, full rules apply, and the recipient signs a Business Associate Agreement promising to use the data only for your purposes, safeguard it, and report breaches. This is the door for analytics vendors: a product-analytics platform with a signed BAA is simply part of your boundary. The minimum-necessary rule (45 CFR §164.514(d)) still applies — send the events the purpose needs, not the entire record.

Door 2 — the limited data set. A halfway form defined in 45 CFR §164.514(e): you remove 16 categories of direct identifiers (names, addresses beyond town/city/state/ZIP, phone numbers, emails, account numbers, device identifiers, IPs, URLs, biometrics, full-face images, and more) but may keep the two fields Safe Harbor takes away — dates and geography down to town and ZIP code. The data is still PHI. It may be shared only for research, public health, or health care operations, never marketing, and only under a data use agreement (DUA) — a contract in which the recipient promises not to re-identify or contact the individuals, to safeguard the data, to report misuse, and to bind its own subcontractors the same way (§164.514(e)(4)).

Door 3 — de-identification. The data is transformed until it no longer identifies anyone and there is no reasonable basis to believe it could (45 CFR §164.514(a)). De-identified data is not PHI at all — HIPAA's requirements stop applying to it (45 CFR §164.502(d)(2)), and you may send it to a non-BAA tool, publish it, or sell it without a HIPAA authorization. Because the prize is total freedom, the bar is high, and there are exactly two ways over it: Safe Harbor and Expert Determination.

Compliance-boundary diagram showing three exits for PHI: BAA vendor, limited data set under DUA, and de-identification Figure 1. The three doors out of the PHI boundary. Doors 1 and 2 carry HIPAA with the data; only door 3 removes it — and it is the hardest to open.

One more exit exists on paper: a signed HIPAA authorization from each patient under 45 CFR §164.508, the route the consent article covers. For analytics at scale it is rarely practical — authorizations are specific, revocable one by one, and cannot be a blanket condition of treatment — so this article focuses on the three structural doors.

Safe Harbor: the 18-item checklist with teeth

The first de-identification method, called Safe Harbor (45 CFR §164.514(b)(2)), is mechanical by design: remove 18 listed categories of identifiers for the patient and for their relatives, employers, and household members, and confirm you have no actual knowledge that what remains could identify someone. No statistician required. The list, condensed:

names; geographic subdivisions smaller than a state (with a narrow ZIP exception below); all elements of dates except year (birth, admission, discharge, death) plus all ages over 89; telephone and fax numbers; email addresses; Social Security numbers; medical record numbers; health plan beneficiary numbers; account numbers; certificate and license numbers; vehicle identifiers including plates; device identifiers and serial numbers; web URLs; IP addresses; biometric identifiers, including finger and voice prints; full-face photographic images and any comparable images; and any other unique identifying number, characteristic, or code.

Read that list as a product engineer and the problem with off-the-shelf analytics announces itself. The bolded items are the default payload of every analytics SDK, pixel, and session-replay tool on the market. A tool that auto-collects the visitor's IP address and the URL of the page they are on is collecting two Safe Harbor identifiers before you define a single custom event. A telemedicine session recording contains a voice print and a full-face image by definition — more on that below.

Three details in the list deserve a closer look, because each one breaks a common analytics assumption.

The ZIP code rule is arithmetic, not judgment. You may keep only the first three digits of a ZIP code, and only if the area those three digits cover contains more than 20,000 people by current Census data; otherwise the three digits become 000 (§164.514(b)(2)(i)(B)). Under the 2000 Census data cited in OCR's guidance, 17 three-digit areas failed the 20,000-person test — including 036 (parts of Vermont) and 893 (rural Nevada). The rule points at current Census data, so the list of restricted areas shifts with each census; a de-identification job built in 2020 and never revisited can quietly drift out of compliance.

Dates are identifiers. Not just birth dates — all elements of dates directly related to the individual, including admission and visit dates, except the year. A Safe Harbor-de-identified dataset cannot contain visit_date: 2026-06-11; it can contain visit_year: 2026. For product analytics this is close to fatal: a daily-active-users curve, a Tuesday-versus-Saturday no-show analysis, a time-to-second-visit funnel — all are built from exact dates. Safe Harbor was designed for releasing research datasets, not for running a product, and this is the single clearest reason the compliant analytics pattern (below) keeps identified data inside the boundary instead of trying to de-identify the event stream.

A hashed user ID does not count. The final catch-all item — "any other unique identifying number, characteristic, or code" — comes with a companion rule on re-identification codes (45 CFR §164.514(c)): you may keep a code for re-linking records later only if the code is not derived from information about the individual and the translation mechanism is never disclosed. Running an email address or medical record number through SHA-256 produces a code derived from the identifier — it fails §164.514(c)(1) by construction, no matter how irreversible the hash is in practice. "We pseudonymized the user IDs" is not de-identification; it is PHI with extra steps. A random surrogate key, generated independently and mapped in a secured lookup table, is the compliant construction.

Field your analytics tool keeps by default	Safe Harbor identifier?	Why teams get it wrong
IP address	Yes — item (O)	"It's just network metadata" — the rule lists it explicitly
Page URL / referrer	Yes — item (N)	URLs encode visit type, clinic, sometimes appointment IDs
Device ID / advertising ID	Yes — item (M)	Persistent device identifiers are listed by name
Exact visit timestamp	Yes — item (C), date element	Only the year survives Safe Harbor
5-digit ZIP code	Yes — item (B)	Only ZIP3 over 20,000 population survives, else `000`
Hashed email / hashed MRN	Yes — item (R) + §164.514(c)	Derived codes fail the re-identification-code rule
Voice recording	Yes — item (P), voice print	Audio of the patient is biometric by rule
Video frame with the patient's face	Yes — item (Q)	"Full face … and any comparable images"
Free-text chat / transcript	Usually — names, dates inside	Identifiers live in the content, not the schema
Random session UUID (new each session)	No, if truly random	Safe only if not derived and not persistent across contexts

Table 1. The default analytics payload, mapped to 45 CFR §164.514(b)(2). Most of what an SDK collects "for free" is on the list.

After the 18 removals there is a backstop: the actual knowledge test (§164.514(b)(2)(ii)). If you know a specific fact that would let remaining data identify someone — the dataset notes "occupation: state governor's press secretary," or a clinical narrative describes a patient's one-of-a-kind accident covered in local news — the data is not de-identified even though all 18 boxes are ticked. OCR's guidance is explicit that clinical free text is information-rich and risky; running a regex over structured columns while passing transcripts through untouched is not Safe Harbor.

Two-path diagram of HIPAA de-identification: Safe Harbor 18-identifier removal versus Expert Determination risk analysis Figure 2. The two legal methods of de-identification under 45 CFR §164.514(b). Safe Harbor is mechanical and destroys analytic detail; Expert Determination preserves utility but requires a documented statistical determination.

Expert Determination: the statistical path

The second method, Expert Determination (45 CFR §164.514(b)(1)), trades the checklist for a professional judgment. A person "with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods" applies those methods, determines that the risk is very small that the data could be used — alone or in combination with other reasonably available information — by an anticipated recipient to identify an individual, and documents the methods and results. That documentation is what you hand OCR if they ever ask.

Four facts about this method surprise most product teams, all four straight from OCR's de-identification guidance (the controlling interpretation, issued November 26, 2012 and still current as of June 2026):

First, there is no magic number. OCR deliberately declines to define "very small" as a percentage. The expert sets the threshold in context: who receives the data, what other datasets they could reasonably link it to, what they would gain. The same dataset can be de-identified for one recipient and not for another.

Second, there is no licensed profession of de-identification experts. No certification exists; OCR looks at the person's actual training and experience with statistical disclosure methods. In practice this is a statistician, epidemiologist, or specialized consultancy, and their report — methods, assumptions, results — is the compliance artifact.

Third, determinations age. The guidance notes that experts commonly attach an expiration to their determination, because external data and computing power grow. A determination from 2021 says nothing about a 2026 release of "the same" data pipeline if the schema, volume, or recipient changed.

Fourth, the method exists because identifiability is a spectrum, and the guidance shows the math. The combination of year of birth, sex, and 3-digit ZIP is unique for roughly 0.04% of US residents — about 4 people in every 10,000, which an expert can often mitigate down to "very small." Swap in the full date of birth and the 5-digit ZIP, and the same three fields uniquely pin down over 50% of the US population. Same schema, three columns either way; one parameter choice separates a publishable dataset from a re-identification machine. This is the entire logic of Expert Determination: risk lives in the resolution of the data, and a statistician can dial resolution down — generalizing ages into 5-year bands, truncating ZIPs, shifting dates by a random per-patient offset — until risk is provably small while the dataset stays useful.

That last technique matters for analytics: an expert can approve date shifting (every patient's events moved by a consistent random offset), which preserves intervals — time-to-follow-up, days-between-sessions — that Safe Harbor would destroy. When a telemedicine product actually needs to release event-level data (to a research partner, a data buyer, an AI vendor outside your BAA boundary), Expert Determination is usually the only door that preserves enough utility to be worth opening. Budget for it: a determination is a consulting engagement with a written report, re-validated on a schedule, not a library you npm install.

Why de-identification is fragile — and what that means for video

Both methods produce data with small risk, not zero risk, and the external world keeps moving. The landmark study here is Rocher, Hendrickx, and de Montjoye (Nature Communications, 2019): using generative models, the authors estimated that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes — and, more damning for the industry's habits, that even heavily sampled, incomplete datasets ("we only released 1% of the rows") do not provide the protection intuition suggests. Their conclusion: the release-and-forget model of de-identification is technically and legally inadequate for modern data richness.

For a telemedicine platform this lands in three places.

Session recordings cannot be de-identified as media. A video consult contains a full-face image (identifier Q) and a voice print (identifier P); the conversation itself is dense with names, dates, places, and the kind of one-of-a-kind detail the actual-knowledge test exists for. Blurring faces and pitching voices does not survive scrutiny — voice conversion is reversible in part, and the audio content still identifies. The honest engineering position: de-identified telemedicine video does not exist; de-identified derivatives of it do. Call-quality metrics, durations, reconnection counts, NLP-extracted symptom codes from a redacted transcript — those can pass through door 3. The recording itself stays inside the boundary, governed by the rules in the recording article.

Transcripts need redaction plus a risk assessment. Automated PHI-scrubbing of free text (now usually an NLP model) is necessary but not sufficient — redaction is a technique, not a legal status. A scrubbed transcript reaches de-identified status through Safe Harbor only if you can defend "no actual knowledge" on what slipped through, which for noisy automated redaction usually means a measured residual-error rate and an Expert Determination on top.

AI training is a secondary use like any other. Letting a vendor train models on your patients' data requires door 1 plus contract terms (a BAA whose permitted-uses clause actually covers training — most don't), door 3 (true de-identification, with the fragility above), or a signed authorization per patient. "The vendor pinky-promised they de-identify it on their side" hands your PHI to a non-BAA party first and de-identifies second — the disclosure already happened. The clinical-AI wiring is its own article; the rule of thumb here: a model vendor is a business associate, and de-identification done by the recipient is too late.

What you can and cannot send to a generic analytics tool

Now the question every team actually has: "so can I keep Google Analytics?" Walk it through the three doors.

Google does not sign a BAA for Google Analytics — Google's own help documentation says customers must not send data subject to HIPAA, and that policy has not changed for GA4 as of June 2026. No BAA means door 1 is closed. Event-level analytics data is not a limited data set you could justify (and an ad-tech recipient is not a research/operations recipient anyway) — door 2 closed. And the default GA payload — IP, URLs, device identifiers, timestamps, persistent IDs — fails Safe Harbor on four counts simultaneously — door 3 closed. The same logic disqualifies any tool that will not sign a BAA: Meta pixel, TikTok pixel, most ad attribution SDKs, most free session-replay tiers. It is not that these tools are evil; it is that the data cannot lawfully reach them.

The regulators have spent three years making exactly this point:

Enforcement	Year	What happened	The analytics lesson
FTC v. GoodRx	2023	$1.5M civil penalty, first-ever enforcement of the FTC Health Breach Notification Rule	Sending medication and condition data to Facebook, Google, Criteo via pixels/SDKs was an unauthorized disclosure — a "breach" even with no hacker anywhere
FTC v. BetterHelp	2023	$7.8M to refund consumers; ban on sharing health data for ads	Mental-health intake answers + email/IP to ad platforms; claiming "HIPAA compliant" while doing it was a deception count of its own
FTC v. Cerebral	2024	$7M+ order; ban on using health data for advertising	Telehealth platform's tracking tools disclosed patient data of 3.2M users to third parties
OCR tracking bulletin + AHA v. HHS	2022–2024	OCR's bulletin on tracking tech, partially vacated by a federal court in June 2024	On unauthenticated public pages, a court rejected OCR's theory that IP + page visit alone is PHI; HHS dropped its appeal in September 2024. Inside the logged-in app and patient portal, the bulletin's position stands

Table 2. Three FTC actions and one court fight, 2023–2026. Every one of them is, at root, about analytics tooling on health products.

Two layers beyond HIPAA make the perimeter wider than most teams assume. The FTC Health Breach Notification Rule (16 CFR Part 318, amended effective July 29, 2024) covers health apps that are not HIPAA-covered entities — a direct-to-consumer wellness or cash-pay telehealth product without insurance billing can sit outside HIPAA and still owe the FTC breach notifications for unauthorized disclosures, GoodRx-style. And state law keeps moving: Washington's My Health My Data Act (effective March 31, 2024) requires separate consent to collect "consumer health data" at all, defines that term far more broadly than HIPAA defines PHI, includes a private right of action — and produced its first class actions in 2025, starting with one against Amazon over SDK data collection. New York's 2025 health-information privacy law follows the same arc. The EU's GDPR treats health data as a special category with its own, stricter anonymization bar — pseudonymized data explicitly remains personal data there (the global health-data article maps that terrain).

So the honest vendor question is never "is this tool popular?" but "will it sign a BAA, and does its data handling match the contract?" The current state of the market, snapshot June 2026 — re-verify at contract time, plan tiers change:

Tool	BAA available?	Notes for telemedicine use
Google Analytics 4	No	Google's policy: do not send HIPAA-covered data. Usable only on public marketing pages where no PHI flows
Meta / TikTok pixels	No	Ad-platform terms prohibit health data; FTC cases above are the case law
Mixpanel	Yes — Enterprise plan	Product analytics under BAA; configure IP handling, EU residency as needed
Amplitude	Yes — Enterprise plan	Comparable; BAA is a sales conversation, not a checkbox
PostHog	Yes — incl. self-serve	BAA covers product analytics, session replay, flags; self-host option exists
Piwik PRO	Yes	Positioned as the GA-shaped compliant alternative; data residency options
Matomo / Countly (self-hosted)	N/A — no vendor	Runs inside your own boundary; no BAA needed because no disclosure occurs
Freshpaint	Yes	A consented/filtered forwarding layer in front of non-compliant destinations

Table 3. Analytics tooling with the only column that matters first. "Yes" means a BAA is offered on some plan — confirm the plan, the scope, and the configuration in writing.

Map of PHI leak points in a telemedicine app: pixels, SDKs, session replay, crash logs, URLs, push notifications Figure 3. Where telehealth analytics quietly leak PHI. None of these flows looks like "sharing patient data" in a sprint review — every one of them is.

The compliant analytics pipeline

The pattern that survives audits is not "de-identify everything" — Safe Harbor's date rule makes that self-defeating — and not "BAA everything" either, because your marketing, BI, and data-science consumers legitimately live outside the clinical boundary. The pattern is a two-zone pipeline: identified events inside, aggregates outside.

Zone 1 — inside the BAA boundary. The app sends events to a first-party collection endpoint — yours, on your domain, not a vendor's script — over TLS. A schema contract enforces an event allowlist: every event name and property is registered, reviewed, and typed; anything not on the list is dropped at ingestion. Properties carry a pseudonymous surrogate ID (random, mapped to the patient only in a secured lookup table per §164.514(c) — never a hashed email). IP addresses are used for routing and rate-limiting, then truncated or dropped before storage; exact timestamps stay, because inside the boundary they are allowed to. From here, events flow to your warehouse and to a product-analytics tool under BAA (Table 3). Your funnels, cohorts, and call-quality dashboards — the join-time, reconnection, and quality-of-experience metrics a video platform lives on — all run here, on full detail, lawfully, as health care operations. Access is role-scoped and logged, which is the audit-logging article's territory.

Zone 2 — outside the boundary. Nothing event-level crosses. A scheduled aggregation job publishes numbers: weekly active patients, conversion rates, retention curves, NPS by cohort — each cell representing enough people that no one is identifiable. The standard discipline here is small-cell suppression: suppress or merge any cell built from fewer than some threshold of individuals. CMS suppresses any cell of 1–10 patients in its own public data files (the "cell size of 11" policy) — adopting the same threshold gives you an answer with a federal precedent when an auditor asks "why 11?". Marketing attribution gets conversion counts via privacy-preserving APIs or server-side uploads of suppressed aggregates — never user-level conversion events with identifiers.

Two-zone compliant analytics pipeline: first-party collection and BAA tools inside the PHI boundary, aggregates outside Figure 4. The two-zone analytics pipeline. Event-level data, pseudonymous IDs, and exact timestamps never leave the BAA boundary; only suppressed aggregates do.

Run the arithmetic on what this pipeline is protecting you from. Take a mid-size telemedicine product: 40,000 active patients, a marketing team that added one conversion pixel to the booking-confirmation page. Every confirmation fires the pixel with IP, URL (which encodes the visit type), and a device identifier to an ad platform with no BAA:

40,000 patients × 1 impermissible disclosure each = 40,000 violations. At the lowest 2026 penalty tier (no knowledge, $145 minimum per violation): 40,000 × $145 = $5,800,000 in theoretical exposure — capped by regulation at $2,190,294 per provision per calendar year (HHS 2026 inflation-adjusted tiers; the cap assumes one provision violated, and OCR rarely sees just one). Add the breach machinery: individual notice to 40,000 people within 60 days, media notice in every state with 500+ affected, HHS notice, and — because trust is the product in healthcare — the churn that follows the email "we shared your appointment data with an advertising platform." Add the non-HIPAA layer where it applies: the FTC's GoodRx penalty was $1.5M plus a permanent ban on the conduct; Washington's MHMDA adds private lawsuits.

One pixel. The two-zone pipeline costs a sprint or two of plumbing; the alternative is denominated in millions and in headlines.

The classic mistake: "anonymized" by adjective. The word "anonymized" appears in vendor pitches, internal tickets, and incident write-ups as if saying it makes it so. HIPAA recognizes no such state — it recognizes de-identified, reached by exactly two named methods, and everything else, which is PHI. A dataset with names dropped but visit dates, ZIP codes, and a hashed patient ID intact is PHI. An event stream "anonymized" by removing the email column but keeping the device ID is PHI. A "fully anonymous" session recording is two biometric identifiers in a trench coat. If a sentence about your data pipeline cannot cite §164.514(b)(1) or §164.514(b)(2), the data in that sentence is still PHI — write the ticket accordingly.

A governance note that costs nothing and prevents most regressions: make the event allowlist a reviewed artifact, like an API schema. A new analytics event is a pull request; the reviewer checks the properties against Table 1; marketing tags on public pages live in a separate tag-manager container that physically cannot load inside the authenticated app. The OCR bulletin's surviving guidance and the 2026 HIPAA Security Rule update's proposed asset-inventory and risk-analysis requirements (still an NPRM as of 2026-06-11 — status tracked in the 2026 Security Rule article) both point the same direction: know every script that runs in your app and every destination your data reaches. Most of the failure gallery in the common-mistakes article is an unreviewed tag.

Where Fora Soft fits in

We have built telemedicine platforms since 2005, and the analytics conversation arrives in every single one: the requirement first — no PHI beyond the BAA boundary, provable to an auditor — then the capability. In practice that means we ship the two-zone pipeline as part of the platform: first-party event collection behind an allowlisted schema, pseudonymous surrogate IDs done per §164.514(c) rather than hashes, product analytics under a signed BAA, and aggregate-only exports with small-cell suppression for the teams outside the boundary. For video specifically — our home turf across telemedicine, conferencing, and streaming — we instrument quality-of-experience metrics (join time, reconnections, bitrate, packet loss) so product decisions never need the recording itself. The result is a product team that moves at startup speed on metrics an OCR auditor can read without anyone's pulse rising.

Call to action

Talk to a telemedicine engineer — book a 30-minute scoping call to talk through your de-identification hipaa safe harbor plan.
See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
Download the Telemedicine De-Identification & Analytics Checklist — One page: the three doors out of the PHI boundary, the Safe Harbor identifier audit for analytics payloads, and the two-zone pipeline checks, with citations.

References

45 CFR §164.514 — De-identification standard; Safe Harbor identifiers; Expert Determination; re-identification codes; limited data sets; minimum necessary. eCFR, current as of 2026-06-08, read in full 2026-06-11. https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-C/part-164/subpart-E/section-164.514 (Tier 1)
HHS Office for Civil Rights, Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the HIPAA Privacy Rule (November 26, 2012) — expert qualifications, no fixed risk threshold, time-limited determinations, 0.04% vs >50% uniqueness figures, restricted ZIP3 list, actual-knowledge examples. https://www.hhs.gov/hipaa/for-professionals/special-topics/de-identification/index.html (Tier 1)
45 CFR §164.502 — Uses and disclosures: TPO permissions; §164.502(d)(2) de-identified information outside the Privacy Rule; §164.502(a)(5)(ii) sale of PHI prohibition. https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-C/part-164/subpart-E/section-164.502 (Tier 1)
45 CFR §164.501 — Definition of "health care operations" (quality assessment and improvement as permitted use). https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-C/part-164/subpart-E/section-164.501 (Tier 1)
HHS OCR, Use of Online Tracking Technologies by HIPAA Covered Entities and Business Associates (bulletin, December 2022, revised March 2024) — partially vacated as to unauthenticated public pages by AHA v. HHS, No. 4:23-cv-01110 (N.D. Tex., June 20, 2024); HHS withdrew its appeal September 2024. Bulletin status checked 2026-06-11. https://www.hhs.gov/hipaa/for-professionals/privacy/guidance/hipaa-online-tracking/index.html (Tier 1, with judicial limitation noted)
FTC, Health Breach Notification Rule, Final Rule, 89 FR 47028 (May 30, 2024), 16 CFR Part 318, effective July 29, 2024 — health apps outside HIPAA; unauthorized disclosure as "breach." https://www.federalregister.gov/documents/2024/05/30/2024-10855/health-breach-notification-rule (Tier 1)
HHS, HIPAA Security Rule To Strengthen the Cybersecurity of Electronic Protected Health Information, NPRM, 90 FR 800 (January 6, 2025), RIN 0945-AA22 — proposed rule; not final as of 2026-06-11. https://www.federalregister.gov/documents/2025/01/06/2024-30983/hipaa-security-rule-to-strengthen-the-cybersecurity-of-electronic-protected-health-information (Tier 1)
HHS, Annual Civil Monetary Penalties Inflation Adjustment, Federal Register (effective January 28, 2026) — 2026 HIPAA penalty tiers ($145–$73,011 per violation; $2,190,294 cap). https://www.federalregister.gov/documents/2026/01/28/2026-01688/annual-civil-monetary-penalties-inflation-adjustment (Tier 1)
FTC press releases and orders: GoodRx (February 2023, $1.5M HBNR civil penalty); BetterHelp (March 2023, $7.8M); Cerebral (April 2024, $7M+, ad-use ban); FTC-HHS joint letter to ~130 hospital systems and telehealth providers on tracking technologies (July 2023). https://www.ftc.gov/news-events/news/press-releases/2023/07/ftc-hhs-warn-hospital-systems-telehealth-providers-about-privacy-security-risks-online-tracking (Tier 2 — enforcing agency)
Rocher, L., Hendrickx, J.M., de Montjoye, Y-A., Estimating the success of re-identifications in incomplete datasets using generative models, Nature Communications 10:3069 (2019) — 99.98% re-identification with 15 attributes; inadequacy of release-and-forget. https://www.nature.com/articles/s41467-019-10933-3 (Tier 5 — peer-reviewed)
Washington State Attorney General, Protecting Washingtonians' Personal Health Data and Privacy (My Health My Data Act, ch. 19.373 RCW, effective March 31, 2024) — consent to collect; private right of action; first class actions filed 2025 (Amazon SDK suit, W.D. Wash., February 2025). https://www.atg.wa.gov/protecting-washingtonians-personal-health-data-and-privacy (Tier 2 — state enforcing authority)
Google, Google Analytics and HIPAA (help documentation) — Google does not sign a BAA for Google Analytics; customers must not send HIPAA-covered data. Checked 2026-06-11. https://support.google.com/analytics/answer/13297105 (Tier 4 — first-party vendor policy)
Mixpanel, HIPAA compliance at Mixpanel — BAA availability on Enterprise plans. Checked 2026-06-11. https://mixpanel.com/legal/mixpanel-hipaa/ (Tier 4 — vendor; verify plan terms at contract time)
PostHog, The best HIPAA-compliant analytics tools (2026) — market survey used for Table 3 BAA-availability rows, cross-checked against each vendor's own terms. https://posthog.com/blog/best-hipaa-compliant-analytics-tools (Tier 4/6 — vendor survey, orientation)
ResDAC / CMS, CMS Cell Size Suppression Policy — suppression of cells representing 1–10 beneficiaries in public CMS data products; precedent for the threshold of 11. https://resdac.org/articles/cms-cell-size-suppression-policy (Tier 2/3 — CMS-contracted documentation)

Where lower-tier sources disagreed with the rule text, the rule text won: several popular marketing-analytics guides describe hashing identifiers as de-identification — §164.514(c)(1) says otherwise, and this article follows the regulation.

De-Identification, Secondary Use, and Analytics on Health Data

Why this matters

Why analytics and HIPAA collide

The three doors out of the PHI boundary

Safe Harbor: the 18-item checklist with teeth

Expert Determination: the statistical path

Why de-identification is fragile — and what that means for video

What you can and cannot send to a generic analytics tool

The compliant analytics pipeline

Where Fora Soft fits in

What to read next

Call to action

References

Related glossary terms

De-Identification, Secondary Use, and Analytics on Health Data

Why this matters

Why analytics and HIPAA collide

The three doors out of the PHI boundary

Safe Harbor: the 18-item checklist with teeth

Expert Determination: the statistical path

Why de-identification is fragile — and what that means for video

What you can and cannot send to a generic analytics tool

The compliant analytics pipeline

Where Fora Soft fits in

What to read next

Call to action

References

Related glossary terms

HIPAA

De-identification

Telemedicine

Expert Determination

Recording

Telehealth

Consent

Business associate