· Michael Avdeev · Insights  · 5 min read

57 Million Healthcare Records Breached in 2025. Most Were in Places Nobody Knew to Look.

A friend who runs IT security at a regional health system called me last fall. They’d just been hit by ransomware. Bad, but not catastrophic—they had backups.

Then came the bad news.

The attackers had exfiltrated data from a file server nobody remembered existed. A “temporary” archive created during an EHR migration in 2013. It held 12 years of patient records. SSNs. Diagnoses. Insurance information.

“We didn’t even know it was there,” she told me. “It wasn’t in any of our inventories.”

That’s the story behind the statistics.


The Numbers

OCR reported 642 large healthcare breaches in 2025, affecting nearly 57 million individuals. Better than 2024’s record-breaking 275 million. Still staggering.

But here’s what the breach statistics don’t capture: how many of those incidents involved data organizations didn’t know they had?

Ransomware operators don’t discriminate between “official” patient databases and forgotten file shares. They encrypt everything. They exfiltrate everything. The shadow data that falls outside normal security controls? That’s often the easiest target.


The $625,000 Question

Hypertension Nephrology Associates paid $625,000 last year to settle a class action after a breach. The lawsuit alleged they failed to implement “reasonable security protections” under HIPAA.

The keyword is “reasonable.”

Courts and regulators increasingly interpret that to include knowing where PHI exists across your environment. Not just in clinical systems with formal controls. Everywhere.

A compliance officer I talked to put it simply: “If you can’t demonstrate you knew where patient data lived before a breach, good luck explaining why you didn’t protect it.”


The Usual Hiding Places

Every healthcare organization has PHI in places that fall outside the official inventory. I’ve seen it dozens of times:

Billing and administrative systems. Revenue cycle teams export data for analysis, reconciliation, reporting. Those exports land on network shares, local drives, cloud storage. SSNs, insurance information, diagnosis codes—scattered across the organization.

Research and analytics. Population health, clinical research, quality improvement. De-identification is supposed to happen. Partial datasets with identifying information frequently persist.

Email and collaboration tools. Staff email patient information to colleagues, external providers, sometimes patients. Attachments accumulate in mailboxes and cloud drives indefinitely. Nobody cleans them up.

Legacy system migrations. When you upgrade EHRs or merge systems, data migration creates copies. Old systems get decommissioned but backup tapes and archive servers remain. Complete patient records, sitting unprotected.

Third-party integrations. Clearinghouses, billing services, transcription vendors, health information exchanges. They all receive PHI. When those relationships end, the data doesn’t always come back.


Why Detection Takes So Long

IBM’s Cost of a Data Breach Report found breaches involving shadow data took 26.2% longer to identify and 20.2% longer to contain. Average cost: $5.27 million—significantly higher than breaches of managed data.

Makes sense. Security controls only cover what you know about. Monitoring, alerting, access management—all designed for the systems in your inventory. Shadow data falls outside the perimeter you’re actually watching.

Attackers understand this. Why target the hardened EHR when unprotected file shares contain the same patient information?


What OCR Wants to Know

OCR’s enforcement priorities have shifted. Recent investigations focus not just on breach response but on whether organizations had adequate visibility before incidents occurred.

The questions:

  • Did the organization maintain a complete inventory of systems containing PHI?
  • Were risk assessments conducted across all PHI locations—not just clinical systems?
  • Did technical safeguards extend to unstructured data repositories?
  • Were access controls appropriate for all PHI storage locations?

Organizations that can’t answer? They face both regulatory penalties and class action exposure. My friend’s health system is still dealing with the fallout.


Two Scenarios

Consider the difference:

Scenario A: Ransomware encrypts a file server. During incident response, you discover 50,000 patient records you didn’t know existed. OCR investigates. You can’t demonstrate a risk assessment covering that server. Settlement: six figures plus mandatory corrective action.

Scenario B: Same ransomware, same file server. But you discovered those records six months earlier during a comprehensive scan. Documented the finding. Implemented access controls. Included it in your risk assessment. OCR investigates. You demonstrate reasonable security practices. Outcome: significantly reduced liability.

The breach might be unavoidable. The liability isn’t.


Scan Before OCR Does

Comprehensive PHI discovery means scanning:

  • Network file shares across all departments, not just clinical
  • Cloud storage including personal drives and shared folders
  • Email archives where PHI accumulates over years
  • Backup systems containing historical patient data
  • Development and test environments where production data shouldn’t exist

One scan across these locations often reveals PHI in dozens of unexpected places. Each one represents breach exposure—and HIPAA liability.


What We Built

Risk Finder gives healthcare organizations the visibility HIPAA requires:

  • PHI-specific classifiers—ICD-10 codes, medical terms, NPI numbers, blood types
  • 150+ total classifiers running simultaneously
  • HIPAA-ready reports for compliance documentation
  • Local processing—PHI stays in your environment, no cloud dependencies
  • Flat-rate pricing—comprehensive scanning doesn’t blow your budget

Know where PHI lives before OCR asks.


Start a free trial | See Risk Finder | Try the free scanner


57 million patients had their data exposed in 2025. Many of those records lived in places organizations didn’t know to protect. The first step to better security is knowing where your PHI actually exists.

Back to Blog

Related Posts

View All Posts »

How "Classification Intelligence" enables Risk Management

Organizations face an ever-evolving landscape of cyber threats and regulatory scrutiny. The global average cost of a data breach in 2024 is $4.88M, IBM highlights in the 2024 Cost of Data Breach. Effective and accurate data classification has emerged as a critical strategy for enterprises to manage risks, enhance security posture, and build resilience.