· Michael Avdeev · Insights  · 5 min read

How to Conduct Data Due Diligence for Healthcare M&A

Consider this composite case—a pattern I see regularly in healthcare M&A.

A PE firm acquires a 40-physician dermatology practice. Standard rollup play—buy, consolidate, optimize, repeat.

Three weeks post-close, the IT integration team finds a network drive nobody mentioned during due diligence. On it: 15 years of patient records from a practice acquired years earlier. Scanned intake forms. Insurance EOBs. Lab results. Hundreds of thousands of patients.

The practice had been “sold” multiple times. The data followed each time. Nobody ever cleaned it up.

The question every acquirer eventually asks: “How do we make sure this doesn’t happen again?”

The answer isn’t complicated. But almost nobody does it.


Healthcare M&A Is Booming—And So Is the Risk

PE healthcare deals hit $115 billion globally in 2024—the second-highest year on record. In specialties like dermatology, ophthalmology, and gastroenterology, PE involvement now exceeds 30%. Dental, behavioral health, urgent care—if it’s a fragmented specialty, someone’s rolling it up.

The problem: healthcare practices are data disasters.

  • EMR migrations leave orphaned databases
  • Scanned paper records live on random file shares
  • Billing exports get saved “just in case”
  • Staff turnover means nobody remembers where things are
  • Fifteen years of HIPAA exposure sitting in a closet server

When you buy a practice, you buy all of it. Including the PHI nobody told you about.


What Traditional Due Diligence Misses

Standard M&A cyber due diligence looks at:

  • Security policies and procedures
  • Incident history
  • Network architecture
  • Vendor relationships
  • Insurance coverage

What’s missing? A complete inventory of where PHI actually lives.

Not where the target thinks it lives. Where it actually lives.

I’ve seen practices pass due diligence checklists with flying colors—then turn up 2TB of unencrypted patient data on a shared drive labeled “OLD STUFF DO NOT DELETE.”

The checklist didn’t ask. Nobody looked.


The Healthcare-Specific Data Risks

Healthcare data has unique characteristics that make it particularly dangerous in M&A:

1. PHI Has Long Tails

Unlike credit card numbers (which expire), medical record numbers, diagnoses, and treatment histories are sensitive forever. That 2009 patient file? Still a HIPAA violation in 2026.

2. Paper-to-Digital Transitions Created Chaos

When practices went digital, they often scanned everything and dumped it on file shares. The paper got shredded. The digital copies were never organized, classified, or governed.

3. Multiple EMR Migrations

A practice that’s been around 20 years might have used 3-4 different EMR systems. Each migration left data behind. Exports, backups, “just in case” copies.

4. Departed Physicians Take (and Leave) Data

Physicians who left the practice might have copied patient panels to personal drives. Physicians who joined brought data from previous practices. It’s all intermingled.

5. HIPAA Liability Follows the Data

When you acquire a practice, you become responsible for all PHI—including data you didn’t know existed. OCR doesn’t care that you just bought the place.


How to Actually Conduct Healthcare Data Due Diligence

Here’s the process I recommend for healthcare M&A:

Pre-Close: Discovery Scan

Before the deal closes, run a comprehensive data discovery scan across:

  • File shares: Network drives, NAS devices, SharePoint
  • Cloud storage: Box, Dropbox, Google Drive, OneDrive
  • Legacy systems: Old servers, archived databases, backup tapes
  • Endpoints: Workstations, especially long-tenured staff
  • Email: PST archives, shared mailboxes

You’re looking for PHI: patient names paired with SSNs, medical record numbers, diagnoses (ICD-10 codes), insurance information, treatment notes.

What You’ll Find

In typical healthcare data discovery engagements, most PHI lives where you’d expect—the EMR, the billing system. But a significant portion surfaces in “known unknown” locations: old file shares everyone forgot about, archived databases from previous systems.

The real surprises come from completely unexpected places—personal folders, email attachments, temp directories. That category is where deals get repriced—or killed.

Red Flags to Watch For

  • Large archive folders with names like “OLD”, “BACKUP”, “DO NOT DELETE”
  • PST files over 1GB (email archives full of attachments)
  • Scanned document folders from paper-to-digital conversion
  • Exports from previous EMR systems
  • Personal folders of departed physicians
  • Shared drives with wide-open permissions

Post-Discovery: Remediation Assessment

Once you know where PHI lives, you can assess:

  1. Scope: How many records? How many patients?
  2. Sensitivity: SSNs? Mental health? Substance abuse? HIV status?
  3. Exposure: Who has access? Is it encrypted?
  4. Remediation cost: What will it take to clean up?

This becomes a deal term. Either the seller remediates before close, the buyer gets a price reduction, or there’s an escrow holdback.


Why Speed Matters

Healthcare M&A timelines are tight. PE firms move fast. You don’t have six months to deploy a platform and run discovery.

You need answers in days, not quarters.

This is exactly why we built Risk Finder as a containerized scanner. Pull the Docker image, point it at the target’s infrastructure, get results. No agents. No complex deployment. No data leaving the environment.

For healthcare specifically:

  • 150+ classifiers including ICD-10 codes, MRNs, NPIs, DEA numbers
  • PHI detection across structured and unstructured data
  • OCR scanning for scanned intake forms and faxes
  • DICOM support for medical imaging metadata

The Deal Math

Let’s say you’re acquiring a practice for $15M. Due diligence costs maybe $200K total—legal, financial, operational.

Adding a proper data discovery scan costs a fraction of that. And finding a hidden data liability before close could save you:

  • Remediation costs: $500K-$2M for a significant PHI exposure
  • Regulatory fines: $100-$50,000 per record for willful HIPAA violations
  • Deal repricing: 5-10% purchase price reduction is common
  • Deal failure: Walking away from a deal that looked good on paper

The ROI isn’t hard to calculate. The cost of not looking is almost always higher than the cost of looking.


What Smart Acquirers Do Now

After surprises like this, smart PE firms change their process. Now they run a data discovery scan on every healthcare acquisition before close. It’s a standard line item in due diligence, right next to the financial audit.

What they’ve found since:

  • A behavioral health practice with 10 years of unencrypted psychotherapy notes on a shared drive
  • A dental group with patient SSNs in an Excel file emailed between 40 locations monthly
  • An ophthalmology chain with DICOM files containing patient demographics in image metadata

None of these killed deals. But all of them changed deal terms.

Better to know before you sign than discover after you own it.


Start Before Someone Else Does

If you’re doing healthcare M&A—as a PE firm, health system, or practice acquirer—data due diligence isn’t optional anymore.

The question isn’t whether hidden PHI exists. It’s whether you find it before close, or after.

Scan before someone else does.

Start your 7-day free trial →

Back to Blog

Related Posts

View All Posts »

Scan Your Data Before It Enters the LLM

Your LLM is only as clean as your training data. Once PII gets baked into model weights, there is no delete button. Here is how to catch it before that happens.