· Michael Avdeev · Guides  · 8 min read

The Complete PII/PHI Compliance Guide: What You Need to Know in 2026

If you handle customer data, employee records, or patient information, you’re subject to data protection regulations. But which ones? And what exactly do they require?

This guide breaks down PII and PHI compliance into plain language — what data is covered, which laws apply, and what you actually need to do.


Part 1: Understanding PII vs PHI

What is PII (Personally Identifiable Information)?

PII is any information that can identify a specific individual, either on its own or when combined with other data.

Direct identifiers (PII on their own):

  • Social Security Numbers (SSN)
  • Driver’s license numbers
  • Passport numbers
  • Email addresses
  • Phone numbers
  • Biometric data (fingerprints, facial recognition)

Indirect identifiers (PII when combined):

  • Date of birth
  • Zip code
  • Gender
  • Job title
  • IP addresses

Why it matters: A name alone might not be sensitive. But a name + date of birth + zip code can uniquely identify 87% of Americans.

What is PHI (Protected Health Information)?

PHI is health information that can be linked to a specific individual. It’s a subset of PII, but with stricter protections under HIPAA.

PHI includes:

  • Medical record numbers
  • Health insurance IDs
  • Diagnosis codes (ICD-10)
  • Treatment records
  • Prescription information
  • Lab results
  • Mental health records
  • Billing information for healthcare services

The key distinction: PHI requires a healthcare context. A blood pressure reading alone isn’t PHI. A blood pressure reading linked to a patient name in a medical record is PHI.

Quick Reference: PII vs PHI

Data TypePII?PHI?
Social Security NumberYesYes (if in healthcare context)
Medical diagnosisNoYes
Email addressYesYes (if in healthcare context)
Credit card numberYesNo (covered by PCI DSS)
Blood typeNoYes
Home addressYesYes (if in healthcare context)

Part 2: Regulations That Apply to Your Organization

HIPAA (United States - Healthcare)

Who it applies to:

  • Healthcare providers (hospitals, clinics, doctors)
  • Health plans (insurers)
  • Healthcare clearinghouses
  • Business associates (any vendor handling PHI)

What it requires:

  • Privacy Rule: Limits use and disclosure of PHI
  • Security Rule: Administrative, physical, and technical safeguards
  • Breach Notification Rule: Report breaches within 60 days

Penalties: $100 to $50,000 per violation, up to $1.5 million per year per violation category. Criminal penalties possible.

GDPR (European Union)

Who it applies to:

  • Any organization processing data of EU residents
  • Applies regardless of where your company is located

What it requires:

  • Lawful basis for processing (consent, contract, legal obligation, etc.)
  • Data minimization — collect only what you need
  • Right to access, rectification, erasure (“right to be forgotten”)
  • 72-hour breach notification to authorities
  • Data Protection Impact Assessments for high-risk processing

Penalties: Up to 4% of global annual revenue or €20 million, whichever is higher.

CCPA/CPRA (California)

Who it applies to:

  • Businesses with $25M+ annual revenue, OR
  • Businesses that buy/sell data of 100,000+ California residents, OR
  • Businesses deriving 50%+ of revenue from selling personal information

What it requires:

  • Right to know what data is collected
  • Right to delete personal information
  • Right to opt-out of data sales
  • Non-discrimination for exercising rights

Penalties: $2,500 per unintentional violation, $7,500 per intentional violation.

Other Key Regulations

RegulationJurisdictionFocus
GLBAUSFinancial institutions — customer financial data
FERPAUSEducational institutions — student records
PCI DSSGlobalPayment card data (not a law, but contractually required)
LGPDBrazilSimilar to GDPR for Brazilian residents
POPIASouth AfricaPersonal information protection
State Privacy LawsUS (VA, CO, CT, UT, etc.)Growing patchwork of state-level requirements

Which Regulations Apply to You?

Ask these questions:

  1. Do you handle health information? → HIPAA likely applies
  2. Do you have EU customers or employees? → GDPR applies
  3. Do you have California customers? → CCPA/CPRA likely applies
  4. Do you process payments? → PCI DSS applies
  5. Are you a financial institution? → GLBA applies
  6. Are you an educational institution? → FERPA applies

Most mid-to-large organizations are subject to multiple overlapping regulations.


Part 3: Building a PII/PHI Compliance Program

Step 1: Know What Data You Have (Data Discovery)

You can’t protect what you don’t know about. Start with a comprehensive data inventory:

  • Where does sensitive data live? File shares, databases, cloud storage, email, endpoints
  • What types of sensitive data exist? SSNs, medical records, credit cards, credentials
  • How much do you have? Volume matters for breach notification
  • Who has access? Map data to users and systems

Common blind spots:

  • Legacy file shares with years of accumulated data
  • Developer test environments with production data copies
  • Employee desktops and laptops
  • Email attachments
  • Backup archives

Tool tip: Risk Finder scans your environment with 150+ classifiers to identify PII, PHI, PCI, and other sensitive data — with exact file paths and counts.

Choosing the Right Discovery Tool

You’ll hear a lot about DSPM (Data Security Posture Management) platforms. These are comprehensive solutions that combine data discovery, classification, access monitoring, and policy enforcement into one platform.

DSPM tools are solid solutions for enterprises that need ongoing posture management across complex cloud environments. However, they come with significant tradeoffs:

  • Cost: DSPM platforms typically charge per-GB or per-data-source, which can quickly reach six figures annually for large environments
  • Complexity: Full DSPM deployment can take months and requires dedicated resources
  • Overkill for many use cases: If you just need to confirm whether you have sensitive data and where it lives, a full DSPM platform may be unnecessary

When DSPM makes sense: Large enterprises with complex multi-cloud environments needing continuous posture monitoring, access governance, and policy enforcement.

When simpler tools work better: Organizations that need to answer basic questions first — “Do we have PII? Where is it? How much?” — before investing in a full platform. Flat-rate scanning tools let you get answers quickly without budget uncertainty.

On-Premises and Air-Gapped Scanning

For organizations with strict data residency requirements or air-gapped environments, cloud-based scanning tools aren’t an option. Look for solutions that:

  • Run entirely on-premises via Docker container or VM
  • Process data locally with zero egress
  • Work in air-gapped networks with no internet connectivity
  • Generate reports locally without sending data externally

Security-first approach: Your sensitive data should never leave your environment during scanning. On-premises deployment ensures compliance with data residency requirements and eliminates cloud security concerns.

Step 2: Classify and Prioritize

Not all data carries equal risk. Classify by:

Sensitivity level:

  • Critical: SSN, medical records, financial account numbers
  • High: Driver’s license, passport, biometric data
  • Medium: Email, phone, date of birth
  • Low: Name, job title, business address

Regulatory category:

  • HIPAA-covered PHI
  • PCI cardholder data
  • GDPR personal data
  • State-specific PII

Risk score: Combine sensitivity with exposure (is it encrypted? who has access? where is it stored?)

Step 3: Implement Controls

Technical controls:

  • Encryption at rest and in transit
  • Access controls (role-based, least privilege)
  • Data loss prevention (DLP)
  • Audit logging
  • Secure deletion
  • Data redaction for documents that must be shared

Data Redaction: When You Need to Share Documents

Sometimes you need to share documents that contain sensitive data — for legal discovery, audits, or business processes. Data redaction permanently removes or masks sensitive information while preserving the rest of the document.

Common redaction scenarios:

  • Responding to Data Subject Access Requests (DSARs) under GDPR/CCPA
  • Legal discovery with PII/PHI removed
  • Sharing reports with third parties
  • Anonymizing datasets for analytics

Redaction best practices:

  • Use automated tools to identify all sensitive data before redaction
  • Verify redaction is permanent (not just a black box overlay that can be removed)
  • Maintain audit logs of what was redacted and when
  • Consider whether redaction or deletion is more appropriate

Administrative controls:

  • Data handling policies
  • Employee training
  • Vendor management (Business Associate Agreements for HIPAA)
  • Incident response procedures

Physical controls:

  • Secure facilities
  • Device management
  • Clean desk policies

Step 4: Monitor Continuously

Compliance isn’t a one-time project. Data moves constantly:

  • New files created daily
  • Employees copy data to new locations
  • Cloud storage proliferates
  • Migrations leave residue

What to monitor:

  • New sensitive data appearing in unauthorized locations
  • Access pattern anomalies
  • Policy violations
  • Configuration drift

Best practice: Schedule regular scans (weekly or monthly) to catch compliance drift before auditors do.

Step 5: Prepare for Incidents

When (not if) a breach occurs:

  1. Detection: How will you know? Monitoring, alerts, reports
  2. Assessment: What data was exposed? How many individuals affected?
  3. Containment: Stop the bleeding
  4. Notification: HIPAA (60 days), GDPR (72 hours), state laws (varies)
  5. Remediation: Fix the root cause
  6. Documentation: Audit trail for regulators

Pre-breach preparation:

  • Know where your sensitive data is (makes impact assessment faster)
  • Have notification templates ready
  • Know your regulatory timelines
  • Have legal/PR contacts established

Part 4: Common Compliance Mistakes

Mistake 1: Assuming IT Knows Where All the Data Is

IT manages systems. They don’t always know what’s inside them. Business users create and copy data constantly without IT involvement.

Fix: Scan your entire environment, not just the systems IT manages.

Mistake 2: Treating Compliance as a Checkbox

Passing an audit doesn’t mean you’re secure. Audits sample. They don’t see everything.

Fix: Focus on actual risk reduction, not just audit preparation.

Mistake 3: Ignoring Unstructured Data

Databases are easy to inventory. The terabytes of files on shared drives, email servers, and cloud storage? That’s where sensitive data hides.

Fix: Scan unstructured data sources, not just databases.

Mistake 4: One-Time Assessments

A point-in-time assessment becomes stale immediately. Data moves every day.

Fix: Continuous or regular scanning, not annual assessments.

Mistake 5: Underestimating Scope

“We don’t have much sensitive data” is almost always wrong. Organizations routinely discover 10x more sensitive data than expected.

Fix: Scan everything before making assumptions.


Part 5: Quick-Start Compliance Checklist

Use this checklist to assess your current state:

Data Discovery

  • Inventoried all data storage locations (file shares, databases, cloud, endpoints)
  • Scanned for PII/PHI across all locations
  • Documented data types, volumes, and locations
  • Identified data owners for each sensitive dataset

Policy & Governance

  • Written data handling policies
  • Defined data classification scheme
  • Established retention schedules
  • Created incident response plan

Technical Controls

  • Encryption at rest for sensitive data
  • Encryption in transit (TLS/HTTPS)
  • Access controls implemented (least privilege)
  • Audit logging enabled
  • DLP tools deployed (if applicable)

Training & Awareness

  • Annual security awareness training
  • Role-specific training for data handlers
  • Phishing awareness program

Vendor Management

  • Inventory of vendors with data access
  • Business Associate Agreements (for HIPAA)
  • Data Processing Agreements (for GDPR)
  • Vendor security assessments

Monitoring & Response

  • Regular scanning schedule (weekly/monthly)
  • Alerting for policy violations
  • Incident response team identified
  • Notification templates prepared

Next Steps

  1. Start with discovery: You can’t protect what you don’t know about. Try our free scanner to see what’s hiding in a sample file.

  2. Get a complete inventory: Deploy Risk Finder to scan your entire environment — flat-rate pricing means no surprises.

  3. Build your program: Use this guide as a foundation, then customize for your specific regulatory requirements.


Additional Resources


Have questions about PII/PHI compliance? Contact us or try the free scanner to see Risk Finder in action.

Back to Blog

Related Posts

View All Posts »

Health Records Management – Exact Data Matching Use Case

Protecting sensitive information is paramount for maintaining competitive advantage, ensuring regulatory compliance, and preserving customer trust. Exact Data Matching (EDM) has emerged as a powerful tool in the arsenal of data protection strategies, offering precision and efficiency in identifying and securing important data assets.

How "Classification Intelligence" enables Risk Management

Organizations face an ever-evolving landscape of cyber threats and regulatory scrutiny. The global average cost of a data breach in 2024 is $4.88M, IBM highlights in the 2024 Cost of Data Breach. Effective and accurate data classification has emerged as a critical strategy for enterprises to manage risks, enhance security posture, and build resilience.

Probabilistic Models For Data Classification

In the era of digital transformation, organizations amass an unprecedented volume of data, which often includes both regulated data (PII, SOX, HIPAA, CCPA, UCPA, etc) and valuable intellectual property (IP). Ensuring the visibility and proper classification of this data is crucial for compliance, risk management, and safeguarding corporate assets.

STOP! Is Your File Hiding Sensitive Data?

We have a FREE tool that can scan any file you use and instantly flag violations. Find out what sensitive data is lurking in your files before it becomes a problem.