Is OpenRedaction free?

Yes, the regex-only library is free and open-source. You can use it locally without any external services. AI-assist via the hosted API has a free tier with IP-based limits, and a Pro tier with 50,000 requests/month for £9/month.

What is the difference between regex mode and AI mode?

Regex mode uses 500+ tested patterns for fast, deterministic PII detection. It works completely offline and is free. AI mode is optional and uses a hosted AI proxy for better detection on messy or unstructured text. AI mode requires the hosted API and has usage limits.

Does OpenRedaction store my data?

No, the hosted API is stateless and does not store raw text. Processing happens in memory and data is discarded after redaction. Minimal metadata may be logged for rate limiting. For complete privacy, you can self-host the open-source library.

What is the API limit?

Free tier: IP-based rate limiting with fair-use limits (approximately 200 AI-assist requests per day). Pro tier: 50,000 AI-assist requests per month with an API key. Regex-only redaction has no limits and works completely offline.

What Is PII? A Plain-English Guide for Developers and Pro...

PII is any data that can identify someone, directly like names or emails, or indirectly when combined with other information. You should treat both direct and indirect identifiers with strict access controls, data minimization, and strong auditing. Understand that PII is distinct from sensitive data and from anonymized data, which carries its own risk of re-identification. Expect logs, tickets, and databases to surface PII unless you've redacted or minimized. If you keep going, you'll uncover practical steps to control it.

Intro: Why PII matters for modern apps

PII matters in modern apps because your users' data powers features, trust, and growth—and mishandling it introduces real risk: regulatory penalties, lost customers, and damaged reputation. You'll want clarity on PII basics so you can build safer products. Think of data privacy as a dev-wide responsibility, not a one-off checklist. Your code, pipelines, and storage choices should minimize exposure, limit access, and enable quick incident response. Follow developers guidance that emphasizes least privilege, encryption at rest and in transit, and audit trails. Apply risk-based tagging to data, classify sensitive fields, and document how data flows across systems. When you design features, assume data is sensitive until proven otherwise, and test for privacy risks early. This mindset supports compliance, trust, and sustainable growth.

Definition of PII with concrete examples

Understanding PII means knowing which pieces of data can identify or reveal an individual when combined with other information. In practice, PII includes direct identifiers like names, email addresses, and phone numbers, plus indirect ones such as IP addresses, user IDs, and device identifiers that can be linked to a person. You'll assess risk by asking: does this data point alone identify someone, or could it? If yes, it's pii. Personal data spans broader categories like birthdates, locations, and biometrics, especially when linked with account data or behavior patterns. You must document retention, access controls, and encryption needs for these elements to uphold data privacy. Treat any collection, processing, or sharing of pii with care, aligning with policy, consent, and least-privilege principles.

Direct vs indirect identifiers

Direct identifiers point to a specific person on sight—names, email addresses, phone numbers, and other data you'd recognize in a moment. In practice, you'll separate PII into what you can directly link to a person and what requires context. Direct identifiers allow immediate reidentification, increasing risk if exposed. Indirect identifiers, by contrast, don't name a person but, when combined with other data, can pinpoint someone. Examples include ZIP codes, birth year, device IDs, or disparate location patterns. Your risk management hinges on understanding both types: catalog them, minimize storage, and apply access controls. Treat indirect identifiers as sensitive when combined with other data. Build privacy-by-design checks into data flows, map data lineage, and document retention. This distinction guides compliant handling of PII across systems.

PII vs sensitive data vs anonymised data

Curious about the distinctions among PII, sensitive data, and anonymised data? You should know these categories guide risk, controls, and compliance. PII refers to identifiers that can directly or indirectly reveal a person, such as names or emails, and may require heightened protection. Sensitive data covers attributes that, if exposed, could cause harm or violate policies, like medical history or financial details. Anonymised data has had identifiers removed to prevent re-identification, but its safety rests on the robustness of the masking and the lack of linkage risks. Treat PII with stricter controls, enforce access limits, and log handling steps. Use data minimisation, purpose limitation, and regular risk reviews to ensure you stay compliant while enabling legitimate use of data. Remember to assess re-identification risk before sharing anonymised data.

How PII shows up in logs, tickets and databases

Logs, tickets, and databases are common places PII slips through if you're not watching closely. You'll see direct identifiers like names, emails, and phone numbers in error messages, support chats, and API traces, plus indirect data that fingerprints someone, such as user IDs tied to activity. PII logs can accumulate when verbose logging isn't tuned for production. To limit exposure, practice data minimization: log only what you need, redact sensitive fields, and truncate long values. Enforce data access controls so only authorized teammates can view logs or tickets containing PII. Regularly review schemas, sweepers, and retention rules to remove stale data. Tie logging practices to a risk-aware mindset, documenting who can access what, and why. This keeps operations efficient without compromising privacy or compliance.

Regulatory angles (GDPR, CCPA etc.) in brief

Are you compliant by design or by chance? In this brief regulatory angle, you'll connect PII handling to big-picture rules. GDPR shapes data rights, accountability, and risk scoring, so you should map data flows, minimize collection, and document consent. CCPA adds consumer rights and transparency, influencing UI prompts, data access, and deletion timelines. You'll implement privacy by design—embedding safeguards from the outset rather than retrofitting later. Treat PII as a legal asset with audit-ready trails, access controls, and data minimization baked into product specs. You'll align retention, security, and breach notification with regulatory expectations, using clear roles and responsibilities. Remember: compliant design reduces legal exposure, speeds launches, and builds user trust.

Practical checklist: spotting PII in your product

Spotting PII in your product starts with a practical mindset: identify where data could be tied to an individual, then trace how it flows from input to storage and use. You'll build a practical checklist that centers on PII identification, focusing on fields, logs, analytics, and third-party integrations. Map data ownership and access: who can see or modify PII, and where it resides during transit and at rest. Apply data minimization by questioning necessity, reducing retention, and masking sensitive elements where feasible. Audit user input channels, APIs, and telemetry for identifiers, contact details, or behavioral traces. Embrace user privacy practices by default, documenting decisions, and flagging potential risks for remediation. This disciplined approach sustains compliance, lowers risk, and clarifies responsibilities.

Next steps: documenting and controlling PII

How can you nail down PII handling across your product workflow? You establish PII governance as a living practice, not a one-off checklist. Map data flows end to end so you know where PII enters, where it moves, and where it exits. Create a data inventory that's periodically refreshed, with owners and retention timelines clearly defined. Implement data protection controls that align with risk levels—encryption, access gates, and audit trails for sensitive fields. Document procedures for data minimization, purpose limitation, and breach response, and train teams on them. Use lightweight policies that scale with your product. Regularly review policies, update data mappings, and enforce accountability to sustain compliant, privacy-minded product development.

Conclusion

You've got data, and with it comes responsibility. Track what you collect, where it flows, and who accesses it. Classify PII clearly, separate direct and indirect identifiers, and treat sensitive data with extra protections. Embed privacy into every feature—logs, tickets, databases—so data stays controllable, not accidental. Stay aligned with GDPR, CCPA, and related rules, document decisions, and enforce least privilege. A proactive, risk-aware mindset today prevents costly issues tomorrow.

Ready to get started?

Read Understanding PII Detection for a deeper dive
Learn about PII Detection for AI workflows
Check out our PII Detection guide for practical implementation
Get in touch if you have questions or need help

What Is PII? A Plain-English Guide for Developers and Product Teams