Context aware, pluggable and customizable PII anonymization service for text and images.
Presidio (Origin from Latin praesidium ‘protection, garrison’) helps to ensure sensitive text is properly managed and governed. It provides fast analytics and anonymization for sensitive text such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers and financial data. Presidio analyzes the text using predefined or custom recognizers to identify entities, patterns, formats, and checksums with relevant context. Presidio leverages docker and kubernetes for workloads at scale.
Presidio can be integrated into any data pipeline for intelligent PII scrubbing. It is open-source, transparent and scalable. Additionally, PII anonymization use-cases often require a different set of PII entities to be detected, some of which are domain or business specific. Presidio allows you to customize or add new PII recognizers via API or code to best fit your anonymization needs.
Try Presidio with your own data
API Spec - available APIs, request and response formats.
Presidio REST API Open API Spec
- Simple Text Analysis
- Create Reusable Templates
- Detect Specific Entities
- Custom Anonymization
- Add Custom PII Entity Recognizer
- Image Anonymization
More information can be found in Presidio Documentation
- Supported field types
- Database and storage scanner
- Adding new PII recognizers
- Generating Swagger file
- Evaluating Presidio
- Proto packages for Presidio API
Follow the Deployment Guidelines for details:
- Single click deployment on a Kubernetes Cluster
- Step by Step Deployment with customizable parameters on a Kubernetes Cluster
- Setting Up a Development Environment
- Adding Custom Fields
- Recognizers Development - Best Practices and Considerations
- Using the Analyzer Service
- Calling the different services
- Connector Developer Guide
- Deploy locally using Docker
- Deploy locally using KIND
- Presidio-Analyzer as a standalone python package
Module | Feature | Status |
---|---|---|
API | HTTP input | ✅ |
Scanner | MySQL | ❌ |
Scanner | MSSQL | ❌ |
Scanner | PostgreSQL | ❌ |
Scanner | Oracle | ❌ |
Scanner | Azure Blob Storage | ✅ |
Scanner | S3 | ✅ |
Scanner | Google Cloud Storage | ❌ |
Streams | Kafka | ✅ |
Streams | Azure Event Hub | ✅ |
Datasink (output) | MySQL | ✅ |
Datasink (output) | MSSQL | ✅ |
Datasink (output) | Oracle | ❌ |
Datasink (output) | PostgreSQL | ✅ |
Datasink (output) | Kafka | ✅ |
Datasink (output) | Azure Event Hub | ✅ |
Datasink (output) | Azure Blob Storage | ✅ |
Datasink (output) | S3 | ✅ |
Datasink (output) | Google Cloud Storage | ❌ |
- ✅ - Working
- 🔶 - Partially supported (alpha)
- ❌ - Not supported yet
If you have a usage question, found a bug or have a suggestion for improvement, please file a Github issue. For other matters, please email [email protected]
❗ Note: As we are in the process of defining the roadmap for Presidio, we will only accept PRs with bug fixes and no new features in the upcoming months.
For details on contributing to this repository, see the contributing guide.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.