Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microsoft Presidio DLP support #3

Merged
merged 4 commits into from
Aug 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 149 additions & 3 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 5 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "redacter"
version = "0.2.0"
version = "0.3.0"
edition = "2021"
authors = ["Abdulla Abdurakhmanov <[email protected]>"]
license = "Apache-2.0"
Expand All @@ -18,7 +18,8 @@ description = "Copy & Redact files cli tool utilizing Data Loss Prevention (DLP)
default = []
ci-gcp = [] # For testing on CI/GCP
ci-aws = [] # For testing on CI/AWS
ci = ["ci-gcp", "ci-aws"]
ci-ms-presidio = [] # For testing on CI/MS Presidiom
ci = ["ci-gcp", "ci-aws", "ci-ms-presidio"]


[dependencies]
Expand Down Expand Up @@ -48,6 +49,8 @@ csv-async = { version = "1", default-features = false, features = ["tokio", "tok
aws-config = { version = "1", features = ["behavior-version-latest"] }
aws-sdk-s3 = { version = "1" }
aws-sdk-comprehend = { version = "1" }
url = "2"
reqwest = { version = "0.12", features = ["multipart", "h2"] }


[dev-dependencies]
Expand Down
30 changes: 26 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,16 @@ Google Cloud Platform's DLP API.
* Amazon Simple Storage Service (S3)
* Zip files
* **DLP Integration:**
* GCP DLP API for accurate and customizable redaction for:
* [Google Cloud Platform DLP](https://cloud.google.com/security/products/dlp?hl=en) for accurate and customizable
redaction for:
* text, html, json files
* structured data table files (csv)
* images (jpeg, png, bpm, gif)
* AWS Comprehend PII redaction for text files.
* [AWS Comprehend](https://aws.amazon.com/comprehend/) PII redaction for text files.
* [Microsoft Presidio](https://microsoft.github.io/presidio/) for PII redaction (open source project that you can
install on-prem).
* text, html, json files
* images
* ... more DLP providers can be added in the future.
* **CLI:** Easy-to-use command-line interface for streamlined workflows.
* Built with Rust to ensure speed, safety, and reliability.
Expand Down Expand Up @@ -58,7 +63,7 @@ Options:
-f, --filename-filter <FILENAME_FILTER>
Filter by name using glob patterns such as *.txt
-d, --redact <REDACT>
Redacter type [possible values: gcp-dlp, aws-comprehend-dlp]
Redacter type [possible values: gcp-dlp, aws-comprehend, ms-presidio]
--gcp-project-id <GCP_PROJECT_ID>
GCP project id that will be used to redact and bill API calls
--allow-unsupported-copies
Expand All @@ -69,6 +74,10 @@ Options:
CSV delimiter (default is ','
--aws-region <AWS_REGION>
AWS region for AWS Comprehend DLP redacter
--ms-presidio-text-analyze-url <MS_PRESIDIO_TEXT_ANALYZE_URL>
URL for text analyze endpoint for MsPresidio redacter
--ms-presidio-image-redact-url <MS_PRESIDIO_IMAGE_REDACT_URL>
URL for image redact endpoint for MsPresidio redacter
-h, --help
Print help
```
Expand All @@ -91,12 +100,19 @@ Source/destination can be a local file or directory, or a file in GCS, S3, or a
To be able to use GCP DLP you need to authenticate using `gcloud auth application-default login` or provide a service
account key using `GOOGLE_APPLICATION_CREDENTIALS` environment variable.

### AWS Comprehend DLP
### AWS Comprehend

To be able to use AWS Comprehend DLP you need to authenticate using `aws configure` or provide a service account.
To provide an AWS region use `--aws-region` option since AWS Comprehend may not be available in all regions.
AWS Comprehend DLP is only available for unstructured text files.

### Microsoft Presidio

To be able to use Microsoft Presidio DLP you need to have a running instance of the Presidio API.
You can use Docker to run it locally or deploy it to your infrastructure.
You need to provide the URLs for text analysis and image redaction endpoints using `--ms-presidio-text-analyze-url` and
`--ms-presidio-image-redact-url` options.

## Examples:

```sh
Expand Down Expand Up @@ -128,6 +144,12 @@ and/or by size:
redacter cp -m 1024 ...
```

MS Presidio redacter:

```sh
redacter cp -d ms-presidio --ms-presidio-text-analyze-url http://localhost:5002/analyze --ms-presidio-image-redact-url http://localhost:5003/redact ...
```

## Security considerations

- Your file contents are sent to the DLP API for redaction. Make sure you trust the DLP API provider.
Expand Down
Loading