Connector Developer Guide

This guide describes how developers can write new connectors for Presidio. Connectors allow users to anonymize their data between different data sources. This guide will review a few key Connector concepts and then describe how to create connectors and schedule the anonymization process between them.

Core concepts

To anonymize data between one system and another, define Connectors for the systems to pull the data from or push data to. There are two types of Connectors:

Source - data source from which data is read.
Sink - anonymized data output target.

After having both sink and source connectors defined, the next step is to trigger the Scheduler. The Scheduler allows you to read your data periodically from the Source Connector, anonymize it and export it to Sink Connector.

How to Create New Data Source Connector

The project already supports sources like object storage and streams. For similar sources and code reuse, or creating a new source follow the steps in how to create:

Object Storage Source Connector
Stream Source Connector
New Source Connector

Object Storage Source Connector

Object storage is a computer data storage architecture that manages data as objects. Currently the following object storage services are supported: - AWS S3 - Azure Blob Storage - Azure Data Lake Storage Gen2

To implement a new object storage source connector and reuse existing code follow the next steps:

Implement the scanner interface - The scanner interface is representing a storage data source. It requires implementation of three methods:
1. Init - initialize connection to the storage data source.
2. ScanFunc - the function executed on the scanned item
3. Scan(fn ScanFunc) - function that walks over the items to scan and executes ScanFunc on each item
Implement the scanner interface using a struct under presidio-collector/cmd/presidio-collector/scanner. For example see: storage scanner
Add configuration - create templates for source and object storage configurations and service definitions, on the presidio-genproto repo.
Add to Factory - Add your newly created scanner in the storage factory CreateScanner method.

Stream Source Connector

A stream is data that is continuously generated by different sources.

To implement a new stream source connector and reuse existing code follow the next steps:

Implement the stream interface - The stream interface is representing a stream data source. It requires implementation of three methods:
1. ReceiveFunc - define how received event should be processed
2. Receive - read event from stream.
3. Send - send event to stream. Implement the stream interface using a struct under pkg/stream. For example see: kafka
Add configuration - create templates for source and stream configurations and service definitions, on the presidio-genproto repo.
Add to Factory - Add your newly created stream under streams factory CreateStream method.

New Source Connector

Creation of additional types of source connector are more than welcome. To implement a new type of data Source Connector follow the next steps:

Create implementation - Create a new directory under pkg and implement the new data source connector. you can take example from the existing connectors.
Create configuration - create templates for source and any additional configurations and service definitions, on the presidio-genproto repo.
Create Factory - Create a new directory under presidio-collector/cmd/presidio-collector/ and add a factory like the factories for other connectors types.

How to Create New Data Sink

All data sinks need to implement the Datasink interface. The Datasink interface represents the different data output types. It requires implementation of following methods:

Init - how to initialize the datasink.
WriteAnalyzeResults - write the analyzer results to the specified datasink.
WriteAnonymizeResults - write the anonymized response to the specified datasink.

The project already supports sinks like object storage, database and streams. For similar sinks and code reuse, or creating a new sink follow the steps in how to create:

Object Storage Sink Connector
Stream Sink Connector
DB Sink Connector
New Sink Connector

Object Storage Sink Connector

Creating an object storage sink connector is similar to object storage source connector. In addition to steps listed in storage source connector, implement the following functions in you storage scanner class:

CreateContainer(name string) - create container/bucket reference
New(kind string, config stow.Config, concurrencyLimit int) - initialize a new storage instance.
PutItem(name string, content string, container stow.Container) - write a new item to the container For example see: storage

Stream Sink Connector

Creating a stream sink connector is similar to stream source connector. In addition to steps listed in stream source connector:

Implement producer - To allow message to be written to the stream define NewProducer method in your stream struct.
Add to Factory - Add your newly created stream sink under stream New method.

DB Sink Connector

Presidio supports the following DBs:

MSSQL
MySQL
PostgreSQL

To support additional databases:

Implement datasink interface - by creating a new struct under presidio-datasink/cmd/presidio-datasink/database. For example see: database
Add to Factory - Add your newly created DB sink under datasink-factory

New Sink Connector

Creation of additional types of sink connector are more than welcome. To implement a new type of data Sink Connector follow the next steps:

Implement required writing capabilities - by creating new directory under pkg and adding a struct implementing the required writing capabilities.
Create configuration - create templates for source and any additional configurations and service definitions, on the presidio-genproto repo.
Implement datasink interface - by creating a new directory under presidio-datasink/cmd/presidio-datasink and a struct which implements datasink interface methods.
Add to Factory - Add your newly created sink under datasink-factory

Scheduler

The Scheduler allows you to read your data periodically from the Source Connector, anonymize it and export it to Sink Connector.

For more information about scheduler usage: scheduler readme.

Currently we support object storage and streams scheduling.

To support additional source and sinks in scheduler follow the next steps:

Create configuration - create any additional configurations, templates and service definitions, on the presidio-genproto repo as follows:
- CronJobApiRequest - represents the request to the API HTTP service
- CronJobRequest - represents the request to the scheduler service via GRPC.
- CronJobResponse- represents the response from the scheduler service.
- Support the new job in the init() method.
- Add Apply method to SchedulerServiceClient to trigger a new scanning cron job and return if it was triggered successfully.
- Create a Handler for to handle job request.
- Add the newly added Apply method and the Handler to _SchedulerService_serviceDesc. Use existing storage scanner/stream as an example.
Create job - Create new job directory under presidio-api/cmd/presidio-api/api and create job implementation, how it should be scheduled and etc. For example see: scanner-job
Support new cron job creation to existing API - Add newly created job in methods validateTemplate method.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tutorial_connector.md

tutorial_connector.md

Connector Developer Guide

Core concepts

How to Create New Data Source Connector

Object Storage Source Connector

Stream Source Connector

New Source Connector

How to Create New Data Sink

Object Storage Sink Connector

Stream Sink Connector

DB Sink Connector

New Sink Connector

Scheduler

Files

tutorial_connector.md

Latest commit

History

tutorial_connector.md

File metadata and controls

Connector Developer Guide

Core concepts

How to Create New Data Source Connector

Object Storage Source Connector

Stream Source Connector

New Source Connector

How to Create New Data Sink

Object Storage Sink Connector

Stream Sink Connector

DB Sink Connector

New Sink Connector

Scheduler