Presidio supports custom fields using either online via a simple REST API or by adding new PII recognizers in code. The underlying object behind each field is called a 'Recognizer'. This documentation describes how to add new recognizers by API or code.
-
API
a. Getting a recoginzer
GET <api-service-address>/api/v1/analyzer/recognizers/<recognizer_name>
b. Getting all recognizers
GET <api-service-address>/api/v1/analyzer/recognizers
c. Creating a new recoginzer
POST <api-service-address>/api/v1/analyzer/recognizers/<new_recognizer_name>
{ "value": { "entity": "ROCKET", "language": "en", "patterns": [ { "name": "rocket-recognizer", "regex": "\\W*(rocket)\\W*", "score": 1 }, { "name": "projectile-recognizer", "regex": "\\W*(projectile)\\W*", "score": 1 } ] } }
Field Description Optional value
The recognizer json object no Recognizer format:
Field Description Optional entity
The name of the new field. e.g. 'ROCKET' no language
The supported language no patterns
A list of regular expressions objects yes blacklist
A list of words to be identified as PII entities e.g. ["Mr","Mrs","Ms","Miss"] yes contextPhrases
A list of words to be used for improving confidence, in case they are found in vicinity to an identified entity e.g. ["credit-card","credit","cc","amex"] yes A request should provide either
patterns
orblacklist
as input.Regular expression format:
Field Description Optional name
The name of this pattern no regex
A regular expression no score
The score given to entities detected by this recognizer no d. Update a recoginzer
PUT <api-service-address>/api/v1/analyzer/recognizers/<recognizer_name>
Payload is similar to the one described in
Creating new recognizer
.e. Delete a recoginzer
DELETE <api-service-address>/api/v1/analyzer/recognizers/<recognizer_name>
f. Using the custom field
After creating a new recognizer, either explicitly state in the templates the newly added entity name, or set allFields to true. For example:
i.
allFields=True
:echo -n '{"text":"They sent a rocket to the moon!", "analyzeTemplate":{"allFields":true} }' | http <api-service-address>/api/v1/projects/<my-project>/analyze
ii. Specifically define the recognizers to be used:
echo -n '{"text":"They sent a rocket to the moon!", "analyzeTemplate":{"fields":[{"name": "ROCKET"}]}}' | http <api-service-address>/api/v1/projects/<my-project>/analyze
-
Custom recognizer by code
Code based recognizers are written in Python and are a part of the presidio-analyzer module. The main modules in
presidio-analyzer
are theAnalyzerEngine
and theRecognizerRegistry
. TheAnalyzerEngine
is in charge of calling each requested recognizer. theRecognizerRegistry
is in charge of providing the list of predefined and custom recognizers for analysis.In order to implement a new recognizer by code, follow these two steps:
a. Implement the abstract recognizer class:
Create a new Python class which implements LocalRecognizer.
LocalRecognizer
implements the base EntityRecognizer class. All local recognizers run locally together with all other predefined recognizers as a part of thepresidio-analyzer
Python process. In contrast,RemoteRecognizer
is a placeholder for recognizers that are external to thepresidio-analyzer
service, for example on a different microservice.The
EntityRecognizer
abstract class requires the implementation the following methods:i. initializing a model. Occurs when the
presidio-analyzer
process starts:def load(self)
ii. analyze: The main function to be called for getting entities out of the new recognizer:
def analyze(self, text, entities, nlp_artifacts):
The
analyze
method should return a list of RecognizerResult. Refer to the code documentation for more information.b. Reference and add the new class to the
RecognizerRegistry
module, in theload_predefined_recognizers
method, which registers all code based recognizers.c. Note that if by adding the new recognizer, the memory or CPU consumption of the analyzer is expected to grow (such as in the case of adding a new model based recognizer), you should consider updating the pod's resources allocation in analyzer-deployment.yaml