Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instance anonymization #17

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,17 @@ def data(self) -> MappingDataType:
return {k: dict(v) for k, v in self.mapping.items()}

def update(self, new_mapping: MappingDataType) -> None:
"""Update the deanonymizer mapping with new values
Duplicate values will not be added
maks-operlejn-ds marked this conversation as resolved.
Show resolved Hide resolved
"""
new_values_seen = set()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this? Due to the fact that the same value can appear for different entity_type?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of default recognizers, not likely, but who knows what users will come up with when adding their own 😛


for entity_type, values in new_mapping.items():
self.mapping[entity_type].update(values)
for k, v in values.items():
# Make sure it is not a duplicate value
if (
v not in self.mapping[entity_type].values()
and v not in new_values_seen
):
self.mapping[entity_type][k] = v
new_values_seen.update({v})
Original file line number Diff line number Diff line change
Expand Up @@ -282,7 +282,12 @@ def _anonymize(self, text: str, language: Optional[str] = None) -> str:
text, filtered_analyzer_results, anonymizer_results
)

return anonymizer_results.text
anonymizer_mapping = {
key: {v: k for k, v in inner_dict.items()}
for key, inner_dict in self.deanonymizer_mapping.items()
}
maks-operlejn-ds marked this conversation as resolved.
Show resolved Hide resolved

return default_matching_strategy(text, anonymizer_mapping)

def _deanonymize(
self,
Expand Down