Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor base classes #2

Merged
merged 10 commits into from
Sep 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/tox.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ jobs:

strategy:
matrix:
python: ["3.10", "3.11"]
django: ["32", "40", "41", "42", "main"]
python: ["3.11"]
django: ["32", "40", "41", "42", "50", "main"]

env:
TOXENV: py${{ matrix.python }}-django${{ matrix.django }}
Expand Down
26 changes: 25 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,30 @@ up-to-date.
The anonymisation itself doesn't change - it's just shifting the code
around.

## Redaction vs. Anonymisation

This library contains two flavours of anonymisation - Redaction, and
Anonymisation. The two differ in how the data is overwritten:

Type | Implementation | Performance | Data
--- | --- | --- | ---
Redaction | SQL | Fast | Table level
Anonymisation | Python | Slow | Row level

### Redaction

Redaction is implemented as a single SQL `update` statement that wipes
an entire table in one go. It's very fast, but it's limited in the sense
that it cannot produce realistic data. In fact it may well render your
application unusable. It is recommended as the first step in data
anonymisation.

### Anonymisation

Anonymisation is an row-level operation that iterates over a
queryset and updates each object in turn. The main advantage is that
post-anonymisation you will have realistic, usable, data.

## Usage

As an example - this is a hypothetical User model's anonymisation today:
Expand All @@ -42,7 +66,7 @@ a new anonymiser that splits out each field:
```python
# anonymisers.py
@register_anonymiser
class UserAnonymiser(BaseAnonymiser):
class UserAnonymiser(ModelAnonymiser):
model = User

def anonymise_first_name(self, obj: User) -> None:
Expand Down
8 changes: 4 additions & 4 deletions anonymiser/decorators.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
from .models import BaseAnonymiser
from .registry import register
from .models import ModelAnonymiser
from .registry import register_model_anonymiser


def register_anonymiser(anonymiser: type[BaseAnonymiser]) -> type[BaseAnonymiser]:
def register_anonymiser(anonymiser: type[ModelAnonymiser]) -> type[ModelAnonymiser]:
"""Add {model: Anonymiser} to the global registry."""
register(anonymiser)
register_model_anonymiser(anonymiser)
return anonymiser
36 changes: 3 additions & 33 deletions anonymiser/management/commands/display_model_anonymisation.py
Original file line number Diff line number Diff line change
@@ -1,46 +1,16 @@
from typing import Any

from django.apps import apps
from django.core.management.base import BaseCommand
from django.db.models import ForeignObjectRel, Model
from django.template.loader import render_to_string

from anonymiser.models import FieldSummaryData
from anonymiser.registry import get_model_anonymiser
from anonymiser import registry


class Command(BaseCommand):
def get_models(self) -> list[type[Model]]:
"""Force alphabetical order of models."""
return sorted(apps.get_models(), key=lambda m: m._meta.label)

def get_fields(self, model: type[Model]) -> list:
"""Get model fields ordered by type and then name."""
return sorted(
[
f
for f in model._meta.get_fields()
if not isinstance(f, ForeignObjectRel)
],
key=lambda f: f.__class__.__name__ + f.name,
)

def handle(self, *args: Any, **options: Any) -> None:
model_fields: list[FieldSummaryData] = []
model_anonymisers: dict[str, str] = {}
for model in self.get_models():
model_name = model._meta.label
anonymiser = get_model_anonymiser(model)
anonymiser_name = anonymiser.__class__.__name__ if anonymiser else ""
model_anonymisers[model_name] = anonymiser_name
for f in self.get_fields(model):
is_anonymisable = False
if anonymiser:
is_anonymisable = anonymiser.is_field_anonymisable(f.name)
field_data = FieldSummaryData(f, is_anonymisable)
model_fields.append(field_data)
model_fields = registry.get_all_model_fields()
out = render_to_string(
"display_model_anonymisation.md",
{"model_anonymisers": model_anonymisers, "model_fields": model_fields},
{"model_fields": model_fields},
)
self.stdout.write(out)
Loading