-
-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change de-identification method for Sensitive ID export fields #35772
base: master
Are you sure you want to change the base?
Conversation
b774853
to
5a9f740
Compare
Ensures it can be transformed to xml
hashed_value = models.TextField(max_length=32) | ||
deid = models.UUIDField(default=uuid.uuid4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had trouble coming up with good names for these fields, with fairly little context on export code in general. Open to alternatives here. Maybe something closer to hashed_item
and deid_value
?
if value is None: | ||
# None is a de-identifiable value but needs a string to encode for lookup | ||
value = '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a meaningful difference between an empty string and None
in data this would be de-identifying.
Previously you would get the same output for the both None
and the string 'None'
and this seems better to me, but if there's any reason to retain the previous behavior I can just call str(value)
instead.
78eab85
to
29a676a
Compare
Product Description
This will result in a change to the appearance and a one-time shift of values for de-identified "Sensitive ID" fields in exports. A previous field that might have looked like "PG42OH9AH5" will now read as a UUID like "3754f1a2-7eb1-4e6b-8640-65d7550c9bbc".
Technical Summary
SAAS-16366
The main concept here is that, instead of using a consistent hashing algorithm for DeID'd fields, we'll create and store a UUID that can be looked up for any given value and domain. This way, the value will still stay consistent over time but cannot be determined by simply re-running the hashing process on a value. This is represented by
DeIdMapping
(also open to other suggestions on naming).In order to add a model in the
export
app that could be used in the same place as other transform functions, I had to tweak how we access those functions inexport/const.py
. Previously these functions were all imported directly, but (perhaps predictably), adding a reference to a model from a django app within that app's own constants, would have resulted in circular imports. This is dealt with by addingget_transform_function
andget_deid_transform_function
toexport/utils.py
, which import the transform function at runtime usingimportlib
. There may be a slight negative performance impact to this choice -- not sure if this is a major concern or how to mitigate, if so.Safety Assurance
Safety story
This is a change that interacts with core export functionality, so there is some risk here. I've tested locally to see that marked fields are DeID'd correctly and that there is no other change to given exports. There is decent automated test coverage of the
export
app as a whole and would expect that to catch any major functional issues.Automated test coverage
Added a small test suite for behavior of the
get_deid
method ofDeIdMapping
. TheDEID_ID_TRANSFORM
that now points to the new method is implicitly tested in a few places throughouthqcase/tests
.QA Plan
Technically this is not a huge change, but given the risk in changes to exports, I think it'll be worthwhile to have this get a pass by QA.
Migrations
Rollback instructions
Labels & Review