-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add logging to linkage process and consolidate FHIR link routes #135
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #135 +/- ##
==========================================
+ Coverage 95.47% 96.12% +0.64%
==========================================
Files 31 31
Lines 1414 1392 -22
==========================================
- Hits 1350 1338 -12
+ Misses 64 54 -10 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think logging will help observability and explainability! Left a few questions/suggestions
LOGGER.info( | ||
"cluster belongingness", | ||
extra={ | ||
"ratio": belongingness_ratio, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"ratio": belongingness_ratio, | |
"belongingness_ratio": belongingness_ratio, |
"person.reference_id": person.reference_id, | ||
"matched": matched_count, | ||
"total": len(patients), | ||
"algorithm.ratio_lower": belongingness_ratio_lower_bound, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"algorithm.ratio_lower": belongingness_ratio_lower_bound, | |
"algorithm.belongingness_ratio_lower": belongingness_ratio_lower_bound, |
"matched": matched_count, | ||
"total": len(patients), | ||
"algorithm.ratio_lower": belongingness_ratio_lower_bound, | ||
"algorithm.ratio_upper": belongingness_ratio_upper_bound, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"algorithm.ratio_upper": belongingness_ratio_upper_bound, | |
"algorithm.belongingness_ratio_upper": belongingness_ratio_upper_bound, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few questions. Thanks for your detailed responses in the discussions!
results: list[LinkResult] = [ | ||
LinkResult(k, v) for k, v in sorted(scores.items(), reverse=True, key=lambda i: i[1]) | ||
] | ||
result_counts["above_lower_bound"] = len(results) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this will only return a value > 1 if include_multiple_matches=true
. Is that designed as intended?
"person.reference_id": matched_person and matched_person.reference_id, | ||
"patient.reference_id": patient.reference_id, | ||
"result.prediction": prediction, | ||
"result.count_patients_compared": result_counts["patients_compared"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also log the number of Person clusters compared? That would put into context the number of Patients compared.
is_match = matching_rule(results, **kwargs) | ||
details["rule.results"] = is_match | ||
# TODO: this may add a lot of noise, consider moving to debug | ||
LOGGER.info("patient comparison", extra=details) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is being logged here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think I see - the result of the match rule (true or false for match or no match, respectively). I was thinking more of the log odds score, similarity score, log odds ratio and fuzzy threshold as helpful metrics to log in regards to feature comparison here, to make more explainable why a prediction was made. What do you think?
Also, what do you think about logging the match rule being used alongside this match rule return value? That way, there's more context to interpret the true/false result (e.g., on what matching criteria this result was generated).
Description
Add logging to link.py to capture details of linkage process.
Related Issues
closes #131
closes #130
Additional Notes
<--------------------- REMOVE THE LINES BELOW BEFORE MERGING --------------------->
Checklist
Please review and complete the following checklist before submitting your pull request:
Checklist for Reviewers
Please review and complete the following checklist during the review process: