Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add logging to linkage process and consolidate FHIR link routes #135

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

ericbuckley
Copy link
Collaborator

@ericbuckley ericbuckley commented Nov 19, 2024

Description

Add logging to link.py to capture details of linkage process.

Related Issues

closes #131
closes #130

Additional Notes

  • refactored link_router to calculate the "prediction" result in link.py, as logging that value will be valuable.
  • cleanup of try/except blocks in link_router to only return 422 on missing algorithm or bad FHIR bundle.
  • consolidated /link/fhir and /link/dibbs into one endpoint
  • updating splunk to send channel header

<--------------------- REMOVE THE LINES BELOW BEFORE MERGING --------------------->

Checklist

Please review and complete the following checklist before submitting your pull request:

  • I have ensured that the pull request is of a manageable size, allowing it to be reviewed within a single session.
  • I have reviewed my changes to ensure they are clear, concise, and well-documented.
  • I have updated the documentation, if applicable.
  • I have added or updated test cases to cover my changes, if applicable.
  • I have minimized the number of reviewers to include only those essential for the review.

Checklist for Reviewers

Please review and complete the following checklist during the review process:

  • The code follows best practices and conventions.
  • The changes implement the desired functionality or fix the reported issue.
  • The tests cover the new changes and pass successfully.
  • Any potential edge cases or error scenarios have been considered.

@ericbuckley ericbuckley self-assigned this Nov 19, 2024
Copy link

codecov bot commented Nov 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.12%. Comparing base (3182485) to head (27d5b5c).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #135      +/-   ##
==========================================
+ Coverage   95.47%   96.12%   +0.64%     
==========================================
  Files          31       31              
  Lines        1414     1392      -22     
==========================================
- Hits         1350     1338      -12     
+ Misses         64       54      -10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@ericbuckley ericbuckley marked this pull request as ready for review November 19, 2024 18:50
@ericbuckley ericbuckley changed the title add logging to linkage process add logging to linkage process and consolidate FHIR link routes Nov 19, 2024
Copy link
Collaborator

@alhayward alhayward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think logging will help observability and explainability! Left a few questions/suggestions

LOGGER.info(
"cluster belongingness",
extra={
"ratio": belongingness_ratio,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"ratio": belongingness_ratio,
"belongingness_ratio": belongingness_ratio,

"person.reference_id": person.reference_id,
"matched": matched_count,
"total": len(patients),
"algorithm.ratio_lower": belongingness_ratio_lower_bound,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"algorithm.ratio_lower": belongingness_ratio_lower_bound,
"algorithm.belongingness_ratio_lower": belongingness_ratio_lower_bound,

"matched": matched_count,
"total": len(patients),
"algorithm.ratio_lower": belongingness_ratio_lower_bound,
"algorithm.ratio_upper": belongingness_ratio_upper_bound,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"algorithm.ratio_upper": belongingness_ratio_upper_bound,
"algorithm.belongingness_ratio_upper": belongingness_ratio_upper_bound,

src/recordlinker/linking/link.py Outdated Show resolved Hide resolved
src/recordlinker/linking/link.py Show resolved Hide resolved
Copy link
Collaborator

@alhayward alhayward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few questions. Thanks for your detailed responses in the discussions!

results: list[LinkResult] = [
LinkResult(k, v) for k, v in sorted(scores.items(), reverse=True, key=lambda i: i[1])
]
result_counts["above_lower_bound"] = len(results)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will only return a value > 1 if include_multiple_matches=true. Is that designed as intended?

"person.reference_id": matched_person and matched_person.reference_id,
"patient.reference_id": patient.reference_id,
"result.prediction": prediction,
"result.count_patients_compared": result_counts["patients_compared"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also log the number of Person clusters compared? That would put into context the number of Patients compared.

is_match = matching_rule(results, **kwargs)
details["rule.results"] = is_match
# TODO: this may add a lot of noise, consider moving to debug
LOGGER.info("patient comparison", extra=details)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is being logged here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I see - the result of the match rule (true or false for match or no match, respectively). I was thinking more of the log odds score, similarity score, log odds ratio and fuzzy threshold as helpful metrics to log in regards to feature comparison here, to make more explainable why a prediction was made. What do you think?

Also, what do you think about logging the match rule being used alongside this match rule return value? That way, there's more context to interpret the true/false result (e.g., on what matching criteria this result was generated).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

logging for linking process consolidate /link/fhir and /link/dibbs
2 participants