New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

add logging to linkage process and consolidate FHIR link routes #135

Open

ericbuckley wants to merge 15 commits into main from feature/131-linkage-logging

Collaborator

ericbuckley commented Nov 19, 2024 •

edited

Loading

Description

Add logging to link.py to capture details of linkage process.

Related Issues

closes #131
closes #130

Additional Notes

refactored link_router to calculate the "prediction" result in link.py, as logging that value will be valuable.
cleanup of try/except blocks in link_router to only return 422 on missing algorithm or bad FHIR bundle.
consolidated /link/fhir and /link/dibbs into one endpoint
updating splunk to send channel header

<--------------------- REMOVE THE LINES BELOW BEFORE MERGING --------------------->

Checklist

Please review and complete the following checklist before submitting your pull request:

I have ensured that the pull request is of a manageable size, allowing it to be reviewed within a single session.
I have reviewed my changes to ensure they are clear, concise, and well-documented.
I have updated the documentation, if applicable.
I have added or updated test cases to cover my changes, if applicable.
I have minimized the number of reviewers to include only those essential for the review.

Checklist for Reviewers

Please review and complete the following checklist during the review process:

The code follows best practices and conventions.
The changes implement the desired functionality or fix the reported issue.
The tests cover the new changes and pass successfully.
Any potential edge cases or error scenarios have been considered.


          refactoring Prediction so we can add outcome to logs

6a8438d

ericbuckley self-assigned this

codecov bot commented Nov 19, 2024 •

edited

Loading

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.12%. Comparing base (3182485) to head (27d5b5c).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #135      +/-   ##
==========================================
+ Coverage   95.47%   96.12%   +0.64%     
==========================================
  Files          31       31              
  Lines        1414     1392      -22     
==========================================
- Hits         1350     1338      -12     
+ Misses         64       54      -10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

ericbuckley added 4 commits

November 19, 2024 09:56


          consolidate /link/fhir and /link/dibbs

be10b87


          slight optimization

166e28c


          swapping 400 with 422 in patient router

bac49ee


          fixing link router error catching

7b69d81

ericbuckley marked this pull request as ready for review

November 19, 2024 18:50

ericbuckley requested review from alhayward and cbrinson-rise8 as code owners

November 19, 2024 18:50

ericbuckley changed the title ~~add logging to linkage process~~ add logging to linkage process and consolidate FHIR link routes

alhayward reviewed

View reviewed changes

Collaborator

alhayward left a comment

I think logging will help observability and explainability! Left a few questions/suggestions

src/recordlinker/linking/link.py

+                                  LOGGER.info(
+                                      "cluster belongingness",
+                                      extra={
+                                          "ratio": belongingness_ratio,

Collaborator

alhayward Nov 20, 2024

Suggested change

      
                                        "ratio": belongingness_ratio,
          
                                        "belongingness_ratio": belongingness_ratio,

src/recordlinker/linking/link.py

+                                          "person.reference_id": person.reference_id,
+                                          "matched": matched_count,
+                                          "total": len(patients),
+                                          "algorithm.ratio_lower": belongingness_ratio_lower_bound,

Collaborator

alhayward Nov 20, 2024

Suggested change

      
                                        "algorithm.ratio_lower": belongingness_ratio_lower_bound,
          
                                        "algorithm.belongingness_ratio_lower": belongingness_ratio_lower_bound,

src/recordlinker/linking/link.py

+                                          "matched": matched_count,
+                                          "total": len(patients),
+                                          "algorithm.ratio_lower": belongingness_ratio_lower_bound,
+                                          "algorithm.ratio_upper": belongingness_ratio_upper_bound,

Collaborator

alhayward Nov 20, 2024

Suggested change

      
                                        "algorithm.ratio_upper": belongingness_ratio_upper_bound,
          
                                        "algorithm.belongingness_ratio_upper": belongingness_ratio_upper_bound,

src/recordlinker/linking/link.py Outdated Show resolved Hide resolved

src/recordlinker/linking/link.py Show resolved Hide resolved

ericbuckley added 10 commits

November 20, 2024 08:09


          adding patient comparison log statement

6cb5e99


          add counts above lower and upper thresholds to link results log

24e75bb


          adding better result counts

8db9718


          Merge branch 'main' into feature/131-linkage-logging

5c93424


          remove algorithm lists from link logs

bb1cd71


          fix bug in count

f180c68


          send splunk request channel just in case the splunk admin has enabled…

0340c57

… acknoledgements


          fix splunk unit tests

97b4652


          small fix to logging var assignment

8394bad


          evaluator_features property is no longer needed

27d5b5c

alhayward reviewed

View reviewed changes

Collaborator

alhayward left a comment

Left a few questions. Thanks for your detailed responses in the discussions!

src/recordlinker/linking/link.py

+                  results: list[LinkResult] = [
+                      LinkResult(k, v) for k, v in sorted(scores.items(), reverse=True, key=lambda i: i[1])
+                  ]
+                  result_counts["above_lower_bound"] = len(results)

Collaborator

alhayward Nov 27, 2024

I believe this will only return a value > 1 if include_multiple_matches=true. Is that designed as intended?

src/recordlinker/linking/link.py

+                          "person.reference_id": matched_person and matched_person.reference_id,
+                          "patient.reference_id": patient.reference_id,
+                          "result.prediction": prediction,
+                          "result.count_patients_compared": result_counts["patients_compared"],

Collaborator

alhayward Nov 27, 2024

Should we also log the number of Person clusters compared? That would put into context the number of Patients compared.

src/recordlinker/linking/link.py

+                  is_match = matching_rule(results, **kwargs)
+                  details["rule.results"] = is_match
+                  # TODO: this may add a lot of noise, consider moving to debug
+                  LOGGER.info("patient comparison", extra=details)

Collaborator

alhayward Nov 27, 2024

What is being logged here?

Collaborator

alhayward Nov 27, 2024

Ah, I think I see - the result of the match rule (true or false for match or no match, respectively). I was thinking more of the log odds score, similarity score, log odds ratio and fuzzy threshold as helpful metrics to log in regards to feature comparison here, to make more explainable why a prediction was made. What do you think?

Also, what do you think about logging the match rule being used alongside this match rule return value? That way, there's more context to interpret the true/false result (e.g., on what matching criteria this result was generated).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet