Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

White Rabbit: Add thresholds to compare to Field empty in Scan Report #357

Open
megan-lien opened this issue Oct 27, 2022 · 0 comments
Open

Comments

@megan-lien
Copy link

The Scan Report Field Overview tab includes a Fraction empty field to show the percent populated for each field in each table scanned. It would be helpful if we could also specify thresholds for each field, or particular fields, in order to quickly verify the data is populated as expected.
For example: we expect a person_id field to be populated 100% of the time in the person table. A Fraction empty value >0.0% should signal a failure.

Proposed Solution:
Adding a threshold table to the tables to scan to compare, specifying acceptable thresholds for Fraction empty
Ex:
table,field,min_threshold,max_threshold
person,person_id,0,0
visit_occurrence,visit_end_datetime,0,5
drug_exposure,stop_reason,95,100

Add a Pass/Fail column in the Field Overview tab based on the threshold file:
Table, Field, Description, Type, ... , Fraction empty, Threshold check,
person, person_id,,INT, ..., 0.0%, PASS
visit_occurrence,visit_end_datetime, , DATE, ..., 7.0%, FAIL

Currently we can check this manually by reviewing the scan report, but the analyst must be familiar with the data model and the requirements. For data that is ingested regularly, those requirements won't change and it would be easier to call out data issues using thresholds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant