You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Scan Report Field Overview tab includes a Fraction empty field to show the percent populated for each field in each table scanned. It would be helpful if we could also specify thresholds for each field, or particular fields, in order to quickly verify the data is populated as expected.
For example: we expect a person_id field to be populated 100% of the time in the person table. A Fraction empty value >0.0% should signal a failure.
Proposed Solution:
Adding a threshold table to the tables to scan to compare, specifying acceptable thresholds for Fraction empty
Ex:
table,field,min_threshold,max_threshold
person,person_id,0,0
visit_occurrence,visit_end_datetime,0,5
drug_exposure,stop_reason,95,100
Add a Pass/Fail column in the Field Overview tab based on the threshold file:
Table, Field, Description, Type, ... , Fraction empty, Threshold check,
person, person_id,,INT, ..., 0.0%, PASS
visit_occurrence,visit_end_datetime, , DATE, ..., 7.0%, FAIL
Currently we can check this manually by reviewing the scan report, but the analyst must be familiar with the data model and the requirements. For data that is ingested regularly, those requirements won't change and it would be easier to call out data issues using thresholds.
The text was updated successfully, but these errors were encountered:
The Scan Report Field Overview tab includes a Fraction empty field to show the percent populated for each field in each table scanned. It would be helpful if we could also specify thresholds for each field, or particular fields, in order to quickly verify the data is populated as expected.
For example: we expect a person_id field to be populated 100% of the time in the person table. A Fraction empty value >0.0% should signal a failure.
Proposed Solution:
Adding a threshold table to the tables to scan to compare, specifying acceptable thresholds for Fraction empty
Ex:
table,field,min_threshold,max_threshold
person,person_id,0,0
visit_occurrence,visit_end_datetime,0,5
drug_exposure,stop_reason,95,100
Add a Pass/Fail column in the Field Overview tab based on the threshold file:
Table, Field, Description, Type, ... , Fraction empty, Threshold check,
person, person_id,,INT, ..., 0.0%, PASS
visit_occurrence,visit_end_datetime, , DATE, ..., 7.0%, FAIL
Currently we can check this manually by reviewing the scan report, but the analyst must be familiar with the data model and the requirements. For data that is ingested regularly, those requirements won't change and it would be easier to call out data issues using thresholds.
The text was updated successfully, but these errors were encountered: