Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError in DiagnosticReport if synthetic data does not match metadata #524

Conversation

R-Palazzo
Copy link
Contributor

CU-86ayp2dap
CU-86ayp5z2k
Resolve #508
Resolve #509

@R-Palazzo R-Palazzo requested a review from a team as a code owner November 17, 2023 13:42
@sdv-team
Copy link
Contributor

@sdv-team
Copy link
Contributor

@R-Palazzo R-Palazzo removed the request for review from a team November 17, 2023 13:43
Comment on lines 83 to 90
try:
self._validate_metadata_matches_data(real_data, synthetic_data, metadata)

def _handle_results(self, verbose):
raise NotImplementedError
except ValueError as e:
if self.__class__.__name__ == 'DiagnosticReport':
return

raise e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a try/except can we just skip this check if it's a DiagnosticReport?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done in edfffc0

@codecov-commenter
Copy link

codecov-commenter commented Nov 17, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (bf5ccd2) 78.32% compared to head (38a79ca) 78.31%.

Files Patch % Lines
sdmetrics/reports/multi_table/diagnostic_report.py 50.00% 1 Missing ⚠️
...dmetrics/reports/single_table/diagnostic_report.py 50.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@                      Coverage Diff                      @@
##           diagnostic_report_updates     #524      +/-   ##
=============================================================
- Coverage                      78.32%   78.31%   -0.01%     
=============================================================
  Files                            102      102              
  Lines                           3695     3699       +4     
=============================================================
+ Hits                            2894     2897       +3     
- Misses                           801      802       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@R-Palazzo R-Palazzo force-pushed the issue-510-single-multi-report-error branch from 896773e to 4caa4a0 Compare November 17, 2023 16:45
@R-Palazzo R-Palazzo force-pushed the issue-508-data-metadata-validation-diagnostic-report branch from edfffc0 to 0ba6be7 Compare November 17, 2023 16:51
Base automatically changed from issue-510-single-multi-report-error to diagnostic_report_updates November 17, 2023 18:01
@R-Palazzo R-Palazzo force-pushed the issue-508-data-metadata-validation-diagnostic-report branch from 0ba6be7 to c716f20 Compare November 17, 2023 18:25
Comment on lines 84 to 86
table_name = list(metadata.get('tables', {}).keys())
if table_name:
self.table_names = table_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to store the table names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do validations, like using this method:

def _check_table_names(self, table_name):

Before the table names were instantiated here:

def _validate_metadata_matches_data(self, real_data, synthetic_data, metadata):

Comment on lines 83 to 86
if self.__class__.__name__ == 'DiagnosticReport':
return

def _handle_results(self, verbose):
raise NotImplementedError
self._validate_metadata_matches_data(real_data, synthetic_data, metadata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just override the _validate_metadata_matches_data method in the DiagnosticReport to be a no-op instead. Main reason being in case any naming changes. I don't expect that it ever will, but I think we should avoid any introspection when possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, done in d58d19d

Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines 81 to 82
self.table_names = list(metadata['tables'].keys())
super()._validate(real_data, synthetic_data, metadata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I think it makes more sense to store this in the generate method. It's not really related to validating and it seems like generate is being used to get all the info we need for the report, almost acting like an init

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, done in f150da4

report.generate(real_data, synthetic_data, metadata)

# Assert
mock_generate.assert_called_once_with(real_data, synthetic_data, metadata, True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to assert the table names get saved too :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks haha

@R-Palazzo R-Palazzo merged commit 61676fc into diagnostic_report_updates Nov 22, 2023
45 checks passed
@R-Palazzo R-Palazzo deleted the issue-508-data-metadata-validation-diagnostic-report branch November 22, 2023 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants