-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError
in DiagnosticReport if synthetic data does not match metadata
#524
ValueError
in DiagnosticReport if synthetic data does not match metadata
#524
Conversation
sdmetrics/reports/base_report.py
Outdated
try: | ||
self._validate_metadata_matches_data(real_data, synthetic_data, metadata) | ||
|
||
def _handle_results(self, verbose): | ||
raise NotImplementedError | ||
except ValueError as e: | ||
if self.__class__.__name__ == 'DiagnosticReport': | ||
return | ||
|
||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a try/except can we just skip this check if it's a DiagnosticReport
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done in edfffc0
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## diagnostic_report_updates #524 +/- ##
=============================================================
- Coverage 78.32% 78.31% -0.01%
=============================================================
Files 102 102
Lines 3695 3699 +4
=============================================================
+ Hits 2894 2897 +3
- Misses 801 802 +1 ☔ View full report in Codecov by Sentry. |
896773e
to
4caa4a0
Compare
edfffc0
to
0ba6be7
Compare
0ba6be7
to
c716f20
Compare
sdmetrics/reports/base_report.py
Outdated
table_name = list(metadata.get('tables', {}).keys()) | ||
if table_name: | ||
self.table_names = table_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to store the table names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do validations, like using this method:
def _check_table_names(self, table_name): |
Before the table names were instantiated here:
def _validate_metadata_matches_data(self, real_data, synthetic_data, metadata): |
sdmetrics/reports/base_report.py
Outdated
if self.__class__.__name__ == 'DiagnosticReport': | ||
return | ||
|
||
def _handle_results(self, verbose): | ||
raise NotImplementedError | ||
self._validate_metadata_matches_data(real_data, synthetic_data, metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just override the _validate_metadata_matches_data
method in the DiagnosticReport
to be a no-op instead. Main reason being in case any naming changes. I don't expect that it ever will, but I think we should avoid any introspection when possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, done in d58d19d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
self.table_names = list(metadata['tables'].keys()) | ||
super()._validate(real_data, synthetic_data, metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: I think it makes more sense to store this in the generate
method. It's not really related to validating and it seems like generate is being used to get all the info we need for the report, almost acting like an init
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, done in f150da4
report.generate(real_data, synthetic_data, metadata) | ||
|
||
# Assert | ||
mock_generate.assert_called_once_with(real_data, synthetic_data, metadata, True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to assert the table names get saved too :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks haha
CU-86ayp2dap
CU-86ayp5z2k
Resolve #508
Resolve #509