`ValueError` in DiagnosticReport if synthetic data does not match metadata #524

R-Palazzo · 2023-11-17T13:42:40Z

CU-86ayp2dap
CU-86ayp5z2k
Resolve #508
Resolve #509

sdv-team · 2023-11-17T13:42:43Z

Task linked: CU-86ayp2dap SDMetrics - ValueError in DiagnosticReport if synthetic data does not match metadata #508

sdv-team · 2023-11-17T13:42:43Z

Task linked: CU-86ayp5z2k SDMetrics - Check if QualityReport needs the synthetic data to match the metadata #509

frances-h · 2023-11-17T14:37:44Z

sdmetrics/reports/base_report.py

+        try:
+            self._validate_metadata_matches_data(real_data, synthetic_data, metadata)

-    def _handle_results(self, verbose):
-        raise NotImplementedError
+        except ValueError as e:
+            if self.__class__.__name__ == 'DiagnosticReport':
+                return
+
+            raise e


Instead of a try/except can we just skip this check if it's a DiagnosticReport?

Yes, done in edfffc0

codecov-commenter · 2023-11-17T15:04:16Z

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (bf5ccd2) 78.32% compared to head (38a79ca) 78.31%.

Files	Patch %	Lines
sdmetrics/reports/multi_table/diagnostic_report.py	50.00%	1 Missing ⚠️
...dmetrics/reports/single_table/diagnostic_report.py	50.00%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@                      Coverage Diff                      @@
##           diagnostic_report_updates     #524      +/-   ##
=============================================================
- Coverage                      78.32%   78.31%   -0.01%     
=============================================================
  Files                            102      102              
  Lines                           3695     3699       +4     
=============================================================
+ Hits                            2894     2897       +3     
- Misses                           801      802       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

frances-h · 2023-11-20T15:48:59Z

sdmetrics/reports/base_report.py

+            table_name = list(metadata.get('tables', {}).keys())
+            if table_name:
+                self.table_names = table_name


Why do we need to store the table names?

To do validations, like using this method:

SDMetrics/sdmetrics/reports/multi_table/base_multi_table_report.py

Line 71 in c716f20

def _check_table_names(self, table_name):

Before the table names were instantiated here:

SDMetrics/sdmetrics/reports/multi_table/base_multi_table_report.py

Line 61 in c716f20

def _validate_metadata_matches_data(self, real_data, synthetic_data, metadata):

amontanez24 · 2023-11-21T18:48:19Z

sdmetrics/reports/base_report.py

+        if self.__class__.__name__ == 'DiagnosticReport':
+            return

-    def _handle_results(self, verbose):
-        raise NotImplementedError
+        self._validate_metadata_matches_data(real_data, synthetic_data, metadata)


I think we should just override the _validate_metadata_matches_data method in the DiagnosticReport to be a no-op instead. Main reason being in case any naming changes. I don't expect that it ever will, but I think we should avoid any introspection when possible.

I see, done in d58d19d

amontanez24

LGTM!

amontanez24 · 2023-11-21T22:45:42Z

sdmetrics/reports/multi_table/base_multi_table_report.py

+        self.table_names = list(metadata['tables'].keys())
+        super()._validate(real_data, synthetic_data, metadata)


minor: I think it makes more sense to store this in the generate method. It's not really related to validating and it seems like generate is being used to get all the info we need for the report, almost acting like an init

I agree, done in f150da4

amontanez24 · 2023-11-21T23:15:19Z

tests/unit/reports/multi_table/test_base_multi_table_report.py

+        report.generate(real_data, synthetic_data, metadata)
+
+        # Assert
+        mock_generate.assert_called_once_with(real_data, synthetic_data, metadata, True)


I think we want to assert the table names get saved too :)

Good catch, thanks haha

…adata (#524)

R-Palazzo requested review from amontanez24 and frances-h November 17, 2023 13:42

R-Palazzo requested a review from a team as a code owner November 17, 2023 13:42

R-Palazzo removed the request for review from a team November 17, 2023 13:43

frances-h reviewed Nov 17, 2023

View reviewed changes

R-Palazzo force-pushed the issue-510-single-multi-report-error branch from 896773e to 4caa4a0 Compare November 17, 2023 16:45

R-Palazzo force-pushed the issue-508-data-metadata-validation-diagnostic-report branch from edfffc0 to 0ba6be7 Compare November 17, 2023 16:51

Base automatically changed from issue-510-single-multi-report-error to diagnostic_report_updates November 17, 2023 18:01

R-Palazzo added 4 commits November 17, 2023 12:24

def

74f8dd1

tests

2cd26e5

remove try/Except

8dcf467

update test

c716f20

R-Palazzo force-pushed the issue-508-data-metadata-validation-diagnostic-report branch from 0ba6be7 to c716f20 Compare November 17, 2023 18:25

frances-h reviewed Nov 20, 2023

View reviewed changes

_validate for multi table report

3bb2392

frances-h approved these changes Nov 21, 2023

View reviewed changes

amontanez24 reviewed Nov 21, 2023

View reviewed changes

set validate_metadata_matches_data in diagnostic

d58d19d

amontanez24 approved these changes Nov 21, 2023

View reviewed changes

move table_name to generate

f150da4

amontanez24 approved these changes Nov 21, 2023

View reviewed changes

assert table_names

38a79ca

R-Palazzo merged commit 61676fc into diagnostic_report_updates Nov 22, 2023
45 checks passed

R-Palazzo deleted the issue-508-data-metadata-validation-diagnostic-report branch November 22, 2023 00:23

R-Palazzo added a commit that referenced this pull request Nov 27, 2023

ValueError in DiagnosticReport if synthetic data does not match met…

0960619

…adata (#524)

R-Palazzo added a commit that referenced this pull request Nov 27, 2023

ValueError in DiagnosticReport if synthetic data does not match met…

e611cd9

…adata (#524)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ValueError` in DiagnosticReport if synthetic data does not match metadata #524

`ValueError` in DiagnosticReport if synthetic data does not match metadata #524

R-Palazzo commented Nov 17, 2023

sdv-team commented Nov 17, 2023

sdv-team commented Nov 17, 2023

frances-h Nov 17, 2023

R-Palazzo Nov 17, 2023

codecov-commenter commented Nov 17, 2023 •

edited

Loading

frances-h Nov 20, 2023

R-Palazzo Nov 20, 2023

amontanez24 Nov 21, 2023

R-Palazzo Nov 21, 2023

amontanez24 left a comment

amontanez24 Nov 21, 2023

R-Palazzo Nov 21, 2023

amontanez24 Nov 21, 2023

R-Palazzo Nov 21, 2023

		self.table_names = list(metadata['tables'].keys())
		super()._validate(real_data, synthetic_data, metadata)

ValueError in DiagnosticReport if synthetic data does not match metadata #524

ValueError in DiagnosticReport if synthetic data does not match metadata #524

Conversation

R-Palazzo commented Nov 17, 2023

sdv-team commented Nov 17, 2023

sdv-team commented Nov 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Nov 17, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amontanez24 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

`ValueError` in DiagnosticReport if synthetic data does not match metadata #524

`ValueError` in DiagnosticReport if synthetic data does not match metadata #524

codecov-commenter commented Nov 17, 2023 •

edited

Loading