-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Inequality CAG #2405
base: feature/single-table-CAG
Are you sure you want to change the base?
Add Inequality CAG #2405
Conversation
Task linked: CU-86b42406t SDV - Add |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## feature/single-table-CAG #2405 +/- ##
============================================================
+ Coverage 98.55% 98.56% +0.01%
============================================================
Files 65 66 +1
Lines 6425 6563 +138
============================================================
+ Hits 6332 6469 +137
- Misses 93 94 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
74151e2
to
2cc3cc4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good!
I let some suggestions about the previous code definition but mainly can we ensure that any parameter setting is done during fit()
instead of transform()
.
I also have one question regarding the single-table check in validate()
, maybe something in the base should be updated.
sdv/cag/base.py
Outdated
if isinstance(data, pd.DataFrame): | ||
if self._single_table: | ||
data = {self._table_name: data} | ||
else: | ||
table_name = self._get_single_table_name(metadata) | ||
data = {table_name: data} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amontanez24 do you remember if we should be able to validate a cag before calling fit()
?
If this is the case we may want to move the following logic outside of fit()
so it can be used here as well:
https://github.com/sdv-dev/SDV/blob/e96754ed3f1d892f7842898703ceb14369a846df/sdv/cag/base.py#L77-80
Also could we only keep if isinstance(data, pd.DataFrame):
and remove if data is not None:
before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think validate
should set any attributes, it should simply check the given parameters are valid.
if nan_col is not None: | ||
self._nan_column_name = _create_unique_name(nan_col.name, table_data.columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also move this in fit()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As was discussed here, this PR only converts the existing constraint to a CAG. I agree this is a good suggestion but I don't think it should be implemented in this PR.
else: | ||
diff_column = high - low | ||
|
||
self._diff_column_name = _create_unique_name(self._diff_column_name, table_data.columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move this in fit()
?
fd4882b
to
f9d68b7
Compare
CU-86b42406t, Resolve #2384.