Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Inequality CAG #2405

Open
wants to merge 9 commits into
base: feature/single-table-CAG
Choose a base branch
from

Conversation

fealho
Copy link
Member

@fealho fealho commented Mar 4, 2025

CU-86b42406t, Resolve #2384.

@sdv-team
Copy link
Contributor

sdv-team commented Mar 4, 2025

Copy link

codecov bot commented Mar 4, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.56%. Comparing base (b4367d9) to head (f9d68b7).

Additional details and impacted files
@@                     Coverage Diff                      @@
##           feature/single-table-CAG    #2405      +/-   ##
============================================================
+ Coverage                     98.55%   98.56%   +0.01%     
============================================================
  Files                            65       66       +1     
  Lines                          6425     6563     +138     
============================================================
+ Hits                           6332     6469     +137     
- Misses                           93       94       +1     
Flag Coverage Δ
integration 81.95% <76.76%> (-0.15%) ⬇️
unit 97.44% <98.59%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@fealho fealho marked this pull request as ready for review March 5, 2025 17:52
@fealho fealho requested a review from a team as a code owner March 5, 2025 17:52
@fealho fealho requested review from gsheni and removed request for a team March 5, 2025 17:52
@fealho fealho requested a review from lajohn4747 March 5, 2025 17:52
@fealho fealho force-pushed the issue-2384-inquality-cags branch from 74151e2 to 2cc3cc4 Compare March 6, 2025 16:29
@fealho fealho requested review from gsheni and lajohn4747 March 7, 2025 13:18
Copy link
Contributor

@gsheni gsheni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@fealho fealho mentioned this pull request Mar 7, 2025
@fealho fealho requested a review from R-Palazzo March 7, 2025 22:32
Copy link
Contributor

@R-Palazzo R-Palazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good!

I let some suggestions about the previous code definition but mainly can we ensure that any parameter setting is done during fit() instead of transform().

I also have one question regarding the single-table check in validate(), maybe something in the base should be updated.

sdv/cag/base.py Outdated
Comment on lines 46 to 51
if isinstance(data, pd.DataFrame):
if self._single_table:
data = {self._table_name: data}
else:
table_name = self._get_single_table_name(metadata)
data = {table_name: data}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amontanez24 do you remember if we should be able to validate a cag before calling fit()?
If this is the case we may want to move the following logic outside of fit() so it can be used here as well:
https://github.com/sdv-dev/SDV/blob/e96754ed3f1d892f7842898703ceb14369a846df/sdv/cag/base.py#L77-80

Also could we only keep if isinstance(data, pd.DataFrame): and remove if data is not None: before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think validate should set any attributes, it should simply check the given parameters are valid.

Comment on lines +230 to +231
if nan_col is not None:
self._nan_column_name = _create_unique_name(nan_col.name, table_data.columns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also move this in fit()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As was discussed here, this PR only converts the existing constraint to a CAG. I agree this is a good suggestion but I don't think it should be implemented in this PR.

else:
diff_column = high - low

self._diff_column_name = _create_unique_name(self._diff_column_name, table_data.columns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this in fit()?

@fealho fealho requested a review from R-Palazzo March 10, 2025 18:12
@fealho fealho force-pushed the issue-2384-inquality-cags branch from fd4882b to f9d68b7 Compare March 10, 2025 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants