Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] fix faulty BaseObject.__eq__ and deep_equals if an attribute or nested structure contains np.nan #111

Merged
merged 6 commits into from
Feb 1, 2023

Conversation

fkiraly
Copy link
Contributor

@fkiraly fkiraly commented Jan 14, 2023

BaseObject.__eq__ and deep_equals return the wrong answer if somewhere in the structure an np.nan occurs. This is because np.nan != np.nan by default.

For BaseObject-s, this can lead to failures of equality whenever somehwere in the nested structure (of parameters) an np.nan is contained.

This happens to be the case for tuner default parameter settings in sktime, which makes the refactor fail for those specific estimators, see here: sktime/sktime#3151 (comment)

This bugfix handles the case np.nan in deep_equals, which fixes it in its reverse dependency BaseObject.__eq__.

Curiously, the np.nan case was handled in sktime, but incorrectly. The check in sktime incorrectly considers any two floats as being equal (rather than just two instances of np.nan which happens to be of type float).
In another failure, the handling of np.nan was removed (instead of being fixed) when deep_equals was ported to skbase.

Together, this explains the presence of the bug in skbase, the failure of tests in the refactor PR on sktime tuners, and the absence of test failures on the main branch of sktime.

I've also added more test cases for deep_equals to cover the failures around np.nan which are being fixed here.

@fkiraly fkiraly added the bug Something isn't working label Jan 14, 2023
@fkiraly
Copy link
Contributor Author

fkiraly commented Jan 14, 2023

@RNKuhns, why are unrelated tests (for package structure lookup) failing here?

@fkiraly
Copy link
Contributor Author

fkiraly commented Jan 14, 2023

I get it, you used skbase itself for testing and hard coded expected retrieval. Not sure whether that is such a good idea? Opened issue here to discuss: #114

@codecov-commenter
Copy link

codecov-commenter commented Jan 14, 2023

Codecov Report

Merging #111 (f38782e) into main (86e0c4e) will increase coverage by 0.02%.
The diff coverage is 88.88%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main     #111      +/-   ##
==========================================
+ Coverage   78.05%   78.07%   +0.02%     
==========================================
  Files          23       23              
  Lines        1877     1884       +7     
==========================================
+ Hits         1465     1471       +6     
- Misses        412      413       +1     
Impacted Files Coverage Δ
skbase/testing/utils/tests/test_deep_equals.py 100.00% <ø> (ø)
skbase/testing/utils/deep_equals.py 64.90% <88.88%> (+1.01%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Contributor

@RNKuhns RNKuhns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fkiraly the all_extras optional dependencies in pyproject.toml needs to be updated to include numpy.

This doesn't fail in our CI workflow, because that is installing all test dependencies and numpy is already a test dependency. But we want to give users the easy of installing the extras.

@RNKuhns
Copy link
Contributor

RNKuhns commented Jan 14, 2023

I get it, you used skbase itself for testing and hard coded expected retrieval. Not sure whether that is such a good idea? Opened issue here to discuss: #114

Yea -- it expedient at the time. I've commented on #114 on way forward on that.

@RNKuhns
Copy link
Contributor

RNKuhns commented Jan 14, 2023

@fkiraly the other thing I didn't mention explicitly in my review is that we need to determine how you want this fix published. If having it available by early-February in sktime refactor is fine with you then we can proceed as usual with merging issues.

However, if you want a patch (bug fix) release, we want to keep the changes there to minimum (bug fixes only) if possible (which impacts decision on merging other branches). Let me know what you are thinking on the timing.

Moving forward, it might make sense to open a release branch and merge PRs for a release into that branch. That way it is easy to do quick bug fix releases, since we know main will have limited changes.

@fkiraly
Copy link
Contributor Author

fkiraly commented Jan 14, 2023

@fkiraly the all_extras optional dependencies in pyproject.toml needs to be updated to include numpy.

Is that really an omission or problem of this PR?
Feels more like a separate PR, since the status quo was using numpy already in the same way (same dependency pattern).

@fkiraly
Copy link
Contributor Author

fkiraly commented Jan 14, 2023

@fkiraly the other thing I didn't mention explicitly in my review is that we need to determine how you want this fix published. If having it available by early-February in sktime refactor is fine with you then we can proceed as usual with merging issues.

I just overrode __eq__ in sktime, we should just make sure we don't forget to revert that. So there's no hard blocker through this, and no need for an urgent timing, imo.

@fkiraly
Copy link
Contributor Author

fkiraly commented Jan 14, 2023

Yea -- it expedient at the time. I've commented on #114 on way forward on that.

I don't see any comments on #114, did they get lost? Sometimes I push the comment button and internet disconnects, and it does not get posted.

@RNKuhns
Copy link
Contributor

RNKuhns commented Jan 23, 2023

Yea -- it expedient at the time. I've commented on #114 on way forward on that.

I don't see any comments on #114, did they get lost? Sometimes I push the comment button and internet disconnects, and it does not get posted.

@fkiraly it appears they did. But the gist of it was that we should make a PR that expands the mock_package in the test module, so that it covers more cases needed for tests (non-public functions, modules to make sure retrieval can exclude them; non-public modules to make sure we can exclude those and some sub-module nesting, etc). I'll try to do that this week.

@fkiraly
Copy link
Contributor Author

fkiraly commented Jan 23, 2023

just to clarify, do you want the numpy dep in this PR? I don't mind either way, just thought it might be cleaner separately.
(in the file change history, it would show up with a PR/commit description that seems completely unrelated, which is going to be confusing for sth as substantial as a dep change - for context, I typically copy the PR description as the squash commit merge message for good documentation of the edit history)

@RNKuhns
Copy link
Contributor

RNKuhns commented Jan 23, 2023

I'm fine with it being separate PR. Can you open that?

fkiraly added a commit to sktime/sktime that referenced this pull request Jan 23, 2023
…e or nested structure contains `float` (#4109)

This fixes `deep_equals` logic and reverse dependency `BaseObject.__eq__` for the case that parameters or nested objects contained an `float`.

Incorrectly, any two floats were considered equal. This is a consequence of erroneous logic to cover the case of `np.nan`.

For more context, also see sktime/skbase#111

Tests were also added to ensure that the `float` and `np.nan` case are both covered correctly.
@fkiraly
Copy link
Contributor Author

fkiraly commented Jan 24, 2023

Sure, opened a PR with numpy here: #121

@RNKuhns
Copy link
Contributor

RNKuhns commented Jan 25, 2023

@fkiraly wanted to verify this is good to go (with all updated logic from sktime) before merging.

@fkiraly
Copy link
Contributor Author

fkiraly commented Jan 27, 2023

yes, good to go from my side.

@fkiraly fkiraly requested a review from RNKuhns January 28, 2023 10:45
@fkiraly
Copy link
Contributor Author

fkiraly commented Jan 31, 2023

(you still haven't approved? is there sth I should change?)

Copy link
Contributor

@RNKuhns RNKuhns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Adding of numpy soft dependency is in #121.

@RNKuhns RNKuhns merged commit acaefb3 into main Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants