Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT Remove statsmodels #236

Merged
merged 15 commits into from
Feb 19, 2024
Merged

MAINT Remove statsmodels #236

merged 15 commits into from
Feb 19, 2024

Conversation

BorisMuzellec
Copy link
Collaborator

@BorisMuzellec BorisMuzellec commented Feb 13, 2024

Reference Issue or PRs

(Partially) fixes #231, closes #235

What does your PR implement? Be specific.

This PR removes pydeseq2's dependence on statsmodels, which has been causing a few bugs recently (and is the reason why the CI currently fails).

Statsmodels was used to:

@BorisMuzellec BorisMuzellec marked this pull request as ready for review February 15, 2024 09:24
@BorisMuzellec BorisMuzellec requested review from a user and umarteauowkin February 15, 2024 09:24
@BorisMuzellec BorisMuzellec changed the title Remove statsmodels MAINT Remove statsmodels Feb 15, 2024
Copy link
Collaborator

@umarteauowkin umarteauowkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks perfect, I checked the math in the lowess computation and compared to the statsmodel implem, it is the same. Thanks a lot @BorisMuzellec

pydeseq2/utils.py Outdated Show resolved Hide resolved
np.array([np.sort(np.abs(features - features[i]))[r] for i in range(n)]), 1e-12
)
w = np.clip(
np.abs(np.nan_to_num((features[:, None] - features[None, :]) / h)), 0.0, 1.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe for coherence put h[:,None]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would make the code clearer? Given that the code is working as is, I'm a bit reluctant to add unnecessary broadcasting operations.

# Adapted from https://gist.github.com/agramfort/850437


def lowess(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the review!

@@ -477,7 +477,7 @@ def fit_size_factors(
warnings.warn(
"Every gene contains at least one zero, "
"cannot compute log geometric means. Switching to iterative mode.",
RuntimeWarning,
UserWarning,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what is the rationale of selecting UserWarning ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bad reason: it was easier to catch it simultaneously with a switch to "mean" trend curve fitting in the tests. The better reason is that I felt it was more suited (RuntimeWarning is supposed to cover dubious runtime behavior which is not really what's going on here).

@adamgayoso
Copy link
Contributor

Just wanted to flag that there's a very stable implementation of loess in this package, which is a common dependency of plotnine so should be a stable dependency.

return (coeffs, covariates_fit @ coeffs)

if not res.success:
raise RuntimeError("Gamma GLM optimization failed.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cleaner to return res.success and raise the error in the while loop

@umarteauowkin
Copy link
Collaborator

@adamgayoso From what I gather lowess is not exactly loess, the weighting scheme is automatic in lowess and hand made in loess. @BorisMuzellec correct me if I m wrong, but in scikit-misc, I think the weighted version is not implemented.

@adamgayoso
Copy link
Contributor

@umarteauowkin it looks like you can set the weights. Nonetheless I would consider wrapping this custom lowess with a numba njit decorator for speed, as well as test it against the statsmodels one

@BorisMuzellec
Copy link
Collaborator Author

Just wanted to flag that there's a very stable implementation of loess in this package, which is a common dependency of plotnine so should be a stable dependency.

Hi @adamgayoso, thanks for the suggestion! Indeed I tried to use this implementation in #234 (tests are currently failing due to an issue in the docs which can be ignored), but I've experienced several issues because it causes segmentation errors on some platforms, and because scikit-misc's loess class is not picklable. (Cf the PR description of #234)

@BorisMuzellec BorisMuzellec merged commit be50f09 into main Feb 19, 2024
14 checks passed
@BorisMuzellec BorisMuzellec deleted the remove_statsmodels branch February 19, 2024 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Negative coefficients in the trend curve
3 participants