-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT Remove statsmodels #236
Conversation
… parametric trend curve in default inference model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks perfect, I checked the math in the lowess computation and compared to the statsmodel implem, it is the same. Thanks a lot @BorisMuzellec
np.array([np.sort(np.abs(features - features[i]))[r] for i in range(n)]), 1e-12 | ||
) | ||
w = np.clip( | ||
np.abs(np.nan_to_num((features[:, None] - features[None, :]) / h)), 0.0, 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe for coherence put h[:,None]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would make the code clearer? Given that the code is working as is, I'm a bit reluctant to add unnecessary broadcasting operations.
# Adapted from https://gist.github.com/agramfort/850437 | ||
|
||
|
||
def lowess( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the review!
@@ -477,7 +477,7 @@ def fit_size_factors( | |||
warnings.warn( | |||
"Every gene contains at least one zero, " | |||
"cannot compute log geometric means. Switching to iterative mode.", | |||
RuntimeWarning, | |||
UserWarning, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, what is the rationale of selecting UserWarning ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bad reason: it was easier to catch it simultaneously with a switch to "mean" trend curve fitting in the tests. The better reason is that I felt it was more suited (RuntimeWarning
is supposed to cover dubious runtime behavior which is not really what's going on here).
Just wanted to flag that there's a very stable implementation of loess in this package, which is a common dependency of plotnine so should be a stable dependency. |
pydeseq2/default_inference.py
Outdated
return (coeffs, covariates_fit @ coeffs) | ||
|
||
if not res.success: | ||
raise RuntimeError("Gamma GLM optimization failed.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be cleaner to return res.success
and raise the error in the while loop
@adamgayoso From what I gather lowess is not exactly loess, the weighting scheme is automatic in lowess and hand made in loess. @BorisMuzellec correct me if I m wrong, but in scikit-misc, I think the weighted version is not implemented. |
@umarteauowkin it looks like you can set the weights. Nonetheless I would consider wrapping this custom lowess with a numba njit decorator for speed, as well as test it against the statsmodels one |
Hi @adamgayoso, thanks for the suggestion! Indeed I tried to use this implementation in #234 (tests are currently failing due to an issue in the docs which can be ignored), but I've experienced several issues because it causes segmentation errors on some platforms, and because scikit-misc's |
Reference Issue or PRs
(Partially) fixes #231, closes #235
What does your PR implement? Be specific.
This PR removes pydeseq2's dependence on statsmodels, which has been causing a few bugs recently (and is the reason why the CI currently fails).
Statsmodels was used to:
fit_dispersion_trend
scipy.optimize
's L-BFGS-B (which incidentally solves the negative coefficients issue [BUG] Negative coefficients in the trend curve #231). When the fit fails, the code switches to a mean fit.scipy.stats.false_discovery_control
DeseqStats._independent_filtering