MAINT Remove statsmodels #236

BorisMuzellec · 2024-02-13T16:20:58Z

Reference Issue or PRs

(Partially) fixes #231, closes #235

What does your PR implement? Be specific.

This PR removes pydeseq2's dependence on statsmodels, which has been causing a few bugs recently (and is the reason why the CI currently fails).

Statsmodels was used to:

fit a gamma GLM in fit_dispersion_trend
- -> this PR replaces it with a custom implementation based on scipy.optimize's L-BFGS-B (which incidentally solves the negative coefficients issue [BUG] Negative coefficients in the trend curve #231). When the fit fails, the code switches to a mean fit.
perform multiple test adjustment
- -> replaced by scipy.stats.false_discovery_control
perform lowess (locally weighted regression) in DeseqStats._independent_filtering
- -> replaced by a custom implementation adapted from @agramfort's lowess gist

… parametric trend curve in default inference model

…y_control

…e fit

umarteauowkin

Looks perfect, I checked the math in the lowess computation and compared to the statsmodel implem, it is the same. Thanks a lot @BorisMuzellec

pydeseq2/utils.py

umarteauowkin · 2024-02-15T15:32:17Z

pydeseq2/utils.py

+        np.array([np.sort(np.abs(features - features[i]))[r] for i in range(n)]), 1e-12
+    )
+    w = np.clip(
+        np.abs(np.nan_to_num((features[:, None] - features[None, :]) / h)), 0.0, 1.0


maybe for coherence put h[:,None]

Do you think it would make the code clearer? Given that the code is working as is, I'm a bit reluctant to add unnecessary broadcasting operations.

umarteauowkin · 2024-02-15T15:49:35Z

pydeseq2/utils.py

+# Adapted from https://gist.github.com/agramfort/850437
+
+
+def lowess(


Thanks a lot for the review!

umarteauowkin · 2024-02-15T15:51:20Z

pydeseq2/dds.py

@@ -477,7 +477,7 @@ def fit_size_factors(
            warnings.warn(
                "Every gene contains at least one zero, "
                "cannot compute log geometric means. Switching to iterative mode.",
-                RuntimeWarning,
+                UserWarning,


Out of curiosity, what is the rationale of selecting UserWarning ?

The bad reason: it was easier to catch it simultaneously with a switch to "mean" trend curve fitting in the tests. The better reason is that I felt it was more suited (RuntimeWarning is supposed to cover dubious runtime behavior which is not really what's going on here).

adamgayoso · 2024-02-15T20:59:21Z

Just wanted to flag that there's a very stable implementation of loess in this package, which is a common dependency of plotnine so should be a stable dependency.

adamgayoso · 2024-02-15T21:21:36Z

pydeseq2/default_inference.py

-        return (coeffs, covariates_fit @ coeffs)
+
+        if not res.success:
+            raise RuntimeError("Gamma GLM optimization failed.")


I think it would be cleaner to return res.success and raise the error in the while loop

umarteauowkin · 2024-02-16T07:57:03Z

@adamgayoso From what I gather lowess is not exactly loess, the weighting scheme is automatic in lowess and hand made in loess. @BorisMuzellec correct me if I m wrong, but in scikit-misc, I think the weighted version is not implemented.

adamgayoso · 2024-02-16T08:27:21Z

@umarteauowkin it looks like you can set the weights. Nonetheless I would consider wrapping this custom lowess with a numba njit decorator for speed, as well as test it against the statsmodels one

BorisMuzellec · 2024-02-16T10:08:28Z

Just wanted to flag that there's a very stable implementation of loess in this package, which is a common dependency of plotnine so should be a stable dependency.

Hi @adamgayoso, thanks for the suggestion! Indeed I tried to use this implementation in #234 (tests are currently failing due to an issue in the docs which can be ignored), but I've experienced several issues because it causes segmentation errors on some platforms, and because scikit-misc's loess class is not picklable. (Cf the PR description of #234)

…sing error

BorisMuzellec added 13 commits February 13, 2024 16:08

refactor: remove statsmodels with custom gamma GLM implementation for…

522de04

… parametric trend curve in default inference model

fix: use consistent ordering for trend curve coefficients

51d422e

feat: catch gamma GLM failure and switch to mean fit

a72bec9

build: remove statsmodels dependency

ff59e41

CI: ignore deprecation warning raised by pandas regarding Pyarrow.

6edefb3

CI: ignore deprecation warning raised by pandas regarding Pyarrow.

baf3e45

fix: switch to mean fit if all coeffs are 0

e181433

test: catch both warnings in test_zero_inflated

f3d74b5

build: bump scipy version requirement to get access to false_discover…

e66321b

…y_control

build: BREAKING CHANGE deprecate python 3.8

3306d1b

refactor: switch from RuntimeWarning to UserWarning

fdddb4c

docs: remove statsmodels dependency

c157066

fix: stop size factor iterations if no eligible genes are left for th…

f5a735c

…e fit

BorisMuzellec marked this pull request as ready for review February 15, 2024 09:24

BorisMuzellec requested review from maikia and arthurPignetOwkin as code owners February 15, 2024 09:24

BorisMuzellec requested review from a user and umarteauowkin February 15, 2024 09:24

BorisMuzellec changed the title ~~Remove statsmodels~~ MAINT Remove statsmodels Feb 15, 2024

umarteauowkin approved these changes Feb 15, 2024

View reviewed changes

docs: improve lowess parameter description

cf2b47d

adamgayoso reviewed Feb 15, 2024

View reviewed changes

refactor: return convergence status from gamma glm fit instead of rai…

21a5f85

…sing error

BorisMuzellec merged commit be50f09 into main Feb 19, 2024
14 checks passed

BorisMuzellec deleted the remove_statsmodels branch February 19, 2024 14:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT Remove statsmodels #236

MAINT Remove statsmodels #236

BorisMuzellec commented Feb 13, 2024 •

edited

Loading

umarteauowkin left a comment

umarteauowkin Feb 15, 2024

BorisMuzellec Feb 15, 2024

umarteauowkin Feb 15, 2024

BorisMuzellec Feb 15, 2024

umarteauowkin Feb 15, 2024

BorisMuzellec Feb 15, 2024

adamgayoso commented Feb 15, 2024

adamgayoso Feb 15, 2024

umarteauowkin commented Feb 16, 2024

adamgayoso commented Feb 16, 2024

BorisMuzellec commented Feb 16, 2024

		# Adapted from https://gist.github.com/agramfort/850437


		def lowess(

MAINT Remove statsmodels #236

MAINT Remove statsmodels #236

Conversation

BorisMuzellec commented Feb 13, 2024 • edited Loading

Reference Issue or PRs

What does your PR implement? Be specific.

umarteauowkin left a comment

Choose a reason for hiding this comment

umarteauowkin Feb 15, 2024

Choose a reason for hiding this comment

BorisMuzellec Feb 15, 2024

Choose a reason for hiding this comment

umarteauowkin Feb 15, 2024

Choose a reason for hiding this comment

BorisMuzellec Feb 15, 2024

Choose a reason for hiding this comment

umarteauowkin Feb 15, 2024

Choose a reason for hiding this comment

BorisMuzellec Feb 15, 2024

Choose a reason for hiding this comment

adamgayoso commented Feb 15, 2024

adamgayoso Feb 15, 2024

Choose a reason for hiding this comment

umarteauowkin commented Feb 16, 2024

adamgayoso commented Feb 16, 2024

BorisMuzellec commented Feb 16, 2024

BorisMuzellec commented Feb 13, 2024 •

edited

Loading