-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Discrepancies between DESeq2/pyDESeq2 #217
Comments
Hi @ledwmp, thanks for sharing your feedback. Some discrepancies are indeed to be expected from time to time, but certainly not to the extent shown in the second example (assuming that the same set of parameters were chosen in DESeq2 and PyDESeq2). If you could share a reproducible example, I'd be very interested to have a look. |
Hi @ledwmp, were you able to solve this issue? If not, could you share a reproducible example? |
Hi @BorisMuzellec, Sorry for the delay, hopefully you can still take a look at this. https://gist.github.com/ledwmp/464d8ba0edd3d66d314db3cc4ec4e89d I put together a minimal(ish) example here. Because the discrepancies that we're seeing only occur on some datasets, took me a little while to find something that reproduces the issue and runs in a reasonable amount of time. It looks like many of the discrepancies are found in lowly expressed genes. You'll see that there are some convergence issues in the R implementation for this datasets, but I think the issue at large still holds. Also, if you wouldn't mind taking a peak at the last cell in the notebook. We notice that pydeseq2 sometimes reports extremely low pvalues with fits that are obviously problematic. Thanks for taking a look! We love using pydeseq2. Hopefully this is helpful, and please point out any issues that may be occurring due to user error. |
Hi @ledwmp, thanks for providing a complete example. I didn't dig deeply into it yet, but I can see that the DESeq2 raises the following warning:
Since local regression fits for the trend curve are not implement yet in PyDESeq2, this could explain part of the difference in the results. To check this, it would be interesting to see whether genewise dispersions (pre-MAP refitting and filtering) are already different between both packages. |
We are experiencing issues where specific datasets yield dramatically different results (p-values) between the R and pyDESeq2 implementations. Given that other datasets yield very similar fits and p-values, don't think this is caused by different parameters, issues with precision, etc. Happy to provide a fully reproducible example, just curious if this is a known issue or if anyone else has encountered the same issue and if there is a workaround.
Python: 3.12
pydeseq2: 0.4.3
OS: Ubuntu 22.04.3 LTS
Dataset, everything behaving as expected:
Dataset, discordant python/R results:
The text was updated successfully, but these errors were encountered: