Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Discrepancies between DESeq2/pyDESeq2 #217

Open
ledwmp opened this issue Dec 5, 2023 · 4 comments
Open

[BUG] Discrepancies between DESeq2/pyDESeq2 #217

ledwmp opened this issue Dec 5, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@ledwmp
Copy link

ledwmp commented Dec 5, 2023

We are experiencing issues where specific datasets yield dramatically different results (p-values) between the R and pyDESeq2 implementations. Given that other datasets yield very similar fits and p-values, don't think this is caused by different parameters, issues with precision, etc. Happy to provide a fully reproducible example, just curious if this is a known issue or if anyone else has encountered the same issue and if there is a workaround.

Python: 3.12
pydeseq2: 0.4.3
OS: Ubuntu 22.04.3 LTS

Dataset, everything behaving as expected:
image

Dataset, discordant python/R results:
image

@ledwmp ledwmp added the bug Something isn't working label Dec 5, 2023
@BorisMuzellec
Copy link
Collaborator

Hi @ledwmp, thanks for sharing your feedback.

Some discrepancies are indeed to be expected from time to time, but certainly not to the extent shown in the second example (assuming that the same set of parameters were chosen in DESeq2 and PyDESeq2).

If you could share a reproducible example, I'd be very interested to have a look.

@BorisMuzellec
Copy link
Collaborator

Hi @ledwmp, were you able to solve this issue? If not, could you share a reproducible example?

@ledwmp
Copy link
Author

ledwmp commented Jan 5, 2024

Hi @BorisMuzellec,

Sorry for the delay, hopefully you can still take a look at this.

https://gist.github.com/ledwmp/464d8ba0edd3d66d314db3cc4ec4e89d

I put together a minimal(ish) example here. Because the discrepancies that we're seeing only occur on some datasets, took me a little while to find something that reproduces the issue and runs in a reasonable amount of time.

It looks like many of the discrepancies are found in lowly expressed genes. You'll see that there are some convergence issues in the R implementation for this datasets, but I think the issue at large still holds.

Also, if you wouldn't mind taking a peak at the last cell in the notebook. We notice that pydeseq2 sometimes reports extremely low pvalues with fits that are obviously problematic.

Thanks for taking a look! We love using pydeseq2. Hopefully this is helpful, and please point out any issues that may be occurring due to user error.

@BorisMuzellec BorisMuzellec reopened this Jan 5, 2024
@BorisMuzellec
Copy link
Collaborator

Hi @ledwmp, thanks for providing a complete example.

I didn't dig deeply into it yet, but I can see that the DESeq2 raises the following warning:

R[write to console]: -- note: fitType='parametric', but the dispersion trend was not well captured by the function: y = a/x + b, and a local regression fit was automatically substituted. specify fitType='local' or 'mean' to avoid this message next time.

Since local regression fits for the trend curve are not implement yet in PyDESeq2, this could explain part of the difference in the results.

To check this, it would be interesting to see whether genewise dispersions (pre-MAP refitting and filtering) are already different between both packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants