[BUG] Discrepancies between DESeq2/pyDESeq2 #217

ledwmp · 2023-12-05T22:17:25Z

We are experiencing issues where specific datasets yield dramatically different results (p-values) between the R and pyDESeq2 implementations. Given that other datasets yield very similar fits and p-values, don't think this is caused by different parameters, issues with precision, etc. Happy to provide a fully reproducible example, just curious if this is a known issue or if anyone else has encountered the same issue and if there is a workaround.

Python: 3.12
pydeseq2: 0.4.3
OS: Ubuntu 22.04.3 LTS

Dataset, everything behaving as expected:

Dataset, discordant python/R results:

BorisMuzellec · 2023-12-06T08:03:35Z

Hi @ledwmp, thanks for sharing your feedback.

Some discrepancies are indeed to be expected from time to time, but certainly not to the extent shown in the second example (assuming that the same set of parameters were chosen in DESeq2 and PyDESeq2).

If you could share a reproducible example, I'd be very interested to have a look.

BorisMuzellec · 2023-12-18T15:54:49Z

Hi @ledwmp, were you able to solve this issue? If not, could you share a reproducible example?

ledwmp · 2024-01-05T00:46:09Z

Hi @BorisMuzellec,

Sorry for the delay, hopefully you can still take a look at this.

https://gist.github.com/ledwmp/464d8ba0edd3d66d314db3cc4ec4e89d

I put together a minimal(ish) example here. Because the discrepancies that we're seeing only occur on some datasets, took me a little while to find something that reproduces the issue and runs in a reasonable amount of time.

It looks like many of the discrepancies are found in lowly expressed genes. You'll see that there are some convergence issues in the R implementation for this datasets, but I think the issue at large still holds.

Also, if you wouldn't mind taking a peak at the last cell in the notebook. We notice that pydeseq2 sometimes reports extremely low pvalues with fits that are obviously problematic.

Thanks for taking a look! We love using pydeseq2. Hopefully this is helpful, and please point out any issues that may be occurring due to user error.

BorisMuzellec · 2024-01-05T13:53:19Z

Hi @ledwmp, thanks for providing a complete example.

I didn't dig deeply into it yet, but I can see that the DESeq2 raises the following warning:

R[write to console]: -- note: fitType='parametric', but the dispersion trend was not well captured by the function: y = a/x + b, and a local regression fit was automatically substituted. specify fitType='local' or 'mean' to avoid this message next time.

Since local regression fits for the trend curve are not implement yet in PyDESeq2, this could explain part of the difference in the results.

To check this, it would be interesting to see whether genewise dispersions (pre-MAP refitting and filtering) are already different between both packages.

ledwmp added the bug Something isn't working label Dec 5, 2023

BorisMuzellec closed this as completed Dec 22, 2023

BorisMuzellec reopened this Jan 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Discrepancies between DESeq2/pyDESeq2 #217

[BUG] Discrepancies between DESeq2/pyDESeq2 #217

ledwmp commented Dec 5, 2023 •

edited

Loading

BorisMuzellec commented Dec 6, 2023

BorisMuzellec commented Dec 18, 2023

ledwmp commented Jan 5, 2024

BorisMuzellec commented Jan 5, 2024

[BUG] Discrepancies between DESeq2/pyDESeq2 #217

[BUG] Discrepancies between DESeq2/pyDESeq2 #217

Comments

ledwmp commented Dec 5, 2023 • edited Loading

BorisMuzellec commented Dec 6, 2023

BorisMuzellec commented Dec 18, 2023

ledwmp commented Jan 5, 2024

BorisMuzellec commented Jan 5, 2024

ledwmp commented Dec 5, 2023 •

edited

Loading