Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PAML Emulator #1802

Open
FinnKlemp opened this issue Jan 29, 2025 · 8 comments
Open

PAML Emulator #1802

FinnKlemp opened this issue Jan 29, 2025 · 8 comments

Comments

@FinnKlemp
Copy link

Hello!
I am using the PAML emulator for my Phd-thesis and have a question. When i run the original PAML Branch-Site model, I get some genes with very high dN/dS ratios for few sites (presumably caused by msa errors/low dS). This does not happen when I use the same data in the PAML emulator.
Does the emulator remove "problematic" sites/ratios in a similar fashion to some options in regular HyPhy?
I am assuming that it does but would be very thankful for some clarification on what is done exactly.
Thank you so much in advance and for the possibilities of your great program!
Cheers!
Finn

@spond
Copy link
Member

spond commented Jan 29, 2025

Dear @FinnKlemp,

Not immediately sure. HyPhy filtering is "opt in", so you have to turn on specific options. These options are not available for legacy PAML-type analyses.

I'd be happy to help you but I need to be able to reproduce the error. So I need the input files, the exact commands that you used, and the hyphy version.

Best,
Sergei

@FinnKlemp
Copy link
Author

Thank you for the answer! I think i did not explain it correctly. When I run PAML and the PAML emulator I get different dN/dS ratio results for the same gene (the only difference may be the exact labeling) I was wondering what causes these diferences

@spond
Copy link
Member

spond commented Jan 30, 2025

Dear @FinnKlemp,

I am not sure what 'PAML emulator' refers to. This? https://github.com/veg/hyphy-analyses/tree/master/PAML-emulator

Best,
Sergei

@FinnKlemp
Copy link
Author

Yes! I was wondering if it does anythong different than the original PAML tests, as I got quite different results.

@spond
Copy link
Member

spond commented Jan 31, 2025

Dear @FinnKlemp,

There are lots of possibilities

  1. Difference in settings. Depending on what you have specified in your PAML .ctl file, options like equilibrium frequencies and rate variation models may differ between the two. For example, the emulator uses the F3x4 frequency estimator.
  2. Difference in how indels are handled. I was never 100% sure what PAML does there; hyphy keeps everything and treats indels as missing data. This may matter if you have a gappy alignment.
  3. Difference in optimization robustness. Branch site models are notoriously temperamental. The programs may simply converge to different "solutions".

If the underlying models are the same, you could always compare the log likelihood produced, and select the one with the better score.

Best,
Sergei

@spond
Copy link
Member

spond commented Feb 2, 2025

Dear @FinnKlemp,

I'd be happy to take a closer look if you send me one file where you observe the diffefrences, along with the PAML .ctl file, and the command you used to run hyphy.

Best,
Sergei

@FinnKlemp
Copy link
Author

Thank you so much! The very high dN/dS ratios were caused by msa errors. I figured out how to get rid of them, but was wondering how the two PAML version got to very different results. But "
Difference in optimization robustness. Branch site models are notoriously temperamental. The programs may simply converge to different "solutions"." seems logical, especially considering that the programs were working with "problematic" alingments.

@spond
Copy link
Member

spond commented Feb 3, 2025

Dear @FinnKlemp,

For MSA errors I would encourage you to consider BUSTED-E (https://www.biorxiv.org/content/10.1101/2024.11.13.620707v1.full). You can use BUSTED-E to filter the alignment and then feed it to PAML and compare the before/after.

Best,
Sergei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants