Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency of lengths and mutations assigned to branches that are children of the root. #1689

Open
rneher opened this issue Nov 25, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@rneher
Copy link
Member

rneher commented Nov 25, 2024

This issue is something I noticed together with @isabeljoia while working on the mpv nextclade dataset.

The aim was to have a midpoint rooted tree without timetree, but augur consistently

  • reduced one of the branches emanating from the root to zero length
  • assigned no mutations to that branch

even though the input tree to refine and ancestral was mid-point rooted. Even the output tree of refined was mid-point rooted. It turned out that there were two unrelated but similar issues.

TreeTime assigns mutations at a node according to the argument sample-from-profile which can take the value False, True and root. In the case of False, the most likely state is picked, in the case of True a random choice is made according to the inferred distribution among states . the choice root results in sample-from-profile==True at the root node and false otherwise.

Augur, however, doesn't specify sample-from-profile and it defaults to False. This results in all mutations in either child of the root to be assigned to the longer branch, which it typically not the desired result. The same applies when refining trees without timetree and using mutation as unit of branch length.

We would need to specify sample_from_profile='root' here

tt.infer_ancestral_sequences(infer_gtr=infer_gtr, marginal=bool_marginal,

and here

tt.infer_ancestral_sequences()

@rneher rneher added the bug Something isn't working label Nov 25, 2024
@rneher
Copy link
Member Author

rneher commented Nov 25, 2024

of note, estimates of branch length in units of mutations are done independently of the reconstruction, which might well result in inconsistencies between the length of a branch and the number of mutations assigned to it. This is only an issue for long branches without an explicit root when the mutation unit isn't very useful, but still...

@jameshadfield
Copy link
Member

It'd be nice to implement a good fix for this, it keeps rearing its head.

Apart from the oddity of branch lengths not matching mutations this issue prevents augur clades from working on very deep nodes. For mpox clade-Ia/Ib I implemented a (hopefully temporary) solution which uses the MRCA of metadata to get around this.

in the case of True a random choice is made according to the inferred distribution among states

If i'm understanding correctly, this will tend towards matching up the branch length with the number of mutations but I don't think it'll solve the consistent assignment of mutations to one or the other basal branch. For instance, with mpox clade-I nuc 35352 we have one clade of base A (clade Ia) and one of base C (Ib). I don't think this approach can correctly assign the the root state to A or C (it should be C according to our clades.tsv) without some knowledge of an outgroup sequence.

@rneher
Copy link
Member Author

rneher commented Nov 25, 2024

Without an outgroup, we can't know the root state is and I don't think there is an algorithmic solution to this. But if you specified say 5 sites for each clade, the probability that all relevant mutation get assigned to "other" branch should be very low. In the worst case, one of the two sister clades would be assigned to the root node. augur clade takes the largest clade that matches the specified genotype if I recall correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants