You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is something I noticed together with @isabeljoia while working on the mpv nextclade dataset.
The aim was to have a midpoint rooted tree without timetree, but augur consistently
reduced one of the branches emanating from the root to zero length
assigned no mutations to that branch
even though the input tree to refine and ancestral was mid-point rooted. Even the output tree of refined was mid-point rooted. It turned out that there were two unrelated but similar issues.
TreeTime assigns mutations at a node according to the argument sample-from-profile which can take the value False, True and root. In the case of False, the most likely state is picked, in the case of True a random choice is made according to the inferred distribution among states . the choice root results in sample-from-profile==True at the root node and false otherwise.
Augur, however, doesn't specify sample-from-profile and it defaults to False. This results in all mutations in either child of the root to be assigned to the longer branch, which it typically not the desired result. The same applies when refining trees without timetree and using mutation as unit of branch length.
We would need to specify sample_from_profile='root' here
of note, estimates of branch length in units of mutations are done independently of the reconstruction, which might well result in inconsistencies between the length of a branch and the number of mutations assigned to it. This is only an issue for long branches without an explicit root when the mutation unit isn't very useful, but still...
It'd be nice to implement a good fix for this, it keeps rearing its head.
Apart from the oddity of branch lengths not matching mutations this issue prevents augur clades from working on very deep nodes. For mpox clade-Ia/Ib I implemented a (hopefully temporary) solution which uses the MRCA of metadata to get around this.
in the case of True a random choice is made according to the inferred distribution among states
If i'm understanding correctly, this will tend towards matching up the branch length with the number of mutations but I don't think it'll solve the consistent assignment of mutations to one or the other basal branch. For instance, with mpox clade-I nuc 35352 we have one clade of base A (clade Ia) and one of base C (Ib). I don't think this approach can correctly assign the the root state to A or C (it should be C according to our clades.tsv) without some knowledge of an outgroup sequence.
Without an outgroup, we can't know the root state is and I don't think there is an algorithmic solution to this. But if you specified say 5 sites for each clade, the probability that all relevant mutation get assigned to "other" branch should be very low. In the worst case, one of the two sister clades would be assigned to the root node. augur clade takes the largest clade that matches the specified genotype if I recall correctly.
This issue is something I noticed together with @isabeljoia while working on the mpv nextclade dataset.
The aim was to have a midpoint rooted tree without timetree, but augur consistently
even though the input tree to
refine
andancestral
was mid-point rooted. Even the output tree ofrefined
was mid-point rooted. It turned out that there were two unrelated but similar issues.TreeTime assigns mutations at a node according to the argument
sample-from-profile
which can take the valueFalse
,True
androot
. In the case ofFalse
, the most likely state is picked, in the case ofTrue
a random choice is made according to the inferred distribution among states . the choiceroot
results insample-from-profile==True
at the root node and false otherwise.Augur, however, doesn't specify
sample-from-profile
and it defaults toFalse
. This results in all mutations in either child of the root to be assigned to the longer branch, which it typically not the desired result. The same applies when refining trees without timetree and usingmutation
as unit of branch length.We would need to specify
sample_from_profile='root'
hereaugur/augur/ancestral.py
Line 87 in e90383b
and here
augur/augur/refine.py
Line 316 in e90383b
The text was updated successfully, but these errors were encountered: