You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Freebayes likes to output MNPs e.g. TGC TAC,TGG.
Currently the best approach with these is to 'atomize' them into SNPs before assembly (removing redundant bases).
There are two options to support these without any user pre-processing:
automatically atomize them
generalize the encoding of read distributions to include MNPs
The second option would technically be the most computationally efficient because it only requires fewer variants.
The first option assesses all possible combinations of the atomized SNPs, not just the alts of the MNP.
The main complexity of encoding an MNP is how to handle the case where a read does not extend the entire length of the MNP.
The text was updated successfully, but these errors were encountered:
This may require allowing read allele "distributions" in which the allele probabilities at a position sum to > 1.
This is because these probabilities are interpreted as "the probability of drawing this read from a haplotype with these alleles assuming the read can only cover the bases that it does".
Currently this means that P(T,A,- | T,A,A) + P(T,A,- | T,A,T) > 1 because a gap - is ignored when calculating the probability.
Encoded as an MNP we should get the same result P(TA- | TAA) + P(TA- | TAT) > 1.
For example two SNPs that could be encoded as a single MNP with all possible combinations of SNP alleles:
However, if not all possible SNP encodings where present (i.e. the MNP is restricted to a subset of the possible haplotypes) then these probabilities would need to be normalized.
In which case the probabilities of the partial read should be 'normalized' to sum to the total found if all SNP combinations are present?
Freebayes likes to output MNPs e.g.
TGC TAC,TGG
.Currently the best approach with these is to 'atomize' them into SNPs before assembly (removing redundant bases).
There are two options to support these without any user pre-processing:
The second option would technically be the most computationally efficient because it only requires fewer variants.
The first option assesses all possible combinations of the atomized SNPs, not just the alts of the MNP.
The main complexity of encoding an MNP is how to handle the case where a read does not extend the entire length of the MNP.
The text was updated successfully, but these errors were encountered: