-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
are duplex foldback chimeras handled by Dorado? #443
Comments
By examining output from First, the following demonstrate that yes, dorado will perform duplex basecalling on split reads. I provide this as proof for those who might land here. code
output
So, split reads show a mix of simplex and duplex. In this small sampling of one run, there are more duplex than simplex, since many chimeric reads are false foldback duplexes. Duplex reads arise from both split and unsplit read pairs, with plenty of both types. The example shows that parent read |
The above was the good news. Here's the bad news. My downstream code looks for alignment pairs from a single read where minimap2 reports them as aligning to the same genomic location on opposite strands, the expected alignment pattern when a chimeric read comprises the two strands of the same duplex molecule. Read splitting prevents some of those, but not all as exemplified here: code
output
Thus, read I don't have a number, but such reads are frequent in the data I'm analyzing, it is not hard to find examples. My code is finding them after reference alignment, but in my perfect world, Dorado would find these chimeric reads that cannot be split based on adapter searching alone and subject them to stereo basecalling. This could be done by looking for self-complementarity of reads. Any chance that Dorado could be made to do that? It could increase the duplex yield. Perhaps it is undesirable due to reduced confidence in the nature of a chimera in the absence of adapter confirmation, or computational reasons. If so, it will be useful to have it definitely stated that Dorado will never catch them so people realize that downstream code must. |
doradao duplex identifies when two different reads have sequenced the same strand of a source DNA molecule.
Simplex read splitting was added to dorado in v0.4.0.
However, I am uncertain if the reads that are split are then checked for duplex state? Some duplex sequence reads are output as a single chimeric read that falsely appears as a foldback inversion, not as two independent reads. Does dorado find these (after splitting) and apply stereo calling to them in duplex mode?
The text was updated successfully, but these errors were encountered: