-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consensus duplex read generation by combining dx:i:-1 and dx:i:1 reads #491
Comments
Hi @bruzecruise , If you wanted to do this, my recommendation would be to write a custom tool to do it yourself. You would be able to get most of the way there by ignoring the complement read (the second read in the duplex read, the one following the semicolon) and just using the template - since the template is normally longer, more accurate, and has the same direction as the duplex read). What you would need to do is align the two to each other and find a point where you can "stitch" the two reads together. We have considered adding this to Dorado, but came to the view that it would have limited utility. We may reconsider this in the future if there is a lot of demand for it. |
Hi @vellamike thanks for the suggestion! If i get around to making some sort of script I'll be sure to share it here. |
@bruzecruise Thank you for any information you can provide! |
I'm still a bit skeptical that combining the duplex read with the non duplex template component is such a good idea, are you seeing significant amounts of length difference between the simplex template and corresponding duplex reads? |
Hi @vellamike We have only a single sequencing run right now. I will look into the data from the run tomorrow and update. The main concern that has been raised in my group is an issue in which the template strand is 100kb but the complement read is 1kb, making the duplex read generated 1kb. Since this read is derived from a significantly longer template strand, we would want to utilize the duplex + remaining 99kb template strand during assembly to prevent chimera formation or gaps. From my understanding of the dxr tags and the conversation here, simply sorting out by tag would either remove data or create redundant data an assembler would have to decipher, raising concerns with duplexing potentially changing the bases at a given position. Thank you, |
@minefield47 The example you give (100kb->1kb) is quite bad but I'm a little bit skeptical that this is common enough to warrant a separate fix, I suspect it's quite rare. Have you considered writing a small tool to do this step? what you need to do is local align the duplex read to the template using something like |
Hi @minefield47 BUT I have manually looked through 10 or so alignments of duplex reads and their simplex parents and so far it seems duplex reads are largely the same size as their simplex parents. The worse offending case I could find was where simplex reads could extend a 11,000 bp duplex read another 400 bp. I'll update you if anything changes. |
@vellamike Apologies for the delay, it has been a busy few days. I started doing some simple statistical analysis around the duplexing and I found a series of weird cases in which a read is used as a template strand for one duplex read, having a length close to the original read length, while also being a complement for a different duplex read. For instance, this read of 10kb is used as the complement for a template of ~700bp. Is this expected behavior? Thank you! |
@bruzecruise |
For duplex base calling as I understand the output BAM file contains reads with the following tags:
dx:i:1 for duplex reads.
dx:i:0 for simplex reads which don't have duplex offsprings.
dx:i:-1 for simplex reads which have duplex offsprings.
Would it be possible to create a 4th option here that can be the "consensus" read consisting of the combination of the dx:i:1 and the dx:i:-1 reads - essentially extending the length of duplex reads with their simplex ends. Currently, we are stuck with either excluding the dx:i:-1 reads which could be reducing the potential read lengths or including them which falsely increases our coverage.
#327
Also any current workarounds would be appreciated.
Cheers,
Dan
The text was updated successfully, but these errors were encountered: