Consensus duplex read generation by combining dx:i:-1 and dx:i:1 reads #491

bruzecruise · 2023-11-29T15:00:56Z

For duplex base calling as I understand the output BAM file contains reads with the following tags:

dx:i:1 for duplex reads.
dx:i:0 for simplex reads which don't have duplex offsprings.
dx:i:-1 for simplex reads which have duplex offsprings.

Would it be possible to create a 4th option here that can be the "consensus" read consisting of the combination of the dx:i:1 and the dx:i:-1 reads - essentially extending the length of duplex reads with their simplex ends. Currently, we are stuck with either excluding the dx:i:-1 reads which could be reducing the potential read lengths or including them which falsely increases our coverage.

#327

Also any current workarounds would be appreciated.

Cheers,
Dan

vellamike · 2023-12-04T17:58:15Z

Hi @bruzecruise ,

If you wanted to do this, my recommendation would be to write a custom tool to do it yourself. You would be able to get most of the way there by ignoring the complement read (the second read in the duplex read, the one following the semicolon) and just using the template - since the template is normally longer, more accurate, and has the same direction as the duplex read). What you would need to do is align the two to each other and find a point where you can "stitch" the two reads together.

We have considered adding this to Dorado, but came to the view that it would have limited utility. We may reconsider this in the future if there is a lot of demand for it.

bruzecruise · 2023-12-06T15:18:41Z

Hi @vellamike thanks for the suggestion! If i get around to making some sort of script I'll be sure to share it here.

minefield47 · 2024-01-17T23:47:44Z

@bruzecruise
Have you done any more looking into this process? I am looking to do a de-novo assembly of a non-model organism and the ability to concatenate the duplex with the template read to prevent chimera formation during assembly.

Thank you for any information you can provide!

vellamike · 2024-01-18T00:46:22Z

I'm still a bit skeptical that combining the duplex read with the non duplex template component is such a good idea, are you seeing significant amounts of length difference between the simplex template and corresponding duplex reads?

minefield47 · 2024-01-18T02:24:57Z

Hi @vellamike
Could you elaborate why you are skeptical?

We have only a single sequencing run right now. I will look into the data from the run tomorrow and update. The main concern that has been raised in my group is an issue in which the template strand is 100kb but the complement read is 1kb, making the duplex read generated 1kb. Since this read is derived from a significantly longer template strand, we would want to utilize the duplex + remaining 99kb template strand during assembly to prevent chimera formation or gaps. From my understanding of the dxr tags and the conversation here, simply sorting out by tag would either remove data or create redundant data an assembler would have to decipher, raising concerns with duplexing potentially changing the bases at a given position.

Thank you,

vellamike · 2024-01-18T13:18:45Z

@minefield47 The example you give (100kb->1kb) is quite bad but I'm a little bit skeptical that this is common enough to warrant a separate fix, I suspect it's quite rare.

Have you considered writing a small tool to do this step? what you need to do is local align the duplex read to the template using something like edlib and stitch the missing simplex component.

bruzecruise · 2024-01-24T19:21:58Z

Hi @minefield47
I've sadly haven't had anytime to try and write a little consensus script.

BUT I have manually looked through 10 or so alignments of duplex reads and their simplex parents and so far it seems duplex reads are largely the same size as their simplex parents. The worse offending case I could find was where simplex reads could extend a 11,000 bp duplex read another 400 bp. I'll update you if anything changes.

minefield47 · 2024-01-26T21:12:43Z

@vellamike Apologies for the delay, it has been a busy few days. I started doing some simple statistical analysis around the duplexing and I found a series of weird cases in which a read is used as a template strand for one duplex read, having a length close to the original read length, while also being a complement for a different duplex read. For instance, this read of 10kb is used as the complement for a template of ~700bp. Is this expected behavior?

Thank you!

minefield47 · 2024-01-27T00:22:40Z

@bruzecruise
No worries at all, thank you for sharing. Another developer shared the code to determine pairs (https://github.com/nanoporetech/dorado/blob/master/dorado/read_pipeline/PairingNode.cpp#L67) and based on cursory glances through my first library (we just finished our second and plan to continue for the next couple of months until we can generate a de novo genome), I get some results pretty similar as you described comparing template to the simplexes. The worst I got by far is the example I showed above in which a 10k read was used as the complement for a read that was only ~700bp.

vellamike self-assigned this Dec 4, 2023

vellamike closed this as not planned Won't fix, can't repro, duplicate, stale Dec 6, 2023

vellamike reopened this Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consensus duplex read generation by combining dx:i:-1 and dx:i:1 reads #491

Consensus duplex read generation by combining dx:i:-1 and dx:i:1 reads #491

bruzecruise commented Nov 29, 2023 •

edited

Loading

vellamike commented Dec 4, 2023

bruzecruise commented Dec 6, 2023 •

edited

Loading

minefield47 commented Jan 17, 2024 •

edited

Loading

vellamike commented Jan 18, 2024 •

edited

Loading

minefield47 commented Jan 18, 2024

vellamike commented Jan 18, 2024

bruzecruise commented Jan 24, 2024

minefield47 commented Jan 26, 2024 •

edited

Loading

minefield47 commented Jan 27, 2024 •

edited

Loading

Consensus duplex read generation by combining dx:i:-1 and dx:i:1 reads #491

Consensus duplex read generation by combining dx:i:-1 and dx:i:1 reads #491

Comments

bruzecruise commented Nov 29, 2023 • edited Loading

vellamike commented Dec 4, 2023

bruzecruise commented Dec 6, 2023 • edited Loading

minefield47 commented Jan 17, 2024 • edited Loading

vellamike commented Jan 18, 2024 • edited Loading

minefield47 commented Jan 18, 2024

vellamike commented Jan 18, 2024

bruzecruise commented Jan 24, 2024

minefield47 commented Jan 26, 2024 • edited Loading

minefield47 commented Jan 27, 2024 • edited Loading

bruzecruise commented Nov 29, 2023 •

edited

Loading

bruzecruise commented Dec 6, 2023 •

edited

Loading

minefield47 commented Jan 17, 2024 •

edited

Loading

vellamike commented Jan 18, 2024 •

edited

Loading

minefield47 commented Jan 26, 2024 •

edited

Loading

minefield47 commented Jan 27, 2024 •

edited

Loading