Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very few duplex reads with short amplicons #268

Open
HenkvdMeulen opened this issue Jun 28, 2023 · 16 comments
Open

Very few duplex reads with short amplicons #268

HenkvdMeulen opened this issue Jun 28, 2023 · 16 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@HenkvdMeulen
Copy link

Somehow I'm ending up with a very low number of duplex reads after using Dorado duplex basecalling with the following command (used the ligation sequencing kit V14 SQK-LSK114):

dorado duplex [email protected] pod5/directory/ > duplex.bam

I extracted the duplex reads from this file using
samtools view duplexBarcoded/barcode01.bam | grep "dx:i:1"| samtools view -O BAM -o duplexOnly.bam

However, only 615 out of nearly 2M reads turn out to be duplex reads. What could be the issue here? Does anyone have suggestions as to what might be going wrong and how I could solve it? Thanks in advance!

@vellamike
Copy link
Collaborator

Hi @HenkvdMeulen

A couple of questions:

  1. What flowcells are you using? Are these high duplex flowcells?
  2. How long are your reads? What is their N50?

@HenkvdMeulen
Copy link
Author

HenkvdMeulen commented Jun 28, 2023

Hi @vellamike

  1. I'm using the R10.4.1 flow cell. is there any difference among those, I would assume not?
  2. The amplicon length is expected to be around 450 bp with primers, tails and barcodes. For some reason the MinION did not produce a run report that I can check now, However, there was a very tight peak in the read length around the expected amplicon size. hope that is sufficiently helpful, sorry! I tried to retrieve the report from MinKnow but it doesn't show the experiment anymore, even though I am logged in and changed the number of past days to the right amount.

@vellamike vellamike self-assigned this Jun 28, 2023
@vellamike vellamike added the enhancement New feature or request label Jun 28, 2023
@vellamike vellamike changed the title very few duplex reads Very few duplex reads with short amplicons Jun 28, 2023
@vellamike
Copy link
Collaborator

Hi @HenkvdMeulen,

Dorado has a known issue whereby duplex basecalling of short reads (<1kb) results in low yields. This is an result of the duplex algorithm which has been optimised for longer reads. We are working on a fix and will update this issue when we release it, but at present what you are trying to do will unfortunately not work very well.

Mike

@vellamike vellamike added the bug Something isn't working label Jun 28, 2023
@HenkvdMeulen
Copy link
Author

Hi @vellamike,
That's good to know, thanks! I will patiently wait for the update to arrive. Do you have any estimation on the time until we could expect this fix?

Best regards,
Henk

@HenkvdMeulen
Copy link
Author

Hi @vellamike,
is there any progress yet in regard to this issue? Do you know when this fix will be available in an update?
Thanks!

Best regards,
Henk

@quintonwessells
Copy link

Hi @vellamike

I'm also trying to perform duplex basecalling on reads < 1kb. It is tagging most reads with dx:i:0 (simplex without duplex offpring) and some reads with dx:i:-1 (simplex reads which have duplex offsprings), but no reads actually get tagged with dx:i:1 (duplex). Shouldn't it produce some duplex reads tagged with dx:i:1 if there are dx:i:-1 simplex reads present?

Thanks

@tijyojwad
Copy link
Collaborator

Hi @quintonwessells which version of dorado are you using? your issue sounds like a bug we fixed in 0.3.4

we haven't made any explicit improvements for short read duplex yet. we made some updates to long read duplex yield in the most recent release, and we are in the process of discussing short read related changes.

@quintonwessells
Copy link

Hi @tijyojwad, I was using 0.3.3. Just tried it with 0.3.4 and now I'm not getting any dx:i:-1 or dx:i:1 reads, so seems that is sort of fixed even though I can tell some of the reads marked as simplex are actually duplex. Thanks for keeping us posted on status of short read related changes!

@tijyojwad
Copy link
Collaborator

even though I can tell some of the reads marked as simplex are actually duplex

what do you mean by this?

@alexanderwg-ornl
Copy link

alexanderwg-ornl commented Sep 7, 2023

I think I might be running into the same length limitation. I was playing around with forcing dorado to duplex call reads I knew were complementary to each other by passing the --pairs flag, but it appears that the reads are filtered. Below is a partial output of my experimental run. Is dorado filtering by length or is there another criteria I'm missing that it's using to reject this pair?

[2023-09-07 11:26:51.844] [info] > Starting Stereo Duplex pipeline
[2023-09-07 11:26:51.853] [info] > Reading read channel info
[2023-09-07 11:26:51.854] [info] > Processed read channel info
[2023-09-07 11:26:54.204] [debug] hits 1, mapq 60, overlap length 316, overlap frac 0.8885542, delta -1103, read 1 376, read 2 332, strand -, pass true, accepted true, temp start 27 temp end 343, comp start 18 comp end 313, 7239544e-c176-4d8b-93d4-8cc9da1bac8f and f2f76b5d-0c16-453b-9720-2efd1ca01273
[2023-09-07 11:26:54.964] [info] > Duplex reads basecalled: 0
[2023-09-07 11:26:54.964] [info] > Duplex reads filtered: 1
[2023-09-07 11:26:54.964] [info] > Duplex rate: 0.56497175%
[2023-09-07 11:26:54.964] [info] > Basecalled @ Bases/s: 3.552131e+01

@tijyojwad
Copy link
Collaborator

Unless the min-qscore option was specified in the cmdline, the only other filter enabled by default is the min sequence length, which is 5. so even though the pairing is accepted and a duplex read is called, the length could be super small. probably because the model hasn't really seen data of that size so it doesn't know what to do with it

@alexanderwg-ornl
Copy link

Unless the min-qscore option was specified in the cmdline, the only other filter enabled by default is the min sequence length, which is 5. so even though the pairing is accepted and a duplex read is called, the length could be super small. probably because the model hasn't really seen data of that size so it doesn't know what to do with it

Thanks for the reply! According to dorado's help file for duplex, the min-qscore default is 0 so I never altered it; does that mean the filter is off or is it behaving in some other way? I also don't see a minimum sequence length flag in the help documentation, is this an actual option I can set?

@tijyojwad
Copy link
Collaborator

does that mean the filter is off or is it behaving in some other way?

correct, min qscore 0 means no filtering is happened based on the read's mean qscore

I also don't see a minimum sequence length flag in the help documentation, is this an actual option I can set?

unfortunately not. but that minimum length is 5, which we thought was a good threshold and anything smaller than that is most likely garbage anyway. from your log I'd expect the duplex read to be ~200 bp or longer...

@alexanderwg-ornl
Copy link

I see, so it sounds like it is the same problem as the OP: the duplex basecaller is rejecting these shorter alignments. Thanks for your insight!

@Delayed-Gitification
Copy link

@vellamike Do you have any info on when duplex pairing for short amplicons might be ready? Thanks!

@vellamike
Copy link
Collaborator

Apologies for the delay in replying. Duplex pairing for short amplicons is not something which we will deliver soon. It is currently considered a research problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants