Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dorado demux outpus a greate portion of unclassfied reads #1185

Open
Zhenlisme opened this issue Dec 18, 2024 · 8 comments
Open

dorado demux outpus a greate portion of unclassfied reads #1185

Zhenlisme opened this issue Dec 18, 2024 · 8 comments
Labels
barcode Issues related to barcoding

Comments

@Zhenlisme
Copy link

Hello,

I used dorado demux to demultiplex and trim barcode. But more than 95% of reads remained unclassified, and the barcode sequences were supposed to be trimmed but were not. Here, I am showing you an example of reads in the unclassified output, with the barcode flanking sequences highlighted in black.
image

The kit we used is SQK-RBK114-24 because the barcode flanking sequence is : 5' - ATCGCCTACCGTGAC - barcode - CGTTTTTCGTGCGCCGCTTC - 3'.

  • Dorado version: 0.8.3+98456f7
  • Dorado command: dorado demux reads.fastq.gz -o dorado_demux --kit-name SQK-RBK114-24 -t 20 --emit-fastq --emit-summary
@malton-ont
Copy link
Collaborator

Hi @Zhenlisme,

Those flanking regions are not correct for SQK-RBK114-24 - see here. From those sequences I would expect SQK-RPB114-24, SQK-RPB004 or SQK-RLB001. Are you certain that SQK-RBK114-24 is the barcoding kit you used to prep the data?

@malton-ont malton-ont added the barcode Issues related to barcoding label Dec 18, 2024
@Zhenlisme
Copy link
Author

Hi,
Thank you for your reply. I did not do the sequencing experiment. I just deduced the kit number according to this document, where I found it conformed the kit "Rapid PCR Barcoding Kit 24 V14". And I am sorry I did not use the correct kit before (they look so similar..).

Yet, I still have some questions:

  1. I agree with you that the correct kit should be either SQK-RPB114-24 or SQK-RPB004. But currently I don't know which one is the exact kit we used for sequencing. Is there any way to deduce them from the reads or should I use both of them?
  2. I rerun the demux function with the two kit respectively. I still got many unclassified reads with both kits. And I found that the many these reads contain internal barcodes. Do you have any suggestions?

Thank you again for your time.
Zhen

@malton-ont
Copy link
Collaborator

@Zhenlisme,

From the document you linked to you can see that Rapid PCR Barcoding Kit 24 V14 is SQK-RPB114-24.

If you have the .pod5 files, you can run:

pod5 inspect debug <file.pod5> | grep sequencing_kit

and this should output the kit used.

Regarding your remaining unclassified reads, it's difficult to say without more information - how many is "many"? What proportion of your reads remain unclassified?

My first suggestion would be to run the original basecall command with --no-trim to prevent adapter trimming from adversely affecting the flank sequences. I would then suggest demuxing a small sample of the unclassified reads with the -vv flag and looking for lines in the output log like "Found midstrand barcode flanks" - reads that contain barcode flanks beyond the expected barcode window (~175 bases from either end) are marked as unclassified as these are typically concatamers.

@Zhenlisme
Copy link
Author

Hi,
Thanks for the quick reply.
I tried the 'vv' flag. Indeed, I found lines "Found midstrand barcode flanks" in the output log, meaning there are concatamers in the unclassified reads. Should I just dope the concatamers or is there any other methods to deal with them?

@malton-ont
Copy link
Collaborator

dorado doesn't provide any other mechanism to deal with these reads beyond marking them as unclassified. Any further analysis of them is left to the user.

@Zhenlisme
Copy link
Author

Thanks a lot

@Zhenlisme
Copy link
Author

Hello again,

I found adaptors even in the clasified reads, meaning that the dorado demux did not trim all reads accordingly. In addition, there are still some reads being concatemers among those classified reads. Is it normal? I worried about that such reads would influence the quality of genome assembly if concatemers are prevalent in my case.

Thank you for your time.
Zhen

@Zhenlisme Zhenlisme reopened this Dec 18, 2024
@malton-ont
Copy link
Collaborator

dorado demux does not trim adapters directly, only barcodes - since adapters are outboard of barcodes, classified reads should also have the adapter removed. dorado attempts to identify mid-strand barcodes (and mark the read unclassified), but it does not search for mid-strand adapters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
barcode Issues related to barcoding
Projects
None yet
Development

No branches or pull requests

2 participants