Demultiplexing Issues with Dorado v0.7.3 vs MinKNOW #1179

prekijpatel · 2024-12-17T10:52:45Z

Issue Report

Please describe the issue:

I recently encountered significant differences in basecalling and demultiplexing results between Dorado v0.7.3 and MinKNOW.

When using Dorado for basecalling, most reads were categorized as "unclassified." For example, the size of the FASTQ for barcode 16 was 125 MB.

However, when I basecalled the same dataset using MinKNOW 24.11, the size of the FASTQ for barcode 16 was much larger (959 MB). After downstream assembly, I found that the MinKNOW output for this barcode was heavily contaminated with reads that appeared to belong to other barcodes.

I am trying to determine whether:

Dorado is overly strict during demultiplexing, or
MinKNOW is too lenient, leading to contamination and misclassified reads.

Additionally, I am unsure whether the contamination originates from mis-demultiplexing or inherent issues with the sample.

Steps to reproduce the issue:

Basecall using Dorado v0.7.3 with the following command:

dorado basecaller [email protected] ../../combined_pod5/ -v -x cuda:all -b 0 -c 33000 > basecalled_13092024.bam

Basecall using minknow standard settings for SQK-RBK114-24 with model sup.
Demultiplex the reads from bam file.

dorado demux -o ./ --kit-name SQK-RBK114-24 -t 16 --emit-fastq ../basecalled_13092024.bam

Compare the size of the FASTQ files with those generated by MinKNOW for the same dataset.

Run environment:

Dorado version: v0.7.3

Dorado command:

dorado basecaller [email protected] ../../combined_pod5/ -v -x cuda:all -b 0 -c 33000 > basecalled_13092024.bam

Operating system: Ubuntu 24
Hardware (CPUs, Memory, GPUs): i9-12th gen, Nvidia RTX 3060, 64 Gb RAM
Source data type: pod5
Source data location: On device drive
Details about data:
- Flow cell: R10.4.1
- Kit: SQK-RBK114-24

Thank you for your help and insights!

malton-ont · 2024-12-17T13:41:37Z

Hi @prekijpatel,

I would recommend running the dorado basecaller command with --no-trim and then allowing dorado demux to perform barcode trimming to clear up the barcodes/adapters/primers. It is possible that adapter trimming from the basecaller command is interfering with the barcode detection in the demux stage.

prekijpatel · 2024-12-17T14:10:51Z

Oo, alright! I shall try that.

Also, does it mean that my MinKnow data is correctly demuxed and the mixture of samples I see is inherent contamination in samples sequenced?

malton-ont · 2024-12-17T14:20:55Z

Without more information I'm not sure it's possible to tell, but if dorado gives similar results after that change then I'd say it points in that direction.

prekijpatel · 2024-12-18T04:29:01Z

Sure, I will try the Dorado basecaller with --no-trim will keep things posted.
Thanks a lot for help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demultiplexing Issues with Dorado v0.7.3 vs MinKNOW #1179

Demultiplexing Issues with Dorado v0.7.3 vs MinKNOW #1179

prekijpatel commented Dec 17, 2024 •

edited

Loading

malton-ont commented Dec 17, 2024

prekijpatel commented Dec 17, 2024

malton-ont commented Dec 17, 2024

prekijpatel commented Dec 18, 2024

Demultiplexing Issues with Dorado v0.7.3 vs MinKNOW #1179

Demultiplexing Issues with Dorado v0.7.3 vs MinKNOW #1179

Comments

prekijpatel commented Dec 17, 2024 • edited Loading

Issue Report

Please describe the issue:

Steps to reproduce the issue:

Run environment:

malton-ont commented Dec 17, 2024

prekijpatel commented Dec 17, 2024

malton-ont commented Dec 17, 2024

prekijpatel commented Dec 18, 2024

prekijpatel commented Dec 17, 2024 •

edited

Loading