-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[question] duplex and barcode demultiplexing? #522
Comments
Hi @JWDebler , duplex barcode demux is currently not implemented in Dorado. The best way to currently do it would be to demux the simplex reads and then use the duplex read IDs (which are a concatenation of simplex read IDs) to demux. |
we have this on our roadmap and we plan to implement this for a future release |
Are there updates on the timeline to release barcode demultiplexing for duplex ? It seems to be a heavily requested feature for a while now... |
You can extract them, it's just a bit annoying as it basically means basecalling twice. At least that's what we are currently doing. Basecall in simplex mode, classify into barcodes, extract barcode reads into their own pod5 files and then duplex call those individually:
|
I see, thanks a lot for the details of your implementation ! |
I created this script to demux a duplex basecalled BAM. Takes a summary file after demultiplexing simplex reads and your duplex called BAM. There’s some options for dealing with situations where the two reads that make a duplex read don’t have the same barcode in simplex mode or if one is unclassified. Also supports fastq output |
Thanks for your answer and the link to your code ! Great work on the manuscript as well, it's super nice to see the variant calling metrics there (also shows that duplex is maybe not essential ^^). |
interesting @mbhall88. I'm a bit confused. How did you get barcodes in the duplex bam file? I thought dorado couldn't do duplex calling and barcode assignment at the same time yet, which is why we're doing it the long way around. |
There’s no barcodes in the duplex bam. But that script uses the assignment from the sequencing summary file from the simplex basecalling to assign barcodes for the duplex reads. I also found cases when testing that there were sometime simplex reads that are part of a duplex read that have different barcode assignments (normally it was just that one was unclassified and the other was classified) so I added options for how you want to deal with this |
Ahhh, ok, that makes sense. Do you do simplex live calling during the run then? We always use post run basecalling, and I don't think that produces the summary file. |
No I generally basecall after the fact also. But you can create a summary file with https://github.com/nanoporetech/dorado?tab=readme-ov-file#sequencing-summary the demux command also has an —emit-summary option which will place a summary file in the output directory (see https://github.com/nanoporetech/dorado?tab=readme-ov-file#classifying-existing-datasets) |
Thanks for the details, I'll give it a try ! Looking at your script it does not seem super complicated to demultiplex duplex reads. I'm a bit puzzled why there is no native way of doing it with Dorado... |
This is my current implementation, maybe useful for somebody ... This uses the bacterial research model for the basecalling, just change this to "sup" or "hac" in line 53 if you want to use the standard models |
Just to clarify, when running doradobase caller would --no-trim need to be added for the subsequent demultiplex step to work? |
Hi,
I am wondering if there is currenly a way to do barcode demultiplexing and duplex calling using only dorado, and if so, what would be the best way to do it?
The text was updated successfully, but these errors were encountered: