-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dorado duplex having different pairing rates on the same dataset #372
Comments
Thanks for reporting this @hasindu2008 - we are looking into it. |
Hi @hasindu2008 ,I'd like some clarification - in the first section where you run |
Yes, that is correct. |
Hi @hasindu2008 , could you run both of these a few times and report on the yields reported? Duplex is not fully deterministic because we make a determinism-performance tradeoff. I'd like to understand if what you see with one pod5 producing higher yield is consistent or within the noise. Also, could you give me some infromation about what kind of data this is? (read lengths, amplicons etc) |
Seeing this now makes we wonder if grouping by channel will have a different effect. I'd hope it would mirror a single pod5 file. I'll try to post if we see any variability. |
Yeh, different rates at different times:
|
Hi @hasindu2008 , we are working on making duplex basecalling fully deterministic, but I'm still surprised to see a 3% difference between the single and multiple POD5s, do you find this difference to be systematic? |
I could not find some time to do a through observation to see if it is systematic. From the limited tests, it feels as if things are a bit stochastic |
For the same dataset, depending on if it is multiple POD5 files or a merged single POD5 file, I seem to get a bit different output when using Dorado duplex. I would expect the output to be deterministic irrespective of the number of files for the same dataset.
Merging using:
SIngle POD5:
Merged POD5:
The text was updated successfully, but these errors were encountered: