Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dorado basecalling: handling fail, pass, and skip reads #1241

Open
Patricie34 opened this issue Feb 4, 2025 · 4 comments
Open

Dorado basecalling: handling fail, pass, and skip reads #1241

Patricie34 opened this issue Feb 4, 2025 · 4 comments

Comments

@Patricie34
Copy link

Hi everyone,

I have a question related to the Dorado basecalling algorithm.
When performing simplex basecalling, I first convert FAST5 files to POD5 format.
Then, I use all POD5 files—fail, pass, and skip—for the downstream Dorado command.
Is it okay to do it this way, or should I use only the "pass" reads? Or does Dorado recognize this automatically?

Thank you.

Best regards,

Patricie

@malton-ont
Copy link
Collaborator

Hi @Patricie34,

Dorado makes no distinction regarding the source of the pod5 files - if you pass them in, they'll get basecalled just the same. Whether this is what you want to do is largely up to you.

@Patricie34
Copy link
Author

Hi @malton-ont ,

thank you for the explanation.

And what is the best practice? Should I use only the "pass" reads, or include all of them to avoid losing important data and filter out low-quality ones after basecalling?

I am using these basecalling models: [email protected] and for some samples also [email protected]. The basecalled data are then used for variant calling.

Best regards,

Patricie

@malton-ont
Copy link
Collaborator

Once again, that decision is up to you and your protocol. If you would like a discussion with other bioinformaticians regarding best practices, I would suggest posting on the Nanopore community forums - this issue tracker is really for technical problems with the dorado software.

@Kirk3gaard
Copy link

Hi Patricie

Just rebasecall only your "pass" reads. The current "Q20" chemistry should return most of your data well above the Q10 cutoff for the super accuracy basecalling. So the little data that was already determined to be of poor quality is better left out. Alternatively you can of course always filter the data afterwards using e.g. https://github.com/rrwick/Filtlong

Best regards
Rasmus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants