Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimum parameters for ONT reads #68

Open
asylvz opened this issue Apr 30, 2024 · 6 comments
Open

optimum parameters for ONT reads #68

asylvz opened this issue Apr 30, 2024 · 6 comments

Comments

@asylvz
Copy link

asylvz commented Apr 30, 2024

Hi,

I'm trying to use the library to generate consensus of ONT reads for multiple clusters of reads. Each cluster has around 10 - 30 reads. However, I'm not sure which parameters to use for minimizer-based seeding and partitioning in order to balance the accuracy and speed.

I'll be happy if you can suggest me a set of parameters to optimize for speed, memory and accuracy.

Thank you,
Arda

@yangao07
Copy link
Owner

Hi, if you can share a few example input datasets, I think I may be able to give you some suggestions in terms of parameters.

@asylvz
Copy link
Author

asylvz commented May 1, 2024

Actually this is not for a specific scenario; I'll use it in my algorithm and currently testing it with ONT data of some samples (reads can be retrieved from the crams here: https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1KG_ONT_VIENNA/hg38/).

Basically it should be fast enough for 20-30K long ONT reads. I'm currently using wtdbg2 for this.

@asylvz
Copy link
Author

asylvz commented May 1, 2024

I'm also sending a sample cluster of reads. This is one of the large clusters (25 reads), so not all of them are that large.
H2-s218243_1350.fasta.zip

@yangao07
Copy link
Owner

yangao07 commented May 1, 2024

I am not sure the scenario you specifically refer to.
Since you mentioned wtdbg2, if you need a consensus sequence after the assembly step, I think wtdbg2 has its own poa consensus calling module.
For abPOA, it generally takes reads with unified boundaries and perform end-to-end global alignment, and then generate a consensus sequence based on the alignment result.

@asylvz
Copy link
Author

asylvz commented May 1, 2024

I actually want to generate a consensus but since the poa algorithms are slower, I had to use wtdbg2. Your algorithm seems to be much faster, so I wanted to test it. For the ONT reads of 20-30K, which w, k, min-w, etc. would you suggest?

@yangao07
Copy link
Owner

yangao07 commented May 2, 2024

For your data H2-s218243_1350.fasta.zip, I see the read lengths varies a lot and they are not from the same strand.

Since I don't know how you obtained this cluster of reads (based on mapping position?), I can only suggest you run abpoa -Ss in.fasta > cons.fa and see how the consensus sequence meets your expection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants