optimum parameters for ONT reads #68

asylvz · 2024-04-30T13:21:45Z

Hi,

I'm trying to use the library to generate consensus of ONT reads for multiple clusters of reads. Each cluster has around 10 - 30 reads. However, I'm not sure which parameters to use for minimizer-based seeding and partitioning in order to balance the accuracy and speed.

I'll be happy if you can suggest me a set of parameters to optimize for speed, memory and accuracy.

Thank you,
Arda

yangao07 · 2024-04-30T19:25:28Z

Hi, if you can share a few example input datasets, I think I may be able to give you some suggestions in terms of parameters.

asylvz · 2024-05-01T09:30:30Z

Actually this is not for a specific scenario; I'll use it in my algorithm and currently testing it with ONT data of some samples (reads can be retrieved from the crams here: https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1KG_ONT_VIENNA/hg38/).

Basically it should be fast enough for 20-30K long ONT reads. I'm currently using wtdbg2 for this.

asylvz · 2024-05-01T09:42:50Z

I'm also sending a sample cluster of reads. This is one of the large clusters (25 reads), so not all of them are that large.
H2-s218243_1350.fasta.zip

yangao07 · 2024-05-01T14:53:47Z

I am not sure the scenario you specifically refer to.
Since you mentioned wtdbg2, if you need a consensus sequence after the assembly step, I think wtdbg2 has its own poa consensus calling module.
For abPOA, it generally takes reads with unified boundaries and perform end-to-end global alignment, and then generate a consensus sequence based on the alignment result.

asylvz · 2024-05-01T18:09:23Z

I actually want to generate a consensus but since the poa algorithms are slower, I had to use wtdbg2. Your algorithm seems to be much faster, so I wanted to test it. For the ONT reads of 20-30K, which w, k, min-w, etc. would you suggest?

yangao07 · 2024-05-02T20:33:32Z

For your data H2-s218243_1350.fasta.zip, I see the read lengths varies a lot and they are not from the same strand.

Since I don't know how you obtained this cluster of reads (based on mapping position?), I can only suggest you run abpoa -Ss in.fasta > cons.fa and see how the consensus sequence meets your expection.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimum parameters for ONT reads #68

optimum parameters for ONT reads #68

asylvz commented Apr 30, 2024

yangao07 commented Apr 30, 2024

asylvz commented May 1, 2024

asylvz commented May 1, 2024 •

edited

Loading

yangao07 commented May 1, 2024

asylvz commented May 1, 2024

yangao07 commented May 2, 2024

optimum parameters for ONT reads #68

optimum parameters for ONT reads #68

Comments

asylvz commented Apr 30, 2024

yangao07 commented Apr 30, 2024

asylvz commented May 1, 2024

asylvz commented May 1, 2024 • edited Loading

yangao07 commented May 1, 2024

asylvz commented May 1, 2024

yangao07 commented May 2, 2024

asylvz commented May 1, 2024 •

edited

Loading