Parallelize fixFasta #39

hermidalc · 2022-09-11T15:36:44Z

The initial fixFasta step of Pufferfish indexing is single-threaded, and when there are a lot of sequences in the reference it takes a lot of time. From the outside it seems like this step could be parallelized, with the input reference FASTA split into parts, e.g. using the fast SeqKit toolkit and split2 command, which can output gzipped or regular split FASTA files from a gzipped or regular input reference FASTA (to save disk space for example), and then processing each split using fixFasta and concatenating the fixed splits into one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize fixFasta #39

Parallelize fixFasta #39

hermidalc commented Sep 11, 2022 •

edited

Loading

Parallelize fixFasta #39

Parallelize fixFasta #39

Comments

hermidalc commented Sep 11, 2022 • edited Loading

hermidalc commented Sep 11, 2022 •

edited

Loading