You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The initial fixFasta step of Pufferfish indexing is single-threaded, and when there are a lot of sequences in the reference it takes a lot of time. From the outside it seems like this step could be parallelized, with the input reference FASTA split into parts, e.g. using the fast SeqKit toolkit and split2 command, which can output gzipped or regular split FASTA files from a gzipped or regular input reference FASTA (to save disk space for example), and then processing each split using fixFasta and concatenating the fixed splits into one.
The text was updated successfully, but these errors were encountered:
The initial fixFasta step of Pufferfish indexing is single-threaded, and when there are a lot of sequences in the reference it takes a lot of time. From the outside it seems like this step could be parallelized, with the input reference FASTA split into parts, e.g. using the fast SeqKit toolkit and split2 command, which can output gzipped or regular split FASTA files from a gzipped or regular input reference FASTA (to save disk space for example), and then processing each split using fixFasta and concatenating the fixed splits into one.
The text was updated successfully, but these errors were encountered: