Sacra runs scale very badly on sequence count. #21

barneypotter24 · 2018-06-24T17:32:42Z

When running a large dataset of approximately 20,000 sequences the current formulation of validation and merges (disabled as of 4a3bb8e) adds an unfeasible amount of runtime to the sacra run. I.e. my laptop battery died from full charge before the run completed. Additionally, calls to entrez on large input sets take a very long time to query the database; including some subdivisions with percent-completion output can see progress could help in assuring users that entrez is in fact being queried and that the run has not failed.

Currently testing on branch rabies with:

python src/run.py -f input/rabies_sequnces.fasta -o rabies_test.json --pathogen rabies --entrez --metafiles input/rabies_metadata.txt

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sacra runs scale very badly on sequence count. #21

Sacra runs scale very badly on sequence count. #21

barneypotter24 commented Jun 24, 2018

Sacra runs scale very badly on sequence count. #21

Sacra runs scale very badly on sequence count. #21

Comments

barneypotter24 commented Jun 24, 2018