Skip to content
This repository has been archived by the owner on Jun 12, 2024. It is now read-only.

Sacra runs scale very badly on sequence count. #21

Open
barneypotter24 opened this issue Jun 24, 2018 · 0 comments
Open

Sacra runs scale very badly on sequence count. #21

barneypotter24 opened this issue Jun 24, 2018 · 0 comments

Comments

@barneypotter24
Copy link
Contributor

When running a large dataset of approximately 20,000 sequences the current formulation of validation and merges (disabled as of 4a3bb8e) adds an unfeasible amount of runtime to the sacra run. I.e. my laptop battery died from full charge before the run completed. Additionally, calls to entrez on large input sets take a very long time to query the database; including some subdivisions with percent-completion output can see progress could help in assuring users that entrez is in fact being queried and that the run has not failed.

Currently testing on branch rabies with:

python src/run.py -f input/rabies_sequnces.fasta -o rabies_test.json --pathogen rabies --entrez --metafiles input/rabies_metadata.txt
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant