Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing multiple output files based on number of threads then concatenating them #8

Open
jelber2 opened this issue Oct 15, 2017 · 0 comments

Comments

@jelber2
Copy link

jelber2 commented Oct 15, 2017

Hi,
I am assembling the huge axolotl genome (estimated between 21-48GBp) from about 2 terabytes of Illumina 100bp paired-end reads on a node with 1.5TB RAM and 48 cores. I am noticing a load of 1 and CPU usage of 100% (instead of 4800%) for at least a day now as LightAssembler slowly writes the output to the contigs file. I looked at the code, and it appears that the output is written to the contigs file within lines 389-424 of GraphTraversal.cpp. I was wondering if instead of writing the output to a single contigs file in a serial fashion, if it would be possible to write the output to multiple temporary contigs files (as many as there are threads), concatenating the temporary contigs files into a single output file, and deleting the temporary contigs files. This seems like it would be a performance increase, granted I don't know the code well enough to see if this can be implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant