-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Options for parallelization/multi-core runs #13
Comments
Hi Brad, Quick benchmarks: So it looks like there's a small penalty to do the coverage calcs / downsampling, but the bottleneck remains the reading/writing steps. One caveat is that |
Jeremiah; Essentially what I'm running is a bamsormadup followed by VariantBam and if I look at a subset of 2 million reads, VariantBam performs similarly to samtools view or bamsormadup:
When I chain samtools with VariantBam, it is also fine:
But when I add VariantBam after bamsormadup, I end up with a pretty large runtime increase:
To be fair, this also true with adding a samtools view afterward:
However, if I parallelize samtools view with 2 threads (
FYI, I also don't think the maximum coverage calculations play much role, there is some overhead but not nearly as much as the read/write costs:
So there might be some nice improvement to be had if we can take advantage of multithreading support. I know VariantBam uses SeqLib which uses htslib, but don't know the internals well enough to have an idea if it's feasible to leverage it similarly to how samtools does. Do you think this is worth exploring? Thanks again for the discussion and help. |
Ah, I hadn't appreciated that samtools could use parallelization for reading/writing (I thought it was just for sorting). That makes a big difference. To that extent, it would be a matter of reviewing the code for how samtools does this, and incorporating into SeqLib. Full disclosure is that my bandwidth at the moment for adding new features to SeqLib is really low. I'm happy to leave this issue open and address when I get the chance -- it would be a fun project, but it might take some time for me to get to. Thanks for pointing this out and the suggestion, I hadn't realized that multicore might work for SeqLib without (hopefully) too much refactoring. |
I ended up just taking a look at how samtools does this, and it's as simple as associating their thread-pool infrastructure with the hts file objects in SeqLib. Got it to work pretty quickly:
And converting the BAMs back to SAM and doing a When you update, just make sure to update SeqLib as well. |
Jeremiah;
Thank you again for looking at this and let me know if this helps provide any pointers I can use for debugging more. |
Just a +1 from me. Being able to run this at scale would help immensely to streamline our current workflow. |
Hi Brad, http://valgrind.org/docs/manual/hg-manual.html I've not used |
Jeremiah;
I tried using helgrind but to my eyes it's not giving any extra useful information over the gdb trace above. Here are logs from https://gist.github.com/chapmanb/9e2fb7d6e79f9c1380375ed4f218982d When ctrl-cing out of the deadlock I get a similar strack trace to what we saw with gdb. I'm still stuck on what to try but hopefully either reproducing or some info in the helgrind output give clues about how to proceed. Thank you again for all the help with this. |
@chapmanb @walaj I can reproduce this issue, compiling by hand as shown in the README on Ubuntu zesty running on AWS:
Prepending the run with strace (
Easy to reproduce the above trace with tiny bam files with a fresh
Identifying/seeing the issue with another tool
|
I hate to introduce more entropy to this issue, but a quick threading stress test on OSX does not lock up, regardless of threads:
Prints out timings and all executions come back in 1.4 seconds :-S If I remove |
Jeremiah;
Now that we have VariantBam maximum coverage downsampling in bcbio (thank you again), we've been working on doing benchmarking runs with and without downsampling. Because we need to run it on sorted BAMs it ends up running essentially on it's own as the streaming output comes from bamsormadup. Since it's single threaded, this adds a noticeable amount of time to processing whole genome BAMs (where the downsampling is most useful). I wanted to pick your brain to see if there were options to run multicore or otherwise parallelize maximum coverage downsampling to improve runtimes. My only thought was to try and parallelize by region but then we start running into IO and merging bottlenecks which may not provide much of an improvement. Thanks for any suggestions or tips on speeding up the downsampling.
The text was updated successfully, but these errors were encountered: