-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test running on Baylor "case 2" dataset #548
Comments
I was able to run the joint-caller on this data successfully from a recent HEAD (08dcc6c) three times today with varying numbers of executors: 10, 20, and dynamic (52-369, per stage widths). Common configs:
See stats below, including times for the bottleneck Stage 6, which builds pileups and calls variants on them: First run: 10 executorsSecond run: 20 executorsThird run: dynamically-allocated executors
These portray a pretty good robustness story, and provide some promising scaling data points. Going from 10 to 20 executors halved the bottleneck stage (and then some…!), and the whole app ran in 58% of the time. Put another way, they ran as if they had perfect linear scaling outside of just under 4mins of fixed-cost time. That's more than reasonable considerable we lost that doing loci-partitioning broadcasting between the end of stage 4 and the beginning of stage 5, resulting in gaps of 4:08, 3:51, and 4:20 in the 10-, 20-, and dynamic runs, resp. where the driver was the only node doing work. This and other fixed time-costs weighed further on the linear-scaling null-hypothesis when going from 20 to dynamic (52-359) executors, the latter only running about half as fast:
So that's more than 6mins of fixed-cost, outside of which the dynamic-allocation run was definitely in the ideal linear-scaling range for its 52-359 executors. Of course, the fixed costs matter, but this is still a good sanity check. Local runsIn a couple of attempts to run this in "local" mode ( I'll follow up on this and see if I can get it working. |
@ryan-williams is this task still in progress? |
@jstjohn discussed hitting some issues running on the "case 2" data here.
I'm downloading the data now to attempt to reproduce.
The text was updated successfully, but these errors were encountered: