Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test running on Baylor "case 2" dataset #548

Open
ryan-williams opened this issue Aug 15, 2016 · 2 comments
Open

Test running on Baylor "case 2" dataset #548

ryan-williams opened this issue Aug 15, 2016 · 2 comments
Assignees

Comments

@ryan-williams
Copy link
Member

@jstjohn discussed hitting some issues running on the "case 2" data here.

I'm downloading the data now to attempt to reproduce.

@ryan-williams ryan-williams self-assigned this Aug 15, 2016
@ryan-williams
Copy link
Member Author

I was able to run the joint-caller on this data successfully from a recent HEAD (08dcc6c) three times today with varying numbers of executors: 10, 20, and dynamic (52-369, per stage widths).

Common configs:

  • Spark 1.6.1, Hadoop 2.6.0-cdh5.5.1
  • --master yarn --deploy-mode cluster
  • 15gb driver
  • 17gb, 6-core executors
    • standard size we use on our cluster
    • fits 3 on each 64GB, 24-core node, with room to spare
  • underlying normal/tumor files spanned 311 HDFS blocks

See stats below, including times for the bottleneck Stage 6, which builds pileups and calls variants on them:

First run: 10 executors

  • 60 concurrent tasks, max.

  • Total time: 47 mins.

    • Stage 6: 27 mins.
  • Stages page:

    application_1465500354457_34726

Second run: 20 executors

  • 120 concurrent tasks, max.

  • Total time: 27 mins.

    • Stage 6: 13 mins.
  • Stages page:

    application_1465500354457_34727

Third run: dynamically-allocated executors

  • During the 311-wide stages: 52 executors (for up to 312 concurrent tasks).

  • During the 2153-wide stages: 359 executors (for up to 2154 concurrent tasks).

  • Total time: 14 mins.

    • Stage 6: 102s.
  • Stages page:

    application_1465500354457_34725

These portray a pretty good robustness story, and provide some promising scaling data points.

Going from 10 to 20 executors halved the bottleneck stage (and then some…!), and the whole app ran in 58% of the time. Put another way, they ran as if they had perfect linear scaling outside of just under 4mins of fixed-cost time. That's more than reasonable considerable we lost that doing loci-partitioning broadcasting between the end of stage 4 and the beginning of stage 5, resulting in gaps of 4:08, 3:51, and 4:20 in the 10-, 20-, and dynamic runs, resp. where the driver was the only node doing work.

This and other fixed time-costs weighed further on the linear-scaling null-hypothesis when going from 20 to dynamic (52-359) executors, the latter only running about half as fast:

So that's more than 6mins of fixed-cost, outside of which the dynamic-allocation run was definitely in the ideal linear-scaling range for its 52-359 executors. Of course, the fixed costs matter, but this is still a good sanity check.

Local runs

In a couple of attempts to run this in "local" mode (--master local), reading the BAMs from a local NFS mount instead of HDFS, I observed a 20gb driver to OOM; we've discussed trying to make sure local runs are reasonably performant in the past, and this seems like it could be a good test case for ironing out kinks there, since these BAMs are a nice medium-small size that should be doable locally. In particular, it seems that @jstjohn was attempting it this way when he was stymied on #386.

I'll follow up on this and see if I can get it working.

@hammer
Copy link
Member

hammer commented Dec 17, 2016

@ryan-williams is this task still in progress?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants