Optimise chunk sizes for both perf and storage scaling plots #47

benjeffery · 2023-12-05T15:10:04Z

Current figures are based on quite small chunks, which may be hurting perf and storage benchmarks.

jeromekelleher · 2023-12-13T14:37:20Z

I think this is hitting us at the upper end all right, where we're spending ~1/4 of the total time in the kernel:

    num_samples  num_sites   tool  threads  user_time  sys_time     wall_time
11           10     116230  sgkit        1       6.70      0.30      6.900836
13          100     204714  sgkit        1       7.55      0.48      7.896185
15         1000     403989  sgkit        1      13.19      0.69     13.466672
17        10000     863998  sgkit        1     113.01     10.76    119.052356
19       100000    2365367  sgkit        1    2545.20    658.47   3085.060317
21      1000000    7254858  sgkit        1   76912.53  27060.23  99354.307909
    num_samples  num_sites   tool  threads  user_time  sys_time    wall_time
10           10     116230  savvy        1       0.11      0.00     0.147215
12          100     204714  savvy        1       0.25      0.01     0.287616
14         1000     403989  savvy        1       1.31      0.02     1.364692
16        10000     863998  savvy        1      14.12      0.08    14.228865
18       100000    2365367  savvy        1     388.86      0.59   389.678411
20      1000000    7254858  savvy        1    8410.33      5.25  8418.529669

Presumably having fewer chunks would help here -- also we are getting warnings from Dask about sending a large graph.

Can you update with your findings re a reasonable choice of chunks size @benjeffery?

jeromekelleher · 2023-12-19T10:44:37Z

@benjeffery I'd like to rerun the vcf code over the break to get better chunks - can you document your findings about chunk size here please?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimise chunk sizes for both perf and storage scaling plots #47

Optimise chunk sizes for both perf and storage scaling plots #47

benjeffery commented Dec 5, 2023

jeromekelleher commented Dec 13, 2023

jeromekelleher commented Dec 19, 2023

Optimise chunk sizes for both perf and storage scaling plots #47

Optimise chunk sizes for both perf and storage scaling plots #47

Comments

benjeffery commented Dec 5, 2023

jeromekelleher commented Dec 13, 2023

jeromekelleher commented Dec 19, 2023