Initial development of Amdahl's Law plotting #28

mikerenfro · 2022-08-10T15:47:00Z

For #13. May still need some work on this, but here's a start for the Snakefile and Python code to convert the run*.log files into a speedup graph like the one shown below. Other items of note:

Since we're not doing real work in the MPI ranks, we can expand the horizontal axis out considerably beyond 8 processors. This gets us closer to the speedup limit, and de-emphasizes the bits of experimental error in the timing results.
We may need to explicitly specify the number of processes used in mpirun, especially in an HPC environment, as mpirun may query the scheduler to know how many processes it "should" run.
My time values aren't particularly close to the theoretical values. Not sure if the Conde-based mpirun is deficient or if I'm missing something else. Some runs have been better than others, but for a synthetic example, I'd have expected my values to be practically perfect on every run. -- Followup: Trevor's results were basically perfect, so it's possibly a local problem. Will check later with current Amdahl code to verify.

bkmgit · 2022-08-11T08:19:07Z

May consider just using Numpy, keep it simple https://matplotlib.org/stable/gallery/lines_bars_and_markers/simple_plot.html#sphx-glr-gallery-lines-bars-and-markers-simple-plot-py

mikerenfro · 2022-08-12T14:29:25Z

I can go either way on that. If the plotting code is intended to be black box like the Amdahl script is, then I don't mind making it more legible and showing the theoretical speedup line. If they're expected to code it up on their own, then a simpler plotting code is fine.

One other reason I added the extra line was because my timing results were so far off from the expected values. Not sure if that's due to conda's mpirun or something present on both my Mac and my HPC. Adding the theoretical line could lead to discussions of "yeah, and your results are right on that line because this is a somewhat synthetic problem," "yeah, your results are off the line because measurements aren't perfect," or others rather than explaining why different learners' graphs look visibly different.

tkphd · 2022-08-12T16:29:47Z

It's possible, but I believe it would be more complicated to build the NumPy arrays than it is to build the Pandas DataFrame. The input data for the plot exists as a number of JSON files written by the compute nodes to disk, essentially one "row" per file. The process of building the DataFrame, as Mike programmed it, is relatively straight-forward to explain. If you can come up with a competitive solution with NumPy, I'd certainly be interested to see it.

The plotting script is intended to be a gray box: learners can open the file and see what's going on inside, but it is not necessary to do so, and we will certainly not spend time in a Snakemake lesson on writing a Python plotting script.

tkphd · 2022-08-12T18:23:00Z

The current version seems to nail Amdahl.

bkmgit · 2022-08-12T18:29:43Z

Possibly writing out a csv file rather than json would make it easier, but if Python will be required for this, then using Pandas is ok.

vinisalazar

Looks really good! I added some suggestions, I hope they are helpful. The config.yaml looks great with the comments.

code/Snakefile

vinisalazar · 2022-08-12T18:24:11Z

code/Snakefile

+    conda:
+        "amdahl"
+    shell:
+        "python3 plot_amdahl.py {input} > {output}"


Snakemake does have a specific feature to handle scripts, so I think may be worth mentioning that. I'm not saying that this rule should be change to use the script directive, but rather that users should be aware of that, maybe a callout box would work.

Interesting! I'm not sure how script interacts with cluster, though, and we want the plotting to be done on a compute node.

code/plot_amdahl.py

tkphd · 2022-08-12T23:20:20Z

@bkmgit if you would like to offer a complete solution using NumPy, we will consider it, but that is outside the scope of this PR.

tkphd · 2022-08-12T23:24:17Z

Done tinkering with this for now. Here's what the scaling plot currently looks like.

There's also a PR on amdahl that improves the random jitter code (to accept a jitter proportion on the CLI and only increase runtime, unless the proportion is negative, in which case only decrease runtime). If you're in a reviewing mood, please take a look!

code/plot_amdahl.py

tkphd

LGTM, but I might be biased.

reid-a · 2022-08-18T21:15:59Z

I think it's useful to get a version of this code into the repo. In the spirit of not letting the perfect be the enemy of the good, I favor merging.

Initial development of Amdahl's Law plotting

feb4451

mikerenfro mentioned this pull request Aug 10, 2022

Plotting Amdahl's Law #13

Open

tkphd added 4 commits August 12, 2022 13:45

Update Snakefile & incorporate Amdahl plot

efeaf75

jitter timing by default

68d54fd

feed list of inputs to plotting script (more Snakemake-thonic)

78b4ae3

use CONDA_PREFIX env var

6946620

mikerenfro changed the title ~~WIP: Initial development of Amdahl's Law plotting~~ Initial development of Amdahl's Law plotting Aug 12, 2022

mikerenfro marked this pull request as ready for review August 12, 2022 18:18

tkphd requested review from reid-a and ocaisa August 12, 2022 18:20

consistent output, log filenames

cd66a71

vinisalazar reviewed Aug 12, 2022

View reviewed changes

tkphd added 3 commits August 12, 2022 17:43

capture Slurm output, use 'if __name__ == main' pattern

794f8a4

plot a few theoretical curves

7994130

agg is only needed if plt.show() is called

c0fac61

mikerenfro commented Aug 14, 2022

View reviewed changes

code/plot_amdahl.py Show resolved Hide resolved

PEP8 compliance

bb8b626

tkphd approved these changes Aug 18, 2022

View reviewed changes

reid-a merged commit e4d6844 into gh-pages Aug 18, 2022

tkphd deleted the amdahl_plotting branch August 18, 2022 21:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial development of Amdahl's Law plotting #28

Initial development of Amdahl's Law plotting #28

mikerenfro commented Aug 10, 2022 •

edited

Loading

bkmgit commented Aug 11, 2022

mikerenfro commented Aug 12, 2022

tkphd commented Aug 12, 2022

tkphd commented Aug 12, 2022

bkmgit commented Aug 12, 2022

vinisalazar left a comment

vinisalazar Aug 12, 2022

tkphd Aug 12, 2022

tkphd commented Aug 12, 2022

tkphd commented Aug 12, 2022

tkphd left a comment

reid-a commented Aug 18, 2022

Initial development of Amdahl's Law plotting #28

Initial development of Amdahl's Law plotting #28

Conversation

mikerenfro commented Aug 10, 2022 • edited Loading

bkmgit commented Aug 11, 2022

mikerenfro commented Aug 12, 2022

tkphd commented Aug 12, 2022

tkphd commented Aug 12, 2022

bkmgit commented Aug 12, 2022

vinisalazar left a comment

Choose a reason for hiding this comment

vinisalazar Aug 12, 2022

Choose a reason for hiding this comment

tkphd Aug 12, 2022

Choose a reason for hiding this comment

tkphd commented Aug 12, 2022

tkphd commented Aug 12, 2022

tkphd left a comment

Choose a reason for hiding this comment

reid-a commented Aug 18, 2022

mikerenfro commented Aug 10, 2022 •

edited

Loading