-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Internal Benchmark tooling #1
Comments
I implemented this on a new branch: https://github.com/MAAP-Project/get-dem/tree/scalene I deployed a new version of the algorithm as
I copied the outputs to I also tested that I can run it successfully from the ADE. |
Closed too soon. Need PR approved first. |
@nemo794, I made adjustments to produce a JSON profile by default, rather than HTML. (I'm continuing to make adjustments to the I also modified the notebook in a manner that should aid our eventual effort to collate/aggregate profiling metrics once we're ready to kick off a slew of jobs. The primary change was to associate names with the sample bboxes to allow us to use the names as part of the tags for the jobs we submit, because a job's tag is used to create a directory along the path structure to the job's output directory. More specifically, in the notebook, a job's tag (the As an example, I ran "Italy", and I copied the output to Note that the path from The information we want to extract from Unfortunately, Scalene does not produce an HTML file corresponding to the JSON file, so it's not immediately and completely clear which stats from the JSON file are the ones that are nicely rendered in the HTML file, so my next step is to run another job with the same inputs, but passing arguments to scalene to produce an HTML file. Then I'll compare what I see in the browser for the HTML profile against the stats in the JSON file to match things up. (Obviously, the numbers will be different, but should be close enough for me to decipher things.) |
@nemo794, it looks like these are the 2 primary values we want to pull out of the JSON profile:
What other information might be needed for the large spreadsheet you presented in the recent meeting about gathering get-dem performance figures from both NASA and ESA jobs? |
@nemo794, for comparison between the 2 Italy runs (one producing |
Hi @chuckwondo , This is a great progress! Thank you! We discussed this offline, but for the record, I really like how you modularized each algorithm step into its own function, in order to wrap each step individually in the scalene wrapper. The DPS outputs are really helpful! A few thoughts:
After this, hopefully there's a way to get even finer granularity by digging into the |
Hi @nemo794, thanks for the great annotated screenshots. They are very helpful. This perhaps got buried in one of my earlier comments:
That is, Scalene won't produce both formats for a given run. It's one or the other, sadly. Regarding, "finer granularity by digging into the That's within the Regarding the absolute time values vs. the percentages, it seems strange that the profile does not contain the individual elapsed time values. Unfortunately, I think that leaves us with having to do the extra calculations ourselves, but at least we have the necessary numbers to do so. |
Fixed by #6. |
Let's keep this open until the json parser is integrated. Thanks! |
@nemo794 and @arthurduf, I did a bit more digging into the profiling metrics captured by Scalene. Unfortunately, I think Scalene's metrics are not providing the information we're looking for, or at least I haven't deciphered it yet, if it's there. I'm going to dig a bit deeper. The problem is this: neither the percentages shown in the previous screenshot, nor the percentages shown in the screenshot below are the percentages that allow us to compute the percentage of elapsed time taken by each function. To illustrate, the timings printed to
The last number, 2071.4, closely aligns with the value of ~2072 for Unfortunately, neither the percentages circled in the preceding diagram, nor those highlighted below align with these percentages. I believe the reasons are as follows:
Therefore, I'm going to do a bit more digging into the other parts of the profiling metrics to see if the I/O time is accounted for somewhere. If the I/O time cannot be accounted for within the profile metrics, where does that leave us? I'm thinking that we could use Scalene's profile simply to obtain max memory usage, but then rely on the output from our own code for capturing the runtime metrics. If that's the case, I recommend we tweak our output to write the runtime values to a separate JSON file because we don't want to attempt to parse the values out of the |
|
Per discussions w/ @nemo794, I have created a branch named Here's what I've done so far on that branch:
If you want to test things out, you can run things locally by following the instructions in
At this point, you can compute the means by running the following:
This will output the mean values to stdout, which should give you something like the following:
|
@nemo794, okay to consider this closed now? |
To facilitate tracking of benchmarks internal to the job runs the MAAP team is recommending Scalene for python code.
The text was updated successfully, but these errors were encountered: