-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added profiling docs for Polaris and Aurora #576
base: main
Are you sure you want to change the base?
Conversation
multiple nodes. A simple example, where we use a wrapper script to trace the | ||
rank 0 on each node of a 4 node job running a PyTorch application is below: | ||
|
||
### A `unitrace` wrapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a title for the script using: https://squidfunk.github.io/mkdocs-material/reference/code-blocks/#adding-a-title
And remove the ###
subsection header
|
||
### Deployment | ||
|
||
The wrapper above can be deployed using a PBS job script the following way |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wrapper above can be deployed using a PBS job script the following way | |
The wrapper above can be deployed using the following PBS job script: |
@@ -0,0 +1,250 @@ | |||
# Profiling Deep Learning Applications | |||
|
|||
We can use both framework (for example, PyTorch) native profiler and vendor specific |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use both framework (for example, PyTorch) native profiler and vendor specific | |
We can use both a framework-specific (for example, PyTorch-specific) native profiler and the vendor-specific NVIDIA |
[Nsight compute profiler](https://developer.nvidia.com/tools-overview/nsight-compute/get-started). | ||
Refer to the respective documentation for more details: | ||
|
||
[Nsight System User Guide](https://docs.nvidia.com/nsight-systems/UserGuide/index.html) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use unordered list -
multiple nodes. A simple example, where we use a wrapper script to trace the | ||
rank 0 on each node of a 2 node job running a PyTorch application is below: | ||
|
||
### An `nsys` wrapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as before re: code block title
|
||
This wrapper can be deployed as the `nsys` example above. In the `ncu` wrapper | ||
we explicitly set the name of the kernel that we want to analyze | ||
(a gemm kernel in this case). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(a gemm kernel in this case). | |
(a GEMM kernel in this case). |
or
(a gemm kernel in this case). | |
(a `gemm` kernel in this case). |
The next step is to load the `nsys-rep` files in the Nsight Systems GUI, and | ||
the `ncu-rep` files to the Nsight Compute GUI. | ||
|
||
### For a single rank run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### For a single rank run | |
### Single rank run |
of the documentation. Here we only show standard options, either of the three | ||
could be chosen. Note that, invoking each option will lead to varying amounts | ||
of time the profiler need to run. This will be important in setting the | ||
requested wall-time for your batch job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requested wall-time for your batch job. | |
requested walltime for your batch job. |
generate the profiles. The exhaustive list could be found in the respective | ||
documentation pages: | ||
|
||
[Nsight System User Guide](https://docs.nvidia.com/nsight-systems/UserGuide/index.html) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use unordered list to clean up the formatting
fi | ||
|
||
``` | ||
There are a few important things to notice in the wrapper. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few important things to notice in the wrapper. | |
There are several important shell variables in the wrapper, which may require modification: |
also change the phrasing on the Polaris profiling doc
For Poalris:
For Aurora
TODO: