Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added profiling docs for Polaris and Aurora #576

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

khossain4337
Copy link
Collaborator

For Poalris:

  • Added nsys and ncu profiling methods

For Aurora

  • Added unitrace profiling methods

TODO:

  • Add PyTorch Profiler for Polaris
  • Add THAPI for Aurora

multiple nodes. A simple example, where we use a wrapper script to trace the
rank 0 on each node of a 4 node job running a PyTorch application is below:

### A `unitrace` wrapper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a title for the script using: https://squidfunk.github.io/mkdocs-material/reference/code-blocks/#adding-a-title

And remove the ### subsection header


### Deployment

The wrapper above can be deployed using a PBS job script the following way
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The wrapper above can be deployed using a PBS job script the following way
The wrapper above can be deployed using the following PBS job script:

@@ -0,0 +1,250 @@
# Profiling Deep Learning Applications

We can use both framework (for example, PyTorch) native profiler and vendor specific
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We can use both framework (for example, PyTorch) native profiler and vendor specific
We can use both a framework-specific (for example, PyTorch-specific) native profiler and the vendor-specific NVIDIA

[Nsight compute profiler](https://developer.nvidia.com/tools-overview/nsight-compute/get-started).
Refer to the respective documentation for more details:

[Nsight System User Guide](https://docs.nvidia.com/nsight-systems/UserGuide/index.html)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use unordered list -

multiple nodes. A simple example, where we use a wrapper script to trace the
rank 0 on each node of a 2 node job running a PyTorch application is below:

### An `nsys` wrapper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as before re: code block title


This wrapper can be deployed as the `nsys` example above. In the `ncu` wrapper
we explicitly set the name of the kernel that we want to analyze
(a gemm kernel in this case).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(a gemm kernel in this case).
(a GEMM kernel in this case).

or

Suggested change
(a gemm kernel in this case).
(a `gemm` kernel in this case).

The next step is to load the `nsys-rep` files in the Nsight Systems GUI, and
the `ncu-rep` files to the Nsight Compute GUI.

### For a single rank run
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### For a single rank run
### Single rank run

of the documentation. Here we only show standard options, either of the three
could be chosen. Note that, invoking each option will lead to varying amounts
of time the profiler need to run. This will be important in setting the
requested wall-time for your batch job.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
requested wall-time for your batch job.
requested walltime for your batch job.

generate the profiles. The exhaustive list could be found in the respective
documentation pages:

[Nsight System User Guide](https://docs.nvidia.com/nsight-systems/UserGuide/index.html)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use unordered list to clean up the formatting

fi

```
There are a few important things to notice in the wrapper.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There are a few important things to notice in the wrapper.
There are several important shell variables in the wrapper, which may require modification:

also change the phrasing on the Polaris profiling doc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants