diff --git a/docs/aurora/running-jobs-aurora.md b/docs/aurora/running-jobs-aurora.md
index e1dcd9198..acd4aeeea 100644
--- a/docs/aurora/running-jobs-aurora.md
+++ b/docs/aurora/running-jobs-aurora.md
@@ -1,19 +1,19 @@
# Running Jobs on Aurora
-## Queues
+## Queues
There is a single routing queue in place called `EarlyAppAccess` which currently has a node count of 2,844, but we recommend a max job size of 2048 or 2560. This will be replaced by new queues during an upcoming PM.
For example, a one-node interactive job can be requested for 30 minutes with the following command, where `[your_ProjectName]` is replaced with an appropriate project name.
-```
+```bash
qsub -l select=1 -l walltime=30:00 -A [your_ProjectName] -q EarlyAppAccess -I
```
Recommended PBSPro options follow.
-```
+```bash
#!/bin/sh
#PBS -A [your_ProjectName]
#PBS -N
@@ -32,11 +32,13 @@ We recommend against useing `-W tolerate_node_failures=all` in your qsub command
1. Start your interactive job
2. When the job transitions to Running state, run `pbsnodes -l | grep `
3. Manually REMOVE all nodes identified in that output from inclusion in your mpiexec
-```
-$ cat $PBS_NODEFILE > local.hostfile
-# edit local.hostfile to remove problem nodes
-$ mpiexec --hostfile local.hostfile [other mpiexec arguments]
-```
+
+ ```bash
+ $ cat $PBS_NODEFILE > local.hostfile
+ # edit local.hostfile to remove problem nodes
+ $ mpiexec --hostfile local.hostfile [other mpiexec arguments]
+ ```
+
4. Continue to execute
5. If other nodes go down during your job, it will not be killed, and you can further exclude those nodes from your mpiexec as needed
@@ -83,7 +85,7 @@ GPU-enabled applications will similarly run on the compute nodes using the above
- If running on a specific GPU or subset of GPUs and/or tiles is desired, then the `ZE_AFFINITY_MASK` environment variable can be used. For example, if one only wanted an application to access the first two GPUs on a node, then setting `ZE_AFFINITY_MASK=0,1` could be used.
### Binding MPI ranks to GPUs
-Support in MPICH on Aurora to bind MPI ranks to GPUs is currently work-in-progress. For applications that need this support, this instead can be handled by use of a small helper script that will appropriately set `ZE_AFFINITY_MASK` for each MPI rank. Users are encouraged to use the `/soft/tools/mpi_wrapper_utils/gpu_tile_compact.sh` script for instances where each MPI rank is to be bound to a single GPU tile with a round-robin assignment.
+Support in MPICH on Aurora to bind MPI ranks to GPUs is currently work-in-progress. For applications that need this support, this instead can be handled by use of a small helper script that will appropriately set `ZE_AFFINITY_MASK` for each MPI rank. Users are encouraged to use the `/soft/tools/mpi_wrapper_utils/gpu_tile_compact.sh` script for instances where each MPI rank is to be bound to a single GPU tile with a round-robin assignment.
This script can be placed just before the executable in an `mpiexec` command like so.
@@ -110,16 +112,19 @@ Users with different MPI-GPU affinity needs, such as assigning multiple GPUs/til
## Interactive Jobs on Compute Nodes
-Here is how to submit an interactive job to, for example, edit/build/test an application Polaris compute nodes:
-```
+Here is how to submit an interactive job to, for example, edit/build/test an application on Aurora compute nodes:
+
+```bash
qsub -I -l select=1,walltime=1:00:00,place=scatter -A MYPROJECT -q workq
```
This command requests 1 node for a period of 1 hour in the `workq` queue. After waiting in the queue for a node to become available, a shell prompt on a compute node will appear. You may then start building applications and testing gpu affinity scripts on the compute node.
-**NOTE:** If you want to ```ssh``` or ```scp``` to one of your assigned compute nodes you will need to make sure your ```$HOME``` directory and your ```$HOME/.ssh``` directory permissions are both set to ```700```.
+**NOTE:** If you want to `ssh` or `scp` to one of your assigned compute nodes you will need to make sure your `$HOME` directory and your `$HOME/.ssh` directory permissions are both set to `700`.
+
## Running Multiple MPI Applications on a node
+
Multiple applications can be run simultaneously on a node by launching several `mpiexec` commands and backgrounding them. For performance, it will likely be necessary to ensure that each application runs on a distinct set of CPU resources and/or targets specific GPUs and tiles. One can provide a list of CPUs using the `--cpu-bind` option, which when combined with `ZE_AFFINITY_MASK` provides a user with specifying exactly which CPU and GPU resources to run each application on. In the simple example below, twelve instances of the application are simultaneously running on a single node. In the first instance, the application is spawning MPI ranks 0-3 on CPU cores 0-3 and using GPU 0 tile 0.
```bash
@@ -141,6 +146,7 @@ mpiexec -n 4 --ppn 4 --cpu-bind list:40:41:42:43 ./hello_affinity &
wait
```
+
Users will likely find it beneficial to launch processes across CPU cores in both sockets of a node.
## Compute Node Access to the Internet
@@ -153,7 +159,7 @@ export https_proxy="http://proxy.alcf.anl.gov:3128"
export ftp_proxy="http://proxy.alcf.anl.gov:3128"
```
-#In the future, though we don't have a timeline on this because it depends on future features in slingshot and internal software development, we intend to have public IP addresses be a schedulable resource. For instance, if only your head node needed public access your select statement might looks something like: `-l select=1:pubnet=True+63`.
+In the future, though we don't have a timeline on this because it depends on future features in slingshot and internal software development, we intend to have public IP addresses be a schedulable resource. For instance, if only your head node needed public access your select statement might looks something like: `-l select=1:pubnet=True+63`.
## Controlling Where Your Job Runs
If you wish to have your job run on specific nodes form your select like this: `-l select=1:vnode=+1:vnode=...` . Obviously, that gets tedious for large jobs.