Update docs/aurora/running-jobs-aurora.md Fix #308

argonne-lcf · Dec 15, 2023 · e227c37 · e227c37
1 parent 385126c
commit e227c37
Showing 1 changed file with 19 additions and 13 deletions.
diff --git a/docs/aurora/running-jobs-aurora.md b/docs/aurora/running-jobs-aurora.md
@@ -1,19 +1,19 @@
 
 # Running Jobs on Aurora
 
-##  <a name="Aurora-Queues"></a>Queues
+## <a name="Aurora-Queues"></a>Queues
 
 There is a single routing queue in place called `EarlyAppAccess` which currently has a node count of 2,844, but we recommend a max job size of 2048 or 2560. This will be replaced by new queues during an upcoming PM.
 
 For example, a one-node interactive job can be requested for 30 minutes with the following command, where `[your_ProjectName]` is replaced with an appropriate project name.
 
-```
+```bash
 qsub -l select=1 -l walltime=30:00 -A [your_ProjectName] -q EarlyAppAccess -I
 ```
 
 Recommended PBSPro options follow.
 
-```
+```bash
 #!/bin/sh
 #PBS -A [your_ProjectName]
 #PBS -N
@@ -32,11 +32,13 @@ We recommend against useing `-W tolerate_node_failures=all` in your qsub command
 1. Start your interactive job
 2. When the job transitions to Running state, run `pbsnodes -l | grep <jobid>`
 3. Manually REMOVE all nodes identified in that output from inclusion in your mpiexec
-```
-$ cat $PBS_NODEFILE > local.hostfile
-# edit local.hostfile to remove problem nodes
-$ mpiexec --hostfile local.hostfile [other mpiexec arguments]
-```
+
+    ```bash
+    $ cat $PBS_NODEFILE > local.hostfile
+    # edit local.hostfile to remove problem nodes
+    $ mpiexec --hostfile local.hostfile [other mpiexec arguments]
+    ```
+
 4. Continue to execute
 5. If other nodes go down during your job, it will not be killed, and you can further exclude those nodes from your mpiexec as needed
 
@@ -83,7 +85,7 @@ GPU-enabled applications will similarly run on the compute nodes using the above
 - If running on a specific GPU or subset of GPUs and/or tiles is desired, then the `ZE_AFFINITY_MASK` environment variable can be used. For example, if one only wanted an application to access the first two GPUs on a node, then setting `ZE_AFFINITY_MASK=0,1` could be used.
 
 ### <a name="Binding-MPI-ranks-to-GPUs"></a>Binding MPI ranks to GPUs
-Support in MPICH on Aurora to bind MPI ranks to GPUs is currently work-in-progress. For applications that need this support, this instead can be handled by use of a small helper script that will appropriately set `ZE_AFFINITY_MASK` for each MPI rank. Users are encouraged to use the `/soft/tools/mpi_wrapper_utils/gpu_tile_compact.sh` script for instances where each MPI rank is to be bound to a single GPU tile with a round-robin assignment. 
+Support in MPICH on Aurora to bind MPI ranks to GPUs is currently work-in-progress. For applications that need this support, this instead can be handled by use of a small helper script that will appropriately set `ZE_AFFINITY_MASK` for each MPI rank. Users are encouraged to use the `/soft/tools/mpi_wrapper_utils/gpu_tile_compact.sh` script for instances where each MPI rank is to be bound to a single GPU tile with a round-robin assignment.
 
 This script can be placed just before the executable in an `mpiexec` command like so.
 
@@ -110,16 +112,19 @@ Users with different MPI-GPU affinity needs, such as assigning multiple GPUs/til
 
 ## <a name="Interactive-Jobs-on-Compute-Nodes"></a>Interactive Jobs on Compute Nodes
 
-Here is how to submit an interactive job to, for example, edit/build/test an application Polaris compute nodes:
-```
+Here is how to submit an interactive job to, for example, edit/build/test an application on Aurora compute nodes:
+
+```bash
 qsub -I -l select=1,walltime=1:00:00,place=scatter -A MYPROJECT -q workq
 ```
 
 This command requests 1 node for a period of 1 hour in the `workq` queue. After waiting in the queue for a node to become available, a shell prompt on a compute node will appear. You may then start building applications and testing gpu affinity scripts on the compute node.
 
-**NOTE:** If you want to ```ssh``` or ```scp``` to one of your assigned compute nodes you will need to make sure your ```$HOME``` directory and your ```$HOME/.ssh``` directory permissions are both set to ```700```.
+**NOTE:** If you want to `ssh` or `scp` to one of your assigned compute nodes you will need to make sure your `$HOME` directory and your `$HOME/.ssh` directory permissions are both set to `700`.
+
 
 ## <a name="Running-Multiple-MPI-Applications-on-a-node"></a>Running Multiple MPI Applications on a node
+
 Multiple applications can be run simultaneously on a node by launching several `mpiexec` commands and backgrounding them. For performance, it will likely be necessary to ensure that each application runs on a distinct set of CPU resources and/or targets specific GPUs and tiles. One can provide a list of CPUs using the `--cpu-bind` option, which when combined with `ZE_AFFINITY_MASK` provides a user with specifying exactly which CPU and GPU resources to run each application on. In the simple example below, twelve instances of the application are simultaneously running on a single node. In the first instance, the application is spawning MPI ranks 0-3 on CPU cores 0-3 and using GPU 0 tile 0.
 
 ```bash
@@ -141,6 +146,7 @@ mpiexec -n 4 --ppn 4 --cpu-bind list:40:41:42:43 ./hello_affinity &
 
 wait
 ```
+
 Users will likely find it beneficial to launch processes across CPU cores in both sockets of a node.
 
 ## <a name="Compute-Node-Access-to-the-Internet"></a>Compute Node Access to the Internet
@@ -153,7 +159,7 @@ export https_proxy="http://proxy.alcf.anl.gov:3128"
 export ftp_proxy="http://proxy.alcf.anl.gov:3128"
 ```
 
-#In the future, though we don't have a timeline on this because it depends on future features in slingshot and internal software development, we intend to have public IP addresses be a schedulable resource.  For instance, if only your head node needed public access your select statement might looks something like: `-l select=1:pubnet=True+63`.
+In the future, though we don't have a timeline on this because it depends on future features in slingshot and internal software development, we intend to have public IP addresses be a schedulable resource.  For instance, if only your head node needed public access your select statement might looks something like: `-l select=1:pubnet=True+63`.
 
 ## <a name="Controlling-Where-Your-Job-Runs"></a>Controlling Where Your Job Runs
 If you wish to have your job run on specific nodes form your select like this: `-l select=1:vnode=<node name1>+1:vnode=<node name2>...` . Obviously, that gets tedious for large jobs.