diff --git a/docs/aurora/running-jobs-aurora.md b/docs/aurora/running-jobs-aurora.md index e1dcd9198..acd4aeeea 100644 --- a/docs/aurora/running-jobs-aurora.md +++ b/docs/aurora/running-jobs-aurora.md @@ -1,19 +1,19 @@ # Running Jobs on Aurora -## Queues +## Queues There is a single routing queue in place called `EarlyAppAccess` which currently has a node count of 2,844, but we recommend a max job size of 2048 or 2560. This will be replaced by new queues during an upcoming PM. For example, a one-node interactive job can be requested for 30 minutes with the following command, where `[your_ProjectName]` is replaced with an appropriate project name. -``` +```bash qsub -l select=1 -l walltime=30:00 -A [your_ProjectName] -q EarlyAppAccess -I ``` Recommended PBSPro options follow. -``` +```bash #!/bin/sh #PBS -A [your_ProjectName] #PBS -N @@ -32,11 +32,13 @@ We recommend against useing `-W tolerate_node_failures=all` in your qsub command 1. Start your interactive job 2. When the job transitions to Running state, run `pbsnodes -l | grep ` 3. Manually REMOVE all nodes identified in that output from inclusion in your mpiexec -``` -$ cat $PBS_NODEFILE > local.hostfile -# edit local.hostfile to remove problem nodes -$ mpiexec --hostfile local.hostfile [other mpiexec arguments] -``` + + ```bash + $ cat $PBS_NODEFILE > local.hostfile + # edit local.hostfile to remove problem nodes + $ mpiexec --hostfile local.hostfile [other mpiexec arguments] + ``` + 4. Continue to execute 5. If other nodes go down during your job, it will not be killed, and you can further exclude those nodes from your mpiexec as needed @@ -83,7 +85,7 @@ GPU-enabled applications will similarly run on the compute nodes using the above - If running on a specific GPU or subset of GPUs and/or tiles is desired, then the `ZE_AFFINITY_MASK` environment variable can be used. For example, if one only wanted an application to access the first two GPUs on a node, then setting `ZE_AFFINITY_MASK=0,1` could be used. ### Binding MPI ranks to GPUs -Support in MPICH on Aurora to bind MPI ranks to GPUs is currently work-in-progress. For applications that need this support, this instead can be handled by use of a small helper script that will appropriately set `ZE_AFFINITY_MASK` for each MPI rank. Users are encouraged to use the `/soft/tools/mpi_wrapper_utils/gpu_tile_compact.sh` script for instances where each MPI rank is to be bound to a single GPU tile with a round-robin assignment. +Support in MPICH on Aurora to bind MPI ranks to GPUs is currently work-in-progress. For applications that need this support, this instead can be handled by use of a small helper script that will appropriately set `ZE_AFFINITY_MASK` for each MPI rank. Users are encouraged to use the `/soft/tools/mpi_wrapper_utils/gpu_tile_compact.sh` script for instances where each MPI rank is to be bound to a single GPU tile with a round-robin assignment. This script can be placed just before the executable in an `mpiexec` command like so. @@ -110,16 +112,19 @@ Users with different MPI-GPU affinity needs, such as assigning multiple GPUs/til ## Interactive Jobs on Compute Nodes -Here is how to submit an interactive job to, for example, edit/build/test an application Polaris compute nodes: -``` +Here is how to submit an interactive job to, for example, edit/build/test an application on Aurora compute nodes: + +```bash qsub -I -l select=1,walltime=1:00:00,place=scatter -A MYPROJECT -q workq ``` This command requests 1 node for a period of 1 hour in the `workq` queue. After waiting in the queue for a node to become available, a shell prompt on a compute node will appear. You may then start building applications and testing gpu affinity scripts on the compute node. -**NOTE:** If you want to ```ssh``` or ```scp``` to one of your assigned compute nodes you will need to make sure your ```$HOME``` directory and your ```$HOME/.ssh``` directory permissions are both set to ```700```. +**NOTE:** If you want to `ssh` or `scp` to one of your assigned compute nodes you will need to make sure your `$HOME` directory and your `$HOME/.ssh` directory permissions are both set to `700`. + ## Running Multiple MPI Applications on a node + Multiple applications can be run simultaneously on a node by launching several `mpiexec` commands and backgrounding them. For performance, it will likely be necessary to ensure that each application runs on a distinct set of CPU resources and/or targets specific GPUs and tiles. One can provide a list of CPUs using the `--cpu-bind` option, which when combined with `ZE_AFFINITY_MASK` provides a user with specifying exactly which CPU and GPU resources to run each application on. In the simple example below, twelve instances of the application are simultaneously running on a single node. In the first instance, the application is spawning MPI ranks 0-3 on CPU cores 0-3 and using GPU 0 tile 0. ```bash @@ -141,6 +146,7 @@ mpiexec -n 4 --ppn 4 --cpu-bind list:40:41:42:43 ./hello_affinity & wait ``` + Users will likely find it beneficial to launch processes across CPU cores in both sockets of a node. ## Compute Node Access to the Internet @@ -153,7 +159,7 @@ export https_proxy="http://proxy.alcf.anl.gov:3128" export ftp_proxy="http://proxy.alcf.anl.gov:3128" ``` -#In the future, though we don't have a timeline on this because it depends on future features in slingshot and internal software development, we intend to have public IP addresses be a schedulable resource. For instance, if only your head node needed public access your select statement might looks something like: `-l select=1:pubnet=True+63`. +In the future, though we don't have a timeline on this because it depends on future features in slingshot and internal software development, we intend to have public IP addresses be a schedulable resource. For instance, if only your head node needed public access your select statement might looks something like: `-l select=1:pubnet=True+63`. ## Controlling Where Your Job Runs If you wish to have your job run on specific nodes form your select like this: `-l select=1:vnode=+1:vnode=...` . Obviously, that gets tedious for large jobs.