Skip to content

Commit

Permalink
minor updates in GPU page, and related spelling fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
kcgthb committed Nov 27, 2023
1 parent d5d8a3a commit 0923f6a
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 19 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/config/spellcheck.wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -64,3 +64,7 @@ dotfiles
IJulia
TensorBoard
conda
CUDA
specificities
GeForce
unsatisfiable
41 changes: 22 additions & 19 deletions src/docs/user-guide/gpu.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
To support the latest computing evolutions in many fields of science, Sherlock
features a number of compute nodes with [GPUs][url_gpus] that can be used to
run a variety of GPU-accelerated applications. Those nodes are available to
everyone, but are a scarce, highly-demanded resource, so getting access to them
may require some wait time in queue.
To support the latest computing advancements in many fields of science,
Sherlock features a number of compute nodes with [GPUs][url_gpu] that can be
used to run a variety of GPU-accelerated applications. Those nodes are
available to everyone, but are a scarce, highly-demanded resource, so getting
access to them may require some wait time in queue.

!!! info "Getting your own GPU nodes"

Expand Down Expand Up @@ -80,7 +80,7 @@ The `gpu` partition only accepts jobs explicitly requesting GPU resources. If
they don't, they will be rejected with the following message:

``` none
$ srun -p gpu --pty bash
$ salloc -p gpu
srun: error: Unable to allocate resources: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
```

Expand All @@ -90,9 +90,9 @@ As for any other compute node, you can submit an interactive job and request a
shell on a GPU node with the following command:

``` none
$ srun -p gpu --gpus 1 --pty bash
srun: job 38068928 queued and waiting for resources
srun: job 38068928 has been allocated resources
$ salloc -p gpu --gpus 1
salloc: job 38068928 queued and waiting for resources
salloc: job 38068928 has been allocated resources
$ nvidia-smi --query-gpu=index,name --format=csv,noheader
0, Tesla V100-SXM2-16GB
```
Expand Down Expand Up @@ -142,8 +142,9 @@ job constraints could be used to satisfy the request. Multiple constraints may
be specified and combined with various operators (please refer to the official
[Slurm documentation][url_slurm_sbatch] for details).

The list of available features on GPU nodes can be obtained with the
`node_feat`[^node_feat] command:
The list of available features on compute nodes can be obtained with the
`node_feat`[^node_feat] command. And more specifically, to list the GPU-related
features of nodes in the `gpu` partition::

``` none
$ node_feat -p gpu | grep GPU_
Expand All @@ -155,11 +156,13 @@ GPU_SKU:TESLA_P100_PCIE
GPU_SKU:TESLA_P40
```

`node_feat` will only list the features of nodes from partitions you have
access to, so output may vary depending on your group membership.
You can use `node_feat` without any option to list all the features of all the
nodes in all the partitions. But please note that `node_feat` will only list
the features of nodes from partitions you have access to, so output may vary
depending on your group membership.

The different characteristics[^values] of various GPU types are listed in the following
table
The different characteristics[^values] of various GPU types are listed in the
following table

| Slurm\ feature | Description | Possible values | Example job constraint |
| -------------- | ----------- | --------------- | ---------------------- |
Expand Down Expand Up @@ -196,8 +199,8 @@ GPU 0: Tesla P100-SXM2-16GB (UUID: GPU-4f91f58f-f3ea-d414-d4ce-faf587c5c4d4)
### GPU compute modes

By default, GPUs on Sherlock are set in the **Exclusive Process** compute
mode[^gpu_cmodes], to provide the best performance and an isolated environment
for jobs, out of the box.
mode[^gpu_compute_modes], to provide the best performance and an isolated
environment for jobs, out of the box.

Some software may require GPUs to be set to a different compute mode, for
instance to share a GPU across different processes within the same application.
Expand Down Expand Up @@ -350,8 +353,8 @@ user's GPU code is running.

[^node_feat]: See `node_feat -h` for more details.
[^values]: The lists of values provided in the table are non exhaustive.
[^gpu_cmodes]: The list of available GPU compute modes and relevant details are
available in the [CUDA Toolkit Documentation][url_gpu_cmodes]
[^gpu_compute_modes]: The list of available GPU compute modes and relevant
details are available in the [CUDA Toolkit Documentation][url_gpu_cmodes]


--8<--- "includes/_acronyms.md"

0 comments on commit 0923f6a

Please sign in to comment.