Skip to content

Commit

Permalink
feat: document A40 GPUs (#113)
Browse files Browse the repository at this point in the history
  • Loading branch information
holtgrewe authored Feb 25, 2024
1 parent 01f2294 commit bd935cc
Show file tree
Hide file tree
Showing 8 changed files with 46 additions and 45 deletions.
58 changes: 29 additions & 29 deletions bih-cluster/docs/help/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,49 +212,49 @@ For GPU jobs also see "My GPU jobs don't get scheduled".

## My GPU jobs don't get scheduled

There are only four GPU machines in the cluster (with four GPUs each, med0301 to med0304).
There are only four GPU machines in the cluster (with four GPUs each, hpc-gpu-1 to hpc-gpu-4).
Please inspect first the number of running jobs with GPU resource requests:

```bash
res-login-1:~$ squeue -o "%.10i %20j %.2t %.5D %.4C %.10m %.16R %.13b" "$@" | grep med03 | sort -k7,7
1902163 ONT-basecalling R 1 2 8G med0301 gpu:tesla:2
1902167 ONT-basecalling R 1 2 8G med0301 gpu:tesla:2
1902164 ONT-basecalling R 1 2 8G med0302 gpu:tesla:2
1902166 ONT-basecalling R 1 2 8G med0302 gpu:tesla:2
1902162 ONT-basecalling R 1 2 8G med0303 gpu:tesla:2
1902165 ONT-basecalling R 1 2 8G med0303 gpu:tesla:2
1785264 bash R 1 1 1G med0304 gpu:tesla:2
res-login-1:~$ squeue -o "%.10i %20j %.2t %.5D %.4C %.10m %.16R %.13b" "$@" | grep hpc-gpu- | sort -k7,7
1902163 ONT-basecalling R 1 2 8G hpc-gpu-1 gpu:tesla:2
1902167 ONT-basecalling R 1 2 8G hpc-gpu-1 gpu:tesla:2
1902164 ONT-basecalling R 1 2 8G hpc-gpu-2 gpu:tesla:2
1902166 ONT-basecalling R 1 2 8G hpc-gpu-2 gpu:tesla:2
1902162 ONT-basecalling R 1 2 8G hpc-gpu-3 gpu:tesla:2
1902165 ONT-basecalling R 1 2 8G hpc-gpu-3 gpu:tesla:2
1785264 bash R 1 1 1G hpc-gpu-4 gpu:tesla:2
```

This indicates that there are two free GPUs on med0304.
This indicates that there are two free GPUs on hpc-gpu-4.

Second, inspect the node states:

```bash
res-login-1:~$ sinfo -n med030[1-4]
res-login-1:~$ sinfo -n hpc-gpu-[1-4]
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up 8:00:00 0 n/a
medium up 7-00:00:00 0 n/a
long up 28-00:00:0 0 n/a
critical up 7-00:00:00 0 n/a
highmem up 14-00:00:0 0 n/a
gpu up 14-00:00:0 1 drng med0304
gpu up 14-00:00:0 1 drng hpc-gpu-4
gpu up 14-00:00:0 3 mix med[0301-0303]
mpi up 14-00:00:0 0 n/a
```

This tells you that med0301 to med0303 have jobs running ("mix" indicates that there are free resources, but these are only CPU cores not GPUs).
med0304 is shown to be in "draining state".
This tells you that hpc-gpu-1 to hpc-gpu-3 have jobs running ("mix" indicates that there are free resources, but these are only CPU cores not GPUs).
hpc-gpu-4 is shown to be in "draining state".
Let's look what's going on there.

```bash hl_lines="10 18"
res-login-1:~$ scontrol show node med0304
NodeName=med0304 Arch=x86_64 CoresPerSocket=16
res-login-1:~$ scontrol show node hpc-gpu-4
NodeName=hpc-gpu-4 Arch=x86_64 CoresPerSocket=16
CPUAlloc=2 CPUTot=64 CPULoad=1.44
AvailableFeatures=skylake
ActiveFeatures=skylake
Gres=gpu:tesla:4(S:0-1)
NodeAddr=med0304 NodeHostName=med0304 Version=20.02.0
NodeAddr=hpc-gpu-4 NodeHostName=hpc-gpu-4 Version=20.02.0
OS=Linux 3.10.0-1127.13.1.el7.x86_64 #1 SMP Tue Jun 23 15:46:38 UTC 2020
RealMemory=385215 AllocMem=1024 FreeMem=347881 Sockets=2 Boards=1
State=MIXED+DRAIN ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Expand Down Expand Up @@ -362,18 +362,18 @@ This is done by specifying a format string and using the `%b` field.
```bash
squeue -o "%.10i %9P %20j %10u %.2t %.10M %.6D %10R %b" -p gpu
JOBID PARTITION NAME USER ST TIME NODES NODELIST(R TRES_PER_NODE
872571 gpu bash user1 R 15:53:25 1 med0303 gpu:tesla:1
862261 gpu bash user2 R 2-16:26:59 1 med0304 gpu:tesla:4
860771 gpu kidney.job user3 R 2-16:27:12 1 med0302 gpu:tesla:1
860772 gpu kidney.job user3 R 2-16:27:12 1 med0302 gpu:tesla:1
860773 gpu kidney.job user3 R 2-16:27:12 1 med0302 gpu:tesla:1
860770 gpu kidney.job user3 R 4-03:23:08 1 med0301 gpu:tesla:1
860766 gpu kidney.job user3 R 4-03:23:11 1 med0303 gpu:tesla:1
860767 gpu kidney.job user3 R 4-03:23:11 1 med0301 gpu:tesla:1
860768 gpu kidney.job user3 R 4-03:23:11 1 med0301 gpu:tesla:1
872571 gpu bash user1 R 15:53:25 1 hpc-gpu-3 gpu:tesla:1
862261 gpu bash user2 R 2-16:26:59 1 hpc-gpu-4 gpu:tesla:4
860771 gpu kidney.job user3 R 2-16:27:12 1 hpc-gpu-2 gpu:tesla:1
860772 gpu kidney.job user3 R 2-16:27:12 1 hpc-gpu-2 gpu:tesla:1
860773 gpu kidney.job user3 R 2-16:27:12 1 hpc-gpu-2 gpu:tesla:1
860770 gpu kidney.job user3 R 4-03:23:08 1 hpc-gpu-1 gpu:tesla:1
860766 gpu kidney.job user3 R 4-03:23:11 1 hpc-gpu-3 gpu:tesla:1
860767 gpu kidney.job user3 R 4-03:23:11 1 hpc-gpu-1 gpu:tesla:1
860768 gpu kidney.job user3 R 4-03:23:11 1 hpc-gpu-1 gpu:tesla:1
```
In the example above, user1 has one job with one GPU running on med0303, user2 has one job running with 4 GPUs on med0304 and user3 has 7 jobs in total running of different machines with one GPU each.
In the example above, user1 has one job with one GPU running on hpc-gpu-3, user2 has one job running with 4 GPUs on hpc-gpu-4 and user3 has 7 jobs in total running of different machines with one GPU each.
## How can I access graphical user interfaces (such as for Matlab) on the cluster?
Expand Down Expand Up @@ -579,8 +579,8 @@ export LC_ALL=C
For this, connect to the node you want to query (via SSH but do not perform any computation via SSH!)
```bash
res-login-1:~$ ssh med0301
med0301:~$ yum list installed 2>/dev/null | grep cuda.x86_64
res-login-1:~$ ssh hpc-gpu-1
hpc-gpu-1:~$ yum list installed 2>/dev/null | grep cuda.x86_64
cuda.x86_64 10.2.89-1 @local-cuda
nvidia-driver-latest-dkms-cuda.x86_64 3:440.64.00-1.el7 @local-cuda
```
Expand Down
18 changes: 9 additions & 9 deletions bih-cluster/docs/how-to/connect/gpu-nodes.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# How-To: Connect to GPU Nodes

The cluster has seven nodes with four Tesla V100 GPUs each: `hpc-gpu-{1..7}`.
The cluster has seven nodes with four Tesla V100 GPUs each: `hpc-gpu-{1..7}` and one node with 10 A400 GPUs: `hpc-gpu-8`.

Connecting to a node with GPUs is easy.
You simply request a GPU using the `--gres=gpu:tesla:COUNT` argument to `srun` and `batch`.
You simply request a GPU using the `--gres=gpu:$CARD:COUNT` (for `CARD=tesla` or `CARD=a40`) argument to `srun` and `batch`.
This will automatically place your job in the `gpu` partition (which is where the GPU nodes live) and allocate a number of `COUNT` GPUs to your job.

!!! note
Expand Down Expand Up @@ -80,14 +80,14 @@ Now to the somewhat boring part where we show that CUDA actually works.

```bash
res-login-1:~$ srun --gres=gpu:tesla:1 --pty bash
med0301:~$ nvcc --version
hpc-gpu-1:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
med0301:~$ source ~/miniconda3/bin/activate
med0301:~$ conda activate gpu-test
med0301:~$ python -c 'import torch; print(torch.cuda.is_available())'
hpc-gpu-1:~$ source ~/miniconda3/bin/activate
hpc-gpu-1:~$ conda activate gpu-test
hpc-gpu-1:~$ python -c 'import torch; print(torch.cuda.is_available())'
True
```

Expand All @@ -103,16 +103,16 @@ Use `squeue` to find out about currently queued jobs (the `egrep` only keeps the
```bash
res-login-1:~$ squeue | egrep -iw 'JOBID|gpu'
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
33 gpu bash holtgrem R 2:26 1 med0301
33 gpu bash holtgrem R 2:26 1 hpc-gpu-1
```

## Bonus #2: Is the GPU running?

To find out how active the GPU nodes actually are, you can connect to the nodes (without allocating a GPU you can do this even if the node is full) and then use `nvidia-smi`.

```bash
res-login-1:~$ ssh med0301 bash
med0301:~$ nvidia-smi
res-login-1:~$ ssh hpc-gpu-1 bash
hpc-gpu-1:~$ nvidia-smi
Fri Mar 6 11:10:08 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
Expand Down
2 changes: 1 addition & 1 deletion bih-cluster/docs/how-to/software/tensorflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ TensorFlow is a package for deep learning with optional support for GPUs.
You can find the original TensorFlow installation instructions [here](https://www.tensorflow.org/install).

This article describes how to set up TensorFlow with GPU support using Conda.
This how-to assumes that you have just connected to a GPU node via `srun --mem=10g --partition=gpu --gres=gpu:tesla:1 --pty bash -i`.
This how-to assumes that you have just connected to a GPU node via `srun --mem=10g --partition=gpu --gres=gpu:tesla:1 --pty bash -i` (for Tesla V100 GPUs, for A400 GPUs use `--gres=gpu:a40:1`).
Note that you will need to allocate "enough" memory, otherwise your python session will be `Killed` because of too little memory.
You should read the [How-To: Connect to GPU Nodes](../../how-to/connect/gpu-nodes/) tutorial on an explanation of how to do this and to learn how to register for GPU usage.

Expand Down
5 changes: 3 additions & 2 deletions bih-cluster/docs/overview/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@ A cluster system bundles a high number of nodes and in the case of HPC, the focu

- approx. 256 nodes (from three generations),
- 4 high-memory nodes (2 nodes with 512 GB RAM, 2 nodes with 1 TB RAM),
- 7 GPU nodes (with 4 Tesla GPUs each), and
- a high-perfomance parallel GPFS files system.
- 7 GPU nodes with 4 Tesla GPUs each, 1 GPU node with 10 A40 GPUs, and
- a high-performance Tier 1 parallel CephFS file system with a larger but slower Tier 2 CephFS file system, and
- a legacy parallel GPFS files system.

### Network Interconnect

Expand Down
2 changes: 1 addition & 1 deletion bih-cluster/docs/overview/for-the-impatient.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ The cluster consists of the following major components:
- a scheduling system using Slurm,
- 228 general purpose compute nodes `hpc-cpu-{1..228}
- a few high memory nodes `hpc-mem-{1..4}`,
- 7 nodes with 4 Tesla V100 GPUs each (!) `hpc-gpu-{1..7}`,
- 7 nodes with 4 Tesla V100 GPUs each (!) `hpc-gpu-{1..7}` and 1 node with 10x A40 GPUs (!) `hpc-gpu-8`,
- a high-performance, parallel GPFS file system with 2.1 PB, by DDN mounted at `/fast`,
- a tier 2 (slower) storage system based on Ceph/CephFS

Expand Down
2 changes: 1 addition & 1 deletion bih-cluster/docs/overview/job-scheduler.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ See [Resource Registration: GPU Nodes](../admin/resource-registration.md#gpu-nod

* **maximum run time:** 14 days
* **partition name:** `gpu`
* **argument string:** select `$count` GPUs: `-p gpu --gres=gpu:tesla:$count`, maximum run time: `--time 14-00:00:00`
* **argument string:** select `$count` GPUs: `-p gpu --gres=gpu:$card:$count` (`card=tesla` or `card=a40`), maximum run time: `--time 14-00:00:00`

### `highmem`

Expand Down
2 changes: 1 addition & 1 deletion bih-cluster/docs/slurm/commands-sbatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ The command will create a batch job and add it to the queue to be executed at a
As you can define minimal and maximal number of tasks/CPUs/cores, you could also specify `--mem-per-cpu` and get more flexible scheduling of your job.
- `--gres`
-- Generic resource allocation.
On the BIH HPC, this is only used for allocating GPUS, e.g., with `--gres=gpu:tesla:2`, a user could allocate two NVIDIA Tesla GPUs on the same hsot.
On the BIH HPC, this is only used for allocating GPUS, e.g., with `--gres=gpu:tesla:2`, a user could allocate two NVIDIA Tesla GPUs on the same host (use `a40` instead of `tesla` for the A40 GPUs).
- `--licenses`
-- On the BIH HPC, this is used for the allocation of MATLAB 2016b licenses only.
- `--partition`
Expand Down
2 changes: 1 addition & 1 deletion bih-cluster/docs/slurm/rosetta-stone.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,4 +45,4 @@ The table below shows some SGE commands and their Slurm equivalents.
| allocate memory | `-l h_vmem=size` | `--mem=mem` OR `--mem-per-cpu=mem` |
| wait for job | `-hold_jid jid` | `--depend state:job` |
| select target host | `-l hostname=host1\|host1` | `--nodelist=nodes` AND/OR `--exclude` |
| allocate GPU | `-l gpu=1` | `--gres=gpu:tesla:count` |
| allocate GPU | `-l gpu=1` | `--gres=gpu:tesla:count` or `--gres=gpu:a40:count` |

0 comments on commit bd935cc

Please sign in to comment.