From 8b1806008f198cdcc194e8631374de34dd8d0037 Mon Sep 17 00:00:00 2001 From: Thomas Sell Date: Tue, 26 Mar 2024 13:15:25 +0100 Subject: [PATCH] more cleanup and re-arrangements --- .../docs/best-practice/project-structure.md | 3 +- .../software-installation-with-conda.md | 82 +++++++---------- bih-cluster/docs/connecting/connecting.md | 17 +++- bih-cluster/docs/connecting/from-external.md | 22 ++--- bih-cluster/docs/help/faq.md | 57 +++--------- bih-cluster/docs/help/hpc-talk.md | 13 ++- bih-cluster/docs/index.md | 8 +- .../docs/overview/for-the-impatient.md | 90 +------------------ bih-cluster/docs/slurm/overview.md | 9 +- bih-cluster/docs/storage/home-quota.md | 45 ++++++++++ bih-cluster/docs/storage/storage-locations.md | 16 ++-- bih-cluster/mkdocs.yml | 5 +- 12 files changed, 146 insertions(+), 221 deletions(-) create mode 100644 bih-cluster/docs/storage/home-quota.md diff --git a/bih-cluster/docs/best-practice/project-structure.md b/bih-cluster/docs/best-practice/project-structure.md index d4b729cb2..c02a36799 100644 --- a/bih-cluster/docs/best-practice/project-structure.md +++ b/bih-cluster/docs/best-practice/project-structure.md @@ -1,6 +1,7 @@ # Project File System Structure -This Wiki page dscribes best pratices for managing your Bioinformatics projects on the file system. +!!! warning "Under Construction" + This guide was written for the old GPFS file system is in the process of being updated. ## General Aims diff --git a/bih-cluster/docs/best-practice/software-installation-with-conda.md b/bih-cluster/docs/best-practice/software-installation-with-conda.md index 86138660b..7437ee284 100644 --- a/bih-cluster/docs/best-practice/software-installation-with-conda.md +++ b/bih-cluster/docs/best-practice/software-installation-with-conda.md @@ -1,14 +1,9 @@ # Software Installation with Conda - ## Conda - -For the management of the bioinformatics software on the BIH cluster we are using conda. -Conda is a package management system that is based on channels, and one of those -channels provides a huge selection of bioinformatics software. - -Conda is written in Python and is based on recipes, such that everybody can -write recipes for missing software (if there is any). In general the packages -are pre-compiled and conda just downloads the binaries from the conda servers. +Users do not have the rights to install system packages on the BIH HPC cluster. +For the management of bioinformatics software we therefore recommend using the conda package manager. +Conda provides software in different “channels” and one of those channels contains a huge selection of bioinformatics software (bioconda). +Generally packages are pre-compiled and conda just downloads the binaries from the conda servers. You are in charge of managing your own software stack, but conda makes it easy to do so. We will provide you with a description on how to install conda and how @@ -16,69 +11,58 @@ to use it. Of course there are many online resources that you can also use. Please find a list at the end of the document. Also note that some system-level software is managed through environment modules. -See [System-near Software Provided by HPC Administration](#system-near-software-provided-by-hpc-administration) below. ## Premise - When you logged into the cluster, please make sure that you also executed `srun` to log into a computation node and perform the software installation there. ## Installing conda ```bash hpc-login-1:~$ srun --mem=5G --pty bash -i -med0127:~$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -med0127:~$ bash Miniconda3-latest-Linux-x86_64.sh -b -f -p $HOME/work/miniconda +hpc-cpu-123:~$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh +hpc-cpu-123:~$ bash Miniconda3-latest-Linux-x86_64.sh -b -f -p $HOME/work/miniconda +hpc-cpu-123:~$ conda init ``` This will install conda to `$HOME/work/miniconda`. This path can be changed to your liking. -Please note that the `$HOME` folder has limited space (an exception is the subfolder `$HOME/work` which has no space limit). - -NB: `$HOME/scratch` is not appropriate as files placed there will be removed automatically after 2 weeks. - -To make it available upon login, extend and export the `$PATH` variable with the -installation path + `/bin` and add it to your `$HOME/.bashrc`: - -```bash -case "${SLURMD_NODENAME-${HOSTNAME}}" in - login-*) - ;; - *) - export PATH=$HOME/work/miniconda/condabin:$PATH - ;; -esac -``` +Please note that the `$HOME` folder has limited space, [more about that here](../storage/home-quota.md). -The above code makes sure that you don't have conda available on the login nodes, -where you are not allowed to start any computations. +!!! note + `$HOME/scratch` is not appropriate as files placed there will be removed automatically after 2 weeks. To make bioinformatics software available, we have to add the `bioconda` and some other channels to the conda configuration: ```bash -med0127:~$ conda config --add channels bioconda -med0127:~$ conda config --add channels default -med0127:~$ conda config --add channels conda-forge +hpc-cpu-123:~$ conda config --add channels bioconda +hpc-cpu-123:~$ conda config --add channels default +hpc-cpu-123:~$ conda config --add channels conda-forge ``` -You can also add channels to your liking. +!!! warning "Important" + By default conda will automatically activate the (base) environment for new shell sessions. + This is can significantly delay your login process and should be turned off: -## Installing software with conda + ```sh + hpc-cpu-123:~$ conda config --set auto_activate_base false + ``` + +## Installing software with conda Installing packages with conda is straight forward: ```bash -med0127:~$ conda install +hpc-cpu-123:~$ conda install ``` -This will install a package into the conda root environment. We will explain -environments in detail in the next section. - +This will install a package into the conda base environment. +We will explain environments in detail in the next section. To search for a package, e.g. to find the correct name in conda or if it exists at all, issue the command: ```bash -med0127:~$ conda search +hpc-cpu-123:~$ conda search ``` To choose a specific version (conda will install the latest version that is @@ -86,7 +70,7 @@ compatible with the current installed Python version), you can provide the version as follows: ```bash -med0127:~$ conda install = +hpc-cpu-123:~$ conda install = ``` Please note that new conda installs may ship with a recently update Python version and not all packages might have been adapted. @@ -102,13 +86,13 @@ E.g., if you find out that some packages don't work after starting out/upgrading Simply run ```bash - med0127:~$ conda install mamba + hpc-cpu-123:~$ conda install mamba ``` With that, you can install software into your environment using the same syntax as for Conda: ```bash - med0127:~$ mamba install + hpc-cpu-123:~$ mamba install ``` ## Creating an environment @@ -125,9 +109,9 @@ environment, is is available in all other environments. To create a Python 2.7 environment and activate it, issue the following commands: ```bash -med0127:~$ conda create -n py27 python=2.7 -med0127:~$ source activate py27 -(py27) med0127:~$ +hpc-cpu-123:~$ conda create -n py27 python=2.7 +hpc-cpu-123:~$ source activate py27 +(py27) hpc-cpu-123:~$ ``` From now on, conda will install packages into the `py27` environment when you issue @@ -135,8 +119,8 @@ the `install` command. To switch back to the root environment, simply deactivate `py27` environment: ```bash -(py27) med0127:~$ source deactivate py27 -med0127:~$ +(py27) hpc-cpu-123:~$ source deactivate py27 +hpc-cpu-123:~$ ``` But of course, as Python 2.7 is not supported any more by the Python Software Foundation, you should switch over to Python 3 already! diff --git a/bih-cluster/docs/connecting/connecting.md b/bih-cluster/docs/connecting/connecting.md index f9a2ad010..a6d9dc70a 100644 --- a/bih-cluster/docs/connecting/connecting.md +++ b/bih-cluster/docs/connecting/connecting.md @@ -23,6 +23,7 @@ Follow these steps to connect to BIH HPC via the command line: # Charite Users $ ssh user_c@hpc-login-1.cubi.bihealth.org $ ssh user_c@hpc-login-2.cubi.bihealth.org + # MDC Users $ ssh user_m@hpc-login-1.cubi.bihealth.org $ ssh user_m@hpc-login-2.cubi.bihealth.org @@ -37,8 +38,15 @@ Follow these steps to connect to BIH HPC via the command line: Please also read [Advanced SSH](./advanced-ssh/overview.md) for more custom scenarios how to connect to BIH HPC. If you are using a Windows PC to access BIH HPC, please read [Connecting via SSH on Windows](./connecting-windows.md) -5. Bonus: [Configure your SSH client :wrench: on Linux and Mac](advanced-ssh/linux.md) or [Windows](advanced-ssh/windows.md). -6. Bonus: [Connect from external networks :flying_saucer:](./from-external.md). +5. Allocate resources on a computation node using [Slurm](../slurm/overview.md). Do not compute on the login node! + + ```bash + # Start interactive shell on computation node + $ srun --pty bash -i + ``` + +6. Bonus: [Configure your SSH client :wrench: on Linux and Mac](advanced-ssh/linux.md) or [Windows](advanced-ssh/windows.md). +7. Bonus: [Connect from external networks :flying_saucer:](./from-external.md). !!! tip "tl;dr" @@ -49,10 +57,11 @@ Follow these steps to connect to BIH HPC via the command line: # Interactive login (choose one) ssh username@hpc-login-1.cubi.bihealth.org ssh username@hpc-login-2.cubi.bihealth.org + srun --pty bash -i # File Transfer (choose one) - sftp username@hpc-transfer-1.cubi.bihealth.org - sftp username@hpc-transfer-2.cubi.bihealth.org + sftp local/file username@hpc-transfer-1.cubi.bihealth.org:remote/file + sftp username@hpc-transfer-2.cubi.bihealth.org:remote/file local/file # Interactive login into the transfer nodes (choose one) ssh username@hpc-transfer-1.cubi.bihealth.org diff --git a/bih-cluster/docs/connecting/from-external.md b/bih-cluster/docs/connecting/from-external.md index d86e84d30..9971c8bcd 100644 --- a/bih-cluster/docs/connecting/from-external.md +++ b/bih-cluster/docs/connecting/from-external.md @@ -48,17 +48,17 @@ $ ssh user_m@hpc-login-1.cubi.bihealth.org ``` ## Charité Users -You will then have to apply for (1) general VPN access and (2) extended VPN access to BIH HPC. -Finally, you will be able to connect to BIH HPC from VPN. +Access to BIH HPC from external networks (including Eduroam) requires a Charité VPN connection with special access permissions. -### General Charite VPN Access -You need to apply for Charite VPN access, if you haven't done so already. +### General Charité VPN Access +You need to apply for general Charité VPN access if you haven't done so already. The form can be found in the [Charite Intranet](https://intranet.charite.de/fileadmin/user_upload/portal/service/service_06_geschaeftsbereiche/service_06_14_it/VPN-Antrag_Mitarb_Stud.pdf) and contains further instructions. +[Charité IT Helpdesk](mailto:helpdesk@charite.de) can help you with any questions. -### Zusatzantrag B (Recommended) -You can find [Zusatzantrag B](https://intranet.charite.de/fileadmin/user_upload/portal/service/service_06_geschaeftsbereiche/service_06_14_it/VPN-Zusatzantrag_B.pdf) in the Charite intranet. -Fill it out and ship it in addition to the general VPN access form from above. -[Charite Helpdesk](mailto:helpdesk@charite.de) can help you with any questions. +### Zusatzantrag B +Special permissions form B is also required for HPC access. +You can find [Zusatzantrag B](https://intranet.charite.de/fileadmin/user_upload/portal/service/service_06_geschaeftsbereiche/service_06_14_it/VPN-Zusatzantrag_B.pdf) in the Charité intranet. +Fill it out and send it to the same address as the general VPN access form above. Once you have been granted VPN access, start the client and connect to VPN. You will then be able to connect from your client in the VPN just as you do from your workstation. @@ -67,9 +67,9 @@ You will then be able to connect from your client in the VPN just as you do from $ ssh jdoe_c@hpc-login-1.cubi.bihealth.org ``` -### Charite VDI -Alternative to using Zusatzantrag B, you can also get access to the Charite VDI (Virtual Desktop Infrastructure). -Here, you connect to a virtual desktop computer which is in the Charite network. +### Charité VDI (Not recommended) +Alternative to using Zusatzantrag B, you can also get access to the Charité VDI (Virtual Desktop Infrastructure). +Here, you connect to a virtual desktop computer which is in the Charité network. From there, you can connect to the BIH HPC system. You need to apply for extended VPN access to be able to access the BIH VDI. diff --git a/bih-cluster/docs/help/faq.md b/bih-cluster/docs/help/faq.md index 3fcb423eb..0a07529e3 100644 --- a/bih-cluster/docs/help/faq.md +++ b/bih-cluster/docs/help/faq.md @@ -1,12 +1,6 @@ # Frequently Asked Questions -## What is this Website? - -This is the BIH cluster documentation that was created and is maintained by BIH Core Unit Bioinformatics (CUBI) and BIH HPC IT with contributions by BIH HPC Users. -The aim is to gather the information for using the cluster efficiently and helping common issues andproblems. - ## Where can I get help? - - Talk to your colleagues! - Have a look at our forums at [HPC-talk](https://hpc-talk.cubi.bihealth.org/) to see if someone already solved the same problem. If not, create a new topic. Administrators, CUBI, and other users can see and answer your question. @@ -14,12 +8,20 @@ The aim is to gather the information for using the cluster efficiently and helpi - For problems with BIH HPC please contact [hpc-helpdesk@bih-charite.de]. ## I cannot connect to the cluster. What's wrong? - Please see the section [Connection Problems](../connecting/connection-problems.md). -## I'd like to learn more about Slurm +## Connecting to the cluster takes a long time. +The most probable cause for this is a conda installation which defaults to loading the _(Base)_ environment on login. +To disable this behaviour you can run: -- Some documentation is available on this website, e.g., start at [Slurm Quickstart](../slurm/quickstart.md). +```sh +$ conda config --set auto_activate_base false +``` + +You can also run the bash shell in verbose mode to find out exactly which command is slowing down login: +```sh +$ ssh user@hpc-login-1.cubi.bihealth.org bash -iv +``` ## What is the difference between MAX and BIH cluster? What is their relation? @@ -317,12 +319,7 @@ This is probably answered by the answer to [My jobs don't run in the partition I You cannot. -## Why can I not mount a network volume from elsewhere on the cluster? - -For performance and stability reasons. -Network volumes are notorious for degrading performance, depending on the used protocol, even stability. - -## How can I then access the files from my workstation/server? +## How can I make workstation/server files available to the HPC? You can transfer files to the cluster through Rsync over SSH or through SFTP to the `hpc-transfer-1` or `hpc-transfer-2` node. @@ -337,11 +334,6 @@ E.g., use the `-march=sandybridge` argument to the GCC/LLVM compiler executables If you absolutely need it, there are some boxes with more recent processors in the cluster (e.g., Haswell architecture). Look at the `/proc/cpuinfo` files for details. -## Where should my (Mini)conda install go? - -As conda installations are big and contain many files, they should go into your `work` directory. -**E.g., `/fast/users/$USER/work/miniconda` is appropriate.** - ## I have problems connecting to the GPU node! What's wrong? Please check whether there might be other jobs waiting in front of you! @@ -414,7 +406,6 @@ You can see the assignment of architectures to nodes using the `sinfo -o "%8P %. This will also display node partition, availability etc. ## Help, I'm getting a Quota Warning Email! - No worries! As documented in the [Storage Locations](../storage/storage-locations.md) section, each user/project/group has three storage volumes: @@ -433,22 +424,10 @@ Use the following command to list **all** files and directories in your home: hpc-login-1:~$ ls -la ~/ ``` -You can use the following command to see how space each item takes up, including the hidden directories - -```bash -hpc-login-1:~$ du -shc ~/.* ~/* --exclude=.. --exclude=. -``` - -In the case that, e.g., the `.cpan` directory is large, you can move it to `work` and create a symlink in its original place. - -```bash -hpc-login-1:~$ mv ~/.cpan ~/work/.cpan -hpc-login-1:~$ ln -sr ~/work/.cpan ~/.cpan -``` +For more information on how to keep your home directory clean and avoid quota warnings, please read [Home Folder Quota](../storage/home-quota.md). ## I'm getting a "Disk quota exceeded" error. - -Most probably you are running into the same problem as described and solved in the entry [Help, I'm getting a Quota Warning Email!](#help-im-getting-a-quota-warning-email) +Most probably you are running into the same problem as described above: [Help, I'm getting a Quota Warning Email!](#help-im-getting-a-quota-warning-email) ## Environment modules don't work and I get "module: command not found" @@ -637,14 +616,6 @@ slurmstepd: error: task[0] unable to set taskset '0x0' This is a minor failure related to Slurm and cgroups. Your job **should** run through successfully despite this error (that is more of a warning for end-users). -## My login stalls / something weird is happening - -You can try to run `ssh -l USER hpc-login-1.cubi.bihealth.org bash -iv`. -This will run `bash -iv` instead of the normal login shell. -The parameter `-i` is creating an interactive shell (which is what you want) and `-v` to see every command that is executed. -This way you will see **every command** that is executed. -You will also be able to identify at which point there is any stalling (e.g., activating conda via `source .../conda` when the fiel system is slow). - ## How can I share files/collaborate with users from another work group? Please use [projects as documented here](../admin/getting-access.md#projects). diff --git a/bih-cluster/docs/help/hpc-talk.md b/bih-cluster/docs/help/hpc-talk.md index baeef68b0..4dfd16b0d 100644 --- a/bih-cluster/docs/help/hpc-talk.md +++ b/bih-cluster/docs/help/hpc-talk.md @@ -1,13 +1,12 @@ # HPC Talk -Another community-driven possibility to get help is our HPC Talk portal. After this manual, it should be the first place to consult. +Another community-driven possibility to get help is our “HPC Talk” forum. After this manual, it should be the first place to consult. https://hpc-talk.cubi.bihealth.org/ -For those who are familiar with it, it resembles the concept of stack overflow, only without the voting. +Its main purpose is to serve as a FAQ, so with time and more people participating, you will more likely find an answer to your question. +We also use it to make announcements and give an up-to-date status of current problems with the cluster, so it is worth logging in every once in a while. +It is also a great first place to look at if you're experiencing problems with the cluster. +Maybe it's a known issue. -Its main purpose is to server as a FAQ, so with time and more people participating, you will more likely find an answer to your question. - -Also, we use it to make announcements and give an up-to-date status of current problems with the cluster, so it is worth to look at it every once in a while, or the first place to look at if you experience problems with the cluster to see if this is a known issue. - -Despite users also being able to anser questions, our admins do participate on a regular basis. +Despite users also being able to answer questions, our admins do participate on a regular basis. diff --git a/bih-cluster/docs/index.md b/bih-cluster/docs/index.md index ce7ae1b69..3bb2ff581 100644 --- a/bih-cluster/docs/index.md +++ b/bih-cluster/docs/index.md @@ -19,11 +19,9 @@ Read the following set of pages (in order) to learn how to get access and connec 1. [Getting Access](admin/getting-access.md) 2. [Connecting](connecting/connecting.md) 3. [Storage](storage/storage-locations.md) -4. [Getting Help](help/hpc-talk.md) ([Writing Good Tickets](help/good-tickets.md); if no answer found, contact the [HPC Helpdesk](help/helpdesk.md)). -5. [HPC Tutorial](hpc-tutorial/episode-0.md) - -Then, continue reading through the manual. - +5. [Slurm](slurm/overview.md) +6. [Getting Help](help/hpc-talk.md) ([Writing Good Tickets](help/good-tickets.md); if no answer found, contact the [HPC Helpdesk](help/helpdesk.md)). +7. [HPC Tutorial](hpc-tutorial/episode-0.md) !!! note "Acknowledging BIH HPC Usage" Acknowledge usage of the cluster in your manuscript as *"Computation has been performed on the HPC for Research/Clinic cluster of the Berlin Institute of Health"*. diff --git a/bih-cluster/docs/overview/for-the-impatient.md b/bih-cluster/docs/overview/for-the-impatient.md index 89f8ba004..29106987e 100644 --- a/bih-cluster/docs/overview/for-the-impatient.md +++ b/bih-cluster/docs/overview/for-the-impatient.md @@ -1,10 +1,6 @@ -# For the Impatient - -This document describes the fundamentals of using the BIH cluster in a very terse manner. - +# Overview ## HPC 4 Research - -**HPC 4 Research** is located in the BIH data center in Buch and connected via the BIH research networks. +**HPC 4 Research** is located in the BIH data center in Buch and connected via the BIH research network. Connections can be made from Charite, MDC, and BIH networks. The cluster is open for users with either Charite or MDC accounts after [getting access through the gatekeeper proces](../admin/getting-access.md). The system has been designed to be suitable for the processing of human genetics data from research contexts (and of course data without data privacy concerns such as public and mouse data). @@ -16,10 +12,11 @@ The cluster consists of the following major components: - 2 login nodes for users `hpc-login-1` and `hpc-login-2` (for interactive sessions only), - 2 nodes for file transfers `hpc-transfer-1` and `hpc-transfer-2`, - a scheduling system using Slurm, -- 228 general purpose compute nodes `hpc-cpu-{1..228} +- 228 general purpose compute nodes `hpc-cpu-{1..228}` - a few high memory nodes `hpc-mem-{1..4}`, - 7 nodes with 4 Tesla V100 GPUs each (!) `hpc-gpu-{1..7}` and 1 node with 10x A40 GPUs (!) `hpc-gpu-8`, - a high-performance, parallel GPFS file system with 2.1 PB, by DDN mounted at `/fast`, +- a next generation high-performance storage system based on Ceph/CephFS - a tier 2 (slower) storage system based on Ceph/CephFS This is shown by the following picture: @@ -76,82 +73,3 @@ This addresses a lot of suboptimal (yet not dangerous, of course) points we obse The I/O system might get overloaded and saving scripts might take some time. We know of people who do this and it works for them. Your mileage might vary. - -## Locations on the Cluster - -- Your home directory is located in `/fast/users/$USER`. - **Your home is for scripts, source code, and configuration only.** - **Use your `work` directory for large files.** - **The quota in the `home` directory is 1 GB but we have nightly snapshots and backups thereof.** -- Your work directory is located in `/fast/users/$USER/work`. - This is where you should place large files. - Files in this location do not have snapshots or backups. -- The directory (actually a GPFS file set) `/fast/users/$USER/scratch` should be used for temporary data. - **All data placed there will be removed after 2 weeks.** -- If you are part of an AG/lab working on the cluster, the group directory is in `/fast/groups/$AG`. -- Projects are located in `/fast/projects/$PROJECT`. - -!!! important "So-called dot files/directories filling up your home?" - - Files and directories starting with a dot "`.`" are not shown with the "ls" command. - May users run into problems with directories such as `$HOME/.local` but also non-dot directories such as `$HOME/R` filling up their storage. - You should move such large directories to your work volume and only keep a symlink in your `$HOME`. - - Here is how you find large directories: - - ```bash - host:~$ du -shc ~/.* ~/* --exclude=.. --exclude=. - ``` - - Here is how you move them to your work and replace them with a symlink, e.g., for `~/.local`: - - ```bash - host:~$ mv ~/.local ~/work/.local - host:~$ ln -s ~/work/.local ~/.local - ``` - - Also see the [related FAQ entry](../help/faq.md#help-im-getting-a-quota-warning-email). - -### Temporary Directories - -Note that you also have access to `/tmp` on the individual nodes but the disk is **small** and might be a **slow** spinning disk. -If you are processing large NGS data, we recommend you create `/fast/users/$USER/scratch/tmp` and set the environment variable `TMPDIR` to point there. -However, for creating locks special Unix files such as sockets or fifos, `/tmp` is the right place. -**Note that files placed in your `scratch` directory will be removed automatically after 2 weeks.** -**Do not place any valuable files in there.** - -## First Steps on the Cluster - -### Connecting to the Cluster - -- From the Charite, MDC, and BIH networks, you can connect to the cluster login nodes `hpc-login-{1,2}.cubi.bihealth.org`. - - For Charite users, your name is `${USER}_c`, for MDC users, your account is `${USER}_m` where `$USER` is the login name of your primary location. -- From the outside, **for MDC users**, the cluster is accessible via `ssh1.mdc-berlin.de` (you need to enable SSH key agent forwarding for this) - - Note that you have to use your MDC user name (without any suffix `_m`) for connecting to this host. - - Also note that BIH HPC IT does not have control over `ssh1.mdc-berlin.de`. - *You have to contact MDC IT in case of any issues.* -- From the outside, **for Charite** users, there is no SSH hop node. - Instead, you have to apply for VPN through Charite Geschäftsbereich IT. - You can use [this form availble in Charite Intranet](https://intranet.charite.de/fileadmin/user_upload/portal/service/service_06_geschaeftsbereiche/service_06_14_it/VPN-Zusatzantrag_O.pdf) for this. - Please refer to the Charite intranet or helpdesk@charite.de for more information. -- Also consider using the [OnDemand Portal](../ondemand/overview.md) at https://hpc-portal.cubi.bihealth.org. - -### Connecting to Compute Node through Login Node - -After logging into the cluster, you are on the login node `hpc-login-1` or `hpc-login-2`. -When transferring files, use the `hpc-transfer-1` or `hpc-transfer-2` nodes. -You should not do computation or other work on the login or file transfer nodes, but use the compute nodes instead. -Typically, you'll create an interactive session on a compute node using the `srun` command. - -### Submitting Jobs - -While not recommended, you can perform computations (such as using BWA) in the interactive session. -However, when the connection is interrupted, your computation process will be stopped. -It is therefore recommended you submit jobs using the `sbatch` command (or [use screen or tmux](../best-practice/screen-tmux.md)). - -Details on how to use Slurm `srun`, `sbatch`, and and other commands can be found in the [Cluster Scheduler](../slurm/overview.md) section. - -### Inspecting Jobs and the Cluster - -You can inspect your currently running jobs with `squeue`, and kill them using `scancel`. -You can inspect jobs that have finished with `sacct`, and see the cluster nodes using `sinfo`. diff --git a/bih-cluster/docs/slurm/overview.md b/bih-cluster/docs/slurm/overview.md index 369cce608..852c2c817 100644 --- a/bih-cluster/docs/slurm/overview.md +++ b/bih-cluster/docs/slurm/overview.md @@ -1,13 +1,12 @@ # Scheduling Overview - -The BIH HPC uses the [Slurm](https://slurm.schedmd.com/overview.html) scheduling system. -This section of the manual attempts to give an overview of what scheduling is and how you can use the Slurm scheduler. +The BIH HPC uses the [Slurm](https://slurm.schedmd.com/overview.html) scheduling system for resource allocation. +This section of the manual attempts to give an overview of what scheduling is and how to use the Slurm scheduler. For more detailed information, you will have to refer to the [Slurm website](https://slurm.schedmd.com/overview.html) and the Slurm man pages (e.g., by entering `man sbatch` or `man srun` on the HPC terminal's command line). For a quick introduction and hands-on examples, please see the manual sections -- Overview, starting with [For the Impatient](../overview/for-the-impatient.md), and -- First Steps/Tutorial, starting with [Episode 0](../hpc-tutorial/episode-0.md). +- Overview, starting with [Slurm Quickstart](./quickstart.md), and +- HPC Tutorial, starting with [Episode 0](../hpc-tutorial/episode-0.md). Also, make sure that you are aware of our [How-To: Debug Software](../how-to/misc/debug-software.md) and [How-To: Debug Software on HPC Systems](../how-to/misc/debug-at-hpc.md) guides in the case that something goes wrong. diff --git a/bih-cluster/docs/storage/home-quota.md b/bih-cluster/docs/storage/home-quota.md new file mode 100644 index 000000000..9c7c0cc0d --- /dev/null +++ b/bih-cluster/docs/storage/home-quota.md @@ -0,0 +1,45 @@ +# Keeping your home folder clean +We set quite restrictive quotas for user homes, but in exchange you get file system [snapshots and mirroring](./storage-locations.md#snapshots-and-mirroring). +Your home folder should therefore only be used for scripts, your user config, and other small files. +Everything else should be stored in the `work` or `scratch` subdirectories, which effectively link to your group's shared storage space. +This document describes some common pitfalls and how to circumvent them. + +!!! hint + The tilde character (`~`) is shorthand for your home directory. + +## Code libraries and other big folders +Various programs are used to depositing large folders in a user's home and can quickly use up your allotted storage quota. +These include: + +- Python: `~/.local/lib/python*` +- *conda: Location chosen by the user. +- R: `~/R/x86_64-pc-linux-gnu-library` +- [HPC portal](../ondemand/overview.md): `~/ondemand` + +Please note that directories whose name is starting with a dot are not shown by the normal `ls` command, but require the `ls -a` flag. You can search your home folder for large directories like so: +```bash +$ du -shc ~/.* ~/* --exclude=.. --exclude=. +``` + +You should move these locations to your `work` folder and create symbolic links in their place. +Conda installations should be installed in `work` from the very beginning as they do not react well to being moved around. + +Here is an example for the `.local` folder. + +```bash +$ mv ~/.local ~/work/.local +$ ln -s ~/work/.local ~/.local +``` + +## Temporary Files +Another usual culprit is the hidden `.cache` directory which contains temporary files. +This folder can be moved to the `scratch` volume in a similar manner as described above. + +```bash +$ mv ~/.cache ~/scratch/.cache +$ ln -s ~/scratch/.cache ~/.cache +``` + +!!! warning "Important" + Files placed in your `scratch` directory will be [automatically removed](./scratch-cleanup.md) after 2 weeks. + Do not place any valuable files in there. diff --git a/bih-cluster/docs/storage/storage-locations.md b/bih-cluster/docs/storage/storage-locations.md index d89b2357b..3383a131f 100644 --- a/bih-cluster/docs/storage/storage-locations.md +++ b/bih-cluster/docs/storage/storage-locations.md @@ -14,7 +14,7 @@ There are the following three entities on the cluster: Each user, group, and project can have storage folders in different locations. -## Data Types and storage Tiers +## Data Types and Storage Tiers Files stored on the HPC fall into one of three categories: 1. **Home** folders store programs, scripts, and user config i. e. long-lived and very important files. @@ -38,15 +38,15 @@ In the HPC filesystem they are mounted in `/data/cephfs-1` and `/data/cephfs-2`. Storage quotas are imposed in these locations to restrict the maximum size of folders. Amount and utilization of quotas is communicated via the [HPC Access](https://hpc-access.cubi.bihealth.org/) web portal. -### Home directories +### Home Directories Location: `/data/cephfs-1/home/` Only users have home directories on Tier 1 storage. This is the starting point when starting a new shell or SSH session. Important config files are stored here as well as analysis scripts and small user files. -Home folders have a strict storage quota of 1 GB. +Home folders have a [strict storage quota](./home-quota.md) of 1 GB. -### Work directories +### Work Directories Location: `/data/cephfs-1/work/` Groups and projects have work directories on Tier 1 storage. @@ -55,7 +55,7 @@ Files shared within a group/project are stored here as long as they are in activ Work folders are generally limited to 1 TB per group. Project work folders are allocated on an individual basis. -### Scratch space +### Scratch Space Location: `/data/cephfs-1/scratch/` Groups and projects have scratch space on Tier 1 storage. @@ -63,9 +63,9 @@ User home folders contain a symlink to their respective group's scratch space. Meant for temporary, potentially large data e. g. intermediate unsorted or unmasked BAM files, data downloaded from the internet etc. Scratch space is generally limited to 10 TB per group. Projects are allocated scratch on an individual basis. -**Files in scratch will be [automatically removed](scratch-cleanup.md) 2 weeks after their creation.** +Files in scratch will be [automatically removed](scratch-cleanup.md) 2 weeks after their creation. -### Tier 2 storage +### Tier 2 Storage Location: `/data/cephfs-2/` This is where big files go when they are not in active use. @@ -105,7 +105,7 @@ Depending on the location and Tier, CephFS creates snapshots in different freque Some parts of Tier 1 and Tier 2 snapshots are also mirrored into a separate fire compartment within the data center. This provides an additional layer of security i. e. physical damage to the servers. -### Accessing snapshots +### Accessing Snapshots To access snapshots, simply navigate to the `.snap/` sub-folder of the respective location. You will find one sub-folder for every snapshot created and in them a complete replica of the folder respective folder at the time of snapshot creation. diff --git a/bih-cluster/mkdocs.yml b/bih-cluster/mkdocs.yml index fdbc538e4..f8b8c2e7b 100644 --- a/bih-cluster/mkdocs.yml +++ b/bih-cluster/mkdocs.yml @@ -101,6 +101,7 @@ nav: - "Connection Problems": connecting/connection-problems.md - "Storage": - "Storage Locations": storage/storage-locations.md + - "Home Folder Quota": storage/home-quota.md - "Scratch Cleanup": storage/scratch-cleanup.md - "Querying Quotas": storage/querying-storage.md - "Storage Migration": storage/storage-migration.md @@ -111,7 +112,7 @@ nav: - "Episode 2": hpc-tutorial/episode-2.md - "Episode 3": hpc-tutorial/episode-3.md - "Episode 4": hpc-tutorial/episode-4.md - - "Cluster Scheduler": + - "Cluster Scheduler (Slurm)": - slurm/overview.md - slurm/background.md - slurm/quickstart.md @@ -144,7 +145,7 @@ nav: - "~/.bashrc Guide": best-practice/bashrc-guide.md - "Temporary Files": best-practice/temp-files.md - "Custom Environment Modules": best-practice/env-modules.md - - "Install with Conda": best-practice/software-installation-with-conda.md + - "Install Software with Conda": best-practice/software-installation-with-conda.md - "Static Data (Cubit)": - "Overview": cubit/index.md - "Annotations": cubit/annotations.md