Skip to content

Commit

Permalink
replace paths; GPFS > CephFS (#169)
Browse files Browse the repository at this point in the history
  • Loading branch information
sellth authored Aug 16, 2024
1 parent 2e2e56c commit 2c43287
Show file tree
Hide file tree
Showing 12 changed files with 67 additions and 64 deletions.
2 changes: 1 addition & 1 deletion bih-cluster/docs/best-practice/env-modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ case "${HOSTNAME}" in

# Define path for temporary directories, don't forget to cleanup!
# Also, this will only work after /fast is available.
export TMPDIR=/fast/users/$USER/scratch/tmp
export TMPDIR=/data/cephfs-1/home/users/$USER/scratch/tmp
;;
esac
```
12 changes: 6 additions & 6 deletions bih-cluster/docs/best-practice/temp-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,17 @@ When undefined, usually `/tmp` is used.

Generally, there are two locations where you could put temporary files:

- `/fast/users/$USER/scratch/tmp` -- inside your scratch folder on the fast GPFS file system; this location is available from all cluster nodes
- `/data/cephfs-1/home/users/$USER/scratch/tmp` -- inside your scratch folder on the CephFS file system; this location is available from all cluster nodes
- `/tmp` -- on the local node's temporary folder; this location is only available on the node itself.
The slurm scheduler uses Linux namespaces such that every **job** gets its private `/tmp` even when run on the same node.

### Best Practice: Use `/fast/users/$USER/scratch/tmp`
### Best Practice: Use `scratch/tmp`

!!! warning "Use GPFS-based TMPDIR"
!!! warning "Use CephFS-based TMPDIR"

Generally setup your environment to use `/fast/users/$USER/scratch/tmp` as filling the local disk of a node with forgotten files can cause a lot of problems.
Generally setup your environment to use `/data/cephfs-1/home/users/$USER/scratch/tmp` as filling the local disk of a node with forgotten files can cause a lot of problems.

Ideally, you append the following to your `~/.bashrc` to use `/fast/users/$USER/scratch/tmp` as the temporary directory.
Ideally, you append the following to your `~/.bashrc` to use `/data/cephfs-1/home/users/$USER/scratch/tmp` as the temporary directory.
This will also create the directory if it does not exist.
Further, it will create one directory per host name which prevents too many entries in the temporary directory.

Expand All @@ -40,7 +40,7 @@ mkdir -p $TMPDIR
## `TMPDIR` and the scheduler

In the older nodes, the local disk is a relatively slow spinning disk, in the newer nodes, the local disk is a relatively fast SSD.
Further, the local disk is independent from the GPFS file system, so I/O volume to it does not affect the network or any other job on other nodes.
Further, the local disk is independent from the CephFS file system, so I/O volume to it does not affect the network or any other job on other nodes.
Please note that by default, Slurm will not change your environment variables.
This includes the environment variable `TMPDIR`.

Expand Down
16 changes: 8 additions & 8 deletions bih-cluster/docs/help/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,11 +189,11 @@ JobId=863089 JobName=pipeline_job.sh
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/fast/work/projects/medgen_genomes/2019-06-05_genomes_reboot/GRCh37/wgs_cnv_export/pipeline_job.sh
WorkDir=/fast/work/projects/medgen_genomes/2019-06-05_genomes_reboot/GRCh37/wgs_cnv_export
StdErr=/fast/work/projects/medgen_genomes/2019-06-05_genomes_reboot/GRCh37/wgs_cnv_export/slurm-863089.out
Command=/data/cephfs-1/work/projects/medgen_genomes/2019-06-05_genomes_reboot/GRCh37/wgs_cnv_export/pipeline_job.sh
WorkDir=/data/cephfs-1/work/projects/medgen_genomes/2019-06-05_genomes_reboot/GRCh37/wgs_cnv_export
StdErr=/data/cephfs-1/work/projects/medgen_genomes/2019-06-05_genomes_reboot/GRCh37/wgs_cnv_export/slurm-863089.out
StdIn=/dev/null
StdOut=/fast/work/projects/medgen_genomes/2019-06-05_genomes_reboot/GRCh37/wgs_cnv_export/slurm-863089.out
StdOut=/data/cephfs-1/work/projects/medgen_genomes/2019-06-05_genomes_reboot/GRCh37/wgs_cnv_export/slurm-863089.out
Power=
MailUser=(null) MailType=NONE
```
Expand Down Expand Up @@ -290,11 +290,11 @@ JobId=4225062 JobName=C2371_2
MinCPUsNode=1 MinMemoryNode=150G MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=YES Contiguous=0 Licenses=(null) Network=(null)
Command=/fast/work/users/user_c/SCZ_replic/JR_sims/GS_wrapy/wrap_y0_VP_2371_GS_chunk2_C02.sh
WorkDir=/fast/work/users/user_c/SCZ_replic/JR_sims
StdErr=/fast/work/users/user_c/SCZ_replic/JR_sims/E2371_2.txt
Command=/data/cephfs-1/home/users/user_c/work/SCZ_replic/JR_sims/GS_wrapy/wrap_y0_VP_2371_GS_chunk2_C02.sh
WorkDir=/data/cephfs-1/home/users/user_c/work/SCZ_replic/JR_sims
StdErr=/data/cephfs-1/home/users/user_c/work/SCZ_replic/JR_sims/E2371_2.txt
StdIn=/dev/null
StdOut=/fast/work/users/user_c/SCZ_replic/JR_sims/slurm-4225062.out
StdOut=/data/cephfs-1/home/users/user_c/work/SCZ_replic/JR_sims/slurm-4225062.out
Power=
```

Expand Down
10 changes: 5 additions & 5 deletions bih-cluster/docs/how-to/software/cell-ranger.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ requires registration before download from [here](https://support.10xgenomics.co
to unpack Cell Ranger, its dependencies and the `cellranger` script:

```
cd /fast/users/$USER/work
cd /data/cephfs-1/home/users/$USER/work
mv /path/to/cellranger-3.0.2.tar.gz .
tar -xzvf cellranger-3.0.2.tar.gz
```
Expand All @@ -22,7 +22,7 @@ will be provided in `/data/cephfs-1/work/projects/cubit/current/static_data/app_

# cluster support SLURM

add a file `slurm.template` to `/fast/users/$USER/work/cellranger-3.0.2/martian-cs/v3.2.0/jobmanagers/sge.template` with the following contents:
add a file `slurm.template` to `/data/cephfs-1/home/users/$USER/work/cellranger-3.0.2/martian-cs/v3.2.0/jobmanagers/sge.template` with the following contents:

```
#!/usr/bin/env bash
Expand Down Expand Up @@ -61,7 +61,7 @@ add a file `slurm.template` to `/fast/users/$USER/work/cellranger-3.0.2/martian-
__MRO_CMD__
```

**note**: on newer cellranger version, `slurm.template` needs to go to `/fast/users/$USER/work/cellranger-XX/external/martian/jobmanagers/`
**note**: on newer cellranger version, `slurm.template` needs to go to `/data/cephfs-1/home/users/$USER/work/cellranger-XX/external/martian/jobmanagers/`

# demultiplexing

Expand All @@ -74,7 +74,7 @@ create a script `run_cellranger.sh` with these contents (consult the [documentat
```
#!/bin/bash
/fast/users/$USER/work/cellranger-3.0.2/cellranger count \
/data/cephfs-1/home/users/$USER/work/cellranger-3.0.2/cellranger count \
--id=sample_id \
--transcriptome=/data/cephfs-1/work/projects/cubit/current/static_data/app_support/cellranger/refdata-cellranger-${species}-3.0.0\
--fastqs=/path/to/fastqs \
Expand All @@ -93,7 +93,7 @@ sbatch --ntasks=1 --mem-per-cpu=4G --time=8:00:00 -p medium -o cellranger.log ru

# cluster support SGE (outdated)

add a file `sge.template` to `/fast/users/$USER/work/cellranger-3.0.2/martian-cs/v3.2.0/jobmanagers/sge.template` with the following contents:
add a file `sge.template` to `/data/cephfs-1/home/users/$USER/work/cellranger-3.0.2/martian-cs/v3.2.0/jobmanagers/sge.template` with the following contents:

```
# =============================================================================
Expand Down
6 changes: 3 additions & 3 deletions bih-cluster/docs/how-to/software/scientific-software.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ proc ModulesHelp { } {
module-whatis {Gromacs molecular simulation toolkit (non-MPI)}
set root /fast/users/YOURUSER/work/software/gromacs-mpi/2018.3
set root /data/cephfs-1/home/users/YOURUSER/work/software/gromacs-mpi/2018.3
prereq gcc/7.2.0-0
Expand Down Expand Up @@ -183,7 +183,7 @@ proc ModulesHelp { } {
module-whatis {Gromacs molecular simulation toolkit (MPI)}
set root /fast/users/YOURUSER/work/software/gromacs-mpi/2018.3
set root /data/cephfs-1/home/users/YOURUSER/work/software/gromacs-mpi/2018.3
prereq openmpi/4.0.3-0
prereq gcc/7.2.0-0
Expand All @@ -210,7 +210,7 @@ You can verify the result:
```bash
med0127:~$ module avail

------------------ /fast/users/YOURUSER/local/modules ------------------
------------------ /data/cephfs-1/home/users/YOURUSER/local/modules ------------------
gromacs/2018.3 gromacs-mpi/2018.3

-------------------- /usr/share/Modules/modulefiles --------------------
Expand Down
14 changes: 7 additions & 7 deletions bih-cluster/docs/hpc-tutorial/episode-1.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,20 +28,20 @@ You can find the data here:
## Creating a Project Directory

First, you should create a folder where the output of this tutorial will go.
It would be good to have it in your `work` directory in `/fast/users/$USER`, because it is faster and there is more space available.
It would be good to have it in your `work` directory in `/data/cephfs-1/home/users/$USER`, because it is faster and there is more space available.

```terminal
(first-steps) $ mkdir -p /fast/users/$USER/work/tutorial/episode1
(first-steps) $ pushd /fast/users/$USER/work/tutorial/episode1
(first-steps) $ mkdir -p /data/cephfs-1/home/users/$USER/work/tutorial/episode1
(first-steps) $ pushd /data/cephfs-1/home/users/$USER/work/tutorial/episode1
```

!!! important "Quotas / File System limits"

- Note well that you have a quota of 1 GB in your home directory at `/fast/users/$USER`.
- Note well that you have a quota of 1 GB in your home directory at `/data/cephfs-1/home/users/$USER`.
The reason for this is that nightly snapshots and backups are created for this directory which are precious resources.
- This limit does not apply to your work directory at `/fast/users/$USER/work`.
- This limit does not apply to your work directory at `/data/cephfs-1/home/users/$USER/work`.
The limits are much higher here but no snapshots or backups are available.
- There is no limit on your scratch directory at `/fast/users/$USER/scratch`.
- There is no limit on your scratch directory at `/data/cephfs-1/home/users/$USER/scratch`.
However, **files placed here are automatically removed after 2 weeks.**
This is only appropriate for files during download or temporary files.

Expand All @@ -51,7 +51,7 @@ In general it is advisable to have a proper temporary directory available.
You can create one in your `~/scratch` folder and make it available to the system.

```terminal
(first-steps) $ export TMPDIR=/fast/users/$USER/scratch/tmp
(first-steps) $ export TMPDIR=/data/cephfs-1/home/users/$USER/scratch/tmp
(first-steps) $ mkdir -p $TMPDIR
```

Expand Down
8 changes: 4 additions & 4 deletions bih-cluster/docs/hpc-tutorial/episode-2.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ The content of the file:
# Formats are MM:SS, HH:MM:SS, Days-HH, Days-HH:MM, Days-HH:MM:SS
#SBATCH --time=30:00

export TMPDIR=/fast/users/${USER}/scratch/tmp
export TMPDIR=/data/cephfs-1/home/users/${USER}/scratch/tmp
mkdir -p ${TMPDIR}
```

Expand All @@ -72,13 +72,13 @@ Slurm will create a log file with a file name composed of the job name (`%x`) an
To start now with our tutorial, create a new tutorial directory with a log directory, e.g.,

```terminal
(first-steps) $ mkdir -p /fast/users/$USER/work/tutorial/episode2/logs
(first-steps) $ mkdir -p /data/cephfs-1/home/users/$USER/work/tutorial/episode2/logs
```

and copy the wrapper script to this directory:

```terminal
(first-steps) $ pushd /fast/users/$USER/work/tutorial/episode2
(first-steps) $ pushd /data/cephfs-1/home/users/$USER/work/tutorial/episode2
(first-steps) $ cp /data/cephfs-1/work/projects/cubit/tutorial/skeletons/submit_job.sh .
(first-steps) $ chmod u+w submit_job.sh
```
Expand Down Expand Up @@ -116,7 +116,7 @@ Your file should look something like this:
# Formats are MM:SS, HH:MM:SS, Days-HH, Days-HH:MM, Days-HH:MM:SS
#SBATCH --time=30:00

export TMPDIR=/fast/users/${USER}/scratch/tmp
export TMPDIR=/data/cephfs-1/home/users/${USER}/scratch/tmp
mkdir -p ${TMPDIR}

BWAREF=/data/cephfs-1/work/projects/cubit/current/static_data/precomputed/BWA/0.7.17/GRCh37/g1k_phase1/human_g1k_v37.fasta
Expand Down
8 changes: 4 additions & 4 deletions bih-cluster/docs/hpc-tutorial/episode-3.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ Every Snakemake run requires a `Snakefile` file. Create a new folder inside your
copy the skeleton:

```terminal
(first-steps) $ mkdir -p /fast/users/${USER}/work/tutorial/episode3
(first-steps) $ pushd /fast/users/${USER}/work/tutorial/episode3
(first-steps) $ mkdir -p /data/cephfs-1/home/users/${USER}/work/tutorial/episode3
(first-steps) $ pushd /data/cephfs-1/home/users/${USER}/work/tutorial/episode3
(first-steps) $ cp /data/cephfs-1/work/projects/cubit/tutorial/skeletons/Snakefile .
(first-steps) $ chmod u+w Snakefile
```
Expand All @@ -53,7 +53,7 @@ rule alignment:
bai='alignment/test.bam.bai',
shell:
r"""
export TMPDIR=/fast/users/${{USER}}/scratch/tmp
export TMPDIR=/data/cephfs-1/home/users/${{USER}}/scratch/tmp
mkdir -p ${{TMPDIR}}
BWAREF=/data/cephfs-1/work/projects/cubit/current/static_data/precomputed/BWA/0.7.17/GRCh37/g1k_phase1/human_g1k_v37.fasta
Expand Down Expand Up @@ -154,7 +154,7 @@ rule alignment:
bai='alignment/{id}.bam.bai',
shell:
r"""
export TMPDIR=/fast/users/${{USER}}/scratch/tmp
export TMPDIR=/data/cephfs-1/home/users/${{USER}}/scratch/tmp
mkdir -p ${{TMPDIR}}
BWAREF=/data/cephfs-1/work/projects/cubit/current/static_data/precomputed/BWA/0.7.17/GRCh37/g1k_phase1/human_g1k_v37.fasta
Expand Down
8 changes: 4 additions & 4 deletions bih-cluster/docs/hpc-tutorial/episode-4.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ to call Snakemake. We run the script and the magic will start.
First, create a new folder for this episode:

```terminal
(first-steps) $ mkdir -p /fast/users/${USER}/work/tutorial/episode4/logs
(first-steps) $ pushd /fast/users/${USER}/work/tutorial/episode4
(first-steps) $ mkdir -p /data/cephfs-1/home/users/${USER}/work/tutorial/episode4/logs
(first-steps) $ pushd /data/cephfs-1/home/users/${USER}/work/tutorial/episode4
```

And copy the wrapper script to this folder as well as the Snakefile (you can also reuse the one with the adjustments from the previous [episode](episode-3.md)):
Expand Down Expand Up @@ -60,7 +60,7 @@ The `Snakefile` is already known to you but let me explain the wrapper script `s
#SBATCH --time=30:00


export TMPDIR=/fast/users/${USER}/scratch/tmp
export TMPDIR=/data/cephfs-1/home/users/${USER}/scratch/tmp
export LOGDIR=logs/${SLURM_JOB_NAME}-${SLURM_JOB_ID}
mkdir -p $LOGDIR

Expand Down Expand Up @@ -120,7 +120,7 @@ rule alignment:
time='12:00:00',
shell:
r"""
export TMPDIR=/fast/users/${{USER}}/scratch/tmp
export TMPDIR=/data/cephfs-1/home/users/${{USER}}/scratch/tmp
mkdir -p ${{TMPDIR}}
BWAREF=/data/cephfs-1/work/projects/cubit/current/static_data/precomputed/BWA/0.7.17/GRCh37/g1k_phase1/human_g1k_v37.fasta
Expand Down
5 changes: 5 additions & 0 deletions bih-cluster/docs/ondemand/quotas.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# OnDemand: Quota Inspection

!!! info "Outdated"

This document is only valid for the old, third-generation file system and will be removed soon.
Quotas of our new CephFS storage are communicated via the [HPC Access](https://hpc-access.cubi.bihealth.org/) web portal.

Accessing the quota report by selecting `Files` and then `Quotas` in the top menu
will provide you with a detailed list of all quotas for directories that you are assigned to.

Expand Down
11 changes: 6 additions & 5 deletions bih-cluster/docs/overview/for-the-impatient.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Overview
## HPC 4 Research
**HPC 4 Research** is located in the BIH data center in Buch and connected via the BIH research network.
## BIH HPC 4 Research
**BIH HPC 4 Research** is located in the BIH data center in Buch and connected via the BIH research network.
Connections can be made from Charite, MDC, and BIH networks.
The cluster is open for users with either Charite or MDC accounts after [getting access through the gatekeeper proces](../admin/getting-access.md).
The system has been designed to be suitable for the processing of human genetics data from research contexts (and of course data without data privacy concerns such as public and mouse data).
Expand All @@ -13,9 +13,9 @@ The cluster consists of the following major components:
- 2 nodes for file transfers `hpc-transfer-1` and `hpc-transfer-2`,
- a scheduling system using Slurm,
- 228 general purpose compute nodes `hpc-cpu-{1..228}`
- a few high memory nodes `hpc-mem-{1..4}`,
- a few high memory nodes `hpc-mem-{1..5}`,
- 7 nodes with 4 Tesla V100 GPUs each (!) `hpc-gpu-{1..7}` and 1 node with 10x A40 GPUs (!) `hpc-gpu-8`,
- a high-performance, parallel GPFS file system with 2.1 PB, by DDN mounted at `/fast`,
- a legacy parallel GPFS file system with 2.1 PB, by DDN mounted at `/fast`,
- a next generation high-performance storage system based on Ceph/CephFS
- a tier 2 (slower) storage system based on Ceph/CephFS

Expand Down Expand Up @@ -67,7 +67,8 @@ This addresses a lot of suboptimal (yet not dangerous, of course) points we obse
Despite this, it is your responsibility to keep important files in the snapshot/backup protected home, ideally even in copy (e.g., a git repository) elsewhere.
Also, keeping safe copies of primary data files, your published results, and the steps in between reproducible is your responsibility.
- A place to store data indefinitely.
The fast GPFS storage is expensive and "sparse" in a way.
The fast CephFS Tier 1 storage is expensive and "rare".
CephFS Tier 2 is bigger in volume, but still not unlimited.
The general workflow is: (1) copy data to cluster, (2) process it, creating intermediate and final results, (3) copy data elsewhere and remove it from the cluster
- Generally suitable for primary software development.
The I/O system might get overloaded and saving scripts might take some time.
Expand Down
Loading

0 comments on commit 2c43287

Please sign in to comment.