-
Notifications
You must be signed in to change notification settings - Fork 2
KSHELL on Betzy and Fram
NOTE: As of 2023-03-02 there are some compilation issues on Betzy, meaning that the choice of modules and compile parameters in this guide will not work as of this date.
Start by loading the necessary modules which contain the correct additional software to run KSHELL
. The intel/2020b
module contains the correct ifort
version as well as blas
and lapack
(double check this), and the module Python/3.8.6-GCCcore-10.2.0
gives us the correct Python
version. Load the modules in this order:
module load intel/2020b
module load Python/3.8.6-GCCcore-10.2.0
Now, clone this repository to the desired install location. Navigate to the <install_location>/src/
directory and edit the Makefile
. We will use the MPI ifort wrapper mpiifort
to compile KSHELL
, so make sure that FC = mpiifort
is un-commented and that all other FC =
lines are commented. Comment with #
. Remember to save the file. Still in the <install_location>/src/
directory, run the command make
, and KSHELL
will be compiled.
Click here to see the terminal output from the compilation process
$ make
mpiifort -O3 -qopenmp -no-ipo -DMPI -c constant.f90
mpiifort -O3 -qopenmp -no-ipo -DMPI -c model_space.f90
mpiifort -O3 -qopenmp -no-ipo -DMPI -c lib_matrix.F90
mpiifort -O3 -qopenmp -no-ipo -DMPI -c class_stopwatch.F90
mpiifort -O3 -qopenmp -no-ipo -DMPI -c partition.F90
mpiifort -O3 -qopenmp -no-ipo -DMPI -c wavefunction.F90
mpiifort -O3 -qopenmp -no-ipo -DMPI -c rotation_group.f90
mpiifort -O3 -qopenmp -no-ipo -DMPI -c harmonic_oscillator.f90
mpiifort -O3 -qopenmp -no-ipo -DMPI -c operator_jscheme.f90
mpiifort -O3 -qopenmp -no-ipo -DMPI -c operator_mscheme.f90
mpiifort -O3 -qopenmp -no-ipo -DMPI -c bridge_partitions.F90
mpiifort -O3 -qopenmp -no-ipo -DMPI -c sp_matrix_element.f90
mpiifort -O3 -qopenmp -no-ipo -DMPI -c interaction.f90
mpiifort -O3 -qopenmp -no-ipo -DMPI -c bp_io.F90
mpiifort -O3 -qopenmp -no-ipo -DMPI -c lanczos.f90
mpiifort -O3 -qopenmp -no-ipo -DMPI -c bp_expc_val.F90
mpiifort -O3 -qopenmp -no-ipo -DMPI -c bp_block.F90
mpiifort -O3 -qopenmp -no-ipo -DMPI -c block_lanczos.F90
mpiifort -O3 -qopenmp -no-ipo -DMPI -c kshell.F90
mpiifort -O3 -qopenmp -no-ipo -DMPI -o kshell.exe kshell.o model_space.o interaction.o harmonic_oscillator.o constant.o rotation_group.o sp_matrix_element.o operator_jscheme.o operator_mscheme.o lib_matrix.o lanczos.o partition.o wavefunction.o bridge_partitions.o bp_io.o bp_expc_val.o class_stopwatch.o bp_block.o block_lanczos.o -mkl
mpiifort -O3 -qopenmp -no-ipo -DMPI -c transit.F90
mpiifort -O3 -qopenmp -no-ipo -DMPI -o transit.exe transit.o model_space.o interaction.o harmonic_oscillator.o constant.o rotation_group.o sp_matrix_element.o operator_jscheme.o operator_mscheme.o lib_matrix.o lanczos.o partition.o wavefunction.o bridge_partitions.o bp_io.o bp_expc_val.o class_stopwatch.o bp_block.o block_lanczos.o -mkl
mpiifort -O3 -qopenmp -no-ipo -DMPI -o count_dim.exe count_dim.f90 model_space.o interaction.o harmonic_oscillator.o constant.o rotation_group.o sp_matrix_element.o operator_jscheme.o operator_mscheme.o lib_matrix.o lanczos.o partition.o wavefunction.o bridge_partitions.o bp_io.o bp_expc_val.o class_stopwatch.o bp_block.o block_lanczos.o -mkl
cp kshell.exe transit.exe count_dim.exe ../bin/
KSHELL
is now compiled! To remove the compiled files and revert back to the starting point, run make clean
in the src/
directory.
Create a directory in which to store the output from KSHELL
. In this directory, run python <install_location>/bin/kshell_ui.py
and follow the instructions on screen. The shell script grenerated by kshell_ui.py
must begin with certain commands wich will be read by the job queue system, slurm
. The needed commands will automatically be added to the executable shell script if the keyword fram
or betzy
is entered in the first prompt of kshell_ui.py
. See a section further down in this document for general instructions on how to use kshell_ui.py
. When the executable shell script has been created, put it in the queue by
sbatch executable.sh
To see the entire queue, or to filter the queue by username, use
squeue
squeue -u <username>
The terminal output from the compute nodes is written to a file, slurm-*.out
, which is placed in the KSHELL
output directory you created. Use
tail -f slurm-*.out
to get a live update on the last 10 lines of terminal output from the compute nodes. If you put in your e-mail address in the executable shell script, you will get an e-mail when the program starts and when it ends (per 2021-12-10, the mailing system is not operative). Following is an example of the commands which must be in the first line of the executable shell script which is generated by kshell_ui.py
. For running 10 nodes with 32 cores each with an estimated calculation time of 10 minutes on Fram:
Click here to see the Fram commands
#!/bin/bash
#SBATCH --job-name=Ar28_usda
#SBATCH --account=<enter account name here (example NN9464K)>
## Syntax is d-hh:mm:ss
#SBATCH --time=0-00:10:00
#SBATCH --nodes=10
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=32
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<your e-mail here>
module --quiet purge
module load intel/2020b
module load Python/3.8.6-GCCcore-10.2.0
set -o errexit
set -o nounset
For running a job on Betzy with 64 nodes with an estimated time of 1 day, using all 256 (virtual (SMT)) cores per node effectively, the slurm commands look like:
Click here to see the Betzy commands
```bash #!/bin/bash #SBATCH --job-name=V50_gxpf1a #SBATCH --account= ## Syntax is d-hh:mm:ss #SBATCH --time=0-01:00:00 #SBATCH --nodes=64 #SBATCH --ntasks-per-node=8 #SBATCH --cpus-per-task=16 #SBATCH --mail-type=ALL #SBATCH --mail-user= module --quiet purge module load intel/2020b module load Python/3.8.6-GCCcore-10.2.0 set -o errexit set -o nounset export OMP_NUM_THREADS=32 ```
The command export OMP_NUM_THREADS=32
forces 256 virtual cores to be used instead of 128 physical cores per node. SMT is beneficial to use with KSHELL, so use this option for better performance! --ntasks-per-node=8
specifies 8 MPI ranks per node, and --cpus-per-task=16
specifies 16 OMP threads per MPI rank (and is extended to 32 by export OMP_NUM_THREADS=32
which in total per node utilizes 8*32 = 256 virtual cores). The Betzy documentation states that this mix of MPI ranks and OMP threads yields better performance than a pure MPI or pure OMP setup.
Note that the modules must be explicitly loaded in the script file since the modules you load to the login node does not get loaded on the compute nodes. The login node is the computer you control when you SSH to <username>@fram.sigma2.no
and the compute nodes are other computers which you control via the slurm
queue system. If you need any other modules loaded, you must add these to the executable shell script. Now, just wait for the program to run its course!