This is better explained in my blog: https://xarxax.xyz/training-ai-models-potsdam-university/
To run with gpu an apptainer container you should
Build the image.
apptainer build img.sif recipe.def
Run the image with --nv flag
apptainer run --nv img.sif
This last step is done in slurm.job
so if you just sbatch slurm.job
while in your cluster you should be fine.
If you want to see how your task is doing, as Uni Potsdam says you can check the Grafana.
Comfy setup aliases for working in your machine but running things on the cluster (you can add this to your .bashrc):
#VARIABLES
export YOUR_CLUSTER_USERNAME="yourusername"
export project="/example_apptainer"#the shortcuts only work if the project is in your home folder
export CLUSTER_LOGIN="[email protected]"
export PATH_IN_CLUSTER="/work/$YOUR_CLUSTER_USERNAME/"
#SCRIPTS
alias ssh_uni="ssh -X $CLUSTER_LOGIN"
alias update_example_apptainer="rsync -av -e ssh --exclude='*.pyc' --exclude='.git' --exclude='*/generated_models/*' $HOME/$project $CLUSTER_LOGIN:$PATH_IN_CLUSTER "
alias reverse_update_example_apptainer="rsync -av -e ssh --exclude='*.pyc' --exclude='.git*' --exclude='*generate_model.py' --exclude='*.sif' --exclude='*.bin' --exclude='*.pt' $CLUSTER_LOGIN:$PATH_IN_CLUSTER/$project $HOME "