This repository works as a template to start a Snakemake project. Along with a basic file structure and example codes, it contains a snakemake profile to submit jobs on MARCC.
##########################################################
### load required modules ###
##########################################################
# load anaconda module version >= 4.6.0
module load anaconda
# load python
module load python/3.7.4-anaconda
############################################################
### create and activate a custom conda environment ###
### to install/update packages without admin privilege ###
### following instructions from ###
### https://www.marcc.jhu.edu/python-environments/. ###
### see section "Case B. Custom conda environments" ###
############################################################
# go to a directory to create conda
# NOTE: MARCC recommends creating conda environments inside ~/work/code/
cd /home-1/[email protected]/python_env/conda # remember to change the directory
# create reqs.yaml file with basic packages
printf "dependencies:\n\
- python=3.7\n\
- matplotlib\n\
- scipy\n\
- numpy\n\
- nb_conda_kernels\n\
- au-eoed::gnu-parallel\n\
- h5py\n\
- pip\n\
- pip:\n\
- sphinx" > reqs.yaml
# install conda environment
conda env update --file reqs.yaml -p ./my_conda_env
# activate conda environment
conda activate /home-net/home-1/[email protected]/python_env/conda/my_conda_env
##############################################################################
### install snakemake using the new enviroment following instructions from ###
### https://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html ###
##############################################################################
# install mamba
conda install -c conda-forge mamba
# install snakemake using mamba
mamba create -c conda-forge -c bioconda -n snakemake snakemake
# exit from current environment
conda deactivate
# activate snakemake
conda activate snakemake
# now you may run snakemake commands. test if help works.
snakemake --help
module load anaconda # >= v4.6.0, use the version used during installation
module load python/3.7.4-anaconda # >= v3.7, use the version used during installation
conda env list # you'll see all available environments
conda activate YOUR/SNAKEMAKE/ENV # activate snakemake env
# conda deactivate # to exit/deactivate snakemake
- In your github account, create a new repository by using this repository as a template. Related tutorial. You will keep your codes in this repository. No need to write any extra code for demo.
- Clone the new repository on marcc. Related tutorial.
git clone https://github.com/USER-NAME/REPOSITORY-NAME
- Make sure you are on a MARCC login node and Snakemake is activated. See the section above.
- Go to the repository directory on MARCC.
cd PATH/TO/YOUR/REPO
- Run the following using maximum 2 cores.
snakemake --profile profiles/marcc -j2
Edit the global and job-specific configuration files to configure your jobs.
Global configuration file: profiles/marcc/config.yaml.
restart-times: 0 # if failed, the job will not be restarted
jobscript: "slurm-jobscript.sh"
cluster: "slurm-submit.py"
cluster-status: "slurm-status.py"
max-jobs-per-second: 1 # max job submission rate 1 job/sec
max-status-checks-per-second: 10
local-cores: 1
latency-wait: 60 # wait time (sec) if output file not found
Job-specific configuration file: profiles/marcc/cluster_config.yaml.
# default configuration for every rule (unless overridden)
__default__:
partition: express
nodes: 1
ntasks: 1
time: 10 # min
output: "output/marcc_logs/{rule}/slurm-%j.out"
error: "output/marcc_logs/{rule}/slurm-%j.err"
job-name: "{rule}"
# configuration for "project_counts" rule -- overrides the default
project_counts:
time: 15
ntasks: 2
- Add your scripts in the repository -- preferably in
src
folder (please create the folder). - Edit configuration variables in
config/config.yaml
file. - Add your Snakemake rules in
rules
folder. - Edit
Snakefile
to aggregate all rules and to define final outcomes of the project. - Edit
profiles/marcc/cluster_config.yaml
to allocate resources for each job.
NOTE: You may like to delete example rules in rules
folder, example data in data/example
folder.
- Snakemake tutorial - Highly recommended!
- Useful arguments to run Snakemake are available here.
- Open-source snakemake profiles to run jobs on different environments are available here.
- Snakemake-intro slides.
- Snakemake in action: Ashis' project.
- Snakemake video tutorial:Youtube link
- How to deal with variable output (an unkown number of files) via checkpoints: Stack Overflow