Either miniwdl
or Cromwell
can be used to run workflows on the HPC.
miniwdl
>= 1.9.0miniwdl-slurm
An example miniwdl.cfg file is provided here. This should be placed at ~/.config/miniwdl.cfg
and edited to match your slurm configuration. This allows running workflows using a basic SLURM setup.
Cromwell supports a number of different HPC backends; see Cromwell's documentation for more information on configuring each of the backends.
Fill out any information missing in the inputs file. Once you have downloaded the reference data bundle, ensure that you have replaced the <local_path_prefix>
in the input template file with the local path to the reference datasets on your HPC.
See the inputs section of the main README for more information on the structure of the inputs.json file.
miniwdl run workflows/main.wdl -i <inputs_json_file>
cromwell run workflows/main.wdl -i <inputs_json_file>
The pipeline requires two publicly licensed reference databases, UniRef100 and the Genome Taxonomy Database Toolkit (GTDB-Tk). For convenience we've provided the links below. The GTDB-tk DB can stay bundled as a *.tar.gz
file, but UniRef100 needs to be unzipped.
# make a directory for and download the reference databases
mkdir dataset; cd dataset
wget https://data.gtdb.ecogenomic.org/releases/release207/207.0/auxillary_files/gtdbtk_r207_v2_data.tar.gz
wget https://zenodo.org/records/4626519/files/uniref100.KO.v1.dmnd.gz
gunzip uniref100.KO.v1.dmnd.gz