The directory contains the workflow metashot/prok-classify, a software for assigning objective taxonomic classifications to bacterial and archaeal genomes. The workflow is updated to use GTDB-Tk 2.4.0 (latest version as of 2024) and the last release of the Genome Database Taxonomy GTDB (Release 220).
- Input: prokaryotic genomes in FASTA format;
- Taxonomic classification using GTDB-TK version 2.4.0 (requires GTDB reference R220);
- Filter genomes by domain (Bacteria and Achaea).
-
Install Docker (or Singulariry) and Nextflow (see Dependences);
-
Download and extract/unzip the GTDB-TK reference data (see https://ecogenomics.github.io/GTDBTk/installing/index.html#gtdb-tk-reference-data):
wget https://data.gtdb.ecogenomic.org/releases/release220/220.0/auxillary_files/gtdbtk_package/full_package/gtdbtk_r220_data.tar.gz tar -xvzf gtdbtk_r220_data.tar.gz
-
Start running the analysis:
nextflow run compmetagen/prok-classify \ --genomes "data/*.fa" \ --gtdbtk_db ./release220 \ --outdir results
See the file nextflow.config
for the complete list of
parameters.
The files and directories listed below will be created in the results
directory
after the pipeline has finished.
bacteria_summary.tsv
: the GTDB-Tk summary for bacterial genomes (documentation);archaea_summary.tsv
: the GTDB-Tk summary for archaeal genomes (documentation);bacteria_genomes
: genomes classified as bacteria by GTDB-Tk;archaea_genomes
: genomes classified as archaea by GTDB-Tk.
gtdbtk
: main GTDB-Tk output files (documentation).
Please refer to System requirements for the complete list of system requirements options.GTDB-Tk and the Genome Database Taxonomy GTDB.