Skip to content

An updated version of the original pipeline metashot/prok-classify by davidealbanese, a workflow for assigning objective taxonomic classifications to bacterial and archaeal genomes, UPDATED to use GTDB-Tk 2.4.0 (2024 version) and the last release of the Genome Database Taxonomy GTDB (Release 220).

License

Notifications You must be signed in to change notification settings

compmetagen/prok-classify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

prok-classify

The directory contains the workflow metashot/prok-classify, a software for assigning objective taxonomic classifications to bacterial and archaeal genomes. The workflow is updated to use GTDB-Tk 2.4.0 (latest version as of 2024) and the last release of the Genome Database Taxonomy GTDB (Release 220).

Main features (UPDATED)

  • Input: prokaryotic genomes in FASTA format;
  • Taxonomic classification using GTDB-TK version 2.4.0 (requires GTDB reference R220);
  • Filter genomes by domain (Bacteria and Achaea).

Quick start

  1. Install Docker (or Singulariry) and Nextflow (see Dependences);

  2. Download and extract/unzip the GTDB-TK reference data (see https://ecogenomics.github.io/GTDBTk/installing/index.html#gtdb-tk-reference-data):

    wget https://data.gtdb.ecogenomic.org/releases/release220/220.0/auxillary_files/gtdbtk_package/full_package/gtdbtk_r220_data.tar.gz
    tar -xvzf gtdbtk_r220_data.tar.gz
  3. Start running the analysis:

    nextflow run compmetagen/prok-classify \
      --genomes "data/*.fa" \
      --gtdbtk_db ./release220 \
      --outdir results

Parameters

See the file nextflow.config for the complete list of parameters.

Output

The files and directories listed below will be created in the results directory after the pipeline has finished.

Main outputs

  • bacteria_summary.tsv: the GTDB-Tk summary for bacterial genomes (documentation);
  • archaea_summary.tsv: the GTDB-Tk summary for archaeal genomes (documentation);
  • bacteria_genomes: genomes classified as bacteria by GTDB-Tk;
  • archaea_genomes: genomes classified as archaea by GTDB-Tk.

Secondary outputs

System requirements

Please refer to System requirements for the complete list of system requirements options.GTDB-Tk and the Genome Database Taxonomy GTDB.

About

An updated version of the original pipeline metashot/prok-classify by davidealbanese, a workflow for assigning objective taxonomic classifications to bacterial and archaeal genomes, UPDATED to use GTDB-Tk 2.4.0 (2024 version) and the last release of the Genome Database Taxonomy GTDB (Release 220).

Resources

License

Stars

Watchers

Forks

Packages

No packages published