Skip to content

Latest commit

 

History

History
107 lines (82 loc) · 3.41 KB

README.md

File metadata and controls

107 lines (82 loc) · 3.41 KB

prodigal-pyrodigal-comparison

A Python script for the comparison of Prodigal and Pyrodigal predictions.

Contents

Background

Prodigal is a gene prediction tool for prokaryotic genomes. It is widely used in bioinformatics, however the latest version is from 2016 and is missing unreleased bug fixes. Pyrodigal provides Cython bindings and a Python interface to Prodigal, including the unpublished bug fixes and other optimizations.

Description

The Python script compare.py compares the predictions of Prodigal and Pyrodigal to find mismatches or missing predictions. The differences in the predictions are stored in a TSV file.

Installation

In order to compare Prodigal and Pyrodigal correctly, a current version of Prodigal is required, which must be compiled by the user, as there is no new version available.

git clone https://github.com/hyattpd/Prodigal.git
cd Prodigal
make install

If you want to install Prodigal in a custom directory use:

make install INSTALLDIR=/where/i/want/prodigal/

compare.py requires the additional Python packages:

The packages can be easily installed with Pip or Conda.

Pip

python3 -m pip install --user biopython xopen pyrodigal

Conda

conda install -c conda-forge -c bioconda biopython xopen pyrodigal

Usage

usage: compare.py [-h] [--genome GENOME [GENOME ...]] [--prodigal PRODIGAL] [--closed] [--output OUTPUT]

Compare CDS predictions of Prodigal to the predictions of Pyrodigal and save the differences in a TSV file.

optional arguments:
  -h, --help            show this help message and exit
  --genome GENOME [GENOME ...], -g GENOME [GENOME ...]
                        Input genomes (/some/path/*.fasta)
  --prodigal PRODIGAL, -p PRODIGAL
                        Path to a newly compiled Prodigal binary.
  --closed, -c          Closed ends. Do not allow genes to run off edges.
  --output OUTPUT, -o OUTPUT
                        Output path (default="./comparison")

Examples

  • Input: single compressed genome
./compare.py --genome /path/to/genome.fasta.gz --prodigal /path/to/binary/prodigal
  • Input: multiple genomes
./compare.py --genome /path/to/*.fasta --prodigal /path/to/binary/prodigal
  • To download 50 test genomes you can use:
wget -i test_genomes.txt -P /some/path/

Input and Output

Input

compare.py can use a single or several (compressed) genomes in the FASTA format as input.

Output

  • A short summary of each comparison is printed to stdout: Hits genome=GCF_000006765: prodigal=5681, pyrodigal=5681, equal=True
  • The comparison output directory contains a mismatches.tsv TSV file with differing predictions.
  • The comparison/tmp directory contains the Prodigal train file and GFF output for each used genome.

Results

Since Pyrodigal v2.0.0-rc.3 there are no more mismatches compared to Prodigal (commit 31b300a99a39964893057128ea10338e9a26bd6c, branch GoogleImport).

License

GNU General Public License v3.0