From 1d3ba7845c1c843bb4a80546a8731e57c53a7a23 Mon Sep 17 00:00:00 2001 From: Kirill Bessonov Date: Thu, 12 Sep 2024 16:16:14 -0400 Subject: [PATCH] Updated README.md to follow DAAD format and also updated url to Species ID MASH database now hosted in Zenodo.org --- README.md | 176 +++++++++++++++++++++++++--------- ectyper/commandLineOptions.py | 2 +- ectyper/definitions.py | 2 +- ectyper/ectyper.py | 20 ++-- ectyper/genomeFunctions.py | 17 ++-- 5 files changed, 149 insertions(+), 68 deletions(-) diff --git a/README.md b/README.md index 6bd9b84..7b62196 100644 --- a/README.md +++ b/README.md @@ -9,8 +9,77 @@ `ECTyper` is a standalone versatile serotyping module for _Escherichia coli_. It supports both _fasta_ (assembled) and _fastq_ (raw reads) file formats. The tool provides convenient species identification coupled to quality control module giving a complete, transparent and reference laboratories suitable report on E.coli serotyping. +# Introduction +*Escherichia coli* is a priority foodborne pathogen of public health concern and popular model organism. Phenotypic characterization such as serotyping, toxin typing and pathotyping provide critical information for surveillance and outbreak detection activities and research including source attribution, outbreak cluster assignment, pathogenicy potential, risk assessement and others. -# Dependencies: +`ECTyper` uses whole-genome sequencing (WGS) for E.coli characterizion including species identification, *in silico* serotyping covering O and H antigens, Shiga toxin typing and DEC pathotyping. It is a versatile, scallable, easy to use tool allowing to obtain key information on E.coli accepting both raw and assembled inputs. + +As WGS becomes standard within public health and research laboratories, it is important to harness the high thourghput and resolution potential of this technology providing accurate and rapid at scale typing of E.coli both in public health, clinical and research contexts. + +## Citation +Bessonov, Kyrylo, Chad Laing, James Robertson, Irene Yong, Kim Ziebell, Victor PJ Gannon, Anil Nichani, Gitanjali Arya, John HE Nash, and Sara Christianson. "ECTyper: in silico Escherichia coli serotype and species prediction from raw and assembled whole-genome sequence data." Microbial genomics 7, no. 12 (2021): 000728. [https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000728](https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000728) + +## Contact +For any questions, issues or comments please make a Github issue or reach out to [Kyrylo Bessonov](kyrylo.bessonov@phac-aspc.gc.ca). + +# Installation +Multiple installation options are available depending on the user context and needs. The most convinient installation is as a `conda` package as it will install all required dependencies. + +### Images +Docker and Singularity images are also available from [https://biocontainers.pro/tools/ectyper](https://biocontainers.pro/tools/ectyper) that could be useful for NextFlow or hassle-free deployment + +### Databases +ECTyper uses multiple databases + - the species identification database is available from [https://zenodo.org/records/10211569](https://zenodo.org/records/10211569) + - the O and H antigen allele sequences are stored in [ectyper_alleles_db.json](ectyper/Data/ectyper_alleles_db.json) + - the toxin and pathotype signature marker sequences are stored in [ectyper_patho_stx_toxin_typing_database.json](ectyper/Data/ectyper_patho_stx_toxin_typing_database.json) + +## Option 1: As a conda package +Optionally if you do not have a conda environment, get and install `miniconda` or `anaconda`: + + ``` + wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh + bash miniconda.sh -b -p $HOME/miniconda + echo ". $HOME/miniconda/etc/profile.d/conda.sh" >> ~/.bashrc + source ~/.bashrc + ``` + +Install the latest `ectyper` conda package from a `bioconda` channel + + ``` + conda install -c bioconda ectyper + ``` + +## Option 2: Install using pip +Install using `pip3` utility including python but missing on [non-python dependencies](#dependencies) + ``` + pip3 install ectyper + ``` +## Option 3: From source code +Second option is to install from the source allowing to excercise maximum control over installation process. + +Install dependencies. On Ubuntu distro run + ``` + apt install samtools bowtie2 mash bcftools ncbi-blast+ seqtk + ``` + +Install python dependencies via `pip`: + ``` + pip3 install pandas biopython + ``` +Clone the repository or checkout a particular release (e.g `v1.0.0`, `v2.0.0` etc.): + ``` + git clone https://github.com/phac-nml/ecoli_serotyping.git + git checkout v1.0.0 #optionally checkout a specific release version + ``` + +Finally, install ectyper +``` +python3 setup.py install # option 1 +pip3 install . # option 2 +``` +## Compatibility +### Dependencies: - python >= 3.5 - bcftools >= 1.8 - blast == 2.7.1 @@ -19,58 +88,26 @@ The tool provides convenient species identification coupled to quality control m - bowtie2 >= 2.3.4.1 - mash >= 2.0 -# Python packages: +### Python packages: - biopython >= 1.70 - pandas >= 0.23.1 - requests >= 2.0 - -# Installation - -## Option 1: As a conda package -1. If you do not have conda environment, get and install `miniconda` or `anaconda`: - - ```wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh - bash miniconda.sh -b -p $HOME/miniconda - echo ". $HOME/miniconda/etc/profile.d/conda.sh" >> ~/.bashrc - source ~/.bashrc``` - -2. Install conda package from `bioconda` channel - ```conda install -c bioconda ectyper``` - -## Option 2: From the source directly -Second option is to install from the source. -1. Install dependencies. On Ubuntu distro run -``` -apt install samtools bowtie2 mash bcftools ncbi-blast+ seqtk -``` -1. Install python dependencies via `pip`: - -``` -pip3 install pandas biopython -``` - -1. Clone the repository or checkout a particular release (e.g v1.0.0, etc.): - -``` -git clone https://github.com/phac-nml/ecoli_serotyping.git -git checkout v1.0.0 #optionally checkout release version -``` - -1. Install ectyper: `python3 setup.py install` - -# Basic Usage +# Getting started +## Basic Usage 1. Put the fasta/fastq files for serotyping analyses in one folder (concatenate paired raw reads files if you would like them to be considered a single entity) 1. `ectyper -i [file path] -o [output_dir]` 1. View the results on the console or in `cat [output folder]/output.csv` -# Example Usage -* `ectyper -i ecoliA.fasta` for a single file -* `ectyper -i ecoliA.fasta -o output_dir` for a single file, results stored in `output_dir` -* `ectyper -i ecoliA.fasta,ecoliB.fastq,ecoliC.fna` for multiple files (comma-delimited) -* `ectyper -i ecoli_folder` for a folder (all files in the folder will be checked by the tool) +## Example Input Scenarios +* `ectyper -i ecoliA.fasta` for a single file (the output folder will be named using `ectyper__