Skip to content

Latest commit



139 lines (99 loc) · 3.53 KB

File metadata and controls

139 lines (99 loc) · 3.53 KB

This tutorial is forked from vappiah/bacterial-genomics-tutorial, share with our lab co-worker as educational use, Thanks alot !

Download and install anaconda(version 3 recommended)

Add channels

conda config --add channels conda-forge\
conda config --add channels bioconda\
conda config --add channels daler\
conda config --add channels defaults\

Download the Analysis pipeline

git clone

Change directory to the dowloaded folder

cd bacterial-genomics-tutorial

Create conda environment.Packages are listed in the environment.yaml file.

conda env create -f environment.yaml

Download the polishing tool pilon

mkdir apps\
wget -O apps/pilon.jar

Activate the analysis environment

source activate bacterial-genomics-tutorial

Add permission to all scripts

chmod +x *.{py,sh,pl}

Install python packages using pip

pip install -r pip-requirements.txt


Step 1: Download data.


Step 2: Perform QC on the raw reads


Step 3: Trim reads using sickle


Step 4: Perform QC on the trimmed reads


Step 5: Perform de novo assembly using spades

### Step 7: Perform QC for both raw assembly and polished assembly


### Step 8: Generate draft genome by reordering contigs against a reference genome using ragtag\


Step 10: Check for antimicrobial resistance genes using abricate\


Step 11: Annotate the draft genome using prokka


Step 12: Get some statistics on the annotation.

Features such as genes, CDS will be counted and displayed. The scripts requires you to specify the folder where annotations were saved . i.e. P7741 Python should be used to run that script

python P7741_annotation P7741

Step 13: Generate dendogram using dREP\


Step 14: Perform Pangenome Analysis using Roary.

Input files are gff (version 3 ) format. It is recommended to use prokka generated gff. So we generate the gffs for the files in the genome folder by reannotating with prokka. We use the get_genome_gffs script

Then perform pangenome analysis\


Step 15: Get gene summary for three of the organism. the default is P7741 Agy99 and Liflandii. Feel free to change it. A venn diagram will be generated(gene_count_summary.png)

python P7741 Agy99 Liflandii pangenome/gene_presence_absence.csv

If you are working on a cluster you will want to combine the analysis results into a zip file for download and view locally. ./

Step 16: Compare your draft genome with the other organisms in the genomes folder by generating circular structures for them . Use the tutorial here to guide you

Step 17: Result interpretation

The result interpretation are available on my youtube video tutorial :

Now that you have been able to perform a bacterial comparative genome analysis. Its time to apply your skills on a real world data. Good luck and see you next time


Vincent Appiah, 2020. Bacterial Genomics Tutorial