Skip to content

Latest commit

 

History

History
105 lines (95 loc) · 4.24 KB

README.md

File metadata and controls

105 lines (95 loc) · 4.24 KB

Quick install and start

git config --global http.postBuffer 5242880000
git clone https://github.com/zhangrengang/OGAP
cd OGAP

# install
conda env create -f OGAP.yaml
conda activate OGAP
#python setup.py install

# start
cd test
python ../OGAP.py Arabidopsis_thaliana-mt.gb -mt -o mt_out
python ../OGAP.py Arabidopsis_thaliana-mt.fa -mt -o mt_out -sp Arabidopsis_thaliana
python ../OGAP.py Arabidopsis_thaliana-pt.gb -pt -o pt_out
python ../OGAP.py Arabidopsis_thaliana-mt.gb -mt -pt -o mt_out

Installation

Dependencies:

  • python 2.7
    • biopython: quickly install by pip2 install biopython<=1.76
    • networkx: quickly install by pip2 install networkx<2.0
    • lazy_property: quickly install by pip2 install lazy-property
  • hmmsearch 3.1x or 3.2x: compatible with HMMER3/f database format
  • exonerate for coding genes annotation
  • augustus for coding genes annotation
  • tRNAscan-SE for tRNA genes annotation
  • blat for rRNA genes annotation
  • tbl2asn output sqn file for submitting to GenBank
  • asn2gb output genbank file
  • without -taxon option to automatically get taxon from organism
    • ete3 for taxonomy mapping from organism
  • -trn_struct option to plot tRNA secondary structure
  • -draw_map option to draw Genome Map
  • -compare_map option to draw Genome Map together with the raw genbank record
    • OGDraw (not in conda)
    • latex (not in conda)
  • -repeat option to annotate repeat region
  • OGAP:
git clone https://github.com/zhangrengang/OGAP

Quick Start

mitochondrion genome in genbank format
cd OGAP/test
python ../OGAP.py Arabidopsis_thaliana-mt.gb -mt -o mt_out

By default, organism name will be extract from the genbank file (ORGANISM) and database will be selected by taxonomy mapping from organism, automatically.

mitochondrion genome in fasta format
python ../OGAP.py Arabidopsis_thaliana-mt.fa -mt -o mt_out -sp Arabidopsis_thaliana

By default, database will be selected by taxonomy mapping from organism (-sp), automatically.

mitochondrion genome with database specified (-taxon)
python ../OGAP.py Arabidopsis_thaliana-mt.fa -mt -o mt_out -sp Arabidopsis_thaliana -taxon rosids
multiple database are supported
python ../OGAP.py Arabidopsis_thaliana-mt.fa -mt -o mt_out -sp Arabidopsis_thaliana -taxon rosids malvids
plastid/chloroplast genome is similar but change to -pt mode
python ../OGAP.py Arabidopsis_thaliana-pt.gb -pt -o pt_out
annonating mitochondrion naive and chloroplast(cp)-derived genes at the same time
python ../OGAP.py Arabidopsis_thaliana-mt.gb -mt -pt -o mt_out

Pipeline for multiple genomes (an example)

for sp in Arabidopsis_thaliana Vitis_vinifera Oryza_sativa Salix_suchowensis Citrus_sinensis
do
	python ../OGAP.py genbank/$sp.gb -mt -prefix $sp -outdir re_anno &> $sp.log
done

python ../lib/Comparative.py summary re_anno/
python ../lib/Comparative.py phylo re_anno/
python ../lib/Comparative.py kaks re_anno/