About this Workflows

This pipeline is an automatic structural annotation workflows written in snakemake. The annotation is based on two tools which uses RNA-Seq and/or protein homology information for predict coding sequence. One of this tools is BRAKER which use GeneMark-EX and AUGUSTUS. And the other tool is AUGUSTUS alone for improve annotation of small coding sequences with few or no intron. Before the annotation, the repeat element of genome are masked for avoid annotation probleme. In addition this workflows can perform a illumina assembly with ABySS using différent value of kmere.

#Installation

For install the annotation Workflows, please use this command :

git clone https://github.com/FlorianCHA/AssemblyAndAnnotation_pipeline.git

This workflows use many tools for assembly, mapping, annotation and quality control. For installation of softwre two option are available. You can install all tools mannually or you can use the singularity launcher without install any tools needed in this workflows.

Manually install

If you want download all software, please complete the software part of config.yaml file.

Mandatory installation

Optional installation

RepeatMasker if you want mask the repeat element of your genomes
ABySS if you want assembled you illumina fastq

Sigularity containers

All containers for the workflows are available here. If you use the 'Launcher_singularty.sh', the workflows download all singularity containers needed. You only need to download the genemark-ES licence here

Defining Workflows

Prepare config file

To run the workflows you have to provide the data path for all input file. Please complete the config.yaml file for launch the workflow.

1. Providing data

Assembly option

    # If you want assembly with ABySS you illumina data please complete this part else, pass this part (keep every path empty '')
    FASTQ: '/path/to/fastq/directory/' 
    SUFFIX_FASTQ_R1 : '_R1.fastq.gz' 
    SUFFIX_FASTQ_R2 : '_R1.fastq.gz'

FASTQ : Path of you directory which contain all your fastq file to assemble, if you let empty the path the workflown don't assembled and use fasta file (give in the FASTA option) for the annotation step.
SUFFIX_FASTQ_R1 : Etension of your R1 fastq files contains in FASTQ directory (for exemple : '_R1.fastq.gz' )
SUFFIX_FASTQ_R2 : Etension of your R2 fastq files contains in FASTQ directory (for exemple : '_R2.fastq.gz' )

Repeat element masking option

 ET_DB: '/path/to/repeat_element_db.fasta'

ET_DB : Path of the repeat element data base for repeatMasker, if you let empty the path, the workflow don't mask the repeat element of the genome

Annotation option

 FASTA:'/path/to/fasta/directory/' 
 SUFFIX_FASTA : '.fasta'
 RNAseq_DIR : 'path/to/RNA_seqfastq/directory/'
 SUFFIX_RNAseq : '.fastq.gz'
 ID_SPECIES: 'arabidopsis'
 PROTEIN_REF: '/path/to/protein_ref.fasta' 
 GM_KEY : '/path/to/gm_key_64'

FASTA : Path of you directory which contain all your fasta file to annotate. If the FASTQ option is empty please give a correct path else you can let empty this option.
RNAseq_DIR : Path of the directory which contain all RNAseq data, if you kepts this path empty this pipeline run only augustus
SUFFIX_RNAseq : Etension of your fastq files contains in FASTQ directory (for exemple : '.fastq.gz','fq.gz ','fq' , etc. )
ID_SPECIES : ID of species for augustus trainings, please refers to augustus main page for this option
PROTEIN_REF : Path of the protein fasta file, if you don't have this file you can kept empty this option ('')
GM_KEY : Path of the licence for Genemarks-ES (please clik here for download the licence).
OUTPUT : Output directory for all results of this pipeline

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
DNAnnot		DNAnnot
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About this Workflows

Manually install

Mandatory installation

Optional installation

Sigularity containers

Defining Workflows

Prepare config file

1. Providing data

Assembly option

Repeat element masking option

Annotation option

2. Parameters for some specific tools

Launching workflow

1. On a HPC clusters

2. On a single machine

About

Releases

Packages

Languages

License

FlorianCHA/DNAnnot

Folders and files

Latest commit

History

Repository files navigation

About this Workflows

Manually install

Mandatory installation

Optional installation

Sigularity containers

Defining Workflows

Prepare config file

1. Providing data

Assembly option

Repeat element masking option

Annotation option

2. Parameters for some specific tools

Launching workflow

1. On a HPC clusters

2. On a single machine

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages