Skip to content

FlorianCHA/DNAnnot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About this Workflows

This pipeline is an automatic structural annotation workflows written in snakemake. The annotation is based on two tools which uses RNA-Seq and/or protein homology information for predict coding sequence. One of this tools is BRAKER which use GeneMark-EX and AUGUSTUS. And the other tool is AUGUSTUS alone for improve annotation of small coding sequences with few or no intron. Before the annotation, the repeat element of genome are masked for avoid annotation probleme. In addition this workflows can perform a illumina assembly with ABySS using différent value of kmere.

#Installation

For install the annotation Workflows, please use this command :

git clone https://github.com/FlorianCHA/AssemblyAndAnnotation_pipeline.git

This workflows use many tools for assembly, mapping, annotation and quality control. For installation of softwre two option are available. You can install all tools mannually or you can use the singularity launcher without install any tools needed in this workflows.

Manually install

If you want download all software, please complete the software part of config.yaml file.

Mandatory installation

Optional installation

  • RepeatMasker if you want mask the repeat element of your genomes
  • ABySS if you want assembled you illumina fastq

Sigularity containers

All containers for the workflows are available here. If you use the 'Launcher_singularty.sh', the workflows download all singularity containers needed. You only need to download the genemark-ES licence here

Defining Workflows

Prepare config file

To run the workflows you have to provide the data path for all input file. Please complete the config.yaml file for launch the workflow.

1. Providing data

Assembly option

    # If you want assembly with ABySS you illumina data please complete this part else, pass this part (keep every path empty '')
    FASTQ: '/path/to/fastq/directory/' 
    SUFFIX_FASTQ_R1 : '_R1.fastq.gz' 
    SUFFIX_FASTQ_R2 : '_R1.fastq.gz' 
  • FASTQ : Path of you directory which contain all your fastq file to assemble, if you let empty the path the workflown don't assembled and use fasta file (give in the FASTA option) for the annotation step.
  • SUFFIX_FASTQ_R1 : Etension of your R1 fastq files contains in FASTQ directory (for exemple : '_R1.fastq.gz' )
  • SUFFIX_FASTQ_R2 : Etension of your R2 fastq files contains in FASTQ directory (for exemple : '_R2.fastq.gz' )

Repeat element masking option

 ET_DB: '/path/to/repeat_element_db.fasta'
  • ET_DB : Path of the repeat element data base for repeatMasker, if you let empty the path, the workflow don't mask the repeat element of the genome

Annotation option

 FASTA:'/path/to/fasta/directory/' 
 SUFFIX_FASTA : '.fasta'
 RNAseq_DIR : 'path/to/RNA_seqfastq/directory/'
 SUFFIX_RNAseq : '.fastq.gz'
 ID_SPECIES: 'arabidopsis'
 PROTEIN_REF: '/path/to/protein_ref.fasta' 
 GM_KEY : '/path/to/gm_key_64' 
  • FASTA : Path of you directory which contain all your fasta file to annotate. If the FASTQ option is empty please give a correct path else you can let empty this option.
  • RNAseq_DIR : Path of the directory which contain all RNAseq data, if you kepts this path empty this pipeline run only augustus
  • SUFFIX_RNAseq : Etension of your fastq files contains in FASTQ directory (for exemple : '.fastq.gz','fq.gz ','fq' , etc. )
  • ID_SPECIES : ID of species for augustus trainings, please refers to augustus main page for this option
  • PROTEIN_REF : Path of the protein fasta file, if you don't have this file you can kept empty this option ('')
  • GM_KEY : Path of the licence for Genemarks-ES (please clik here for download the licence).
  • OUTPUT : Output directory for all results of this pipeline

2. Parameters for some specific tools

Launching workflow

1. On a HPC clusters

2. On a single machine

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published