Skip to content

ngs_PIPELINE

Stephen Fisher edited this page Jan 29, 2015 · 5 revisions

Module: PIPELINE

This module will run a single dataset through the various analysis modules, depending on the selected data type (RNASeq or WGS). The modules used are hardcoded in the ngs_PIPELINE.sh file.

Usage:
	ngs.sh pipeline [-i inputDir] [-o outputDir] [-t RNASeq | RNASeq_Stranded | RNASeq_Human | WGS] -p numProc -s species [-se] sampleID
Input:
	see individual commands
Output:
	see individual commands
Requires:
	see individual commands
OPTIONS:
	-i - parent directory containing subdirectory with compressed fastq files (default: ./raw). This is the parent directory of the sample-specific directory. The sampleID will be used to complete the directory path (ie inputDir/sampleID).
	-o - directory containing subdirectory with analysis files (default: ./analyzed). This is the parent directory of the sample-specific directory. The sampleID will be used to complete the directory path (ie outputDir/sampleID).
	-t type - RNASeq or WGS (Whole Genome Sequencing) (default: RNASeq). RNASeq_Stranded assumes stranded reads for HTSeq counting and will generate intron counts. RNASeq_Human is the same as RNASeq_Stranded but also uses 'gene_name' for the name of the features in the HTSeq GTF file.
	-p numProc - number of cpu to use.
	-s species - species from repository: /lab/repo/resources.
	-se - single-end reads (default: paired-end)

This will process sequencing data using either an RNASeq or WGS (Whole Genome Sequencing) pipeline. For RNASeq the modules used are: init, fastqc, blast, trim, star, post, htseq, blastdb, and rsync. For WGS the modules used are: init, fastqc, blast, trim, bowtie, SPAdes, post, and rsync. See individual modules for documentation.

RNASeq modules and arguments (in order):

INIT
FASTQC
BLAST
TRIM -m 20 -q 53 -rAT 26 -rN -c $REPO_LOCATION/trim/contaminants.fa
FASTQC -i trim -o fastqc.trim
STAR
HTSEQ
 * if species = "hg38.gencode21.stranded" then [-stranded -introns]
POST
RSYNC
Clone this wiki locally