-
Notifications
You must be signed in to change notification settings - Fork 11
ngs_PIPELINE
Stephen Fisher edited this page Jan 29, 2015
·
5 revisions
This module will run a single dataset through the various analysis modules, depending on the selected data type (RNASeq or WGS). The modules used are hardcoded in the ngs_PIPELINE.sh file.
Usage: ngs.sh pipeline [-i inputDir] [-o outputDir] [-t RNASeq | RNASeq_Stranded | RNASeq_Human | WGS] -p numProc -s species [-se] sampleID Input: see individual commands Output: see individual commands Requires: see individual commands OPTIONS: -i - parent directory containing subdirectory with compressed fastq files (default: ./raw). This is the parent directory of the sample-specific directory. The sampleID will be used to complete the directory path (ie inputDir/sampleID). -o - directory containing subdirectory with analysis files (default: ./analyzed). This is the parent directory of the sample-specific directory. The sampleID will be used to complete the directory path (ie outputDir/sampleID). -t type - RNASeq or WGS (Whole Genome Sequencing) (default: RNASeq). RNASeq_Stranded assumes stranded reads for HTSeq counting and will generate intron counts. RNASeq_Human is the same as RNASeq_Stranded but also uses 'gene_name' for the name of the features in the HTSeq GTF file. -p numProc - number of cpu to use. -s species - species from repository: /lab/repo/resources. -se - single-end reads (default: paired-end)
This will process sequencing data using either an RNASeq or WGS (Whole Genome Sequencing) pipeline. For RNASeq the modules used are: init, fastqc, blast, trim, star, post, htseq, blastdb, and rsync. For WGS the modules used are: init, fastqc, blast, trim, bowtie, SPAdes, post, and rsync. See individual modules for documentation.
RNASeq modules and arguments (in order):
INIT FASTQC BLAST TRIM -m 20 -q 53 -rAT 26 -rN -c $REPO_LOCATION/trim/contaminants.fa FASTQC -i trim -o fastqc.trim STAR HTSEQ * if species = "hg38.gencode21.stranded" then [-stranded -introns] POST RSYNC