RNA-seq Demonstration Pipeline

Bash scripts describing a simple RNA-seq pipeline, from adapter trimming to visualization on a genome browser.

Here is a useful article on common filetypes used in bioinformatics

Workflow steps:

Adapter trimming and fastq Quality Control (Trim Galore!)
INPUT: raw .fastq.gz files
OUTPUT: trimmed .fq.gz files, FastQC results in .html
Alignment (STAR)
INPUT: trimmed .fq.gz files, STAR index directory
OUTPUT: raw .bam files
Quality Control (samtools, Picard, bamtools)
INPUT: raw .bam files
OUTPUT: filtered .bam files
Quantification (Subread, awk)
INPUT: filtered .bam files, .gtf file (Gene Transfer Format)
OUTPUT: by-gene read counts in a simple text .tsv
Visualization (bedtools, bedGraphToBigWig)
INPUT: filtered .bam files, Chromosome Sizes file in .bed format
OUTPUT: .bw file that can be uploaded to a genome browser for visalization of read coverage.

Detailed methods

1. Adapter trimming is performed with Trim Galore! package.
      -q 15 - Trim base calls that are below this threshold in addition to adapter sequences.
      --stringency 5 - Minimum overlap of adapter sequence to be trimmed.
      -e 0.1 - Max. allowed error rate (no. of errors divided by the length of the matching region)
      --length 20 - Discard reads that become smaller than this after trimming
      --illumina - Adapter sequence to be trimmed is the first 13bp of the Illumina universal adapter AGATCGGAAGAGC
2. Alignment is performed with STAR aligner. This takes reads from our trimmed .fastq files and assigns them to regions on the genome.
      --genomeDir - Directory which has the STAR index files, is generated by the STAR command genomeGenerate.
      --outFilterType,--outFilterIntronMotifs, --alignIntronMax, --outSAMstrandField, --outSAMunmapped, --chimSegmentMin, --chimJunctionOverhangMin - Flags that control to what degree the output is filtered.
3. Quality control of BAM file. It is important to document these steps, as they can affect reproducibility in the results, and there are many options to choose from.
      a. Filter out any non-chromosomal sequences (contigs) and sort BAM
      b. Mark duplicate reads (set FLAG) via Picard MarkDuplicates and sort
      c. Remove the marked duplicates, as well as PCR artifacts, and reads failing vendor quality checks (-F 1540) and sort
      d. Index the final BAM
4. Read counting (quantification) counts reads over gene regions, given a BAM file. We will use these to perform downstream statistical analyses.

featureCounts \
        -p \ # Paired-end reads
        -t exon \ # Feature types to consider in GTF file
        -g gene_id \ # Attribute to group features (e.g. exons) into meta-features (e.g. genes)
        -s 1 \ # Input data is strand-specific
        -O \ # Count reads that overlap more than one meta-feature more than once.
        -T <# of CPUs> \
        -a <GTF File> \ # Path to GTF file
        -o <outfile> \
        <BAM File>

5. We construct a bigWig file, which is a binary file format for viewing the data on a genome browser track.

##### Normalization #####
READS=$(samtools view -c <bam>) # Calculate number of reads in the bam file 
FACTOR=$(echo "scale=10; 1000000/${READS}" | bc -l) # Calculate a normalization factor 
#########################

bamToBed -i <bam> -bed12 | bed12ToBed6 -i stdin | genomeCoverageBed -bg -i stdin -g <ChromosomeSizes> -scale ${FACTOR} | sort -k1,1 -k2,2n > <temporary output> # Long pipe of commands to convert bam file to sorted text .bed file
bedGraphToBigWig ${outdir}/${prefix}.tmp.bg ${chromsizes} ${outdir}/${prefix}.bw # Convert the newly created bed file into a bigwig file

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
step01_trim_galore.sh		step01_trim_galore.sh
step02_STAR_alignment.sh		step02_STAR_alignment.sh
step03_bam_qc.sh		step03_bam_qc.sh
step04_quantification.sh		step04_quantification.sh
step05_visualization.sh		step05_visualization.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNA-seq Demonstration Pipeline

Bash scripts describing a simple RNA-seq pipeline, from adapter trimming to visualization on a genome browser.

Workflow steps:

Detailed methods

About

Releases

Packages

Languages

Nuwah12/Demo-RNAseq-Pipeline

Folders and files

Latest commit

History

Repository files navigation

RNA-seq Demonstration Pipeline

Bash scripts describing a simple RNA-seq pipeline, from adapter trimming to visualization on a genome browser.

Workflow steps:

Detailed methods

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages