This document describes the output produced by the pipeline.
The MeRIP-seq data for the illustration of MeRIPseqPipe was downloaded from GEO dataset: GSE52662 (Mus musculus) , which was used to describe the dynamic m6A RNA modification during cell fate transition in mammalian embryonic stem cells.
All the results showed below are come from defaulted tools and options:
// Setting main parameters of analysis mode
stranded = "no"
single_end = true
gzip = true
mapq_cutoff = 20
motiflength = "5,6,7,8"
featurecount_minMQS = "0"
delfc = "0.58"
dmlfc = "0.58"
cluster_method = "single"
aligners = "hisat2"
peak_threshold = "medium"
peakCalling_mode = "independence"
peakMerged_mode = "rank"
expression_analysis_mode = "DESeq2"
methylation_analysis_mode = "QNB"
skip_createbedgraph = true
All the intermediate files are output in the pipe result folder, users can use these files for further analysis based on different analysis needs.
- QC
- fastp
*._aligners.fastq.gz
*._fastp.html
*._fastp.json
- fastqc
*._aligners_fastqc.html
*._aligners_fastqc.zip
- rseqc
- bam_stat
- infer_experiment
- inner_distance
- junction_annotation
- junction_saturation
- read_distribution
- read_duplication
- fastp
- alignment
- rRNA_dup
*.fastq.gz
*._rRNA_summary.txt
- hisat2 (or other aligner)
*_hisat2.bam
*_hisat2_summary.txt
- samtoolsSort
groupid_sampleid.bam
- rRNA_dup
- expressionAnalysis
- featurecounts
expression.*.count.matrix
expression.*.fpkm.matrix
expression.*.tpm.matrix
featurecount_*.input.count
- DESeq2 (or edgeR)
DESeq2_*.csv
- featurecounts
- peakCalling
- macs2
*_normalized.bed
*_peaks.narrowPeak
*_peaks.xls
*.summits
- MATK
*.bed
*_normalized.bed
- metpeak
*_normalized.bed
- meyer
*.bed
*_normalized.bed
- mergedBed (or other peak merging tools)
rank_merged_allpeaks.bed
rank_merged_group_*.bed
rank_merged_sample_*.bed
- macs2
- m6AAnalysis
- AnnotatedPeaks
- annotatedbygtf
peakcaller*_normalized.anno.txt
peakcaller*_normalized.peak_bed.center
peakcaller*_normalized.refSeq.all.bed
peakcaller*_normalized.tmp.refSeq.bed
peakcaller*_normalized.unanno.txt
*_normalized_annotatedbyhomer.bed
- annotatedbygtf
- diffm6A
analysistool_diffm6A_*.txt
- m6APredictionSites
m6A_sites_merged.bed
m6A_sites_group*.bed
- m6AQuantification
*_quantification.matrix
*.input.count
*.ip.count
- motif
- rank_merged*_homer
- AnnotatedPeaks
- Report
- diffReport
- PeaksMotifReport
- QCReadsReport
- ReportRData
- pipeline_info
- execution_report.html
- execution_timeline.html
- pipeline_report.html
The quality of RNA sequencing data can be assessed at two different levels, sequence-based and alignment-based metrics. The former includes the sequence quality, sequencing depth, reads duplication rates, GC content, and nucleotide composition bias, while the latter contains mapping statistics, coverage uniformity, saturation of sequencing depth, reads distribution over gene structure, and ribosomal RNA contamination.
MeRIPseqPipe combined three quality control tools including fastp, FastQC and RseQC, which can cover most of sequence-based and alignment-based quality control metrics. For alignment, three popular tools, HISAT2, STAR and BWA are in support.
All the results are be summarized with MultiQC in a comprehensive HTML report.
Output directory: results/Report/QCReadsReport
MultiQC is a visualisation tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in within the report data directory.
QCReadsReport/multiqc_report_hisat2_data
: containing statistics data filesQCReadsReport/multiqc_report_hisat2.html
: check multiqc_report_hisat2.html for the full report.
MeRIPseqPipe will call m6A peaks on the preprocessed bam files using MACS2, MeTPeak, Meyer and MATK. Users can adjust to meet different needs.
The results of peak distribution and motif will be plotted and generated in the Peaks_Motif_Report_rank.html.
Output directory: results/Report/PeaksMotifReport
Three types of plots will be output for visualization:
- The percentage of peaks in different genome regions
MeRIPseqPipe uses DESeq2 and edgeR to perform differential expression analysis and implements five algorithms including Wilcox-test, QNB, MATK, DESeq2 (GLM) and edgeR (GLM) to perform quantification and differential methylation analysis.
Output directory: results/Report/diffReport
All the plots are generated in the DiffReport_rank_QNB_DESeq2.html
Heatmap, volcano plot, quadrant plot and ecdf curve are output for visualization:
- Heatmap of differentially expressed genes and methylated m6As.