remotes::install_github("Sage-Bionetworks/sageseqr")
The sageseqr
package integrates the targets
R package, the config
package for R, and Synapse. targets
tracks dependency relationships in the workflow and only updates data when it has changed. A config
file allows inputs and parameters to be explicitly defined in one location. Synapse is a data repository that allows sensitive data to be stored and shared responsibly.
The workflow takes RNA-seq gene counts and sample metadata as inputs, normalizes counts by conditional quantile normalization (CQN), removes outliers based on a user-defined threshold, empirically selects meaningful covariates and returns differential expression analysis results. The data is also visualized in several ways to help you understand meaningful trends. The visualizations include a heatmap identifying highly correlated covariates, a sample-specific x and y marker gene check, boxplots visualizing the distribution of continuous variables and a principal component analysis (PCA) to visualize sample distribution.
The series of steps that make up the workflow are called targets. The target objects are stored in a cache and can either be read or loaded into your environment with the targets
functions tar_read
or tar_load
. Source code for each target can be visualized by setting show_source = TRUE
with loadd
and readd
.
Importantly, running clean
will remove the data stored as targets (but, the data is never completely gone!). You may specific targets by name by passing them to the tar_destroy()
function.
The targets are called by the targets
tar_make()
function and are:
Raw data:
import_metadata
- imports the raw metadata directly from synapseimport_counts
- imports the raw counts directly from synapsebiomart_results
- the complete list of genes with biomaRt annotations.
Exploratory data visualizations:
gene_coexpression
- the distribution of correlated gene counts.boxplots
- the distribution of continuous variables.sex_plot
- the distribution of samples by x and y marker genes.sex_plot_pca
- a PCA of sex-specific expression to visualize more dimensionality thansex_plot
.correlation_plot
- the correlation of covariates.significant_covariates_plot
- the correlation of covariates to gene expression.outliers
- the clustering of samples by PCA.plot_de_volcano
- volcano plot of differentially expressed genes.
Transformed or normalized data:
clean_md
- metadata with factor and numeric types.filtered_counts
- counts matrix with low gene expression removed.biotypes
- gene proportions summarized by biotype.cqn_counts
- CQN normalized counts.model
- model selected by multivariate forward stepwise regression (evaluated by Bayesian Information Criteria (BIC)).de
- differential expression results including adjusted p-values and gene list.report
- output markdown report rendered as HTML.
Anyone can create a Synapse account and access public data in a variety of disciplines: Alzheimer's Disease Knowledge portal, CommonMind Consoritum.