This is a package for comparing different differential abundance methods used in microbial marker-gene (e.g. 16S rRNA), RNA-seq and protein/metabolite abundance analysis.
There are many methods for testing differential abundance and no gold standard, but results can vary a lot between the different statistical methods. The false positive rate and the power of the different methods highly depends on the dataset. This package aims at aiding the analyst in choosing a method for a specific dataset based on empirical testing.
- Shuffle predictor variable (E.g. case vs. control)
- Spike in data for some randomly chosen features (OTU/gene/protein), such that they are associated with the shuffled predictor
- Apply methods, and check:
- whether they can find the spike-ins
- whether the false positive rate is controlled
- Compare methods with
testDA
function- Check the results with
plot
orsummary
- Choose method that has high AUC, and FPR not higher than ~0.05
- Check the results with
- Run data with the chosen test with DA."test" function, where "test"
is the name of the test (see details with ?testDA)
- Check out your final results.
Please cite the following publication if you use the DAtest package:
Russel et al. (2017) DAtest: A framework for comparing differential abundance and expression methods. Submitted
Remember also to cite the method you end up using for your final analysis (See implemented methods for links). If there is no associated publication (e.g. for a t-test) you can cite as follows: ."we used t-test as implemented in DAtest version x.x.x (Russel et al. 2017)"
- Installation of packages
- How to compare methods
- How to run real data
- Implemented methods
- Extra features
- Errors and Issues
It is advised not to have any packages loaded when installing DAtest. Installation of DAtest and all dependencies has been tested to work on a clean R version 3.4.1 by installing in the following order:
The DAtest package:
install.packages("devtools")
library(devtools)
install_github("Russel88/DAtest")
But the package will work without them
source("https://bioconductor.org/biocLite.R")
biocLite("DESeq2")
biocLite("limma")
biocLite("edgeR")
biocLite("metagenomeSeq")
biocLite("baySeq")
biocLite("ALDEx2")
biocLite("qvalue")
biocLite("impute") # For SamSeq
install.packages("samr") # SamSeq
install.packages("pscl") # Zero-inflated Poisson and zero-inflated Negative Binomial
install.packages("statmod") # For limma with blocking
- RAIDA and ANCOM have to be installed from external sources:
# Dependencies (after the bioconductor packages are installed)
install.packages(c("shiny","exactRankTests","openxlsx","protoclust","DT","coin","stringr"))
# RAIDA:
install.packages("https://cals.arizona.edu/~anling/software/RAIDA_1.0.tar.gz",repos = NULL)
# ANCOM (by default not included, as it is very slow):
download.file("https://www.niehs.nih.gov/research/resources/software/biostatistics/ancom/ancom_software.zip",destfile = "ANCOM.zip")
unzip("ANCOM.zip",exdir=getwd())
install.packages("ancom.R_1.1-3.tar.gz", repos = NULL)
Note: If installation fails for any of the bioconductor or external
packages (RAIDA, ANCOM) do not despair. DAtest
will work seamlessly,
but will simply exclude methods that depends on these packages.
The following are suggested, but not needed:
# For drawing Venn diagrams
install.packages("venneuler")
# For post-hoc testing (generalized) linear models
install.packages("lsmeans")
First, all methods with a "False Positive Rate" (FPR) above ~0.05 has an inflated false positive rate, and the p-values can therefore not be trusted.
Second, among methods with a low FPR, those with a high "Area Under the (Receiver Operator) Curve" (AUC) and "Spike Detection Rate" (Spike.detect.rate) are likely to have more power to detect signal in the respective dataset.
Therefore, we want a method with a FPR ~0.05 or lower and a high AUC.
(if you have a phyloseq object, see details further down)
mytest <- testDA(data, predictor, R = 10)
data
is either a matrix or data.frame with taxa/genes/proteins as rows
and samples as columns (with rownames).
predictor
is the predictor of interest, e.g. a factor denoting whether
samples are cases or controls (in the same order as columns in data).
predictor
can be a factor with more than two levels, in which case
only the second level is spiked.
predictor
can also be continuous/quantitative
R
denotes how many times the spike-in and FPR/AUC calculation should
be replicated. It is advised to use at least 10, but it can be set to 1
for a fast test of the function.
The methods run in parallel, and by default the number of cores used is
one less than available. It has been tested to work on Windows, OS X and
Linux Debian. It can be changed with the cores
argument. cores = 1
will turn off parallel computing. If the function is terminated before
ending you might get the following warning:
closing unused connection X (...)
This can safely be ignored, but if you have terminated the function before it ended and your computer runs slow, you might want to restart R to close all connections.
If you run out of memory/RAM, reduce the number of cores used for computing.
Runtime of the different methods can vary quite a lot, and some methods
are simply unfeasible for datasets with tenths of thousands of features.
The runtimeDA
function can estimate the runtime of the different
methods on your dataset: It runs on small subsets of the data and then
extrapolates the runtime from these subsets. It will not be super
precise, but it should give a decent estimate. The function prints how
many minutes it will take to run each method one time and R
times. If
you only use 1 core the actual runtime will be the sum of all the
methods. With more cores the runtime will decrease and approach the
runtime of the slowest method.
runtimeDA(data, predictor)
There are generally two ways to output results with a categorical
predictor with multiple levels; either there is one p-value indicating
whether the categories are similar or different (e.g. in ANOVA), or
there is one p-value for each level, often where the first level is set
as intercept and the remaining levels are tested against the intercept.
For some methods you can choose which option fits you, with other
methods not, but it is crucial that this option is similar in the
testDA
function as in the final analysis (use the out.anova
argument).
All linear models (also GLMs) output results (including p-values) from
anova
/drop1
functions and are thus testing the predictor
variable
in one go. For your final analysis you can run post-hoc tests for all
pairwise comparisons. If you are interested in testing treatments
against a common baseline/control (i.e. intercept), you can set
out.anova = FALSE
. This will output results from the 2. level of the
predictor
compared to the intercept. This is because only the 2. level
is spiked when predictor
contains multiple levels. In your final
analysis you can get an output with all p-values (See more
here).
All limma models output results (including p-values) from topTable
testing all levels (minus 1) against the intercept. This can be changed
with out.anova
.
DESeq2 is set to run Log Ratio Test (LRT) and is thus testing all levels
of the predictor
in one go.
EdgeR is set to test if all levels of predcitor
(minus intercept) are
zero.
E.g. repeated samples from same patients, or control/treatments inside blocks.
The paired
argument should be a factor with the ID of the
patient/subject/block (in the same order as columns in data
):
mytest <- testDA(data, predictor, paired = subjectID)
When a paired
argument is provided, the predictor
is shuffled within
the levels of the paired
factor.
Paired analysis can be very slow. If you simply can't wait to see the
results remove "neb" from the tests
argument.
Some details on how methods use the paired argument:
t-test, wilcox test, friedman test, quade test, SAMseq and permutation
test expect a balanced "unreplicated" design with one value for each
combination of levels in the paired
and predictor
variables.
Negbinom/poisson glm, linear regression and limma models use the
paired
variable as a random intercept (i.e. they become mixed-effect
models) and are very flexible regarding design and work well for
unbalanced designs.
EdgeR, DESeq2 and metagenomeSeq ZIG include the paired
variable as a
covariable in the model and are also generally flexible in the design.
ANCOM use the paired
variable in a repeated measures manner
mytest <- testDA(data, predictor, relative = FALSE)
If relative=FALSE
the values in data
will NOT be normalized by the
sum of the samples for "ttt", "ltt", "ltt2", "wil", "per", "lrm", "llm",
"llm2", "lim", "lli", "lli2", "pea", "spe", "aov", "lao", "lao2" and
"kru", and there will NOT be an offset of log(librarySize) for "neb",
"poi", "qpo", "zpo" and "znb". Furthermore, tests that are specifically
designed to handle sequence data are turned off: DESeq2, EdgeR,
MetagenomeSeq, ANCOM, RAIDA, ALDEx2 and baySeq. Therefore, the data
is
analysed as provided, except for the tests that log-transform the data:
"ltt", "llm", "lli" and "lao".
The covars
argument should be a named list with the covariables (in
the same order as columns in data
):
mytest <- testDA(data, predictor, covars = list(Age = subject.age, Date = exp.date))
data
can also be a phyloseq object. In this case, the predictor
,
paired
and covars
arguments are the names of the variables in
sample_data(data)
:
mytest <- testDA(data, predictor = "Time", paired = "Patient", covars = c("Age","ExpDate"))
plot(mytest)
summary(mytest)
# Details from the run:
mytest$details
# Average run times of each method:
mytest$run.times
Note:
As ANCOM and SAMseq do not output p-values, AUC and Spike.detect.rate
for "anc" and "sam" are based on pseudo p-values. They are calculated
from the statistics/scores as these are perfectly ranked according to
detection/significance calling: Pseudo p-value = the inverse
statistic/score, normalized such that, of the detected ("significant")
features, the feature with the lowest statistic/score has a pseudo
p-value = 0.05. Higher statistic/score gives lower pseudo p-value and
vice versa. For SAMseq, it is done on the ranked scores, and if
effectSize is below 1 the scores are not inversed such that high
negative scores gives low pseudo p-values. FPR is also based on pseudo
p-values for "anc" and "sam", but as these cannot be adjusted as nominal
p-values, FPR for these methods is the final false discovery rate and we
should expect an FPR = 0 for these two methods, unless you are willing
to accept some false positives. This can be tuned with the sig
("anc")
and fdr.output
("sam") arguments.
P-values for baySeq are defined as 1 - posterior likelihoods.
The default effect size of 2 (equal to a log2 fold change of 1) might not fit for your data. If the best method has an AUC below 0.7 you might want to increase the effect size. In contrast, if several methods have an AUC close to 1, you might want to decrease the effect size (towards one) to better differentiate the methods. It is also possible to test for negative associations with effect sizes between 0 and 1, but this has not been tested extensively.
All tests can easily be run with the original data. E.g. edgeR exact test:
res.ere <- DA.ere(data, predictor)
All methods can be accessed in the same way; DA."test" where "test" is
the abbreviation given in the details of the testDA
function.
It is advised to set allResults = TRUE
for checking final results. For
all methods where relevant, this will output the raw results, often in a
list with each element corresponding to a feature (OTU/gene/protein).
For published methods, it is advised to check their tutorials on how to
read to output.
- IMPORTANT:
- Set
out.anova
similar intestDA
as in your final analysis for reliable comparison - If your
predictor
has more than two levels you might have to set theby
argument for "zig" (this is by default = 2) - All linear models (including all GLMs) output the p-value from
an
anova
/drop1
function. This can be changed with theout.anova
argument - All limma models (lim,lli,lli2,vli) test all levels of the
predictor
against an intercept (withtopTable
). This can be changed with theout.anova
argument - For ANCOM: If the FPR = 0, you would not expect false positives
with the default settings. If "anc" has an FPR > 0, set
multcorr = 1
- Set
For linear models the drop1
/anova
functions can be used to test
significance of the predictor
and covars
variables:
### Apply drop1 for each feature and output the adjusted p-values:
# Works on "zpo", "znb", "qpo", "neb", "poi". Non-paired "lrm", "llm", "llm2"
results <- DA.lrm(data, predictor, allResults = TRUE)
res.drop1 <- DA.drop1(results)
### Apply anova for each feature and output the adjusted p-values:
# Works on "lrm", "llm", "llm2". Non-paired "neb"
results <- DA.lrm(data, predictor, allResults = TRUE)
res.anova <- DA.anova(results)
For anova and linear models we can also run post-hoc tests for all
pairwise comparisons of multi-class predictor
/covars
.
### Apply TukeyHSD for each feature for a selected variable and output the adjusted p-values:
# Works on "aov", "lao", "lao2"
results <- DA.aov(data, predictor, allResults = TRUE)
res.tukey <- DA.TukeyHSD(results, variable = "predictor") # variable can also be the name of a covar
### Apply lsmeans for each feature for a selected variable and output the adjusted p-values:
# This requires the lsmeans package.
# Works on "poi", "neb", "lrm", "llm", "llm2", "qpo", "znb", "zpo"
results <- DA.lrm(data, predictor, allResults = TRUE)
res.lsm <- DA.lsmeans(results, variable = "predictor") # variable can also be the name of a covar
# For paired "lrm", "llm", "llm2" the original predictor variable has to be supplied. For example:
results <- DA.lrm(data, predictor = mypred, paired = SubjectID, allResults = TRUE)
res.lsm <- DA.lsmeans(results, variable = "predictor", predictor = mypred)
# and if covars are used, they also need to be supplied for paired "lrm", "llm", "llm2". For example:
results <- DA.lrm(data, predictor = mypred, paired = SubjectID, covars = list(covar1 = mycovar), allResults = TRUE)
res.lsm <- DA.lsmeans(results, variable = "predictor", predictor = mypred, covars = list(covar1 = mycovar))
Alternatively, run all (or several) methods and check which features are found by several methods.
# Run many methods:
res.all <- allDA(data, predictor)
# Which features are detected by several methods:
View(res.all$table)
# Venn diagram of detected features from selected methods:
# This requires the venneuler package.
vennDA(res.all, tests = c("wil","ttt","ltt"))
# See results from a method (e.g. t.test "ttt"):
View(res.all$results$ttt)
A subset of methods can be run by setting the tests
argument. E.g.
only those performing well based on results from testDA
.
Choosing a method based on the results run from the real data, e.g. the
one with most significant features or with the most interesting
results, will most likely give you an inflated false positive rate, even
though the method has a low FPR based on the testDA
function. This is
because testDA
estimates FPR indpendently for each method, but if a
method is non-randomly chosen from a set of methods they are no longer
independent.
Either add it yourself (see under 'Extra features'), or write to me, preferably with a code snippet of the implementation (see email in Description).
- per - Permutation test with user defined test statistic (See below for description of the paired permutation test)
- bay - baySeq
- adx - ALDEx t-test and wilcoxon
- wil - Wilcoxon Rank Sum on relative abundances (Paired is a Wilcoxon Signed Rank test
- ttt - Welch t.test on relative abundances (Paired is paired t-test)
- ltt - Welch t.test, but reads are first transformed with log(abundance + delta) then turned into relative abundances
- ltt2 - Welch t.test, but with relative abundances transformed with log(relative abundance + delta)
- neb - Negative binomial GLM (The paired version is a mixed-effect model)
- erq - EdgeR - Quasi likelihood (The paired version is a model with the paired variable as covariate)
- ere - EdgeR - Exact test
- msf - MetagenomeSeq feature model
- zig - MetagenomeSeq zero-inflated gaussian (The paired version is a model with the paired variable as covariate)
- ds2 - DESeq2 (The paired version is a model with the paired variable as covariate)
- lim - LIMMA (The paired version is using the block argument in lmFit)
- lli - LIMMA, but reads are first transformed with log(abundance + delta) then turned into relative abundances
- lli2 - LIMMA, but with relative abundances transformed with log(relative abundance + delta)
- vli - LIMMA with voom (The paired version is using the block argument in lmFit)
- kru - Kruskal-Wallis test on relative abundances
- aov - ANOVA on relative abundances
- lao - ANOVA, but reads are first transformed with log(abundance + delta) then turned into relative abundances
- lao2 - ANOVA, but with relative abundances transformed with log(relative abundance + delta)
- lrm - Linear regression on relative abundances (The paired version is a mixed-effect model)
- llm - Linear regression, but reads are first transformed with log(abundance + delta) then turned into relative abundances
- llm2 - Linear regression, but with relative abundances transformed with log(relative abundance + delta)
- rai - RAIDA
- spe - Spearman Rank Correlation
- pea - Pearson Correlation
- poi - Poisson GLM (The paired version is a mixed-effect model)
- qpo - Quasi-poisson GLM
- zpo - Zero-inflated Poisson GLM
- znb - Zero-inflated Negative Binomial GLM
- fri - Friedman Rank Sum test
- qua - Quade test
- anc - ANCOM (by default not included, as it is very slow)
- sam - SAMseq
A paired permutation test is implemented specifically for this package. The test is similar to the original, but with a different test statistic and permutation scheme. The permutations are constrained in the paired version such that the predictor is only permuted within each level of the paired argument (e.g. subjects). The test statistic first finds the log-ratio between the two predictor levels (e.g. case and control) for each level of the paired argument and the final statistic is the mean of these log-ratios.
"zzz" (DA.zzz
) is a placeholder for user-defined methods. You have to
supply it with a function whose input is: A count_table
(data.frame,
samples are columns, rownames indicate features), a predictor
(vector), a paired
variable (factor), and a covars
argument (named
list with vectors). It is OK if your function doesn't use
paired
/covars
, but they have to be there in the arguments. The
output from the user-defined function should be a data.frame that at
least includes: The names of the features ("Feature"), p-values
("pval"), and name of method ("Method").
See example below on how to include a simple t-test on relative abundances:
# Define our function
myfun <- function(count_table, predictor, paired, covars){ # These fours arguments should not be altered
# Relative abundance
rel <- apply(count_table, 2, function(x) x/sum(x))
# t-test function
tfun <- function(x){
tryCatch(t.test(x ~ predictor)$p.value, error = function(e){NA})
}
# P-values for each feature
pvals <- apply(rel, 1, tfun)
# Collect and return data
df <- data.frame(Feature = rownames(count_table),
pval = pvals)
df$Method <- "My own t-test"
return(df)
}
# Test that it works on real data
res <- DA.zzz(data, predictor, FUN = myfun)
# Use it in the testDA function
mytest <- testDA(data, predictor, tests = c("zzz","ttt","wil"), args = list(zzz = list(FUN = myfun)))
# Several user-defined methods can be included by simply putting numbers after "zzz" in both the "tests" and "args" arguments:
mytest <- testDA(data, predictor, tests = c("zzz","zzz2","ttt","wil"), args = list(zzz = list(FUN = myfun),
zzz2 = list(FUN = myfun2)))
Caution: If "zzz" is in the tests
argument, there will be no
checks of whether any of the supplied methods are suitable for the data.
If e.g. your predictor
is quantitative and "ttt" is in the tests
argument, it will try to run a t-test and fail miserably.
print(mytest)
View(mytest$results[[1]]["ere"])
plot(mytest, p = TRUE)
Additional arguments can be passed to the internal functions with the
args
argument. It should be structured as a list with elements named
by the tests:
E.g. passing to the DA.per function that it should only run 1000 iterations:
mytest <- testDA(...,args = list(per=list(noOfIterations=1000)))
Include that the log t.test should use a pseudocount of 0.1:
mytest <- testDA(...,args = list(per=list(noOfIterations=1000), ltt=list(delta=0.1)))
Additional arguments are simply seperated by commas.
Below is an overview of which functions get the arguments that are passed to a specific test:
- per - Passed to DA.per
- bay - Passed to getPriors and getLikelihoods
- adx - Passed to aldex
- wil - Passed to wilcox.test and DA.wil
- ttt - Passed to t.test and DA.ttt
- ltt - Passed to t.test and DA.ltt
- ltt2 - Passed to t.test and DA.ltt2
- neb - Passed to glm.nb and glmer.nb
- ere - Passed to calcNormFactors, estimateCommonDisp, estimateTagwiseDisp and exactTest
- erq - Passed to calcNormFactors, estimateDisp, glmQLFit and glmQLFTest
- msf - Passed to fitFeatureModel
- zig - Passed to fitZig
- ds2 - Passed to DESeq
- lim - Passed to eBayes and lmFit
- lli - Passed to eBayes, lmFit and DA.lli
- lli2 - Passed to eBayes, lmFit and DA.lli2
- kru - Passed to kruskal.test
- aov - Passed to aov
- lao - Passed to aov and DA.lao
- lao2 - Passed to aov and DA.lao2
- lrm - Passed to lm and lme
- llm - Passed to lm, lme and DA.llm
- llm2 - Passed to lm, lme and DA.llm2
- rai - Passed to raida
- spe - Passed to cor.test
- pea - Passed to cor.test
- poi - Passed to glm and glmer
- qpo - Passed to glm
- vli - Passed to voom, eBayes and lmFit
- zpo - Passed to zeroinfl
- znb - Passed to zeroinfl
- fri - Passed to friedman.test
- qua - Passed to quade.test
- anc - Passed to ANCOM
- sam - Passed to SAMseq
You might get an error with the "rJava" package. This will not be a
problem for the workings of any of the main functionalities of this
package, but might interfere with the vennDA
function. See here for
solutions