Skip to content

Compare different differential abundance and expression methods

License

Notifications You must be signed in to change notification settings

Russel88/DAtest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DAtest

This is a package for comparing different differential abundance methods used in microbial marker-gene (e.g. 16S rRNA), RNA-seq and protein/metabolite abundance analysis.

There are many methods for testing differential abundance and no gold standard, but results can vary a lot between the different statistical methods. The false positive rate and the power of the different methods highly depends on the dataset. This package aims at aiding the analyst in choosing a method for a specific dataset based on empirical testing.

The method goes as follows:

  • Shuffle predictor variable (E.g. case vs. control)
  • Spike in data for some randomly chosen features (OTU/gene/protein), such that they are associated with the shuffled predictor
  • Apply methods, and check:
    • whether they can find the spike-ins
    • whether the false positive rate is controlled

The workflow (details can be found below):

  • Compare methods with testDA function
    • Check the results with plot or summary
    • Choose method that has high AUC, and FPR not higher than ~0.05
  • Run data with the chosen test with DA."test" function, where "test" is the name of the test (see details with ?testDA)
    • Check out your final results.

Citation

Please cite the following publication if you use the DAtest package:

Russel et al. (2017) DAtest: A framework for comparing differential abundance and expression methods. Submitted

Remember also to cite the method you end up using for your final analysis (See implemented methods for links). If there is no associated publication (e.g. for a t-test) you can cite as follows: ."we used t-test as implemented in DAtest version x.x.x (Russel et al. 2017)"

Overview of this tutorial

Installation of packages

It is advised not to have any packages loaded when installing DAtest. Installation of DAtest and all dependencies has been tested to work on a clean R version 3.4.1 by installing in the following order:

The DAtest package:

install.packages("devtools")
library(devtools)
install_github("Russel88/DAtest")

The following are needed for full functionality

But the package will work without them

source("https://bioconductor.org/biocLite.R")
biocLite("DESeq2")
biocLite("limma")
biocLite("edgeR")
biocLite("metagenomeSeq")
biocLite("baySeq")
biocLite("ALDEx2")
biocLite("qvalue")

biocLite("impute") # For SamSeq
install.packages("samr") # SamSeq

install.packages("pscl") # Zero-inflated Poisson and zero-inflated Negative Binomial
install.packages("statmod") # For limma with blocking
  • RAIDA and ANCOM have to be installed from external sources:
# Dependencies (after the bioconductor packages are installed)
install.packages(c("shiny","exactRankTests","openxlsx","protoclust","DT","coin","stringr"))

# RAIDA:
install.packages("https://cals.arizona.edu/~anling/software/RAIDA_1.0.tar.gz",repos = NULL)

# ANCOM (by default not included, as it is very slow):
download.file("https://www.niehs.nih.gov/research/resources/software/biostatistics/ancom/ancom_software.zip",destfile = "ANCOM.zip")
unzip("ANCOM.zip",exdir=getwd())
install.packages("ancom.R_1.1-3.tar.gz", repos = NULL)

Note: If installation fails for any of the bioconductor or external packages (RAIDA, ANCOM) do not despair. DAtest will work seamlessly, but will simply exclude methods that depends on these packages.

The following are suggested, but not needed:

# For drawing Venn diagrams
install.packages("venneuler")

# For post-hoc testing (generalized) linear models
install.packages("lsmeans")

How to compare methods

First, all methods with a "False Positive Rate" (FPR) above ~0.05 has an inflated false positive rate, and the p-values can therefore not be trusted.

Second, among methods with a low FPR, those with a high "Area Under the (Receiver Operator) Curve" (AUC) and "Spike Detection Rate" (Spike.detect.rate) are likely to have more power to detect signal in the respective dataset.

Therefore, we want a method with a FPR ~0.05 or lower and a high AUC.

Run the test:

(if you have a phyloseq object, see details further down)

mytest <- testDA(data, predictor, R = 10)

data is either a matrix or data.frame with taxa/genes/proteins as rows and samples as columns (with rownames).

predictor is the predictor of interest, e.g. a factor denoting whether samples are cases or controls (in the same order as columns in data).

predictor can be a factor with more than two levels, in which case only the second level is spiked.

predictor can also be continuous/quantitative

R denotes how many times the spike-in and FPR/AUC calculation should be replicated. It is advised to use at least 10, but it can be set to 1 for a fast test of the function.

The function automatically uses multiple CPUs for fast execution

The methods run in parallel, and by default the number of cores used is one less than available. It has been tested to work on Windows, OS X and Linux Debian. It can be changed with the cores argument. cores = 1 will turn off parallel computing. If the function is terminated before ending you might get the following warning:

closing unused connection X (...)

This can safely be ignored, but if you have terminated the function before it ended and your computer runs slow, you might want to restart R to close all connections.

If you run out of memory/RAM, reduce the number of cores used for computing.

If you have a lot of features; i.e. more than 10k

Runtime of the different methods can vary quite a lot, and some methods are simply unfeasible for datasets with tenths of thousands of features. The runtimeDA function can estimate the runtime of the different methods on your dataset: It runs on small subsets of the data and then extrapolates the runtime from these subsets. It will not be super precise, but it should give a decent estimate. The function prints how many minutes it will take to run each method one time and R times. If you only use 1 core the actual runtime will be the sum of all the methods. With more cores the runtime will decrease and approach the runtime of the slowest method.

runtimeDA(data, predictor)

If your predictor is categorical with more than two levels:

There are generally two ways to output results with a categorical predictor with multiple levels; either there is one p-value indicating whether the categories are similar or different (e.g. in ANOVA), or there is one p-value for each level, often where the first level is set as intercept and the remaining levels are tested against the intercept. For some methods you can choose which option fits you, with other methods not, but it is crucial that this option is similar in the testDA function as in the final analysis (use the out.anova argument).

All linear models (also GLMs) output results (including p-values) from anova/drop1 functions and are thus testing the predictor variable in one go. For your final analysis you can run post-hoc tests for all pairwise comparisons. If you are interested in testing treatments against a common baseline/control (i.e. intercept), you can set out.anova = FALSE. This will output results from the 2. level of the predictor compared to the intercept. This is because only the 2. level is spiked when predictor contains multiple levels. In your final analysis you can get an output with all p-values (See more here).

All limma models output results (including p-values) from topTable testing all levels (minus 1) against the intercept. This can be changed with out.anova.

DESeq2 is set to run Log Ratio Test (LRT) and is thus testing all levels of the predictor in one go.

EdgeR is set to test if all levels of predcitor (minus intercept) are zero.

If you have a paired/blocked experimental design:

E.g. repeated samples from same patients, or control/treatments inside blocks.

The paired argument should be a factor with the ID of the patient/subject/block (in the same order as columns in data):

mytest <- testDA(data, predictor, paired = subjectID)

When a paired argument is provided, the predictor is shuffled within the levels of the paired factor.

Paired analysis can be very slow. If you simply can't wait to see the results remove "neb" from the tests argument.

Some details on how methods use the paired argument:

t-test, wilcox test, friedman test, quade test, SAMseq and permutation test expect a balanced "unreplicated" design with one value for each combination of levels in the paired and predictor variables.

Negbinom/poisson glm, linear regression and limma models use the paired variable as a random intercept (i.e. they become mixed-effect models) and are very flexible regarding design and work well for unbalanced designs.

EdgeR, DESeq2 and metagenomeSeq ZIG include the paired variable as a covariable in the model and are also generally flexible in the design.

ANCOM use the paired variable in a repeated measures manner

If you have non-relative abundances, e.g. for normalized protein abundance:

mytest <- testDA(data, predictor, relative = FALSE)

If relative=FALSE the values in data will NOT be normalized by the sum of the samples for "ttt", "ltt", "ltt2", "wil", "per", "lrm", "llm", "llm2", "lim", "lli", "lli2", "pea", "spe", "aov", "lao", "lao2" and "kru", and there will NOT be an offset of log(librarySize) for "neb", "poi", "qpo", "zpo" and "znb". Furthermore, tests that are specifically designed to handle sequence data are turned off: DESeq2, EdgeR, MetagenomeSeq, ANCOM, RAIDA, ALDEx2 and baySeq. Therefore, the data is analysed as provided, except for the tests that log-transform the data: "ltt", "llm", "lli" and "lao".

If you have covariates:

The covars argument should be a named list with the covariables (in the same order as columns in data):

mytest <- testDA(data, predictor, covars = list(Age = subject.age, Date = exp.date))

If you have a phyloseq object:

data can also be a phyloseq object. In this case, the predictor, paired and covars arguments are the names of the variables in sample_data(data):

mytest <- testDA(data, predictor = "Time", paired = "Patient", covars = c("Age","ExpDate"))

Check the output:

plot(mytest)
summary(mytest)

# Details from the run:
mytest$details

# Average run times of each method:
mytest$run.times

Note:

As ANCOM and SAMseq do not output p-values, AUC and Spike.detect.rate for "anc" and "sam" are based on pseudo p-values. They are calculated from the statistics/scores as these are perfectly ranked according to detection/significance calling: Pseudo p-value = the inverse statistic/score, normalized such that, of the detected ("significant") features, the feature with the lowest statistic/score has a pseudo p-value = 0.05. Higher statistic/score gives lower pseudo p-value and vice versa. For SAMseq, it is done on the ranked scores, and if effectSize is below 1 the scores are not inversed such that high negative scores gives low pseudo p-values. FPR is also based on pseudo p-values for "anc" and "sam", but as these cannot be adjusted as nominal p-values, FPR for these methods is the final false discovery rate and we should expect an FPR = 0 for these two methods, unless you are willing to accept some false positives. This can be tuned with the sig ("anc") and fdr.output ("sam") arguments.

P-values for baySeq are defined as 1 - posterior likelihoods.

How to choose the effect size:

The default effect size of 2 (equal to a log2 fold change of 1) might not fit for your data. If the best method has an AUC below 0.7 you might want to increase the effect size. In contrast, if several methods have an AUC close to 1, you might want to decrease the effect size (towards one) to better differentiate the methods. It is also possible to test for negative associations with effect sizes between 0 and 1, but this has not been tested extensively.

How to run real data

All tests can easily be run with the original data. E.g. edgeR exact test:

res.ere <- DA.ere(data, predictor)

All methods can be accessed in the same way; DA."test" where "test" is the abbreviation given in the details of the testDA function.

It is advised to set allResults = TRUE for checking final results. For all methods where relevant, this will output the raw results, often in a list with each element corresponding to a feature (OTU/gene/protein). For published methods, it is advised to check their tutorials on how to read to output.

  • IMPORTANT:
    • Set out.anova similar in testDA as in your final analysis for reliable comparison
    • If your predictor has more than two levels you might have to set the by argument for "zig" (this is by default = 2)
    • All linear models (including all GLMs) output the p-value from an anova/drop1 function. This can be changed with the out.anova argument
    • All limma models (lim,lli,lli2,vli) test all levels of the predictor against an intercept (with topTable). This can be changed with the out.anova argument
    • For ANCOM: If the FPR = 0, you would not expect false positives with the default settings. If "anc" has an FPR > 0, set multcorr = 1

For linear models the drop1/anova functions can be used to test significance of the predictor and covars variables:

### Apply drop1 for each feature and output the adjusted p-values:
# Works on "zpo", "znb", "qpo", "neb", "poi". Non-paired "lrm", "llm", "llm2"
results <- DA.lrm(data, predictor, allResults = TRUE)
res.drop1 <- DA.drop1(results)

### Apply anova for each feature and output the adjusted p-values:
# Works on "lrm", "llm", "llm2". Non-paired "neb"
results <- DA.lrm(data, predictor, allResults = TRUE)
res.anova <- DA.anova(results)

For anova and linear models we can also run post-hoc tests for all pairwise comparisons of multi-class predictor/covars.

### Apply TukeyHSD for each feature for a selected variable and output the adjusted p-values:
# Works on "aov", "lao", "lao2"
results <- DA.aov(data, predictor, allResults = TRUE)
res.tukey <- DA.TukeyHSD(results, variable = "predictor") # variable can also be the name of a covar

### Apply lsmeans for each feature for a selected variable and output the adjusted p-values:
# This requires the lsmeans package.
# Works on "poi", "neb", "lrm", "llm", "llm2", "qpo", "znb", "zpo"
results <- DA.lrm(data, predictor, allResults = TRUE)
res.lsm <- DA.lsmeans(results, variable = "predictor") # variable can also be the name of a covar

# For paired "lrm", "llm", "llm2" the original predictor variable has to be supplied. For example:
results <- DA.lrm(data, predictor = mypred, paired = SubjectID,  allResults = TRUE)
res.lsm <- DA.lsmeans(results, variable = "predictor", predictor = mypred)

# and if covars are used, they also need to be supplied for paired "lrm", "llm", "llm2". For example:
results <- DA.lrm(data, predictor = mypred, paired = SubjectID, covars = list(covar1 = mycovar),  allResults = TRUE)
res.lsm <- DA.lsmeans(results, variable = "predictor", predictor = mypred, covars = list(covar1 = mycovar))

Alternatively, run all (or several) methods and check which features are found by several methods.

# Run many methods:
res.all <- allDA(data, predictor)

# Which features are detected by several methods:
View(res.all$table)

# Venn diagram of detected features from selected methods:
# This requires the venneuler package.
vennDA(res.all, tests = c("wil","ttt","ltt"))

# See results from a method (e.g. t.test "ttt"):
View(res.all$results$ttt)

A subset of methods can be run by setting the tests argument. E.g. only those performing well based on results from testDA.

How NOT to choose a method

Choosing a method based on the results run from the real data, e.g. the one with most significant features or with the most interesting results, will most likely give you an inflated false positive rate, even though the method has a low FPR based on the testDA function. This is because testDA estimates FPR indpendently for each method, but if a method is non-randomly chosen from a set of methods they are no longer independent.

Implemented methods

Is your favorite method missing?

Either add it yourself (see under 'Extra features'), or write to me, preferably with a code snippet of the implementation (see email in Description).

Paired permutation test

A paired permutation test is implemented specifically for this package. The test is similar to the original, but with a different test statistic and permutation scheme. The permutations are constrained in the paired version such that the predictor is only permuted within each level of the paired argument (e.g. subjects). The test statistic first finds the log-ratio between the two predictor levels (e.g. case and control) for each level of the paired argument and the final statistic is the mean of these log-ratios.

Extra features

Adding user-defined methods

"zzz" (DA.zzz) is a placeholder for user-defined methods. You have to supply it with a function whose input is: A count_table (data.frame, samples are columns, rownames indicate features), a predictor (vector), a paired variable (factor), and a covars argument (named list with vectors). It is OK if your function doesn't use paired/covars, but they have to be there in the arguments. The output from the user-defined function should be a data.frame that at least includes: The names of the features ("Feature"), p-values ("pval"), and name of method ("Method").

See example below on how to include a simple t-test on relative abundances:

# Define our function
myfun <- function(count_table, predictor, paired, covars){ # These fours arguments should not be altered
  
  # Relative abundance
  rel <- apply(count_table, 2, function(x) x/sum(x))
  
  # t-test function
  tfun <- function(x){
    tryCatch(t.test(x ~ predictor)$p.value, error = function(e){NA}) 
  }
  
  # P-values for each feature
  pvals <- apply(rel, 1, tfun)
  
  # Collect and return data
  df <- data.frame(Feature = rownames(count_table),
                   pval = pvals)
  df$Method <- "My own t-test"
  return(df)
  
}

# Test that it works on real data
res <- DA.zzz(data, predictor, FUN = myfun)

# Use it in the testDA function
mytest <- testDA(data, predictor, tests = c("zzz","ttt","wil"), args = list(zzz = list(FUN = myfun)))

# Several user-defined methods can be included by simply putting numbers after "zzz" in both the "tests" and "args" arguments:
mytest <- testDA(data, predictor, tests = c("zzz","zzz2","ttt","wil"), args = list(zzz = list(FUN = myfun),
                                                                                   zzz2 = list(FUN = myfun2)))

Caution: If "zzz" is in the tests argument, there will be no checks of whether any of the supplied methods are suitable for the data. If e.g. your predictor is quantitative and "ttt" is in the tests argument, it will try to run a t-test and fail miserably.

Results from all the runs:

print(mytest)

See the output from the individual methods. E.g. "ere" first run:

View(mytest$results[[1]]["ere"])

Plot p-value histograms for all the non-spiked features:

plot(mytest, p = TRUE)

Passing arguments to the different tests

Additional arguments can be passed to the internal functions with the args argument. It should be structured as a list with elements named by the tests:

E.g. passing to the DA.per function that it should only run 1000 iterations:

mytest <- testDA(...,args = list(per=list(noOfIterations=1000)))

Include that the log t.test should use a pseudocount of 0.1:

mytest <- testDA(...,args = list(per=list(noOfIterations=1000), ltt=list(delta=0.1)))

Additional arguments are simply seperated by commas.

Below is an overview of which functions get the arguments that are passed to a specific test:

  • per - Passed to DA.per
  • bay - Passed to getPriors and getLikelihoods
  • adx - Passed to aldex
  • wil - Passed to wilcox.test and DA.wil
  • ttt - Passed to t.test and DA.ttt
  • ltt - Passed to t.test and DA.ltt
  • ltt2 - Passed to t.test and DA.ltt2
  • neb - Passed to glm.nb and glmer.nb
  • ere - Passed to calcNormFactors, estimateCommonDisp, estimateTagwiseDisp and exactTest
  • erq - Passed to calcNormFactors, estimateDisp, glmQLFit and glmQLFTest
  • msf - Passed to fitFeatureModel
  • zig - Passed to fitZig
  • ds2 - Passed to DESeq
  • lim - Passed to eBayes and lmFit
  • lli - Passed to eBayes, lmFit and DA.lli
  • lli2 - Passed to eBayes, lmFit and DA.lli2
  • kru - Passed to kruskal.test
  • aov - Passed to aov
  • lao - Passed to aov and DA.lao
  • lao2 - Passed to aov and DA.lao2
  • lrm - Passed to lm and lme
  • llm - Passed to lm, lme and DA.llm
  • llm2 - Passed to lm, lme and DA.llm2
  • rai - Passed to raida
  • spe - Passed to cor.test
  • pea - Passed to cor.test
  • poi - Passed to glm and glmer
  • qpo - Passed to glm
  • vli - Passed to voom, eBayes and lmFit
  • zpo - Passed to zeroinfl
  • znb - Passed to zeroinfl
  • fri - Passed to friedman.test
  • qua - Passed to quade.test
  • anc - Passed to ANCOM
  • sam - Passed to SAMseq

Errors and Issues

JAVA_HOME error

You might get an error with the "rJava" package. This will not be a problem for the workings of any of the main functionalities of this package, but might interfere with the vennDA function. See here for solutions