MAST unable to estimate components when including mixed effect #98

gabprg · 2018-08-01T17:46:29Z

Hi there!

I've just recently encountered this issue and was hoping to see whether you guys could help with understanding what's going on: I want to identify differentially expressed genes between 2 conditions. We have 4 replicates (samples from different animals) per condition, which were processed in 2 batches (2 from each condition/batch).

The formula I used to fit the model is: ~ Condition + Batch + nGenes + (1|animal) . One of the known differences between the 2 conditions is that a transgene is only expressed in one of them (and the signal is very robust). From previous posts, I understand that the continuous component of the model can't be computed in that scenario, which is what I see if I fit the model with the formula as above, just without the mixed effect. However, when fitting with the mixed effect, the discrete component is no longer estimated, and a coefficient appears for the continuous component (although not significant). I'm trying to understand why this is happening, since this transgene is essentially a positive control for differential expression, and isn't showing up. Do you have any insights on what's causing this (/how to resolve it)?

Here are the relevant rows from the summary object ("sum.lrt <- MAST::summary(mast.zlmFit, doLRT = T)") using the 2 formulas:

~ Condition + Batch + nGenes:

~ Condition + Batch + nGenes + (1|animal):

Thanks!

Gabriela

gfinak · 2018-08-01T21:25:18Z

This does seem odd.
Can you post a minimal reproducible example (with data) that we could use to investigate?

gabprg · 2018-08-07T15:42:19Z

Hi again!

Sorry for the delay - I've put together a minimal SingleCellAssay object with 3 genes - 2 of those are other random housekeeping genes which are expressed across the board, and then the 3rd is the transgene I mentioned above. Condition 1 corresponds to the one where the transgene is robustly expressed, it's absent in the other cases. I'm attaching the SingleCellAssay object (rds format). You should be able to re-obtain the result above by running the following code (load the rds object as mast.raw):

`
print("Fitting the zlm model...")
mast.zlmFit <- zlm(
formula = formula(~Condition + Batch + (1|animal) + ngeneson),
sca = mast.raw,
parallel = FALSE,
method = 'glmer',
ebayes = FALSE
)

sum.lrt <- MAST::summary(mast.zlmFit, doLRT = T)
summ1 = sum.lrt$datatable

mast.zlmFit2 <- zlm(
formula = formula(~Condition + Batch + ngeneson),
sca = mast.raw,
parallel = FALSE,
method = 'glm',
ebayes = FALSE
)

sum.lrt2 <- MAST::summary(mast.zlmFit2, doLRT = T)
summ2 = sum.lrt2$datatable
`

I suspect that part of the problem is due to the fact that we're trying to fit a complex model with relatively low cell numbers - if that is the case, it'd be great if you had recommendations on alternative ways to run MAST on this data or even other tools that could be used.

Let me know if anything is unclear, and thanks for the help!

mastExampleObject.zip

gfinak · 2018-08-07T18:53:02Z

Could you post your sessionInfo()?
I get the following error reading your SingleCellAssay object:

Error in updateObject(x@rowRanges, check = FALSE) : 
  no slot of name "rowRanges" for this object of class "SingleCellAssay"

Just looking into what changes have occurred in SummarizedExperiment.

I am using SummarizedExperiment v 1.10.1

NEWS file indicates:

    o rowRanges() now is supported on a SummarizedExperiment object that is
      not a RangedSummarizedExperiment, and returns NULL. Also doing
      'rowRanges(x) <- NULL' on a RangedSummarizedExperiment object now is
      supported and degrades it to a SummarizedExperiment instance.

I wonder if there is some backward-incompatibility.

gfinak · 2018-08-07T19:40:08Z

Note: I've managed to rescue the file.

gabprg · 2018-08-07T19:53:53Z

Here's the output:

sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 13.04

Matrix products: default
BLAS: /usr/lib/openblas-base/libopenblas.so.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] lme4_1.1-15 MAST_1.4.1 SummarizedExperiment_1.6.5 DelayedArray_0.4.1 matrixStats_0.53.1
[6] Biobase_2.36.2 GenomicRanges_1.28.6 GenomeInfoDb_1.12.3 IRanges_2.12.0 S4Vectors_0.16.0
[11] BiocGenerics_0.24.0 Seurat_2.2.1 Matrix_1.2-12 cowplot_0.9.2 ggplot2_2.2.1

loaded via a namespace (and not attached):
[1] backports_1.1.2 Hmisc_4.1-1 fastmatch_1.1-0 VGAM_1.0-5 sn_1.5-1 plyr_1.8.4
[7] igraph_1.2.1 lazyeval_0.2.1 splines_3.4.1 BiocParallel_1.10.1 digest_0.6.15 foreach_1.4.4
[13] htmltools_0.3.6 GOSemSim_2.4.1 lars_1.2 GO.db_3.4.1 gdata_2.18.0 magrittr_1.5
[19] checkmate_1.8.5 memoise_1.1.0 cluster_2.0.6 mixtools_1.1.0 ROCR_1.0-7 sfsmisc_1.1-2
[25] recipes_0.1.2 gower_0.1.2 dimRed_0.1.0 R.utils_2.6.0 colorspace_1.3-2 blob_1.1.0
[31] dplyr_0.7.4 RCurl_1.95-4.10 bindr_0.1.1 survival_2.41-3 iterators_1.0.9 ape_5.1
[37] glue_1.2.0 DRR_0.0.3 gtable_0.2.0 ipred_0.9-6 zlibbioc_1.24.0 XVector_0.16.0
[43] kernlab_0.9-25 ddalpha_1.3.1.1 prabclus_2.2-6 DEoptimR_1.0-8 abind_1.4-5 scales_0.5.0
[49] DOSE_3.4.0 mvtnorm_1.0-7 DBI_0.8 Rcpp_0.12.17 dtw_1.18-1 metap_0.8
[55] htmlTable_1.11.2 tclust_1.3-1 proxy_0.4-21 foreign_0.8-69 bit_1.1-12 mclust_5.4
[61] SDMTools_1.1-221 Formula_1.2-2 tsne_0.1-3 lava_1.6 prodlim_1.6.1 htmlwidgets_1.0
[67] fgsea_1.3.1.9000 FNN_1.1 gplots_3.0.1 RColorBrewer_1.1-2 fpc_2.1-11 acepack_1.4.1
[73] modeltools_0.2-21 ica_1.0-1 pkgconfig_2.0.1 R.methodsS3_1.7.1 flexmix_2.3-14 nnet_7.3-12
[79] caret_6.0-79 tidyselect_0.2.4 rlang_0.2.0 reshape2_1.4.3 AnnotationDbi_1.40.0 munsell_0.4.3
[85] tools_3.4.1 RSQLite_2.0 ranger_0.8.0 ggridges_0.4.1 broom_0.4.3 stringr_1.3.0
[91] yaml_2.1.18 ModelMetrics_1.1.0 knitr_1.20 bit64_0.9-7 robustbase_0.92-8 caTools_1.17.1
[97] purrr_0.2.4 bindrcpp_0.2.2 pbapply_1.3-4 nlme_3.1-131.1 R.oo_1.21.0 RcppRoll_0.2.2
[103] DO.db_2.9 compiler_3.4.1 rstudioapi_0.7 tibble_1.4.2 stringi_1.1.7 lattice_0.20-35
[109] trimcluster_0.1-2 psych_1.7.8 nloptr_1.0.4 diffusionMap_1.1-0 pillar_1.2.1 irlba_2.3.2
[115] data.table_1.10.4-3 bitops_1.0-6 qvalue_2.10.0 R6_2.2.2 latticeExtra_0.6-28 KernSmooth_2.23-15
[121] gridExtra_2.3 codetools_0.2-15 MASS_7.3-49 gtools_3.5.0 assertthat_0.2.0 CVST_0.2-1
[127] withr_2.1.1 mnormt_1.5-5 GenomeInfoDbData_1.0.0 diptest_0.75-7 clusterProfiler_3.6.0 grid_3.4.1
[133] rpart_4.1-13 timeDate_3043.102 tidyr_0.8.0 class_7.3-14 minqa_1.2.4 rvcheck_0.0.9
[139] segmented_0.5-3.0 Rtsne_0.13 numDeriv_2016.8-1 lubridate_1.7.3 scatterplot3d_0.3-41 base64enc_0.1-3

gfinak · 2018-08-07T22:32:22Z

The attached file can be read in by MAST under R > 3.5.0 and associated Bioconductor packages.

mastexample.zip

I can fit the discrete part by hand when passing nAGQ=0 to glmer.
Maybe we need to add the ability to pass arguments to the fit function.

amcdavid · 2018-08-08T02:36:59Z

Sorry, I am late to the party and haven't read this thread through in detail yet. We do have this ability. We can pass arguments to the discrete and continuous "fit" methods with fitArgsC or fitArgsD = list(...). See

MAST/R/lmWrapper-glmer.R

Line 223 in 521ef98

    
           object@fitC <- do.call(cfun, c(list(formula=formC, data=quote(datpos), REML=FALSE), fitArgsC))

for how it's used under the hood. The documentation for this is deficient, but there is something under
`?"LMlike-class"'

gfinak · 2018-08-08T16:40:19Z

library(dplyr)
mast.raw <- readRDS("mast_rescued_example.rds")
summary(zlm(~ Condition + (1|Batch), sca = mast.raw, method = "glmer", ebayes = FALSE, fitArgsD = list(nAGQ = 0)))$datatable %>% 
filter(primerid=="transgene")

Gives the following:

fixed-effect model matrix is rank deficient so dropping 2 columns / coefficients

Done!
Combining coefficients and standard errors
Calculating log-fold changes
    primerid component    contrast       ci.hi        ci.lo          coef
1  transgene         D (Intercept) 6048.941514 -6090.025397 -2.054194e+01
2  transgene         S (Intercept)          NA           NA            NA
3  transgene         C (Intercept)    4.194017     2.904607  3.549312e+00
4  transgene         C  Condition1          NA           NA  3.549312e+00
5  transgene         S  Condition1          NA           NA            NA
6  transgene     logFC  Condition1         NaN          NaN  6.157843e+00
7  transgene         D  Condition1 6091.904201 -6047.062776  2.242071e+01
8  transgene         C  Condition2          NA           NA  3.549312e+00
9  transgene         S  Condition2          NA           NA            NA
10 transgene     logFC  Condition2         NaN          NaN  4.271039e-09
11 transgene         D  Condition2 9159.859938 -9159.856155  1.891544e-03
               z
1  -6.633426e-03
2   7.625161e+00
3   1.079024e+01
4             NA
5             NA
6            NaN
7   7.240120e-03
8             NA
9             NA
10           NaN
11  4.047397e-07

The continuous part has a model.matrix of rank 1 for the transgene as it's only expressed in condition 1.
The discrete part has complete separation between conditions.
I don't think there's much to be done if you want to fit a random effects model at this time.

The following is a little more reasonable:

> summary(zlm(~Condition+Batch,sca=mast.raw,method="bayesglm",ebayes=TRUE))$datatable %>% filter(primerid=="transgene")

Done!
Combining coefficients and standard errors
Calculating log-fold changes
    primerid component    contrast      ci.hi       ci.lo        coef
1  transgene         D (Intercept) -1.8729913 -6.14760085 -4.01029607
2  transgene         S (Intercept)         NA          NA          NA
3  transgene         C (Intercept)  3.4800833  2.69328621  3.08668474
4  transgene         D      Batch2  1.5481903 -0.79105237  0.37856896
5  transgene     logFC      Batch2  0.1757916 -0.07852615  0.04863272
6  transgene         S      Batch2         NA          NA          NA
7  transgene         C      Batch2  1.4946451  0.36616222  0.93040364
8  transgene         C  Condition1         NA          NA          NA
9  transgene         S  Condition1         NA          NA          NA
10 transgene     logFC  Condition1        NaN         NaN         NaN
11 transgene         D  Condition1  7.7171705  3.54865402  5.63291228
12 transgene         C  Condition2         NA          NA          NA
13 transgene         S  Condition2         NA          NA          NA
14 transgene     logFC  Condition2        NaN         NaN         NaN
15 transgene         D  Condition2  2.4696289 -4.63723070 -1.08380089
            z
1  -3.6775456
2   8.2736652
3  15.3782752
4   0.6343776
5   0.7496007
6   2.7338533
7   3.2318748
8          NA
9          NA
10        NaN
11  5.2969948
12         NA
13         NA
14        NaN
15 -0.5977917

and you can add animal and ngeneson fixed effects to the model.

gabprg · 2018-08-08T16:57:22Z

Thanks for looking into this! I'll go ahead and try some of your suggestions.

gfinak mentioned this issue Aug 1, 2018

Nested batch effects in MAST #97

Open

fbrundu mentioned this issue Jun 15, 2020

NaN in coefficient when using model with several covariates (but FDR < 1) #135

Closed

gfinak mentioned this issue Dec 4, 2020

Problems when fitting a linear-mixed model ('glmer') with sc-RNAseq data #148

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAST unable to estimate components when including mixed effect #98

MAST unable to estimate components when including mixed effect #98

gabprg commented Aug 1, 2018

gfinak commented Aug 1, 2018

gabprg commented Aug 7, 2018

gfinak commented Aug 7, 2018 •

edited

Loading

gfinak commented Aug 7, 2018

gabprg commented Aug 7, 2018

gfinak commented Aug 7, 2018

amcdavid commented Aug 8, 2018

gfinak commented Aug 8, 2018

gabprg commented Aug 8, 2018

MAST unable to estimate components when including mixed effect #98

MAST unable to estimate components when including mixed effect #98

Comments

gabprg commented Aug 1, 2018

gfinak commented Aug 1, 2018

gabprg commented Aug 7, 2018

gfinak commented Aug 7, 2018 • edited Loading

gfinak commented Aug 7, 2018

gabprg commented Aug 7, 2018

gfinak commented Aug 7, 2018

amcdavid commented Aug 8, 2018

gfinak commented Aug 8, 2018

gabprg commented Aug 8, 2018

gfinak commented Aug 7, 2018 •

edited

Loading