caterpillar plots for a subset of the grouping factors #544

dmbates · 2021-07-20T21:58:26Z

dmbates
Jul 20, 2021
Maintainer

@kliegl has been working on a very large data set where one of the grouping factors, Student, has over 100,000 levels. The next grouping factor, School, has about 500 levels. He would like to create caterpillar plots of the prediction intervals for the random effects for school. At present it is necessary to create the conditional variance, condVar, results for all the grouping factors to do this - which means that there is a tremendous amount of effort devoted to evaluating conditional variances of the random effects for each of the students, even though it is known that they won't be used.

So we should work out a way to create the conditional covariance matrices for a "trailing" set of grouping factors. Alternatively, we can define condVar to apply to only one grouping factor because we only plot one set of random effects at a time in the caterpillar plot.

Restricting to a single grouping factor is aided by the fact that the central calculation is solving a system with the left hand side being the random-effects part of the blocked lower Cholesky factor L, and leading zeros are propagated in such a solution. That is, when working with a lower triangular matrix, as in the call to ldiv! below, you don't need the part of L corresponding to Student if you only want information on School random effects. The hand-waving explanation of this is that lower triangular systems can be solved with a "forward solve" so if you start off with a whole bunch of leading zeros on the right hand side these propagate to the solution. You do need to take into account blocks to the right of the one where you start (Cohort in Reinhold's example) but not those to the left.

Currently the condVar method is defined as

function condVar(m::LinearMixedModel{T}) where {T}
    s = sdest(m)
    @static if VERSION < v"1.6.1"
        spL = LowerTriangular(SparseMatrixCSC{T,Int}(sparseL(m)))
    else
        spL = LowerTriangular(sparseL(m))
    end
    nre = size(spL, 1)
    val = Array{T,3}[]
    offset = 0
    for (i, re) in enumerate(m.reterms)
        λt = s * transpose(re.λ)
        vi = size(λt, 2)
        ℓi = length(re.levels)
        vali = Array{T}(undef, (vi, vi, ℓi))
        scratch = Matrix{T}(undef, (size(spL, 1), vi))
        for b in 1:ℓi
            fill!(scratch, zero(T))
            copyto!(view(scratch, (offset + (b - 1) * vi) .+ (1:vi), :), λt)
            ldiv!(spL, scratch)
            mul!(view(vali, :, :, b), scratch', scratch)
        end
        push!(val, vali)
        offset += vi * ℓi
    end
    return val
end

where sparseL creates a sparse version of the part of the L factor associated with the random effects.

For example, using a model fit to the :mrk17_exp1 data set

julia> using LinearAlgebra, MixedModels

julia> m6 = restoreoptsum!(
                  LinearMixedModel(
                      @formula(1000 / rt ~ 1 + F*P*Q*lQ*lT + (1+F+P+Q+lQ+lT|subj) + (1+P+Q+lQ+lT|item)),
                      MixedModels.dataset(:mrk17_exp1);
                      contrasts = Dict(
                          :F => EffectsCoding(),
                          :P => EffectsCoding(),
                          :Q => EffectsCoding(),
                          :lQ => EffectsCoding(),
                          :lT => EffectsCoding(),
                          :subj => Grouping(),
                          :item => Grouping(),
                       ),
                   ),
                   "/var/tmp/m6optsum.json",
               )
Linear mixed model fit by maximum likelihood
 :(1000 / rt) ~ 1 + F + P + Q + lQ + lT + F & P + F & Q + P & Q + F & lQ + P & lQ + Q & lQ + F & lT + P & lT + Q & lT + lQ & lT + F & P & Q + F & P & lQ + F & Q & lQ + P & Q & lQ + F & P & lT + F & Q & lT + P & Q & lT + F & lQ & lT + P & lQ & lT + Q & lQ & lT + F & P & Q & lQ + F & P & Q & lT + F & P & lQ & lT + F & Q & lQ & lT + P & Q & lQ & lT + F & P & Q & lQ & lT + (1 + F + P + Q + lQ + lT | subj) + (1 + P + Q + lQ + lT | item)
   logLik   -2 logLik     AIC       AICc        BIC    
 -3573.7752  7147.5504  7285.5504  7286.1416  7817.2358

Variance components:
            Column    Variance   Std.Dev.    Corr.
item     (Intercept)  0.00320193 0.05658561
         P: unr       0.00012897 0.01135662 -0.05
         Q: deg       0.00016004 0.01265087 -0.36 +0.38
         lQ: deg      0.00003496 0.00591235 -0.37 +0.03 +0.03
         lT: WD       0.00015741 0.01254638 -0.11 +0.87 +0.01 +0.35
subj     (Intercept)  0.03061731 0.17497802
         F: LF        0.00004444 0.00666663 -0.36
         P: unr       0.00012734 0.01128433 -0.35 +0.89
         Q: deg       0.00079011 0.02810896 -0.41 +0.45 +0.73
         lQ: deg      0.00011615 0.01077708 -0.06 +0.17 +0.18 +0.58
         lT: WD       0.00104563 0.03233622 +0.26 +0.10 +0.02 -0.37 -0.51
Residual              0.08569933 0.29274449
 Number of obs: 16409; levels of grouping factors: 240, 73

  Fixed-effects parameters:
──────────────────────────────────────────────────────────────────────────────────────
                                                   Coef.  Std. Error       z  Pr(>|z|)
──────────────────────────────────────────────────────────────────────────────────────
(Intercept)                                  1.63747      0.0209291    78.24    <1e-99
F: LF                                       -0.019249     0.00438422   -4.39    <1e-04
P: unr                                      -0.01883      0.00274759   -6.85    <1e-11
Q: deg                                      -0.0427489    0.00409441  -10.44    <1e-24
lQ: deg                                     -0.00162212   0.00266016   -0.61    0.5420
lT: WD                                       0.00839459   0.00450703    1.86    0.0625
F: LF & P: unr                              -0.00720568   0.00241009   -2.99    0.0028
F: LF & Q: deg                              -0.00139393   0.0024356    -0.57    0.5671
P: unr & Q: deg                              0.00138021   0.00229629    0.60    0.5478
F: LF & lQ: deg                             -0.000987806  0.00234793   -0.42    0.6740
P: unr & lQ: deg                            -0.00238929   0.00231526   -1.03    0.3021
Q: deg & lQ: deg                             0.00775559   0.00231437    3.35    0.0008
F: LF & lT: WD                              -0.000473814  0.00244671   -0.19    0.8464
P: unr & lT: WD                              4.85807e-5   0.00231302    0.02    0.9832
Q: deg & lT: WD                             -0.00169322   0.00231266   -0.73    0.4641
lQ: deg & lT: WD                             0.00531033   0.00231237    2.30    0.0216
F: LF & P: unr & Q: deg                      0.000308206  0.00229632    0.13    0.8932
F: LF & P: unr & lQ: deg                     0.00134673   0.00231567    0.58    0.5609
F: LF & Q: deg & lQ: deg                    -0.00264547   0.00231785   -1.14    0.2537
P: unr & Q: deg & lQ: deg                    0.00402448   0.00231689    1.74    0.0824
F: LF & P: unr & lT: WD                      0.00200189   0.00231408    0.87    0.3870
F: LF & Q: deg & lT: WD                     -0.00120572   0.00231126   -0.52    0.6019
P: unr & Q: deg & lT: WD                     0.000136496  0.00231455    0.06    0.9530
F: LF & lQ: deg & lT: WD                     0.0015969    0.0023174     0.69    0.4908
P: unr & lQ: deg & lT: WD                   -1.08634e-5   0.0023171    -0.00    0.9963
Q: deg & lQ: deg & lT: WD                    0.00893863   0.00231446    3.86    0.0001
F: LF & P: unr & Q: deg & lQ: deg           -0.00221633   0.00231481   -0.96    0.3383
F: LF & P: unr & Q: deg & lT: WD             0.00135658   0.00231532    0.59    0.5579
F: LF & P: unr & lQ: deg & lT: WD           -0.00289712   0.00231786   -1.25    0.2113
F: LF & Q: deg & lQ: deg & lT: WD           -0.00402424   0.00231737   -1.74    0.0825
P: unr & Q: deg & lQ: deg & lT: WD          -0.00186401   0.00231764   -0.80    0.4212
F: LF & P: unr & Q: deg & lQ: deg & lT: WD   0.00129047   0.00231497    0.56    0.5772
──────────────────────────────────────────────────────────────────────────────────────

the blocks of A and L are

julia> BlockDescription(m6)
rows:     item          subj         fixed     
1200:   BlkDiag    
 438:    Dense     BlkDiag/Dense 
  33:    Dense         Dense         Dense

but the last row corresponds to the fixed-effects and the response. Thus sparseL returns a sparse lower triangular matrix of dimension 1638 (= 1200 + 438)

julia> spL = sparseL(m6)
1638×1638 SparseArrays.SparseMatrixCSC{Float64, Int32} with 492597 stored entries:
⠓⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠈⠳⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠳⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠈⠑⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠈⠓⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠳⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠱⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠓⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠳⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠳⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠑⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠓⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠳⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠱⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠓⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠳⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠳⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠑⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠓⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣷⣄⠀⠀⠀⠀⠀⠀
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣷⣄⠀⠀⠀⠀
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣷⣄⠀⠀
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣷⣄

The point of describing all this is that to evaluate the conditional variance-covariance for the random effects for subj, you don't need the part on the left. You only need the 438 by 438 part in the lower right. In Reinhold's example the whole matrix will be on the order of 600,000 by 600,000 but only a few hundred rows and columns are involved in obtaining the conditional covariance matrices for the Schools.

My plan right now is to rename sparseL as ranefL or something like that and have a second argument of a grouping factor name, which will default to the first grouping fact. For the first grouping factor the result will be essentially what it is now except that the sparse matrix will be passed through densify, which will make it dense unless the sparse matrix is sufficiently sparse to warrant keeping it that way. (Sparse matrices are only worth the trouble is there is a high degree of sparsity. In this case about 37% of the lower triangle is non-zero so it would be easier and faster to work with a dense matrix.)

julia> 492597 / MixedModels.kp1choose2(1638)
0.36696860186793073

If the grouping factor is not the first one in the fitted model then the L that is returned will be just the lower right block(s) from that position on. In Reinhold's case this will be the blocks corresponding to School and to Cohort for the School grouping factor.

I will explore this and report back on timings, etc.

palday · 2021-07-20T22:08:21Z

palday
Jul 20, 2021
Maintainer

I wouldn't rename for now (we just had a breaking release after all 😉), but add in a new function. We can look into formal deprecation/redirection as we get more familiar with the use. But otherwise, this sounds like a great optimization to have.

On the densify front: I need to give some thought about how to do it, but it might be worthwhile to expose that threshold to the user (for both this case and our use of it in model construction). This would also make it easier for us to test the performance implications of different thresholds. I think exposing it to the user would be nice because sparse matrices may still consume less memory and that may be worthwhile for cases where there is high memory pressure (whether because of the size of the problem or the limitations of somebody's computer). Going the other way, it might also be nice to go dense more often if we ever support using GPU-assisted computation.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

caterpillar plots for a subset of the grouping factors #544

{{title}}

Replies: 1 comment

{{title}}

Select a reply

caterpillar plots for a subset of the grouping factors #544

dmbates Jul 20, 2021 Maintainer

Replies: 1 comment

palday Jul 20, 2021 Maintainer

dmbates
Jul 20, 2021
Maintainer

palday
Jul 20, 2021
Maintainer