Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EDF and cAIC #383

Open
James-Thorson-NOAA opened this issue Nov 22, 2024 · 7 comments
Open

EDF and cAIC #383

James-Thorson-NOAA opened this issue Nov 22, 2024 · 7 comments

Comments

@James-Thorson-NOAA
Copy link
Collaborator

Nan Zheng and Noel Cadigan developed a generic and fast approximation to conditional AIC (Eq. 6 here). Besides calculating expected log-likelihood for out-of-sample data that share a random effect with fitted data (e.g., optimizing parsimony for interpolation, similar to simple-random crossvalidation), it calculates effective degrees of freedom (EDF) for spline components passed via the formula, which appear to closely match those the EDF calculated for blocks of random effects in mgcv.

Questions:

  1. If I provide a code snippet for cAIC using TMB, would you want a stand-along cAIC function, or to fold it in as an option in AIC.sdmTMB S3-generic?
  2. If I provide a snippet for EDF for GMRF and/or spline terms, any ideas on how this would be merged into the summary.sdmTMB output?
@seananderson
Copy link
Member

Cool!

It looks like the generic AIC() is defined as AIC(object, ..., k = 2) with ... being reserved for additional models. So, I don't think we have an option to add arguments there. The extractAIC() generic has a ... for further arguments, but I'm not sure that's as widely used as a user-facing function. So, maybe we should define cAIC()? I guess ultimately, to behave like AIC(), that should take multiple models in a .... Or add an option in the ... to extractAIC.sdmTMB(). I'm fine with either.

For EDF, perhaps we could add that as a 2nd column for smoothers where the SD is currently written and after GMRF SDs in parentheses or similar. I could worry about these minor formatting issues if there was an example of grabbing the EDF values.

Smooth terms:
           Std. Dev.
sds(depth)     13.07

Dispersion parameter: 13.68
Tweedie p: 1.58
Matérn range: 16.84
Spatial SD: 2.20
ML criterion at convergence: 2937.789

@James-Thorson-NOAA
Copy link
Collaborator Author

Thanks for the offline discussion! I just did a very quick PR, and will be back from travel on Dec-9 if there's any issues to resolve. Please tell me how I can help.

@ericward-noaa
Copy link
Collaborator

One more thing I was thinking about with this -- is it worth returning gradients for individual observations (or the hat matrix)? Similar to Jim's code for the 2024 Ecology paper (where grad_i or similar gets passed into cAIC() as what).

There's some related discussion over here with glmmTMB where they're working on similar: here (and this follow up)

@James-Thorson
Copy link
Contributor

The Eq-6 estimator for cAIC instead constructs EDF from the hessian matrix of random effects (so it's easy to calculate EDF by block or random effects, similar to mgcv estimator), not the hat matrix of observations (which allows you to calculate the EDF by block of observations similar to what I explored in the Ecology paper). I'll think a bit whether it's easy to crosswalk from the former to the latter.

I was planning on suggesting cAIC and EDF for glmmTMB once the paper comes back from review, but perhaps I'll post about the arXiv preprint on those threads now too.

@seananderson
Copy link
Member

I've added EDF printing for smoothers. For now, I've disabled it by default, but it could be turned on by default for "smaller" models or with some option flag. For "big" models it introduces a lag, presumably because of the MakeADFun call. Perhaps another option would be to store the EDF values at the end of model fitting and retrieve them on each print... actually, I kind of like that idea.

@James-Thorson-NOAA, did you mention there might be a way to speed up the MakeADFun call?

Example:

library(sdmTMB)
fit <- sdmTMB(
  density ~ s(depth_scaled) + s(year, k = 4),
  family = delta_gamma(),
  data = pcod,
  spatial = "off"
)
print(fit, edf = TRUE)
#> Model fit by ML ['sdmTMB']
#> Formula: density ~ s(depth_scaled) + s(year, k = 4)
#> Mesh: NULL (isotropic covariance)
#> Data: pcod
#> Family: delta_gamma(link1 = 'logit', link2 = 'log')
#> 
#> Delta/hurdle model 1: -----------------------------------
#> Family: binomial(link = 'logit') 
#>               coef.est coef.se
#> (Intercept)      -0.41    0.07
#> sdepth_scaled    -1.14    1.11
#> syear            -0.87    0.27
#> 
#> Smooth terms:
#>                   Std. Dev.  EDF
#> sds(depth_scaled)      5.16 5.21
#> sds(year)              3.04 1.97
#> 
#> 
#> Delta/hurdle model 2: -----------------------------------
#> Family: Gamma(link = 'log') 
#>               coef.est coef.se
#> (Intercept)       4.04    0.07
#> sdepth_scaled    -0.58    0.79
#> syear            -0.65    0.27
#> 
#> Smooth terms:
#>                   Std. Dev.  EDF
#> sds(depth_scaled)      3.20 4.32
#> sds(year)              2.37 1.96
#> 
#> Dispersion parameter: 0.62
#> 
#> ML criterion at convergence: 6443.975
#> 
#> See ?tidy.sdmTMB to extract these values as a data frame.
cAIC(fit, "EDF")
#> 1LP-s(depth_scaled)         1LP-s(year) 2LP-s(depth_scaled)         2LP-s(year) 
#>            5.213252            1.969794            4.316564            1.960188

Created on 2024-12-14 with reprex v2.1.1

@seananderson
Copy link
Member

another option would be to store the EDF values at the end of model fitting and retrieve them on each print.

OK, this is now happening. A small lag at the end of model fitting but instant repeated printing. EDF printing for smoothers is on by default.

@James-Thorson
Copy link
Contributor

On the topic of how to speed this up ... does sdmTMB always use framework = "TMBad"?

If yes, then we could explore eliminating the random argument when calling MakeADFun, and using obj$he(parDataMode) instead of obj_new$env$f(parDataMode, order = 1, type = "ADGrad"). However, I'm always surprised by what steps are rate-limiting vs. eliminated by smart compilers ... it seems like eliminating the LA stuff might speed it up, but I might be wrong.

Anyhoo, I don't immediately have a slow example to play with, but for the pcod example in this thread the change doesn't affect the EDF or cAIC results (as expected).

Are you willing to try out the snippet below on a large model where cAIC is slow, or email me such a case for me to test?

snippet with changes to cAIC:

  obj_new <- TMB::MakeADFun(
    data = tmb_data,
    parameters = object$parlist,
    map = object$tmb_map,
    # REMOVE random
    #random = object$tmb_random,
    DLL = "sdmTMB",
    profile = NULL
  )

  par <- obj$env$parList()
  parDataMode <- obj$env$last.par
  indx <- obj$env$lrandom()
  q <- sum(indx)
  p <- length(object$model$par)

  ## use '-' for Hess because model returns negative loglikelihood
  if (is.null(object$tmb_random)) {
    cli_inform(c("This model has no random effects.", "cAIC and EDF only apply to models with random effects."))
    return(invisible(NULL))
  }
  # SWAP OUT hessian call
  #Hess_new <- -Matrix::Matrix(obj_new$env$f(parDataMode, order = 1, type = "ADGrad"), sparse = TRUE)
  Hess_new <- -Matrix::Matrix(obj_new$he(parDataMode), sparse = TRUE)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants