Skip to content

Commit

Permalink
final updates before adding new families
Browse files Browse the repository at this point in the history
  • Loading branch information
nicholasjclark committed Mar 29, 2023
1 parent 7b8472c commit b67467c
Show file tree
Hide file tree
Showing 47 changed files with 424 additions and 557 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: mvgam
Title: Multivariate Bayesian Generalized Additive Models for discrete time series
Version: 1.0.3
Date: 2022-08-03
Version: 1.0.4
Date: 2023-03-29
Authors@R:
person(given = "Nicholas",
family = "Clark",
Expand Down
5 changes: 0 additions & 5 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,6 @@ S3method(print,mvgam_prefit)
S3method(summary,mvgam)
S3method(summary,mvgam_prefit)
export("%>%")
export(add_base_dgam_lines)
export(add_poisson_lines)
export(add_stan_data)
export(add_trend_lines)
export(add_tweedie_lines)
export(code)
export(compare_mvgams)
Expand Down Expand Up @@ -40,5 +36,4 @@ export(roll_eval_mvgam)
export(series_to_mvgam)
export(sim_mvgam)
export(update_priors)
export(vectorise_stan_lik)
importFrom(magrittr,"%>%")
2 changes: 1 addition & 1 deletion R/add_base_dgam_lines.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#' Dynamic GAM model file additions
#'
#'
#' @export
#' @noRd
#' @param use_lv Logical (use latent variables or not?)
#' @param stan Logical (convert existing model to a Stan model?)
#' @param offset Logical (include an offset in the linear predictor?)
Expand Down
2 changes: 1 addition & 1 deletion R/add_poisson_lines.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#' Poisson JAGS modifications
#'
#'
#' @export
#' @noRd
#' @param model_file A template `JAGS` model file to be modified
#' @param upper_bounds Optional upper bounds for the truncated observation likelihood
#' @return A modified `JAGS` model file
Expand Down
2 changes: 1 addition & 1 deletion R/add_stan_data.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#' Add remaining data, model and parameter blocks to a Stan model
#'
#'
#' @export
#' @noRd
#' @param jags_file Prepared JAGS mvgam model file
#' @param stan_file Incomplete Stan model file to be edited
#' @param use_lv logical
Expand Down
2 changes: 1 addition & 1 deletion R/add_trend_lines.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#' Latent trend model file modifications
#'
#'
#' @export
#' @noRd
#' @param model_file A template `JAGS` or `Stan` model file to be modified
#' @param stan Logical (convert existing `JAGS` model to a `Stan` model?)
#' @param use_lv Logical (use latent variable trends or not)
Expand Down
8 changes: 5 additions & 3 deletions R/extract_cmdstanr.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Helper functions for converting cmdstanr objects to stanfit
# All functions were directly copied from brms and so all credit must
# go to the brms development team
#' Helper functions for converting cmdstanr objects to stanfit.
#' All functions were directly copied from `brms` and so all credit must
#' go to the `brms` development team
#' @noRd
#'
repair_variable_names <- function(x) {
x <- sub("\\.", "[", x)
x <- gsub("\\.", ",", x)
Expand Down
74 changes: 35 additions & 39 deletions R/mvgam.R
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,8 @@
#'@param use_stan Logical. If \code{TRUE} and if \code{rstan} is installed, the model will be compiled and sampled using
#'the Hamiltonian Monte Carlo with a call to \code{\link[cmdstanr]{cmdstan_model}} or, if `cmdstanr` is not available,
#'a call to \code{\link[rstan]{stan}}. Note that
#'not all options that are available in `JAGS` can be used, including no option for a Tweedie family.
#'However, as `Stan` can estimate Hilbert base approximate gaussian processes, which
#'are much more computationally tractable than full GPs for time series with `>100` observations, estimation
#'in `Stan` can support latent GP trends while estimation in \code{JAGS} cannot
#'there are many more options when using `Stan` vs `JAGS` (the only "advantage" of `JAGS` is the ability
#'to use a Tweedie family).
#'@param max_treedepth positive integer placing a cap on the number of simulation steps evaluated during each iteration when
#'`use_stan == TRUE`. Default is `12`. Increasing this value can sometimes help with exploration of complex
#'posterior geometries, but it is rarely fruitful to go above a `max_treedepth` of `14`
Expand Down Expand Up @@ -141,17 +139,16 @@
#'draws from the model's posterior distribution
#'\cr
#'\cr
#'*Using Stan*: A useful feature of `mvgam` is the ability to use Hamiltonian Monte Carlo for parameter estimation
#'via the software `Stan` (using either the `cmdstanr` or `rstan` interface). Note that the `rstan` library is
#'currently required for this option to work, even if using `cmdstanr` as the backend. This is because `rstan`'s functions
#'are needed to arrange the posterior samples into the correct format for all of \code{mvgam}'s other functions to work.
#'Also note that currently there is no support for
#'fitting `Tweedie` responses in `Stan`.
#'However there are great advantages when using `Stan`, which includes the option to estimate smooth latent trends
#'*Using Stan*: A feature of `mvgam` is the ability to use Hamiltonian Monte Carlo for parameter estimation
#'via the software `Stan` (using either the `cmdstanr` or `rstan` interface).
#'There are great advantages when using `Stan`, which includes the option to estimate smooth latent trends
#'via [Hilbert space approximate Gaussian Processes](https://arxiv.org/abs/2004.11408). This often makes sense for
#'ecological series, which we expect to change smoothly. In `mvgam`, latent squared exponential GP trends are approximated using
#'by default \code{40} basis functions, which saves computational costs compared to fitting full GPs while adequately estimating
#'GP \code{alpha} and \code{rho} parameters
#'GP \code{alpha} and \code{rho} parameters. Because of the many advantages of `Stan` over `JAGS`, further development
#'of the package will only be applied to `Stan`. This includes the planned addition of more response distributions,
#'plans to handle zero-inflation, and plans to incorporate a greater variety of trend models. Users are strongly encouraged
#'to opt for `Stan` over `JAGS` in any workflows
#'@author Nicholas J Clark
#'
#'@seealso \code{\link[mcgv]{jagam}}, \code{\link[mcgv]{gam}}
Expand Down Expand Up @@ -326,7 +323,7 @@ mvgam = function(formula,
threads = 1,
priors,
upper_bounds,
use_stan = FALSE,
use_stan = TRUE,
max_treedepth,
adapt_delta,
jags_path){
Expand Down Expand Up @@ -394,28 +391,33 @@ mvgam = function(formula,

# If Stan is to be used, make sure it is installed
if(use_stan & run_model){
if(!require(rstan)){
warning('rstan library is required but not found; setting run_model = FALSE')
run_model <- FALSE
if(!requireNamespace('rstan', quietly = TRUE)){
warning('rstan library not found; checking for cmdstanr library')

if(!requireNamespace('cmdstanr', quietly = TRUE)){
warning('cmdstanr library not found; setting run_model = FALSE')
run_model <- FALSE
}
}

}
}

# JAGS cannot support latent GP trends as there is no easy way to use Hilbert base
# approximation to reduce the computational demands
if(!use_stan & trend_model == 'GP'){
warning('gaussian process trends not yet supported for JAGS; reverting to Stan')
warning('gaussian process trends not supported for JAGS; reverting to Stan')
use_stan <- TRUE
}

if(use_stan & family == 'tw'){
warning('Tweedie family not yet supported for stan; reverting to JAGS')
warning('Tweedie family not supported for stan; reverting to JAGS')
use_stan <- FALSE
}

# If the model is to be run in JAGS, make sure the JAGS software can be located
if(!use_stan){
if(run_model){
if(!require(runjags)){
if(!requireNamespace('runjags', quietly = TRUE)){
warning('runjags library is required but not found; setting run_model = FALSE')
run_model <- FALSE
}
Expand All @@ -425,6 +427,7 @@ mvgam = function(formula,
if(!use_stan){
if(run_model){
if(missing(jags_path)){
require(runjags)
jags_path <- runjags::findjags()
}

Expand Down Expand Up @@ -1248,14 +1251,6 @@ mvgam = function(formula,
family = family,
upper_bounds = upper_bounds)

# Remove data likelihood and posterior predictions if this is a prior sampling run
if(prior_simulation){
stan_objects$stan_file <- stan_objects$stan_file[-c(grep('// likelihood functions',
stan_objects$stan_file):
(grep('// likelihood functions',
stan_objects$stan_file) + 6))]
}

model_data <- stan_objects$model_data
if(use_lv){
model_data$n_lv <- n_lv
Expand Down Expand Up @@ -1339,6 +1334,17 @@ mvgam = function(formula,
threads = threads,
trend_model = trend_model,
offset = offset)

# Remove data likelihood if this is a prior sampling run
if(prior_simulation){
vectorised$model_file <- vectorised$model_file[-c((grep('// likelihood functions',
vectorised$model_file,
fixed = TRUE) - 1):
(grep('generated quantities {',
vectorised$model_file,
fixed = TRUE) - 4))]
}

model_data <- vectorised$model_data

# Check if cmdstan is accessible; if not, use rstan
Expand Down Expand Up @@ -1393,7 +1399,7 @@ mvgam = function(formula,
init = inits,
max_treedepth = 12,
adapt_delta = 0.8,
iter_sampling = 600,
iter_sampling = samples,
iter_warmup = 200,
show_messages = FALSE,
diagnostics = NULL)
Expand All @@ -1415,16 +1421,6 @@ mvgam = function(formula,
variables = param)
out_gam_mod <- repair_stanfit(out_gam_mod)

# stanfit <- rstan::read_stan_csv(fit1$output_files(), col_major = TRUE)
# param_names <- row.names(rstan::summary(stanfit)$summary)
# stanfit@sim$samples <- lapply(seq_along(stanfit@sim$samples), function(x){
# samps <- as.list(stanfit@sim$samples[[x]])
# names(samps) <- param_names
# samps
# })
#
# out_gam_mod <- stanfit

} else {
require(rstan)
message('Using rstan as the backend')
Expand Down
2 changes: 1 addition & 1 deletion R/vectorise_stan_lik.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#' Vectorise a stan model's likelihood for quicker computation
#'
#'
#' @export
#' @noRd
#' @param model_file Stan model file to be edited
#' @param model_data Prepared mvgam data for Stan modelling
#' @param family \code{character}. Must be either 'Negative Binomial' or 'Poisson'
Expand Down
Binary file modified docs/README-unnamed-chunk-11-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-13-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-15-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-16-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-17-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-18-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-19-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-20-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-21-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-22-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-23-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-24-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-24-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-25-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-26-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/README-unnamed-chunk-26-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-27-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-28-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-29-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-30-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-31-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-32-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-33-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/README-unnamed-chunk-34-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/README-unnamed-chunk-35-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/README-unnamed-chunk-8-1.png
Binary file modified docs/README-unnamed-chunk-9-1.png
35 changes: 28 additions & 7 deletions docs/README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ The package can also be used to generate all necessary data structures, initial

## Installation
Install the development version from `GitHub` using:
`devtools::install_github("nicholasjclark/mvgam")`. Note that to actually condition models with MCMC sampling, either the `JAGS` software must be installed (along with the `R` packages `rjags` and `runjags`) or the `Stan` software must be installed (along with the package `rstan` and, optionally, the `cmdstanr` package). These are not listed as dependencies of `mvgam` to ensure that installation is less difficult. If users wish to fit the models using `mvgam`, please refer to installation links for `JAGS` [here](https://sourceforge.net/projects/mcmc-jags/files/) or for `Stan` and `rstan` [here](https://mc-stan.org/users/interfaces/rstan)).
`devtools::install_github("nicholasjclark/mvgam")`. Note that to actually condition models with MCMC sampling, either the `JAGS` software must be installed (along with the `R` packages `rjags` and `runjags`) or the `Stan` software must be installed (along with either `rstan` and/or `cmdstanr`). These are not listed as dependencies of `mvgam` to ensure that installation is less difficult. If users wish to fit the models using `mvgam`, please refer to installation links for `JAGS` [here](https://sourceforge.net/projects/mcmc-jags/files/), for `Stan` with `rstan` [here](https://mc-stan.org/users/interfaces/rstan), or for `Stan` with `cmdstandr` [here](https://mc-stan.org/cmdstanr/).

## Citing mvgam and related software
When using open source software (or software in general), please make sure to appropriately acknowledge the hard work that developers and maintainers put into making these packages available. Citations are currently the best way to formally acknowledge this work, so we highly encourage you to cite any packages that you rely on for your research.
Expand Down Expand Up @@ -102,13 +102,23 @@ plot_mvgam_series(data = lynx_train, y = 'population')
```

Now we will formulate an `mvgam` model; this model fits a GAM in which a cyclic smooth function for `season` is estimated jointly with a full time series model for the temporal process (in this case an `AR3` process), rather than relying on smoothing splines that do not incorporate a concept of the future. We assume the outcome follows a Poisson distribution. But before conditioning the model on observed data, a check of prior smooth function realisations is useful to ensure we are allowing enough flexibility to capture the types of functional behaviours we think are reasonable without allowing outrageous behaviours. First we follow conventional recommendations to set `k` for the smooth term to be large, which would allow maximum flexibility in functional behaviours
```{r, message=FALSE, warning=FALSE}
```{r, eval=FALSE}
lynx_mvgam_prior <- mvgam(data = lynx_train,
formula = population ~ s(season, bs = 'cc', k = 19),
knots = list(season = c(0.5, 19.5)),
family = 'poisson',
trend_model = 'AR3',
chains = 2,
prior_simulation = TRUE)
```

```{r, include=FALSE}
lynx_mvgam_prior <- mvgam(data = lynx_train,
formula = population ~ s(season, bs = 'cc', k = 19),
knots = list(season = c(0.5, 19.5)),
family = 'poisson',
trend_model = 'AR3',
chains = 1,
chains = 2,
prior_simulation = TRUE)
```

Expand All @@ -118,18 +128,28 @@ plot(lynx_mvgam_prior, type = 'smooths', realisations = TRUE)
```
These functions are showing the marginal contribution of the seasonal smooth function to the linear predictor (on the log scale), and they are clearly allowed to move into ridiculous spaces that we should give very little prior plausibility to:
```{r}
exp(-25)
exp(25)
exp(-17)
exp(17)
```

Setting `k` to a smaller value results in less flexibility. This is because number of basis functions that contribute to functional behaviour is reduced
```{r, message=FALSE, warning=FALSE}
```{r, include = FALSE}
lynx_mvgam_prior <- mvgam(data = lynx_train,
formula = population ~ s(season, bs = 'cc', k = 12),
knots = list(season = c(0.5, 19.5)),
family = 'poisson',
trend_model = 'AR3',
chains = 2,
prior_simulation = TRUE)
```

```{r, eval = FALSE}
lynx_mvgam_prior <- mvgam(data = lynx_train,
formula = population ~ s(season, bs = 'cc', k = 12),
knots = list(season = c(0.5, 19.5)),
family = 'poisson',
trend_model = 'AR3',
chains = 1,
chains = 2,
prior_simulation = TRUE)
```

Expand Down Expand Up @@ -198,6 +218,7 @@ summary(lynx_mvgam)
```

As with any `MCMC` based software, we can inspect traceplots. Here for the `GAM` component smoothing parameters.
There is no requirement for `rstan` to be installed as a dependency, but we can still use it if available to examine traceplots
```{r}
rstan::stan_trace(lynx_mvgam$model_output, 'rho')
```
Expand Down
Loading

0 comments on commit b67467c

Please sign in to comment.