final updates before adding new families

nicholasjclark · Mar 29, 2023 · b67467c · b67467c
1 parent 7b8472c
commit b67467c
Show file tree

Hide file tree

Showing 47 changed files with 424 additions and 557 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,7 +1,7 @@
 Package: mvgam
 Title: Multivariate Bayesian Generalized Additive Models for discrete time series
-Version: 1.0.3
-Date: 2022-08-03
+Version: 1.0.4
+Date: 2023-03-29
 Authors@R: 
     person(given = "Nicholas",
            family = "Clark",

diff --git a/NAMESPACE b/NAMESPACE
@@ -9,10 +9,6 @@ S3method(print,mvgam_prefit)
 S3method(summary,mvgam)
 S3method(summary,mvgam_prefit)
 export("%>%")
-export(add_base_dgam_lines)
-export(add_poisson_lines)
-export(add_stan_data)
-export(add_trend_lines)
 export(add_tweedie_lines)
 export(code)
 export(compare_mvgams)
@@ -40,5 +36,4 @@ export(roll_eval_mvgam)
 export(series_to_mvgam)
 export(sim_mvgam)
 export(update_priors)
-export(vectorise_stan_lik)
 importFrom(magrittr,"%>%")
diff --git a/R/add_base_dgam_lines.R b/R/add_base_dgam_lines.R
@@ -1,7 +1,7 @@
 #' Dynamic GAM model file additions
 #'
 #'
-#' @export
+#' @noRd
 #' @param use_lv Logical (use latent variables or not?)
 #' @param stan Logical (convert existing model to a Stan model?)
 #' @param offset Logical (include an offset in the linear predictor?)

diff --git a/R/add_poisson_lines.R b/R/add_poisson_lines.R
@@ -1,7 +1,7 @@
 #' Poisson JAGS modifications
 #'
 #'
-#' @export
+#' @noRd
 #' @param model_file A template `JAGS` model file to be modified
 #' @param upper_bounds Optional upper bounds for the truncated observation likelihood
 #' @return A modified `JAGS` model file

diff --git a/R/add_stan_data.R b/R/add_stan_data.R
@@ -1,7 +1,7 @@
 #' Add remaining data, model and parameter blocks to a Stan model
 #'
 #'
-#' @export
+#' @noRd
 #' @param jags_file Prepared JAGS mvgam model file
 #' @param stan_file Incomplete Stan model file to be edited
 #' @param use_lv logical

diff --git a/R/add_trend_lines.R b/R/add_trend_lines.R
@@ -1,7 +1,7 @@
 #' Latent trend model file modifications
 #'
 #'
-#' @export
+#' @noRd
 #' @param model_file A template `JAGS` or `Stan` model file to be modified
 #' @param stan Logical (convert existing `JAGS` model to a `Stan` model?)
 #' @param use_lv Logical (use latent variable trends or not)

diff --git a/R/extract_cmdstanr.R b/R/extract_cmdstanr.R
@@ -1,6 +1,8 @@
-# Helper functions for converting cmdstanr objects to stanfit
-# All functions were directly copied from brms and so all credit must
-# go to the brms development team
+#' Helper functions for converting cmdstanr objects to stanfit.
+#' All functions were directly copied from `brms` and so all credit must
+#' go to the `brms` development team
+#' @noRd
+#'
 repair_variable_names <- function(x) {
   x <- sub("\\.", "[", x)
   x <- gsub("\\.", ",", x)

diff --git a/R/mvgam.R b/R/mvgam.R
@@ -76,10 +76,8 @@
 #'@param use_stan Logical. If \code{TRUE} and if \code{rstan} is installed, the model will be compiled and sampled using
 #'the Hamiltonian Monte Carlo with a call to \code{\link[cmdstanr]{cmdstan_model}} or, if `cmdstanr` is not available,
 #'a call to \code{\link[rstan]{stan}}. Note that
-#'not all options that are available in `JAGS` can be used, including no option for a Tweedie family.
-#'However, as `Stan` can estimate Hilbert base approximate gaussian processes, which
-#'are much more computationally tractable than full GPs for time series with `>100` observations, estimation
-#'in `Stan` can support latent GP trends while estimation in \code{JAGS} cannot
+#'there are many more options when using `Stan` vs `JAGS` (the only "advantage" of `JAGS` is the ability
+#'to use a Tweedie family).
 #'@param max_treedepth positive integer placing a cap on the number of simulation steps evaluated during each iteration when
 #'`use_stan == TRUE`. Default is `12`. Increasing this value can sometimes help with exploration of complex
 #'posterior geometries, but it is rarely fruitful to go above a `max_treedepth` of `14`
@@ -141,17 +139,16 @@
 #'draws from the model's posterior distribution
 #'\cr
 #'\cr
-#'*Using Stan*: A useful feature of `mvgam` is the ability to use Hamiltonian Monte Carlo for parameter estimation
-#'via the software `Stan` (using either the `cmdstanr` or `rstan` interface). Note that the `rstan` library is
-#'currently required for this option to work, even if using `cmdstanr` as the backend. This is because `rstan`'s functions
-#'are needed to arrange the posterior samples into the correct format for all of \code{mvgam}'s other functions to work.
-#'Also note that currently there is no support for
-#'fitting `Tweedie` responses in `Stan`.
-#'However there are great advantages when using `Stan`, which includes the option to estimate smooth latent trends
+#'*Using Stan*: A feature of `mvgam` is the ability to use Hamiltonian Monte Carlo for parameter estimation
+#'via the software `Stan` (using either the `cmdstanr` or `rstan` interface).
+#'There are great advantages when using `Stan`, which includes the option to estimate smooth latent trends
 #'via [Hilbert space approximate Gaussian Processes](https://arxiv.org/abs/2004.11408). This often makes sense for
 #'ecological series, which we expect to change smoothly. In `mvgam`, latent squared exponential GP trends are approximated using
 #'by default \code{40} basis functions, which saves computational costs compared to fitting full GPs while adequately estimating
-#'GP \code{alpha} and \code{rho} parameters
+#'GP \code{alpha} and \code{rho} parameters. Because of the many advantages of `Stan` over `JAGS`, further development
+#'of the package will only be applied to `Stan`. This includes the planned addition of more response distributions,
+#'plans to handle zero-inflation, and plans to incorporate a greater variety of trend models. Users are strongly encouraged
+#'to opt for `Stan` over `JAGS` in any workflows
 #'@author Nicholas J Clark
 #'
 #'@seealso \code{\link[mcgv]{jagam}}, \code{\link[mcgv]{gam}}
@@ -326,7 +323,7 @@ mvgam = function(formula,
                  threads = 1,
                  priors,
                  upper_bounds,
-                 use_stan = FALSE,
+                 use_stan = TRUE,
                  max_treedepth,
                  adapt_delta,
                  jags_path){
@@ -394,28 +391,33 @@ mvgam = function(formula,
 
   # If Stan is to be used, make sure it is installed
   if(use_stan & run_model){
-    if(!require(rstan)){
-      warning('rstan library is required but not found; setting run_model = FALSE')
-      run_model <- FALSE
+    if(!requireNamespace('rstan', quietly = TRUE)){
+      warning('rstan library not found; checking for cmdstanr library')
+
+      if(!requireNamespace('cmdstanr', quietly = TRUE)){
+        warning('cmdstanr library not found; setting run_model = FALSE')
+        run_model <- FALSE
+      }
+    }
+
     }
-  }
 
   # JAGS cannot support latent GP trends as there is no easy way to use Hilbert base
   # approximation to reduce the computational demands
   if(!use_stan & trend_model == 'GP'){
-    warning('gaussian process trends not yet supported for JAGS; reverting to Stan')
+    warning('gaussian process trends not supported for JAGS; reverting to Stan')
     use_stan <- TRUE
   }
 
   if(use_stan & family == 'tw'){
-    warning('Tweedie family not yet supported for stan; reverting to JAGS')
+    warning('Tweedie family not supported for stan; reverting to JAGS')
     use_stan <- FALSE
   }
 
   # If the model is to be run in JAGS, make sure the JAGS software can be located
   if(!use_stan){
     if(run_model){
-      if(!require(runjags)){
+      if(!requireNamespace('runjags', quietly = TRUE)){
         warning('runjags library is required but not found; setting run_model = FALSE')
         run_model <- FALSE
       }
@@ -425,6 +427,7 @@ mvgam = function(formula,
   if(!use_stan){
     if(run_model){
       if(missing(jags_path)){
+        require(runjags)
         jags_path <- runjags::findjags()
       }
 
@@ -1248,14 +1251,6 @@ mvgam = function(formula,
                                     family = family,
                                     upper_bounds = upper_bounds)
 
-      # Remove data likelihood and posterior predictions if this is a prior sampling run
-      if(prior_simulation){
-        stan_objects$stan_file <- stan_objects$stan_file[-c(grep('// likelihood functions',
-                                                                 stan_objects$stan_file):
-                                                              (grep('// likelihood functions',
-                                                                    stan_objects$stan_file) + 6))]
-      }
-
       model_data <- stan_objects$model_data
       if(use_lv){
         model_data$n_lv <- n_lv
@@ -1339,6 +1334,17 @@ mvgam = function(formula,
                                   threads = threads,
                                   trend_model = trend_model,
                                   offset = offset)
+
+      # Remove data likelihood if this is a prior sampling run
+      if(prior_simulation){
+        vectorised$model_file <- vectorised$model_file[-c((grep('// likelihood functions',
+                                                                 vectorised$model_file,
+                                                                 fixed = TRUE) - 1):
+                                                              (grep('generated quantities {',
+                                                                    vectorised$model_file,
+                                                                    fixed = TRUE) - 4))]
+      }
+
       model_data <- vectorised$model_data
 
       # Check if cmdstan is accessible; if not, use rstan
@@ -1393,7 +1399,7 @@ mvgam = function(formula,
                                  init = inits,
                                  max_treedepth = 12,
                                  adapt_delta = 0.8,
-                                 iter_sampling = 600,
+                                 iter_sampling = samples,
                                  iter_warmup = 200,
                                  show_messages = FALSE,
                                  diagnostics = NULL)
@@ -1415,16 +1421,6 @@ mvgam = function(formula,
                                    variables = param)
         out_gam_mod <- repair_stanfit(out_gam_mod)
 
-        # stanfit <- rstan::read_stan_csv(fit1$output_files(), col_major = TRUE)
-        # param_names <- row.names(rstan::summary(stanfit)$summary)
-        # stanfit@sim$samples <- lapply(seq_along(stanfit@sim$samples), function(x){
-        #   samps <- as.list(stanfit@sim$samples[[x]])
-        #   names(samps) <- param_names
-        #   samps
-        # })
-        #
-        # out_gam_mod <- stanfit
-
       } else {
         require(rstan)
         message('Using rstan as the backend')

diff --git a/R/vectorise_stan_lik.R b/R/vectorise_stan_lik.R
@@ -1,7 +1,7 @@
 #' Vectorise a stan model's likelihood for quicker computation
 #'
 #'
-#' @export
+#' @noRd
 #' @param model_file Stan model file to be edited
 #' @param model_data Prepared mvgam data for Stan modelling
 #' @param family \code{character}. Must be either 'Negative Binomial' or 'Poisson'

diff --git a/docs/README-unnamed-chunk-11-1.png b/docs/README-unnamed-chunk-11-1.png
diff --git a/docs/README-unnamed-chunk-13-1.png b/docs/README-unnamed-chunk-13-1.png
diff --git a/docs/README-unnamed-chunk-15-1.png b/docs/README-unnamed-chunk-15-1.png
diff --git a/docs/README-unnamed-chunk-16-1.png b/docs/README-unnamed-chunk-16-1.png
diff --git a/docs/README-unnamed-chunk-17-1.png b/docs/README-unnamed-chunk-17-1.png
diff --git a/docs/README-unnamed-chunk-18-1.png b/docs/README-unnamed-chunk-18-1.png
diff --git a/docs/README-unnamed-chunk-19-1.png b/docs/README-unnamed-chunk-19-1.png
diff --git a/docs/README-unnamed-chunk-20-1.png b/docs/README-unnamed-chunk-20-1.png
diff --git a/docs/README-unnamed-chunk-21-1.png b/docs/README-unnamed-chunk-21-1.png
diff --git a/docs/README-unnamed-chunk-22-1.png b/docs/README-unnamed-chunk-22-1.png
diff --git a/docs/README-unnamed-chunk-23-1.png b/docs/README-unnamed-chunk-23-1.png
diff --git a/docs/README-unnamed-chunk-24-1.png b/docs/README-unnamed-chunk-24-1.png
diff --git a/docs/README-unnamed-chunk-24-2.png b/docs/README-unnamed-chunk-24-2.png
diff --git a/docs/README-unnamed-chunk-25-1.png b/docs/README-unnamed-chunk-25-1.png
diff --git a/docs/README-unnamed-chunk-26-1.png b/docs/README-unnamed-chunk-26-1.png
diff --git a/docs/README-unnamed-chunk-26-2.png b/docs/README-unnamed-chunk-26-2.png
diff --git a/docs/README-unnamed-chunk-27-1.png b/docs/README-unnamed-chunk-27-1.png
diff --git a/docs/README-unnamed-chunk-28-1.png b/docs/README-unnamed-chunk-28-1.png
diff --git a/docs/README-unnamed-chunk-29-1.png b/docs/README-unnamed-chunk-29-1.png
diff --git a/docs/README-unnamed-chunk-30-1.png b/docs/README-unnamed-chunk-30-1.png
diff --git a/docs/README-unnamed-chunk-31-1.png b/docs/README-unnamed-chunk-31-1.png
diff --git a/docs/README-unnamed-chunk-32-1.png b/docs/README-unnamed-chunk-32-1.png
diff --git a/docs/README-unnamed-chunk-33-1.png b/docs/README-unnamed-chunk-33-1.png
diff --git a/docs/README-unnamed-chunk-34-1.png b/docs/README-unnamed-chunk-34-1.png
diff --git a/docs/README-unnamed-chunk-35-1.png b/docs/README-unnamed-chunk-35-1.png
diff --git a/docs/README-unnamed-chunk-8-1.png b/docs/README-unnamed-chunk-8-1.png
diff --git a/docs/README-unnamed-chunk-9-1.png b/docs/README-unnamed-chunk-9-1.png
diff --git a/docs/README.Rmd b/docs/README.Rmd
@@ -40,7 +40,7 @@ The package can also be used to generate all necessary data structures, initial
 
 ## Installation
 Install the development version from `GitHub` using:
-`devtools::install_github("nicholasjclark/mvgam")`. Note that to actually condition models with MCMC sampling, either the `JAGS` software must be installed (along with the `R` packages `rjags` and `runjags`) or the `Stan` software must be installed (along with the package `rstan` and, optionally, the `cmdstanr` package). These are not  listed as dependencies of `mvgam` to ensure that installation is less difficult. If users wish to fit the models using `mvgam`, please refer to installation links for `JAGS` [here](https://sourceforge.net/projects/mcmc-jags/files/) or for `Stan` and `rstan` [here](https://mc-stan.org/users/interfaces/rstan)).
+`devtools::install_github("nicholasjclark/mvgam")`. Note that to actually condition models with MCMC sampling, either the `JAGS` software must be installed (along with the `R` packages `rjags` and `runjags`) or the `Stan` software must be installed (along with either `rstan` and/or `cmdstanr`). These are not listed as dependencies of `mvgam` to ensure that installation is less difficult. If users wish to fit the models using `mvgam`, please refer to installation links for `JAGS` [here](https://sourceforge.net/projects/mcmc-jags/files/), for `Stan` with `rstan` [here](https://mc-stan.org/users/interfaces/rstan), or for `Stan` with `cmdstandr` [here](https://mc-stan.org/cmdstanr/).
 
 ## Citing mvgam and related software
 When using open source software (or software in general), please make sure to appropriately acknowledge the hard work that developers and maintainers put into making these packages available. Citations are currently the best way to formally acknowledge this work, so we highly encourage you to cite any packages that you rely on for your research.
@@ -102,13 +102,23 @@ plot_mvgam_series(data = lynx_train, y = 'population')
 ```
 
 Now we will formulate an `mvgam` model; this model fits a GAM in which a cyclic smooth function for `season` is estimated jointly with a full time series model for the temporal process (in this case an `AR3` process), rather than relying on smoothing splines that do not incorporate a concept of the future. We assume the outcome follows a Poisson distribution. But before conditioning the model on observed data, a check of prior smooth function realisations is useful to ensure we are allowing enough flexibility to capture the types of functional behaviours we think are reasonable without allowing outrageous behaviours. First we follow conventional recommendations to set `k` for the smooth term to be large, which would allow maximum flexibility in functional behaviours
-```{r, message=FALSE, warning=FALSE}
+```{r, eval=FALSE}
+lynx_mvgam_prior <- mvgam(data = lynx_train,
+               formula = population ~ s(season, bs = 'cc', k = 19),
+               knots = list(season = c(0.5, 19.5)),
+               family = 'poisson',
+               trend_model = 'AR3',
+               chains = 2,
+               prior_simulation = TRUE)
+```
+
+```{r, include=FALSE}
 lynx_mvgam_prior <- mvgam(data = lynx_train,
                formula = population ~ s(season, bs = 'cc', k = 19),
                knots = list(season = c(0.5, 19.5)),
                family = 'poisson',
                trend_model = 'AR3',
-               chains = 1,
+               chains = 2,
                prior_simulation = TRUE)
 ```
 
@@ -118,18 +128,28 @@ plot(lynx_mvgam_prior, type = 'smooths', realisations = TRUE)
 ```
 These functions are showing the marginal contribution of the seasonal smooth function to the linear predictor (on the log scale), and they are clearly allowed to move into ridiculous spaces that we should give very little prior plausibility to:
 ```{r}
-exp(-25)
-exp(25)
+exp(-17)
+exp(17)
 ```
 
 Setting `k` to a smaller value results in less flexibility. This is because number of basis functions that contribute to functional behaviour is reduced
-```{r, message=FALSE, warning=FALSE}
+```{r, include = FALSE}
+lynx_mvgam_prior <- mvgam(data = lynx_train,
+               formula = population ~ s(season, bs = 'cc', k = 12),
+               knots = list(season = c(0.5, 19.5)),
+               family = 'poisson',
+               trend_model = 'AR3',
+               chains = 2,
+               prior_simulation = TRUE)
+```
+
+```{r, eval = FALSE}
 lynx_mvgam_prior <- mvgam(data = lynx_train,
                formula = population ~ s(season, bs = 'cc', k = 12),
                knots = list(season = c(0.5, 19.5)),
                family = 'poisson',
                trend_model = 'AR3',
-               chains = 1,
+               chains = 2,
                prior_simulation = TRUE)
 ```
 
@@ -198,6 +218,7 @@ summary(lynx_mvgam)
 ```
 
 As with any `MCMC` based software, we can inspect traceplots. Here for the `GAM` component smoothing parameters.
+There is no requirement for `rstan` to be installed as a dependency, but we can still use it if available to examine traceplots
 ```{r}
 rstan::stan_trace(lynx_mvgam$model_output, 'rho')
 ```