Multispecies vignette edits

For one, make it converge!
pbs-assess · Dec 3, 2024 · 3f97c07 · 3f97c07
1 parent 6f49384
commit 3f97c07
Showing 1 changed file with 93 additions and 83 deletions.
diff --git a/vignettes/web_only/multispecies.Rmd b/vignettes/web_only/multispecies.Rmd
@@ -35,34 +35,30 @@ library(sdmTMB)
 
 For some applications, we might be interested in fitting a model that includes multiple responses such as 2+ species, or multiple size or age classes within a species. The most important step in fitting these models is understanding which parameters are shared, and which parameters are species-specific.
 
-Below, we illustrate a series of models that may be used to answer a few different common questions. We'll start by simulating a 2-species dataset. Each species is allowed to have unique spatial ranges and variances, as well as different year effects.
+Below, we illustrate a series of models that may be used to answer a few different common questions. We'll start by simulating a 2-species dataset. Each species is allowed to have unique spatial standard deviations (`sigma_O`) as well as different year effects.
 
 ```{r sim_dat}
-set.seed(123)
-
-# make fake predictor(s) (a1) and sampling locations:
+set.seed(1)
 predictor_dat <- data.frame(
-  X = runif(500), Y = runif(500),
-  year = rep(1:5, each = 100)
+  X = runif(1000), Y = runif(1000),
+  year = rep(1:5, each = 200)
 )
 predictor_dat$fyear <- as.factor(predictor_dat$year)
 mesh <- make_mesh(predictor_dat, xy_cols = c("X", "Y"), cutoff = 0.1)
-
 sim_dat_A <- sdmTMB_simulate(
   formula = ~ 0 + fyear,
   data = predictor_dat,
   time = "year",
   mesh = mesh,
-  range = 0.5,
-  family = tweedie(),
+  range = 0.2,
+  family = gaussian(),
   seed = 42,
   sigma_O = 0.2,
   phi = 0.1,
-  sigma_E = 0.1,
+  sigma_E = 0.3,
   B = runif(5, min = 5, max = 8) # 5 random year effects
 )
 sim_dat_A$species <- "A"
-
 sim_dat_B <- sdmTMB_simulate(
   formula = ~ 0 + fyear,
   data = predictor_dat,
@@ -73,189 +69,203 @@ sim_dat_B <- sdmTMB_simulate(
   seed = 42,
   sigma_O = 0.3,
   phi = 0.1,
-  sigma_E = 0.1,
+  sigma_E = 0.3,
   B = runif(5, min = 5, max = 8) # 5 random year effects
 )
 sim_dat_B$species <- "B"
-
 sim_dat <- rbind(sim_dat_A, sim_dat_B)
-sim_dat$fyear <- as.factor(sim_dat$year)
+sim_dat$fyear <- factor(sim_dat$year)
 ```
 
 We'll start by making the mesh across the full dataset.
 
 ```{r mesh_fig, fig.asp=1}
-spde <- make_mesh(sim_dat, c("X", "Y"), cutoff = 0.1)
-plot(spde)
+mesh <- make_mesh(sim_dat, c("X", "Y"), cutoff = 0.1)
+plot(mesh)
 ```
 
 ### Model 1: species-specific intercepts
 
-As a first model, we can include species-specific year effects. This can be done in a couple ways -- one option would be to estimate the `species*year` interaction, letting the year effects for each species be independent. All other parameters (range, spatial field, spatial variance, spatiotemporal fields, spatiotemporal variances) are shared.
+As a first model, we can include species-specific year effects. This can be done in a couple ways. One option would be to estimate the `species * year` interaction, letting the year effects for each species be independent. Here, all other parameters (range, spatial field, spatial variance, spatiotemporal fields, spatiotemporal variances) are shared. Also, both species share a single spatial random field.
 
 ```{r}
 fit <- sdmTMB(
-  formula = observed ~ fyear * species,
+  observed ~ fyear * species,
   data = sim_dat,
   time = "year",
   spatiotemporal = "iid",
-  mesh = spde,
+  mesh = mesh,
   family = gaussian()
 )
-
 fit
-
-print(sanity(fit))
 ```
 
-This model doesn't appear to converge, and the results from `sanity(fit)` indicate problems with the spatial and spatiotemporal variances. This is likely a sign that our assumptions in the estimation model are wrong for this particular dataset (not surprising, as our simulated data has different parameters by species).
+### Model 2: species-specific spatial fields
 
-Our simulated dataset only has 5 years of data -- if we had a longer time series, an alternative formulation for our year effects could be to fit smooth terms, like in a GAM with `mgcv`.
+We may be interested in fitting a model that lets the spatial patterning differ by species. These kinds of models can be expressed using spatially varying coefficients. Note that we use `spatial = off` because this represents a global spatial intercept---turning this off is akin to using a `-1` of `0` in a main formula including a factor. Both species take their spatial fields from the `spatial_varying` field here.
 
-```{r eval=FALSE}
+```{r}
 fit <- sdmTMB(
-  formula = observed ~ s(year, by = "species"),
+  observed ~ fyear * species,
   data = sim_dat,
+  mesh = mesh,
+  family = gaussian(),
+  spatial = "off",
   time = "year",
   spatiotemporal = "iid",
-  mesh = spde,
-  family = gaussian()
+  spatial_varying = ~ 0 + factor(species)
 )
+fit
 ```
 
-### Model 2: species-specific fields
+Here we had some convergence issues: one of the spatially varying SDs has collapsed to zero.
 
-We may be interested in fitting a model that lets the spatial patterning differ by species. These kinds of models can be expressed using spatially varying coefficients. Note that we use `spatial = off` because this represents a global spatial intercept -- turning this off is akin to using a `-1` of `0` in a main formula including a factor.
+You'll notice that there are two rows of entries for `sigma_Z`:
 
 ```{r}
-fit <- sdmTMB(
-  observed ~ 0 + fyear, # shared year effects
-  data = sim_dat,
-  mesh = spde,
-  family = gaussian(),
-  spatial = "off",
-  time = "year",
-  spatiotemporal = "iid",
-  spatial_varying = ~ 0 + as.factor(species)
-)
-
-fit
+tidy(fit, "ran_pars")
 ```
 
-This model also appears to be problematic. If we look at the estimated parameters, you'll notice that there are two rows of entries for `sigma_Z`. This means that our model is trying to estimate separate species-specific variance terms for the species-specific spatial fields (say *that* 10 times fast!).
+This means that our model is trying to estimate separate species-specific variance terms for the species-specific spatial fields (say *that* 10 times fast!). It's struggling.
+
+If we wanted to estimate species-specific spatial fields with a single shared variance (meaning the net magnitude of the peaks and valleys in the fields were similar but the wiggles themselves were species specific), we could do that by specifying a custom map argument and passing it into `sdmTMBcontrol()`. Any shared factor levels in the `map` are gathered to have 'mirrored' or shared parameter values. We assign these to `ln_tau_Z` because, internally, this is the parameter that gets converted into the spatially-varying field variances (the SDs of those fields are `sigma_Z`).
+
+This case is pretty simple, but for more complicated cases we could figure out the structure of our needed `map` vector as follows:
 
 ```{r}
-fit$sd_report
+colnames(model.matrix(~ 0 + factor(species), data = sim_dat))
 ```
 
-If we wanted to estimate species-specific spatial fields with a single shared variance (meaning the net magnitude of the peaks and valleys in the fields were similar), we could do that by specifying a custom map argument and passing it into `sdmTMBcontrol()`
+So, we need a vector of length two with shared factor values:
 
 ```{r}
 map_list <- list(ln_tau_Z = factor(c(1, 1)))
+```
 
+Then, we can use our map list to share the spatially varying coefficient SDs:
+
+```{r}
 fit <- sdmTMB(
-  observed ~ 0 + fyear, # shared year effects
+  observed ~ fyear * factor(species),
   data = sim_dat,
-  mesh = spde,
+  mesh = mesh,
   family = gaussian(),
   spatial = "off",
   time = "year",
   spatiotemporal = "iid",
-  spatial_varying = ~ 0 + as.factor(species),
+  spatial_varying = ~ 0 + factor(species),
   control = sdmTMBcontrol(map = map_list)
 )
-
 fit
 ```
 
+Notice the spatially varying coefficient SD is now shared.
+
 ### Model 3: species-specific spatiotemporal fields
 
-In all of the examples above, spatiotemporal fields are included -- but shared among species. As a last example, we can extend the above approaches to set up a model that includes spatiotemporal fields unique to each species.
+In all of the examples above, spatiotemporal fields are included, but shared among species. As another example, we can extend the above approaches to set up a model that includes spatiotemporal fields unique to each species.
 
-One approach to including separate spatiotemporal fields by species is creating a new variable that is a concatenation of species and year (or any given time step factor). For example, we can then implement this form of species-specific spatiotemporal variation by changing the `time` argument to be `time = "species_year"`
+One approach to including separate spatiotemporal fields by species is creating a new variable that is a concatenation of species and year (or any given time step factor). For example, we can then implement this form of species-specific spatiotemporal variation by changing the `time` argument to be `time = "species_year"`.
 
 ```{r}
-# This needs to be a factor
-sim_dat$species_year <- as.factor(paste(sim_dat$species, sim_dat$year))
-
+sim_dat$species_year <- factor(paste(sim_dat$species, sim_dat$year))
 map_list <- list(ln_tau_Z = factor(c(1, 1)))
-
 fit <- sdmTMB(
-  observed ~ 0 + fyear, # shared year effects
+  observed ~ fyear * factor(species),
   data = sim_dat,
-  mesh = spde,
+  mesh = mesh,
   family = gaussian(),
   spatial = "off",
   time = "species_year",
   spatiotemporal = "iid",
-  spatial_varying = ~ 0 + as.factor(species),
+  spatial_varying = ~ 0 + factor(species),
   control = sdmTMBcontrol(map = map_list)
 )
-
 fit
-
-print(sanity(fit))
 ```
 
-This is getting a little better -- there is a gradient warning, but the Hessian appears to be invertible (`hessian_ok == TRUE)`, which is critical to convergence.
-
-### Putting it all together
+### Model 4: hack species into the time element for spatial models
 
-Our spatiotemporal model above came close to passing all sanity() checks, and could be modified to also include the species-specific year effects that we started with
+If we only wanted to fit a spatial model but had several species (or other groups), one approach is to pretend our species (or other group) is the time element.
 
 ```{r}
-fit <- sdmTMB(
-  observed ~ 0 + fyear * species, # shared year effects
+sim_dat$numeric_species <- as.numeric(factor(sim_dat$species)) # needs to be numeric
+fit_fake_time <- sdmTMB(
+  observed ~ 0 + factor(species),
   data = sim_dat,
-  mesh = spde,
+  mesh = mesh,
   family = gaussian(),
   spatial = "off",
-  time = "species_year",
-  spatiotemporal = "iid",
-  spatial_varying = ~ 0 + as.factor(species),
-  control = sdmTMBcontrol(map = map_list)
+  time = "numeric_species", #< hack
+  spatiotemporal = "iid" #< 'AR1' or 'RW' probably wouldn't make sense here
 )
+fit_fake_time
+```
 
+This is just a convenience though. We could instead do the same thing using the `spatial_varying` argument making sure to 'map' the field variances to be shared to match the above:
 
-fit
+```{r}
+fit_svc <- sdmTMB(
+  observed ~ 0 + factor(species),
+  data = sim_dat,
+  mesh = mesh,
+  family = gaussian(),
+  spatial = "off",
+  spatial_varying = ~ 0 + factor(species),
+  control = sdmTMBcontrol(map = list(ln_tau_Z = factor(c(1, 1))))
+)
+fit_svc
+```
 
-print(sanity(fit))
+We can prove they're identical:
 
-fit$sd_report
+```{r}
+logLik(fit_fake_time)
+logLik(fit_svc)
 ```
 
+### Putting it all together
+
 These examples illustrate a number of ways that species-specific effects can be included in `sdmTMB` models, and can be extended to other categories/groups/cohorts within a species for which one wants to control the amount of information shared between groups (e.g., age-, size-, or stage-specific estimates). A brief summary of these approaches can be summarized as:
 
 ```{r echo=FALSE}
-desc <- data.frame("Form" = c("Main effects", "Spatial effects", "Spatial effects w/shared variance", "Spatiotemporal effects"), "Implementation" = c("Year by species interactions, or smooths", "Spatially varying coefficients", "Spatially varying coefficients + map argument", "Species-year factor as time variable"))
+desc <- data.frame("Form" = c("Main effects", "Spatial effects", "Spatial effects w/shared variance", "Spatiotemporal effects"), "Implementation" = c("Year-by-species interactions or smooths", "Spatially varying coefficients", "Spatially varying coefficients + map argument", "Species-year factor as time variable"))
 knitr::kable(desc)
 ```
 
 ### Further extensions
 
-Including spatiotemporal effects like species-year combinations in the `time` argument is relatively straightforward, but these models could get more complicated if we wanted to also include other grouping factors (such as cohort effects for each species). There may be some identifiability issues that would have to be resolved, but even if the `time` argument is being used, it's possible to add other factors through the `spatial_varying` formula. As an example, if our dataframe included a cohort variable for each species, we could first create a new factor that is a concatenation of species and cohort
+Including spatiotemporal effects like species-year combinations in the `time` argument is relatively straightforward, but these models could get more complicated if we wanted to also include other grouping factors (such as cohort effects for each species). There may be some identifiability issues that would have to be resolved, but even if the `time` argument is being used, it's possible to add other factors through the `spatial_varying` formula. As an example, if our dataframe included a cohort variable for each species, we could first create a new factor that is a concatenation of species and cohort.
 
-```{r eval=FALSE}
+```{r eval=TRUE}
+sim_dat$cohort <- rep(seq_len(10), 200)
 sim_dat$species_cohort <- as.factor(paste(sim_dat$species, sim_dat$cohort))
 ```
 
-If there were \~ 10 levels of `species_cohort`, and we were including our original spatial effects in the `spatial_varying` term, we could create a new map list that would indicate we wanted all species-cohort fields to have the same spatial variance as below.
+If there were 10 levels of `species_cohort`, and we were including our original spatial effects in the `spatial_varying` term, we could create a new map list that would indicate we wanted all species-cohort fields to have the same spatial variance as below.
+
+Figure out how our spatial-varying formula will be parsed:
+
+```{r, eval=TRUE}
+colnames(model.matrix(~0 + as.factor(species) + species_cohort, data = sim_dat))
+```
 
 ```{r eval=FALSE}
-sim_dat$species_year <- as.factor(paste(sim_dat$species, sim_dat$year))
+sim_dat$species_year <- factor(paste(sim_dat$species, sim_dat$year))
 
 # The first 2 elements correspond to species, the remaining 10 to species-cohort
 map_list <- list(ln_tau_Z = factor(c(1, 1, rep(2, 10))))
 
 fit <- sdmTMB(
-  observed ~ 0 + fyear * species,
+  observed ~ 0 + fyear * species_cohort,
   data = sim_dat,
-  mesh = spde,
+  mesh = mesh,
   family = gaussian(),
   spatial = "off",
   time = "species_year",
   spatiotemporal = "iid",
   spatial_varying = ~ 0 + as.factor(species) + species_cohort,
-  control = sdmTMBcontrol(map = map_list)
+  do_fit = FALSE
+  # control = sdmTMBcontrol(map = map_list)
 )
 ```