-
Notifications
You must be signed in to change notification settings - Fork 53
Empirical orthogonal functions
It is possible to use VAST
to fit a generalization of empirical orthogonal function (EOF) analysis, while accounting for the noisy, zero-inflated and multivariate data that we have in fisheries. To do so, take a univariate or multivariate model and perform rank-reduction (a.k.a. ordination) on years; this contrasts with "conventional" joint dynamic species distribution models, which ordinate on species. In this case, the model will then estimate spatial variability for each species, one or more annual indices of variability, as well a map for each mode and species indicating the deviation from uniform distribution associated with a positive value of each index for each species. It is possible to visualize the statistical significance of each model component, e.g., to visualize whether index variability is "significant" or to identify locations that are "significantly" associated with a given index.
In this example, we apply EOF to five groundfishes in the eastern Bering Sea. The 1st index (primary mode of variability) is highly correlated with the cold-pool extent. This was initially explored and documented in Thorson Ciannelli & Litzow (2020). Since that paper, however, we have improved the interface and model structure to control for persistent variability using the spatial term; this requires VAST release >= 3.4.0
. This improvement has resulted in cold-pool extent being correlated with the primary mode of spatio-temporal variability, rather than the 3rd factor in the Thorson Ciannelli & Litzow (2020).
Works cited
Thorson, J. T., Ciannelli, L., & Litzow, M. A. (2020). Defining indices of ecosystem variability using biological samples of fish communities: A generalization of empirical orthogonal functions. Progress in Oceanography, 181, 102244. https://doi.org/10.1016/j.pocean.2019.102244
Example code
# Load packages
library(VAST)
# load data set
example = load_example( data_set="five_species_ordination" )
# Make settings:
# including modifications from default settings to match
# analysis in original paper
settings = make_settings( n_x=50,
Region=example$Region,
purpose="EOF3",
n_categories=2,
ObsModel=c(1,1),
RhoConfig=c("Beta1"=0,"Beta2"=0,"Epsilon1"=0,"Epsilon2"=0) )
# Run model (including settings to speed up run)
fit = fit_model( settings=settings,
Lat_i=example$sampling_data[,'Lat'],
Lon_i=example$sampling_data[,'Lon'],
t_i=example$sampling_data[,'Year'],
c_i=example$sampling_data[,'species_number']-1,
b_i=example$sampling_data[,'Catch_KG'],
a_i=example$sampling_data[,'AreaSwept_km2'],
newtonsteps=0,
Use_REML=TRUE )
# Plot results, including spatial term Omega1
results = plot( fit,
check_residuals=FALSE,
plot_set=c(3,16),
category_names = c("pollock", "cod", "arrowtooth", "snow_crab", "yellowfin") )
# Load Cold-pool-extent
example2 = load_example( data_set="EBS_pollock" )
CPE = example2$covariate_data[match(fit$year_labels,example2$covariate_data$Year),'AREA_SUM_KM2_LTE2']
# use `ecodist` to display ordination
index1_tf = results$Factors$Rotated_loadings$EpsilonTime1
cpe_vf = ecodist::vf(index1_tf, data.frame("CPE"=CPE), nperm=1000)
png( "year_ordination.png", width=6, height=6, res=200, units="in")
plot( index1_tf, type="n" )
text( index1_tf, labels=rownames(index1_tf) )
plot( cpe_vf )
dev.off()
# Plot against cold-pool extent index
index2 = results$Factors$Rotated_loadings$EpsilonTime1[,2]
index2 = sign(cor(index2,CPE)) * index2
png( "EOF_index.png", width=6, height=6, res=200, units="in")
matplot( x=fit$year_labels, y=scale(cbind(CPE,index2)),
type="l", lty="solid", col=c("blue","black"), lwd=2, ylab="Index", xlab="Year" )
legend( "bottom", ncol=2, fill=c("blue","black"), legend=c("CPE","factor-2"), bty="n")
dev.off()
However, as with any multivariate model this model may take a long time (hours-days) to fit. If this is happening please try one or more ways to simplify the problem:
- Use fewer extrapolation-grid cells. With recent versions of VAST, this can be done e.g., using
max_cells=1000
for 1000 extrapolation-grid cells. - Use fewer factors, via decreasing
FieldConfig
inmake_data(.)
and/orn_categories
inmake_settings(.)
- Use fewer knots, via decreasing
n_x
inmake_settings(.)
- Use fewer categories and/or years of data
- Eliminate bias-correction and/or run without getting standard errors.
- Get access to a better computer
- Use MRAN for windows, or other version of R that uses parallelization.
Example applications:
- Index standardization
- Empirical Orthogonal Functions
- Ordination using joint species distribution model
- End-of-century projections
- Expand length and age-composition samples
- Combine condition and biomass data
- Expand stomach content samples
- Combine presence/absence, counts, and biomass data
- Seasonal and annual variation
- Combine acoustic and bottom trawl data
- Surplus production models
- Multispecies model of biological interactions
- Stream network models
Usage demos:
- Adding covariates
- Visualize covariate response
- Percent deviance explained
- Create a new extrapolation grid
- Custom maps using ggplot
- Modify axes for distribution metrics
- K-fold crossvalidation
- Simulating new data
- Modify defaults for advanced users
Project structure and utilities: