Skip to content

Model fitting and prediction

miturbide edited this page Mar 21, 2017 · 22 revisions

Data pre-processing and pseudo-absence data generation

In this section we will consider the RSEP and TS methods for pseudo-absence data generation. See section Pseudoa-bsence data generation, for a full worked example on generating pseudo-absences with mopa.

data(Oak_phylo2)
data(biostack)
projection(biostack$baseline) <- CRS("+proj=longlat +init=epsg:4326")
r <- biostack$baseline$bio1
bg <- backgroundGrid(raster = r)
bg.profiled <- OCSVMprofiling(xy = Oak_phylo2, varstack = biostack$baseline, background = bg$xy)

#RSEP pseudo-absences (10 ralizations):
RSEP <- pseudoAbsences(xy = Oak_phylo2, background = bg.profiled$absence, realizations = 10, exclusion.buffer = 0.083*10, prevalence = 0.5)

#TS pseudo-absences (10 realizations):
bg.extents <- backgroundRadios(xy = Oak_phylo2, background = bg.profiled$absence, start = 0.166, by = 0.083*10, unit = "decimal degrees")
TS <- pseudoAbsences(xy = Oak_phylo2, background = bg.extents, realizations = 10, exclusion.buffer = 0.083*5, prevalence = -0.5, kmeans = FALSE)
                              

Model fitting

The mopaTrain function performs the species distribution modeling and the k-fold cross-validation for a set of presence/absence data per species. Algorithms supported are ”glm”, ”svm”, ”maxent”, ”mars”, ”randomForest”, ”cart.rpart” and ”cart.tree”. In the example below, we perform a 10-fold cross validation using the ”glm” modelling algorithm.

glmRSEP <- mopaTrain(y = RSEP, x = biostack$baseline, k = 10, algorithm = "glm")

If TS pseudo-absences are used, mopaTrain performs the species distribution modeling for each different background extent and fits obtained AUCs (corresponding to different background extents) to three non linear models (Michaelis-Menten, exponential2 and exponential3). The model that scores the lowest error is automatically selected to extract the Vm coefficient (equation 1 in the manuscript). Then, the minimum extent at which the AUC surpasses the Vm value is selected as the threshold extent (see Figure 3 in the manuscript), being the corresponding fitted SDM the one returned by mopaTrain and therefore the definitive to predict suitability probabilities in the study area. Besides, if argument diagrams is set to TRUE, A fitted model plot (as in Fig. 3 in Iturbide et al., 2015) is printed in the plotting environment:

glmTS <- mopaTrain(y = TS, x = biostack$baseline, k = 10, algorithm = "glm", diagrams = TRUE)

For each modeled phylogeny the output list contains the following components:

  • model: fitted model with all data for training
  • auc: AUC statistic in the cross validation
  • kappa: kappa statistic in the cross validation
  • tss: true skill statistic in the cross validation
  • fold.models: models fitted with partitioned data for cross validation
  • ObsPred: Observations and cross model prediction

To extract a particular component for multiple phylogenies or species at once, function extractFromModel is applied. Next, we extract "model" for predicting later.

mod.glmRSEP <- extractFromModel(models = glmRSEP, value = "model")
mod.glmTS <- extractFromModel(models = glmTS, value = "model")

Prediction

To predict, function mopaPredict is used.

Predictions for reference period:

prd.glmRSEP <- mopaPredict(models = mod.glmRSEP, varstack = biostack$baseline)
prd.glmTS <- mopaPredict(models = mod.glmTS, varstack = biostack$baseline)
spplot(stack(prd.glmTS$realization01))

Predictions for future period:

prd.glmRSEP.fut <- mopaPredict(models = mod.glmRSEP, varstack = biostack$future)
prd.glmTS.fut <- mopaPredict(models = mod.glmTS, varstack = biostack$future)
spplot(stack(prd.glmTS.fut$realization01$H11))

Building an ensemble forecast

We can easily build an ensemble forecast by using functionalities from the raster package. In the next example we calculate the ensemble mean of 7 climate projections and two SDMs (glm and rf) for the RSEP method and phylogeny H11. To extract the predictions corresponding to the H11 phylogeny we use function extractFromPrediction (package mopa). We could also use this function to extract other elements (e.g. climate projection "MPI"). The standard deviation of the ensemble is also calculated with function stackApply from the raster package.

rfRSEP <- mopaTrain(y = RSEP, x = biostack$baseline, k = 10, algorithm = "rf")
mod.rfRSEP <- extractFromModel(models = rfRSEP, value = "model")
prd.rfRSEP.fut <- mopaPredict(models = mod.rfRSEP, varstack = biostack$future)

prd.glmRSEP.fut.H11 <- extractFromPrediction(prd.glmRSEP.fut, "H11")
prd.rfRSEP.fut.H11 <- extractFromPrediction(prd.rfRSEP.fut, "H11")

forecast <- list(prd.glmRSEP.fut.H11, prd.rfRSEP.fut.H11)
ensForecast <- mean(stack(unlist(forecast)))
sdForecast <- stackApply(stack(unlist(forecast)), indices = rep(1, nlayers(stack(unlist(forecast)))), fun = sd)
spplot(ensForecast)
spplot(sdForecast, at = seq(0,1,0.1))


go to next section -->

go to section Analysis of SDM projections