-
Notifications
You must be signed in to change notification settings - Fork 6
Model fitting and prediction
In this section we will consider the RSEP and TS methods for pseudo-absence data generation. See section Pseudoa-bsence data generation, for a full worked example on generating pseudo-absences with mopa
.
data(Oak_phylo2)
data(biostack)
projection(biostack$baseline) <- CRS("+proj=longlat +init=epsg:4326")
r <- biostack$baseline$bio1
bg <- backgroundGrid(raster = r)
bg.profiled <- OCSVMprofiling(xy = Oak_phylo2, varstack = biostack$baseline, background = bg$xy)
#RSEP pseudo-absences (10 ralizations):
RSEP <- pseudoAbsences(xy = Oak_phylo2, background = bg.profiled$absence, realizations = 10, exclusion.buffer = 0.083*10, prevalence = 0.5)
#TS pseudo-absences (10 realizations):
bg.extents <- backgroundRadios(xy = Oak_phylo2, background = bg.profiled$absence, start = 0.166, by = 0.083*10, unit = "decimal degrees")
TS <- pseudoAbsences(xy = Oak_phylo2, background = bg.extents, realizations = 10, exclusion.buffer = 0.083*5, prevalence = -0.5, kmeans = FALSE)
The mopaTrain
function performs the species distribution modeling and the k-fold cross-validation for a set of presence/absence data per species. Algorithms supported are ”glm”, ”svm”, ”maxent”, ”mars”, ”randomForest”, ”cart.rpart” and ”cart.tree”. In the example below, we perform a 10-fold cross validation using the ”glm” modelling algorithm.
glmRSEP <- mopaTrain(y = RSEP, x = biostack$baseline, k = 10, algorithm = "glm")
If TS pseudo-absences are used, mopaTrain
performs the species distribution modeling for each different background extent and fits obtained AUCs (corresponding to different background extents) to three non linear models (Michaelis-Menten, exponential2 and exponential3). The model that scores the lowest error is automatically selected to extract the Vm coefficient (equation 1 in the manuscript). Then, the minimum extent at which the AUC surpasses the Vm value is selected as the threshold extent (see Figure 3 in the manuscript), being the corresponding fitted SDM the one returned by mopaTrain
and therefore the definitive to predict suitability probabilities in the study area. Besides, if argument diagrams
is set to TRUE, A fitted model plot (as in Fig. 3 in Iturbide et al., 2015) is printed in the plotting environment:
glmTS <- mopaTrain(y = TS, x = biostack$baseline, k = 10, algorithm = "glm", diagrams = TRUE)
For each modeled phylogeny the output list contains the following components:
- model: fitted model with all data for training
- auc: AUC statistic in the cross validation
- kappa: kappa statistic in the cross validation
- tss: true skill statistic in the cross validation
- fold.models: models fitted with partitioned data for cross validation
- ObsPred: Observations and cross model prediction
To extract a particular component for multiple phylogenies or species at once, function extractFromModel
is applied. Next, we extract "model" for predicting later.
mod.glmRSEP <- extractFromModel(models = glmRSEP, value = "model")
mod.glmTS <- extractFromModel(models = glmTS, value = "model")
To predict, function mopaPredict
is used.
Predictions for reference period:
prd.glmRSEP <- mopaPredict(models = mod.glmRSEP, varstack = biostack$baseline)
prd.glmTS <- mopaPredict(models = mod.glmTS, varstack = biostack$baseline)
spplot(stack(prd.glmTS$realization01))
Predictions for future period:
prd.glmRSEP.fut <- mopaPredict(models = mod.glmRSEP, varstack = biostack$future)
prd.glmTS.fut <- mopaPredict(models = mod.glmTS, varstack = biostack$future)
spplot(stack(prd.glmTS.fut$realization01$H11))
We can easily build an ensemble forecast by using functionalities from the raster
package. In the next example we calculate the ensemble mean of 7 climate projections and two SDMs (glm and rf) for the RSEP method and phylogeny H11. To extract the predictions corresponding to the H11 phylogeny we use function extractFromPrediction
(package mopa
). We could also use this function to extract other elements (e.g. climate projection "MPI"). The standard deviation of the ensemble is also calculated with function stackApply
from the raster
package.
rfRSEP <- mopaTrain(y = RSEP, x = biostack$baseline, k = 10, algorithm = "rf")
mod.rfRSEP <- extractFromModel(models = rfRSEP, value = "model")
prd.rfRSEP.fut <- mopaPredict(models = mod.rfRSEP, varstack = biostack$future)
prd.glmRSEP.fut.H11 <- extractFromPrediction(prd.glmRSEP.fut, "H11")
prd.rfRSEP.fut.H11 <- extractFromPrediction(prd.rfRSEP.fut, "H11")
forecast <- list(prd.glmRSEP.fut.H11, prd.rfRSEP.fut.H11)
ensForecast <- mean(stack(unlist(forecast)))
sdForecast <- stackApply(stack(unlist(forecast)), indices = rep(1, nlayers(stack(unlist(forecast)))), fun = sd)
spplot(ensForecast)
spplot(sdForecast, at = seq(0,1,0.1))
go to section Analysis of SDM projections