-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathsdm_workshop.Rmd
345 lines (252 loc) · 10.4 KB
/
sdm_workshop.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
---
title: "SDM with R"
author: "Pedro Henrique Braga and Julia Nordlund"
date: "April 23, 2017"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Outline
This tutorial illustrates how to build, evaluate and project species
distribution models within R.
The main steps, described bellow, are the following:
1. Data preparation
+ Study area and manipulating spatial information
+ Environmental data
+ Species occurrence data
+ Presence-and-absence data
+ Presence-only and pseudo-absences data
2. Model fitting, prediction, and evaluation
3. Making projections
# Workshop environment preparation
Install and load all required *R* libraries:
```{r, eval=FALSE}
install.packages("rgdal")
install.packages("raster")
install.packages("rgeos")
install.packages("dismo")
install.packages("letsR")
install.packages("biomod2")
install.packages("biogeo")
install.packages("maptools")
```
```{r, message=FALSE, warning=FALSE}
library(rgdal)
library(raster)
library(rgeos)
library(dismo)
library(letsR)
library(biomod2)
library(biogeo)
library(maptools)
```
Download the .zip file from xxxx and extract files to your QCBS R Symposium workshop folder.
Then, set your working directory to the same directory of the folder `workshopSDM` using the functions `setwd()` or your prefered method.
# Data preparation
## Selecting your study area and manipulating spatial files
Most species distribution models are done using spatial grid information. In the next step, we will learn how to create a polygon grid of a given region, which will be used within our species distribution model framework.
For this tutorial, we have chosen the Neotropical zoogeographical region as our study area. Load the polygon shapefile of it using the `readOGR` function:
```{r, echo = TRUE}
neotropical_shape <- rgdal::readOGR("data/shapefiles/neotropical.shp")
plot(neotropical_shape)
```
Now that we have our study region, we need to create a grid, intersect our shapefile and create a new grid of that region. For this, we will use the `GridFilter` function. It allows us to input a polygon shapefile, specify a resolution and the proportion of overlap for each cell to be considered.
We will apply it , intersect and crop its features with the polygon `neotropical_shape` you have loaded.
```{r, eval = FALSE}
GridFilter <- function(shape, resol = 1, prop = 0)
{
grid <- raster(extent(shape))
res(grid) <- resol
proj4string(grid)<-proj4string(shape)
gridpolygon <- rasterToPolygons(grid)
drylandproj<-spTransform(shape, CRS("+proj=laea"))
gridpolproj<-spTransform(gridpolygon, CRS("+proj=laea"))
gridpolproj$layer <- c(1:length(gridpolproj$layer))
areagrid <- gArea(gridpolproj, byid=T)
gridpolproj <- gBuffer(gridpolproj, byid=TRUE, width=0)
dry.grid <- intersect(drylandproj, gridpolproj)
areadrygrid <- gArea(dry.grid, byid=T)
info <- cbind(dry.grid$layer, areagrid[dry.grid$layer], areadrygrid)
dry.grid$layer<-info[,3]/info[,2]
dry.grid <- spTransform(dry.grid, CRS(proj4string(shape)))
dry.grid.filtered <- dry.grid[dry.grid$layer >= prop,]
}
# Create a spatial polygon grid for the Neotropical region, with 5 degrees x 5 degrees
neotropical_grid <- GridFilter(neotropical_shape, resol = 5, prop = 0.5)
```
```{r, eval = FALSE}
# Export your resulting polygon grid
writeOGR(neotropical_grid, dsn=paste(getwd(),"/data/shapefiles", sep=""), layer="neotropical_grid_5", driver="ESRI Shapefile", overwrite_layer=T)
```
This is our resulting polygon grid, where each cell refers to a site (and a row) in our data.
```{r raster, echo=TRUE, warning=FALSE}
neotropical_grid <- shapefile("data/shapefiles/neotropical_grid_5.shp")
plot(neotropical_grid)
```
```{r, echo = FALSE}
# extract coordinates from the cell centroids
coords <- as.data.frame(coordinates(neotropical_grid))
colnames(coords) <- c("Longitude_X", "Latitude_Y")
write.table(coords, "data/matrices/NT_coords_5.txt")
```
## Importing environmental variables
```{r, eval = FALSE}
bio_1 <- raster("/data/rasters/w25m_bio_1.asc")
myExpl <- data.frame(bio_1 = numeric(length(neotropical_grid))
)
bio_1_ext <- extract(bio_1,neotropical_grid)
myExpl$bio_1 <- unlist(lapply(bio_1_ext,
function(x)
if (!is.null(x)) mean(x, na.rm=TRUE)
else NA))
# Let's take a look at our table:
head(myExpl)
write.table(myExpl, "data/matrices/NT_EnvVar_5.txt")
```
## Importing species data
### Using expert drawn maps
```{r maptools, echo=TRUE, warning=FALSE}
speciesGeoDist <- readShapePoly("data/shapefiles/panthera_onca_IUCN.shp",
proj4string=CRS("+proj=laea"))
plot(speciesGeoDist)
```
```{r, echo=FALSE, warning=FALSE}
neotropical_grid <- readShapePoly("data/shapefiles/neotropical_grid_5.shp",
proj4string=CRS("+proj=laea"))
```
Create a presence-absence matrix of species' geographic ranges within your polygon grid shapefile:
```{r, echo = FALSE}
abspres.matrix <- lets.presab.grid(speciesGeoDist, neotropical_grid, "ID")
```
You can now visualize a map of your species richness within your polygon grid:
```{r, echo = FALSE}
richnessCalc <- rowSums(abspres.matrix$PAM) + 1
colPalette <- colorRampPalette(c("#fff5f0", "#fb6a4a", "#67000d"))
colors <- c("white", colPalette(max(richnessCalc)))
plot(abspres.matrix$grid, border = "gray40",
col = colors[richnessCalc])
map(add = TRUE)
```
Export your presence-absence matrix:
```{r, echo = FALSE}
write.table(abspres.matrix$PAM, "data/matrices/NT_PAM_5.txt")
```
### Using occurrence records
## Data cleaning
```{r, echo = FALSE}
# Load our species data
DataSpecies <- read.table("data/matrices/NT_PAM_5.txt", header=TRUE)
head(DataSpecies)
# Remove all columns that have less than 4 presences
to.remove <- colnames(DataSpecies)[colSums(DataSpecies) <= 4]
`%ni%` <- Negate(`%in%`)
DataSpecies <- subset(DataSpecies,select = names(DataSpecies) %ni% to.remove)
# Replace "_" per "."
names(DataSpecies) <- gsub(x = names(DataSpecies),
pattern = "\\.",
replacement = ".")
DataSpecies <- read.table("data/matrices/NT_PAM_5.txt", header=TRUE)
```
## Species distribution modelling
First, input data needs to be rearranged for usage in biomod2 using `BIOMOD_FormatingData()`. Its most important arguments are:
* `resp.var` contains **species data** (for a single species) in **binary format** (ones for presences, zeros for true absences and NA for indeterminated) that will be used **to build the species distribution models**. It may be a vector or a spatial points object.
* ``expl.var`` contains a matrix, data.frame, SpatialPointsDataFrame or RasterStack containing
your explanatory variables.
* ``resp.xy`` two columns matrix containing the X and Y coordinates of resp.var (only consider if resp.var is a vector)
* ``resp.name`` contains the response variable name (species).
```{r, echo = TRUE}
# Recall your variables again:
## Load prepared species data
myResp <- read.table("data/matrices/NT_PAM_5.txt", header=TRUE)
## Load environmental variables
myExpl <- read.table("data/matrices/NT_EnvVar_5.txt", header=TRUE)
# Define the species of interest
myRespName <- names(DataSpecies)[1]
# Load coordinates
myRespCoord <- read.table("data/matrices/NT_coords_5.txt", header=TRUE)
myBiomodData <- BIOMOD_FormatingData(resp.var = myResp,
expl.var = myExpl,
resp.xy = myRespCoord,
resp.name = myRespName,
na.rm = TRUE)
```
```{r, eval = FALSE}
myBiomodOption <- BIOMOD_ModelingOptions()
```
```{r, eval = FALSE}
myBiomodModelOut <- BIOMOD_Modeling(
myBiomodData,
models = c('MAXENT.Tsuruoka', 'RF'),
models.options = myBiomodOption,
NbRunEval=3,
DataSplit=80,
Prevalence=0.5,
VarImport=3,
models.eval.meth = c('TSS','ROC'),
SaveObj = TRUE,
rescal.all.models = FALSE,
do.full.models = FALSE,
modeling.id = paste(myRespName,"CurrentClim",sep=""))
```
```{r, eval = FALSE}
### save models evaluation scores and variables importance on hard drive
capture.output(get_evaluations(myBiomodModelOut),
file=file.path(myRespName,
paste(myRespName,"_formal_models_eval.txt", sep="")))
capture.output(get_variables_importance(myBiomodModelOut),
file=file.path(myRespName,
paste(myRespName,"_formal_models_var_imp.txt", sep="")))
```
```{r, eval = FALSE}
### Building ensemble-models
myBiomodEM <- BIOMOD_EnsembleModeling(
modeling.output = myBiomodModelOut,
chosen.models = 'all',
em.by='all',
eval.metric = c('TSS'),
eval.metric.quality.threshold = c(0.5),
prob.mean = T,
prob.cv = T,
prob.ci = T,
prob.ci.alpha = 0.05,
prob.median = T,
committee.averaging = T,
prob.mean.weight = T,
prob.mean.weight.decay = 'proportional')
```
```{r, eval = FALSE}
### Make projections on current variable
myBiomodProj <- BIOMOD_Projection(
modeling.output = myBiomodModelOut,
new.env = myExpl,
xy.new.env = myRespCoord,
proj.name = 'current',
selected.models = 'all',
binary.meth = 'TSS',
compress = TRUE,
clamping.mask = F,
output.format = '.RData')
myCurrentProj <- getProjection(myBiomodProj)
myCurrentProj
```
```{r, eval = FALSE}
### Make ensemble-models projections on current variable
myBiomodEF <- BIOMOD_EnsembleForecasting(
projection.output = myBiomodProj,
binary.meth = 'TSS',
total.consensus = TRUE,
EM.output = myBiomodEM)
```
# print summary
myBiomodEM
# get evaluation scores
getEMeval(myBiomodEM)
```{r, eval = FALSE}
### Make ensemble-models projections on current variable
# load the first speces binary maps which will define the mask
alphaMap <- reclassify(subset(myExpl,1), c(-Inf,Inf,0))
alphaMap <- get(load(paste(getwd(),"/",myRespName[1],"/proj_current/", "proj_current_",
myRespName[1],"_ensemble_TSSbin.RData",
sep="")))[,1]
```