-
Notifications
You must be signed in to change notification settings - Fork 6
Data formats and preprocessing
In Iturbide et al., 2015, we indicate the adequacy of using different ecotypes for modelling species distributions. Thus, functions in the mopa
package are intended to deal with more than one group of presences simultaneously. In this example, we use a list
of 2 different Oak haplotypes (H11 and H1, see Table 1 in the manuscript), but the same steps may be followed in a joint analysis of multiple species. Of course, the most typical case (i.e., dealing with with a single group of species) can be also easily accomplished by providing a data.frame
. The data set of species in this example is Oak_phylo2
and is provided with the mopa
package. This is a modified subset of the Quercus sp Europe Petit 2002 database (Petit et al., 2002b), which is available in the Georeferenced Database of Genetic Diversity or (GD)2 . To aid in map representation, a dataset called wrld
containing a World map is also included in the package.
# Load map and Oak data
data(wrld)
data(Oak_phylo2)
# Map
plot(wrld, asp = 1, xlim= c(-10,50), ylim=c(40,60))
for (i in 1:length(Oak_phylo2)) {
points(Oak_phylo2[[i]], pch = "*", cex = 0.5, col = colors()[i*50])
}
Predictor variables are typically stored in raster files. The different raster layers can be efficiently handled in R using the utilities of the raster
package. mopa
uses as input this type of raster objects. In particular, multiple layers can be arranged in a collection of RasterLayers
objects called a RasterStack
(see ?raster::raster for more information on raster objects).
# RatserStack of environmental variables
data(biostack)
spplot(biostack$baseline)
The regularly distributed grid points covering the continental area can be created from the environmental rasters that are used for modeling. Function delimit
creates a rectangular polygon shape from the bounding coordinates and does the intersection of the background points (e.g. sp_grid_world). A list with two objects is obtained: (1)bbs
: polygon shape of the bounding boxes and (2)bbs.grid
: a list of data frames of the background point grid limited by the bounding coordinates.
# extract a raster object to be used as reference
projection(biostack$baseline) <- CRS("+proj=longlat +init=epsg:4326")
r <- biostack$baseline$bio1
If we want to use the whole study area (defined by the raster object used as spatial reference) in the modeling phase, the only object to be passed to function backgroundGrid
is the reference raster (here r
).
bg <- backgroundGrid(raster = r)
# Plot presences and bounding boxes
plot(bg$plygons, asp = 1)
points(bg$xy, col = "gray", cex = 0.5, pch = 18)
for (i in 1:length(Oak_phylo2)){
points(Oak_phylo2[[i]], col = colors()[i*50])
}
We could also consider a subdomain. If the case, an extent object (type ?extent) is passed to parameter spatial.subset
.
bg.subdomain <- backgroundGrid(raster = r, spatial.subset = extent(c(-10,35,45,65)))
plot(bg.subdomain$plygons, asp = 1)
points(bg.subdomain$xy, col = "gray", cex = 0.5, pch = 18)
for (i in 1:length(Oak_phylo2)){
points(Oak_phylo2[[i]], col = colors()[i*50])
}
Alternatively, if we want to bound the modeling area to the species distribution domain, the object to be passed to spatial.subset
is the known presence localities of the target species. In this case, since the Oak_phylo2
list contains two groups of points (one for each haplotype considered), a list of two matrices is created.
bg.species <- backgroundGrid(raster = r, spatial.subset = Oak_phylo2)
plot(bg.species$plygons, asp = 1)
for (i in 1:length(Oak_phylo2)){
points(bg.species$xy[[i]], col = "gray", cex = 0.3, pch = 18)
points(Oak_phylo2[[i]], col = colors()[i*50])
}