Skip to content

Data formats and preprocessing

miturbide edited this page Mar 21, 2017 · 10 revisions

Presence data

In Iturbide et al., 2015, we indicate the adequacy of using different ecotypes for modelling species distributions. Thus, functions in the mopa package are intended to deal with more than one group of presences simultaneously. In this example, we use a list of 2 different Oak haplotypes (H11 and H1, see Table 1 in the manuscript), but the same steps may be followed in a joint analysis of multiple species. Of course, the most typical case (i.e., dealing with with a single group of species) can be also easily accomplished by providing a data.frame. The data set of species in this example is Oak_phylo2 and is provided with the mopa package. This is a modified subset of the Quercus sp Europe Petit 2002 database (Petit et al., 2002b), which is available in the Georeferenced Database of Genetic Diversity or (GD)2 . To aid in map representation, a dataset called wrld containing a World map is also included in the package.

# Load map and Oak data
data(wrld)
data(Oak_phylo2)

# Map
plot(wrld, asp = 1, xlim= c(-10,50), ylim=c(40,60))
for (i in 1:length(Oak_phylo2)) {
        points(Oak_phylo2[[i]], pch = "*", cex = 0.5, col = colors()[i*50])
}

Environmental variables

Predictor variables are typically stored in raster files. The different raster layers can be efficiently handled in R using the utilities of the raster package. mopa uses as input this type of raster objects. In particular, multiple layers can be arranged in a collection of RasterLayers objects called a RasterStack (see ?raster::raster for more information on raster objects).

# RatserStack of environmental variables
data(biostack)
spplot(biostack$baseline)

Background grid

The regularly distributed grid points covering the continental area can be created from the environmental rasters that are used for modeling. Function delimit creates a rectangular polygon shape from the bounding coordinates and does the intersection of the background points (e.g. sp_grid_world). A list with two objects is obtained: (1)bbs: polygon shape of the bounding boxes and (2)bbs.grid: a list of data frames of the background point grid limited by the bounding coordinates.

# extract a raster object to be used as reference 
projection(biostack$baseline) <- CRS("+proj=longlat +init=epsg:4326")
r <- biostack$baseline$bio1

Considering the whole study area

If we want to use the whole study area (defined by the raster object used as spatial reference) in the modeling phase, the only object to be passed to function backgroundGrid is the reference raster (here r).

bg <- backgroundGrid(raster = r)


# Plot presences and bounding boxes
plot(bg$plygons, asp = 1)
points(bg$xy, col = "gray", cex = 0.5, pch = 18)
for (i in 1:length(Oak_phylo2)){
  points(Oak_phylo2[[i]], col = colors()[i*50])
}

Considering a subdomain

We could also consider a subdomain. If the case, an extent object (type ?extent) is passed to parameter spatial.subset.

bg.subdomain <- backgroundGrid(raster = r, spatial.subset = extent(c(-10,35,45,65)))
plot(bg.subdomain$plygons, asp = 1)
points(bg.subdomain$xy, col = "gray", cex = 0.5, pch = 18)
for (i in 1:length(Oak_phylo2)){
  points(Oak_phylo2[[i]], col = colors()[i*50])
}

Bounded study area around presences

Alternatively, if we want to bound the modeling area to the species distribution domain, the object to be passed to spatial.subset is the known presence localities of the target species. In this case, since the Oak_phylo2 list contains two groups of points (one for each haplotype considered), a list of two matrices is created.

bg.species <- backgroundGrid(raster = r, spatial.subset = Oak_phylo2)
plot(bg.species$plygons, asp = 1)
for (i in 1:length(Oak_phylo2)){
  points(bg.species$xy[[i]], col = "gray", cex = 0.3, pch = 18)
  points(Oak_phylo2[[i]], col = colors()[i*50])
}


go to next section -->