forked from FAO-SID/SOC-Mapping-Cookbook
-
Notifications
You must be signed in to change notification settings - Fork 0
/
05-spatial-covariates.Rmd
314 lines (195 loc) · 21.8 KB
/
05-spatial-covariates.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
# Preparation of spatial covariates {#covariates}
*R. Baritz & Y. Yigini*
The authors of this Chapter used **R** packages. To run the code provided in this Chapter, the following package need to be installed in the **R** user library. If the package is not yet installed, the `install.packages()` function can be used.
```{r, eval=FALSE}
# Installing package for Chapter 'Preparation of Spatial Covariates'
install.packages("raster")
```
The example covariates from this Chapter were prepared by ISRIC. The access and use limitations are presented in Section \@ref(GSOCDataRepo). A small subset of these covariates, comprising most of the soil forming factors, will be used for the code examples. This subset is presented in the following table.
```{r covariates, echo=F}
covs.t <- read.csv("covs/CovsDescription.csv")
knitr::kable(covs.t[,c(1,3)], caption = "Code and name of example covariates provided with the cookbook", booktabs = TRUE)
```
## DEM-derived covariates
This Section gives a short overview on available DEM source data sets. Currently, two global level 30m DEMs are freely available: the Shuttle Radar Topographic Mission (SRTM) and the ASTER Global Digital Elevation Model (GDEM). They provide topographic data at the global scale, which are freely available for users. Both DEMs were compared by @Wong2014. Comparison against high-resolution topographic data of Light Detection and Ranging (LiDAR) in a mountainous tropical montane landscape showed that the SRTM (90 m) produced better topographic data in comparison with ASTER GDEM.
* Recommended for **national level** applications: ASTER GDEM/SRTM with 30 m resolution.
* Recommended for **global level** applications: SRTM with 90 m resolution, resampled to 1 km.
In both cases, noise and artefacts need to be filtered out. ASTER GDEM seems to contain more large artifacts (e.g. peaks), particularly in flat terrain, which are very difficult to remove through filtering.
```{r, fig.cap="SRTM 90 m resampled to 1 km for FYROM", warning=FALSE}
library(raster)
# Load DEM from raster *.tif file
DEM <- raster("covs/DEMENV5.tif")
plot(DEM)
```
> Tip for GRASS GIS or GDAL: Use *mdenoise* module/utility to remove noise while preserving sharp features like ridges, lines, and valleys.
SRTM contains many gaps (pixels with no data). These gaps could be filled using splines. SAGA GIS has a module called *Close Gaps with Splines* and other similar tools for doing this.
## Parent material
Parent material has a crucial impact on soil formation, soil geochemistry, and soil physics. Parent material, if not specifically mapped by soil mappers and included in soil maps, is usually available from geological maps. These maps focus on rock formation, mineral components, and age, and often lack younger surface sediments (even in quaternary maps). Parent material/rock types classified by soil mappers considers more strongly geochemistry and rock structure. Its geochemistry has an essential impact on the soil chemistry, e.g. cation exchange capacity, base saturation, and nutrient stock. The rock structure determines the ability to disintegrate, which has an impact on soil physical properties, like texture, skeleton content, permeability, and soil thickness.
National parent material and geological maps may be used. Other available datasets and data portals are given on the ISRIC [WorldGrids](http://worldgrids.org/doku.php) website that can be accessed at this link: http://worldgrids.org/doku.php.
* **OneGeology**: The world geological maps are now being integrated via the OneGeology project which aims at producing a consistent geological map of the world in approximate scale 1:1M [@jackson2007onegeology]; link: http://www.onegeology.org/.
* **USGS**: United States Geological Survey (USGS) as several data portals, e.g. that allow browsing of the International Surface Geology (split into South Asia, South America, Iran, Gulf of Mexico, Former Soviet Union, Europe, Caribbean, Bangladesh, Asia Pacific, Arctic, Arabian Peninsula, Africa and Afghanistan); link: https://mrdata.usgs.gov/geology/world/.
* **GLiM**: @hartmann2012new have assembled a global, purely lithological database called GLiM (Global Lithological Map). GLiM consists of over 1.25 million digital polygons that are classified into three levels (a total of 42 rock-type classes); link: https://www.geo.uni-hamburg.de/en/geologie/forschung/geochemie/glim.html).
* **USGS** and **ESRI**: Both jointly released in 2014 a Global Ecological Land Units map at 250 m resolution. This also includes world layer of rock types. This data can be downloaded from the USGS site; link: (http://rmgsc.cr.usgs.gov/outgoing/ecosystems/Global/).
## Soil maps
Soil maps play a crucial role for upscaling soil property data from point locations. They can be the spatial layer for conventional upscaling, they can also serve as a covariate in DSM. Predicted soil property maps have lower quality in areas where the covariates such as relief, geology, and climate do not correlate well with the dependent variable, here SOC stocks. This is especially true for soils under groundwater or stagnic water influence. This information is well-represented in soil maps.
### Global HWSD soil property maps
FAO, ISRIC, the International Institute for Applied Systems Analysis (IIASA), the Institute of Soil Science, Chinese Academy of Sciences (ISS CAS), and the Joint Research Center of the European Commision (JRC) together produced a gridded 1 km soil class map (HWSD). Global HWSD-derived soil property maps can be downloaded as GeoTIFF files at the following link http://worldgrids.org/doku.php/wiki:layers#harmonized_world_soil_database_images_5_km (see Section \@ref(GSOCDataRepo)).
```{r, fig.cap="Soil map of FYROM", eval=T, fig.width=8}
# Load the soil map from a shapefile *.shp file
soilmap <- shapefile("MK_soilmap_simple.shp")
# Plot the DEM together with the soil types
plot(DEM)
lines(soilmap)
```
> Digitized small-scale national soil maps are the most important spatial layer for soil property mapping. The higher their resolution, the better soil maps contribute to high-quality soil property maps - considering that the map should cover the target area/full country coverage.
### Technical steps - Rasterizing a vector layer in R
```{r, fig.cap='Rasterized soil map of FYROM from DEM raster layer', eval = T}
# The Symbol attribute from the vector layer will be used for the
# rasterization process. It has to be a factor
soilmap@data$Symbol <- as.factor(soilmap@data$Symbol)
# Save the levels names in a character vector
Symbol.levels <- levels(soilmap$Symbol)
# The rasterization process needs a layer with the target grid
# system: spatial extent and cell size
# The DEM raster layer could be used for this
soilmap.r <- rasterize(x = soilmap, y = DEM, field = "Symbol")
plot(soilmap.r, col=rainbow(21))
legend("bottomright",legend = Symbol.levels, fill=rainbow(21),
cex=0.5)
```
## Land cover and land use
Besides soil, geology, and climate, land use and/or land cover data are unarguably vital data for any statistical effort to map soil properties. There are many of various sources of data on the land cover including global and continental products, such as GlobCover, GeoCover, GlobeLand30, CORINE Land Cover.
For further reading and other global data sources not listed below, see the information following this link: http://worldgrids.org/doku.php/wiki:land_cover_and_land_use.
```{r, background=('#F7F7F7'), R.options=(width = 69)}
# Load the landcover from the raster *.tif file
landcover <- raster("covs/LCEE10.tif")
# Land cover is a categorical covariate, this has to be made
# explicit using function as.factor()
landcover <- as.factor(landcover)
```
### GlobCover (Global)
GlobCover is a European Space Agency (ESA) initiative which began in 2005 in partnership with JRC, FAO, the European Environmental Agency (EEA), the United Nations Environment Programme (UNEP), the Global Observation for Forest Cover and Land Dynamics (GOFC-GOLD), and the International Geosphere-Biosphere Programme (IGBP).
The aim of the project was to develop a service capable of delivering global composites and land cover maps using as input observations from the 300 m MERIS sensor onboard the ENVISAT satellite mission. ESA makes available the land cover maps, which cover 2 periods: December 2004 - June 2006 and January - December 2009.
The classification module of the GlobCover processing chain consists in transforming the MERIS-FR multispectral mosaics produced by the pre-processing modules into a meaningful global land cover map. The global land cover map has been produced in an automatic and global way and is associated with a legend defined and documented using the UN Land Cover Classification System (LCCS). The GlobCover 2009 land cover map is delivered as one global land cover map covering the entire Earth. Its legend, which counts 22 land cover classes, has been designed to be consistent at the global scale and therefore, it is determined by the level of information that is available and that makes sense at this scale [@bontemps2011globcover].
The GlobCover data are available here http://due.esrin.esa.int/page_globcover.php.
### Landsat GeoCover (Global)
The Landsat GeoCover collection of global imagery was merged into mosaics by the Earth Satellite Company (now MDA Federal). The result was a series of tiled imagery that is easier to wield than individual scenes, especially since they cover larger areas than the originals. The great detail in these mosaic scenes, however, makes them large in storage size, so the Mr.Sid file format, which includes compression operations, was chosen for output. While GeoCover itself is available in three epochs of 1975, 1990 and 2000, only the latter two epochs were made into mosaics.
The GeoCover Landsat mosaics are delivered in a Universal Transverse Mercator (UTM) / World Geodetic System 1984 (WGS84) projection. The mosaics extend north-south over 5 degrees of latitude and span east-west for the full width of the UTM zone. For mosaics below 60 degrees north latitude, the width of the mosaic is the standard UTM zone width of 6 degrees of longitude. For mosaics above 60 degrees of latitude, the UTM zone is widened to 12 degrees, centered on the standard even-numbered UTM meridians. To insure overlap between adjacent UTM zones, each mosaic extends for at least 50 kilometers to the east and west, and 1 kilometer to the north and south. The pixel size is 14.25 meters (V 2000).
The Landsat GeoCover data are available here ftp://ftp.glcf.umd.edu/glcf/Mosaic_Landsat/ (FTP Access).
### GlobeLand30 (Global)
GlobeLand30, the Earth’s first global land cover dataset at 30 m resolution for the years 2000 and 2010, was recently released and made publicly available by China. The National Geomatics Center of China under the *Global Land Cover Mapping at Finer Resolution* project has recently generated a global land cover map named GlobeLand30. The dataset covers two timestamps of 2000 and 2010, primarily acquired from Landsat TM and ETM+ sensors, which were then coupled/checked with some local products.
The GlobaLand30 data are publicly available for non-commercial purposes here http://www.globallandcover.com/GLC30Download/index.aspx.
### CORINE land cover (Europe only)
The pan-European component is coordinated by EEA and produces satellite image mosaics, land cover/land use information in the CORINE Land Cover data, and the high-resolution layers.
The CORINE Land Cover is provided for the years 1990, 2000, 2006 and 2012. This vector-based dataset includes 44 land cover and land use classes. The time-series also includes a land-change layer, highlighting changes in land cover and land-use. The high-resolution layers are raster-based datasets (100 m, 250 m) which provide information about different land cover characteristics and is complementary to land-cover mapping (e.g. CORINE) datasets.
The CORINE Land Cover data are available here http://www.eea.europa.eu/data-and-maps/data.
## Climate
### WorldClim V1.4 and V2 (Global)
WorldClim is a set of global climate layers (gridded climate data) with a spatial resolution of about 1 km^2^ (10 minutes, 5 minutes, 2.5 minutes are also available). These data can be used for mapping and spatial modeling. The current version is Version 1.4. and a preview of Version 2 is available for testing at http://worldclim.org/. The data can be downloaded as generic grids or in ESRI grid format.
The WorldClim data layers were generated by interpolation of average monthly climate data from weather stations on a 30 arc-second resolution grid. In V1.4, variables included are monthly total precipitation, and monthly mean, minimum and maximum temperatures, and 19 derived bioclimatic variables. The WorldClim precipitation data were obtained from a network of 1,473 stations, mean temperature from 24,542 stations, and minimum and maximum temperatures from 14,835 stations [@hijmans2005very].
The Bioclimatic parameters are: annual mean temperature, mean diurnal range, iso-thermality, temperature seasonality, max temperature of warmest month, minimum temperature of coldest month, temperature annual range , mean temperature of wettest quarter, mean temperature of driest quarter, mean temperature of warmest quarter, mean temperature of coldest quarter, annual precipitation, precipitation of wettest month, precipitation of driest month, precipitation seasonality (coefficient of variation), precipitation of wettest quarter, precipitation of driest quarter, precipitation of warmest quarter, precipitation of coldest quarter.
WorldClim Climate Data are available at: www.worldclim.org (WorldClim 1.4 (current conditions) by www.worldclim.org; @hijmans2005very. Is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License).
```{r}
# Load the climate covariates from the raster *.tif files
files <- list.files(path = "covs/", pattern = "CHE3.tif",
full.names = TRUE)
# Stack all the files in one RasterStack
climate <- stack(files)
```
```{r, fig.cap="Two different temperature layers included in the climate covariates"}
# Plot the first two layers
plot(climate[[1:2]])
```
### Gridded agro-meteorological data in Europe (Europe)
CGMS database contains meteorological parameters from weather stations interpolated on a 25×25 km grid. Meteorological data are available on a daily basis from 1975 to the last calendar year completed, covering the EU member states, and neighboring European countries.
The following parameters are available at 1-day time resolution:
* Maximum air temperature (°C)
* Minimum air temperature (°C)
* Mean air temperature (°C)
* Mean daily wind speed at 10m (m/s)
* Mean daily vapor pressure (hPa)
* Sum of precipitation (mm/day)
* Potential evaporation from a free water surface (mm/day)
* Potential evapotranspiration from a crop canopy (mm/day)
* Potential evaporation from a moist bare soil surface (mm/day)
* Total global radiation (KJ/m2/day)
* Snow depth
Data is accessible at the following link: http://agri4cast.jrc.ec.europa.eu/DataPortal/Index.aspx.
## GSOCMap - Data repository (ISRIC, 2017) {#GSOCDataRepo}
ISRIC World Soil Information has established a data repository which contains raster layers of various biophysical earth surface properties for each territory in the world. These layers can be used as covariates in any DSM exercise.
### Covariates and empty mask
The territories and their boundaries are obtained from the Global Administrative Unit Layers (GAUL) dataset. Each folder contains three subfolders:
* **Covs**: GIS layers of various biophysical earth surface properties.
* **Mask**: One *empty* grid file of the territory with territory boundary according to GAUL This grid to be used for the final delivery.
* **Soilgrids**: All `SoilGrids250m` soil class and property layers as available through www.soilgrids.org. Layers are aggregated to 1 km.
### Data specifications
The data is provided as follows:
* **File format**: GeoTIFF
* **Coordinate system**: WGS84, latitude-longitude in decimal degrees
* **Spatial resolution**: 1 km
### Data access
The data can be accessed at the following links:
ftp.isric.org/ (username: gsp, password: gspisric) or ftp://85.214.253.67/ (username: gsp, password: gspisric)
LICENCE and ACKNOWLEDGEMENT
**The GIS layers can be freely used under the condition that proper credit should be given to the original data source in each publication or product derived from these layers. Licences, data sources, data citations are indicated the data description table.**
## Extending the soil property table for spatial statistics
The upscaling procedures (see Chapter \@ref(mappingMethods)) depend on the rationale that the accumulation of local soil carbon stocks (and also other properties) depend on parameters for which spatial data are available, such as climate, soil type, parent material, slope, management. This information (Covariates) must be collected first. Details are provided above. The properties contained in the covariates can be extracted for each georeferenced sample site and added to the soil property table (Tab. \@ref(tab:site-level)). This table is used for training and validation of the statistical model for predicting the SOC stocks which subsequently can be applied to the full spatial extent.
## Preparation of a soil property table for spatial statistics
The upscaling procedures (see Chapter \@ref(mappingMethods)) depend on the rationale, that the accumulation of local soil carbon concentrations and stocks (and also other properties) depends on influential parameters for which spatial data are available, such as climate, soil type, parent material, slope, management. Any parameter in the table of local soil properties, for which a spatial layer is available, may be included in the final table. Other covariates will be added in Section \@ref(overlay-soil-covariates).
> In case this table is prepared for different depths, 0 cm - 10 cm, 10 cm - 30 cm, and if the host institution intends to develop different spatial models for different depths (e.g. separate spatial prediction model for litter and mineral soil 0 cm - 30cm), then the separate grids have to be added.
## Technical steps - Overlay covariates and soil points data {#overlay-soil-covariates}
**Step 1 - Load soil sample data and covariates**
```{r}
# Load the processed data
# This table was prepared in the previous Chapter
dat <- read.csv("data/dat_train.csv")
# Read covariates from raster *.tif files
files <- list.files(path = "covs", pattern = "tif$",
full.names = TRUE)
covs <- stack(files)
```
**Step 2 - Adding raster soilmap to the raster stack**
```{r}
# soilmap.r is the rasterization of the soil map and was obtained
# in a previous step
covs <- stack(covs, soilmap.r)
# Correct the name for layer 14
names(covs)[14] <- "soilmap"
```
Finally, we will mask the covariates with a mask developed using the country limits. Next, we will export all the covariates as `*.RData` file. This will allow us to load this file in the following Chapters of this cookbook.
```{r, fig.cap="FYROM soil covariates"}
# Mask the covariates with the country mask from the data repository
mask <- raster("data/mask.tif")
covs <- mask(x = covs, mask = mask)
# Export all the covariates
save(covs, file = "covariates.RData")
plot(covs)
```
**Step 3 - Overlay covariates and point data**
In order to carry out DSM in terms of examining the statistical significance of environmental predictors for explaining the spatial variation of SOC, we should link both sets of data together and extract the values of the covariates at the locations of the soil point data.
Note that the stacking of rasters can only be possible if they are in the same resolution and extent. If they are not, **raster** package resample and `projectRaster()` functions are for harmonizing all your different raster layers. With the stacked rasters of environmental covariates, we can now perform the intersection and extraction.
```{r}
# Upgrade points data frame to SpatialPointsDataFrame
coordinates(dat) <- ~ X + Y
# Extract values from covariates to the soil points
dat <- extract(x = covs, y = dat, sp = TRUE)
# LCEE10 and soilmap are categorical variables
dat@data$LCEE10 <- as.factor(dat@data$LCEE10)
dat@data$soilmap <- as.factor(dat@data$soilmap)
levels(soilmap) <- Symbol.levels
summary(dat@data)
```
After the extraction, it is useful to check if there are not available/missing values, so-called `NA` values, both in the target variable and covariates. In these cases, these data should be excluded. A quick way to assess if there are missing or `NA` values in the data is to use the `complete.cases()` function.
After removing the `NA` values, now there do not appear to be any missing data as indicated by the `integer(0)` output above. It means we have zero rows with missing information.
The last step involves exporting a table which is our regression matrix including the soil data and the values of the environmental covariates in the position of the point samples.
The summary of the `dat data.frame` shows one point with `NA` values for most of the covariates, and 22 points with `NA` values in the soilmap layer. The regression matrix should not contain `NA` values. There are two options to proceed:
* **Option 1**: In some cases, these `NA` values are from points with bad position data. Therefore, the points area outside the study area. In this case, the solution is to correct the coordinates or eliminate the points. This is the case for the two points with `NA` values for most of the covariates.
* **Option 2**: Another case is when a covariate is incomplete and does not cover all the area. This could produce many `NA` values. There are two different solutions, either to eliminate the covariate or to eliminate the point data. This is the case for the soilmap layer and the 30 points.
**Step 4 - Convert result to data.frame and save as a *.csv table**
```{r}
dat <- as.data.frame(dat)
# The points with NA values have to be removed
dat <- dat[complete.cases(dat),]
# Export as a *.csv table
write.csv(dat, "data/MKD_RegMatrix.csv", row.names = FALSE)
```