-
Notifications
You must be signed in to change notification settings - Fork 1
/
12.Uncertainty_and_validation.rmd
78 lines (61 loc) · 9.8 KB
/
12.Uncertainty_and_validation.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# | Uncertainty and validation
Ideally, model prediction uncertainty provided in the GSOCseq map should include all sources of uncertainty that affect predictions, including model structural uncertainty, model parameters' and input data uncertainties. As a minimum, uncertainty should include input data uncertainties (e.g. Morais et al., 2019).
There are different methods to estimate uncertainties in the results. Monte Carlo methods, that draw random values from the probability distribution functions for inputs and parameters, are an efficient way to estimate the whole uncertainty of the modeled estimation (Ogle et al., 2010; FAO, 2019b; Morais et al., 2019). In Monte Carlo simulation methods, parameter values of the model and input data (e.g. mean temperature, clay content, carbon inputs) shall be randomly chosen from hypothetical normal distributions with mean equal to the parameter value and the measured standard error around that mean. Once all the different parameter values for the model are generated from the hypothetical distributions, a model run shall be made. This process is to be repeated 100 or more times to produce a mean model prediction with a 95 percent confidence interval. The Monte Carlo simulation would generate an expected value of SOC stocks for the different scenarios and a 95 percent confidence interval.
Uncertainty (U) shall be expressed as a percentage: half of the 95% confidence interval divided by the mean (Ogle et al., 2010). Thus, uncertainty can be estimated for each simulated scenario as:
\begin{equation}
\tag{12.1}
U \% = \frac{100 \times (ULCI-LLC)}{2 \times SOC_{av}}
\end{equation}
where $UL$ corresponds to the upper limit of the 95% confidence interval of the estimated SOC at the end of the simulation (in t C ha^-1^), LL corresponds to the lower limit of the 95% confidence interval of the estimated SOC at the end of the simulation (in t C ha^-1^); and $SOC_{av}$ the average of the estimated SOC at the end of the simulation (t C ha^-1^), after 20 years of the forward modelling, for each scenario.
To estimate uncertainties of the sequestration rates (uncertain quantities are combined by subtraction, e.g. $\Delta$ SOC = Stocks SSM - SOC stocks BAU), the uncertainty can be expressed in percentage terms was estimated by the following equation (IPCC, 2019):
\begin{equation}
\tag{12.2}
Ut = \sqrt((U1X1)^2 + ... + (UnXn)^2) \\
|X1+ ... +Xn|
\end{equation}
where $Ut$ is the percentage uncertainty in the subtraction of the quantities (half the 95 percent confidence interval divided by the total, i.e. mean, and expressed as a percentage), $x1$ - $n$ represent the quantities to be combined (e.g. Stocks SSM and SOC stocks BAU at the end of the forward simulation), and $U1$ to $n$ is the percentage uncertainties associated with each of the quantities (as estimated from equation 12.1).
However, Monte Carlo and related simulations (e.g. Markov Chain-Monte Carlo method, as in Hararuk et al., 2014; GLUE method, as in Salazar et al., 2011) usually require considerable computational capacity and may be time demanding, especially for long spin-up runs (>500 years), as multiple (>100) runs should be made for each modeling unit.
An alternative is to calculate uncertainties of the input data considering minimum and maximum values (corresponding to the limits of a 95% confidence interval) of a set of predefined input parameters, considered to have the greatest influence in RothC modeling results (initial SOC, Carbon inputs, and soil and climatic variables). Thus, uncertainties can be estimated for each modeling unit and for each scenario by estimating first the minimum and maximum SOC simulated values (similarly to VCS, 2012) using a predefined arrangement of inputs:
\begin{equation}
\tag{12.3}
SOC max = Model(SOC_{FAO \ max}, Ci_{max}, Temp_{min}, Pp_{max}, Clay_{max})
\end{equation}
\begin{equation}
\tag{12.4}
SOC min = Model(SOC_{FAO \ min}, Ci_{min}, Temp_{max}, Pp_{min}, Clay_{min})
\end{equation}
where $SOC_{FAO \ min}$ and $SOC_{FAO \ max}$ are respectively the minimum and maximum value for the simulated SOC stocks; $SOC_{FAO \ min}$ and $SOC_{FAO \ max}$ are the minimum and maximum value for the initial SOC GSCOCmap stocks estimated at the 95% confidence interval are the $Ci_{min}$ and $Ci_{max}$ are respectively the minimum and the maximum value for the annual carbon inputs estimated at the 95% confidence interval; $Temp_{min}$ and $Temp_{max}$ are respectively the minimum and maximum value for the average monthly air temperature estimated at the 95% confidence interval; $Pp_{min}$ and $Pp_{max}$ are respectively the minimum and maximum value for the average monthly precipitation estimated at the 95% confidence interval; and Clay min and Clay max are respectively the minimum and maximum value for the soil clay content (0-30 cm) estimated at the 95% confidence interval. The arrangement of variables to generate minimum and maximum SOC stocks are to be generated considering the effects of each variable on NPP, decomposition rates, and overall carbon dynamics (Chapters 4 and 5).
If information is available, the minimum and maximum value of each parameter (C input, Temp, Pp and Clay) that define de 95% confidence interval can be estimated from its variation and mean value, assuming that values of the parameter are normally distributed about the mean:
\begin{equation}
\tag{12.5}
P_{min} = Xp - 1.96 \times SEp
\end{equation}
\begin{equation}
\tag{12.6}
P_{max} = Xp - 1.96 \times SEp
\end{equation}
where $P_{min}$ and $P_{max}$ are respectively the minimum and maximum value for parameter P (C input, Temp, Pp or Clay) estimated at the 95% confidence interval; $Xp$ is the average value of that parameter; and $SEp$ is the standard error of the mean of that parameter.
Uncertainties already generated in the latest GSOCmap can be used to obtain the min and max SOC FAO values. Uncertainties in C inputs and thus Ci max and min can be estimated from available data (e.g. meta-analysis). Temp max and Temp and PP max and PPmin can be estimated from the average monthly values and confidence intervals of the climatic series to be modeled. Uncertainties in clay contents can be directly obtained from SOIL GRIDS (https://soilgrids.org/) if the ISRIC database is to be used for the clay content layers. If no estimate of clay variation is available for the used database, Clay max and clay min can be determined from clay content variation within the 1km x 1km grid cells (i.e. considering the values from 250 m x 250m resolution grids).
If no estimate of the SE or CI is available for these parameters, a maximum and minimum value can be estimated for these parameters, using general uncertainty coefficients, as those reported from global modelling exercises by Gottschalk et al. (2007) and Hastings et al. (2010). Average uncertainties for these parameters are summarized in Table 12.1.
Input data layers (SOC FAO, temperature, precipitation, clay content, and C inputs) need to be re-prepared for the different modelling phases considering the maximum and minimum values for each data input outlined in equations 12.3 and 12.4.
General uncertainties in SOC sequestration can be then estimated for all scenarios. The model is to be run 2 two more times for each modelling unit and scenario in the different modelling phases: once using the selection of values to obtain a maximum expected SOC (eq 12.2), and once using the selection of input values to obtain a minimum expected SOC change (eq 12.3). SOC stocks are then modelled for each modelling unit. Uncertainties can be then expressed in % as in equation 12.1 for each scenario (considering the average SOC values as the ones obtained in the first modelling run for each scenario):
\begin{equation}
\tag{12.7}
U \%=\frac{100 \times (SOC_{max} - SOC_{min})}{2 \times SOC}
\end{equation}
where $SOC_{max}$ corresponds to the upper limit of simulated the SOC stocks (in tC ha^-1^) at the end of the forward modelling phase using the combination of inputs in equation 12.3, $SOC_{min}$ corresponds to the lower limit of the simulated SOC stocks ( in tC ha^-1^) at the end of the forward modelling phase using the combination of inputs in equation 12.4; and SOC the average simulated SOC stocks (tC ha^-1^), for each scenario, after 20 years of the forward modeling phase. Uncertainties for the absolute and relative sequestration rates can be then estimated using equation 12.2, for each scenario.
**Table 12.1** *General uncertainties of main parameters affecting SOC dynamics. Derived from Gottschalk et al. (2007) and Hastings et al. (2010).*
```{r,echo=FALSE}
library(kableExtra)
library(dplyr)
dt <- read.csv("tables/Table_12.1.csv", sep = ";")
kable(dt, col.names = gsub("[.]", " ", names(dt))) %>%
kable_styling("striped", full_width = F)
```
The model should be validated for the conditions in which it will be applied when possible. The use of models for prediction involves a series of problems for validation, as data required to quantify the accuracy of the estimates do not yet exist. Nonetheless, predictive models can be validated if they explain past events (ex-post validation). If local results from different SSM practices on SOC stocks are available (a meta-analysis of local SSM practices can be conducted), and the collected activity data allow to perform simulations with these records, model-produced estimates shall be compared with the observed results. The RMSE shall be used to compare the divergence between model estimates and field observations. The RMSE can be expressed as:
\begin{equation}
\tag{12.8}
RMSE = \frac{\sqrt(\sum_{i=1}^n (0_i - P_i)^2)}{n}
\end{equation}
where $Pi$ is the predicted (modelled) value, $Oi$ is the observed value, $n$ is the number of measured
Relative $RMSE$ can be expressed as a % of the observed mean. These results should be specified in the report accompanying the mapping product.