-
Notifications
You must be signed in to change notification settings - Fork 0
/
01-ch2.Rmd
462 lines (333 loc) · 20.5 KB
/
01-ch2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
# (PART) Geostatistical Data {-}
```{r global_options, include=FALSE}
knitr::opts_chunk$set(warning=FALSE, message=FALSE, echo=FALSE)
```
```{r, echo=FALSE}
library(ggplot2)
library(plotly)
library(gstat)
library(latex2exp)
library(tidyverse)
```
# Geostatistics
## Targeted Problems
Spatial data is considered as a random process $\{Z(s),s\in D\}$ in this part.
- $D$: a is a fixed subset of $\mathbb{R}^d$ with positive d-dimensional volume.
- $s$: locations varies _continuously_ throughout the region $D$.
Set `coalash` dataset as an example (Figure \@ref(fig:coalash-scatter)). $D$ is the region with values. and $s$ indicates percent of coalash in this location. Many kinds of exploratory statistics can be applied here to test stationarity, local stationarity and so on.
```{r, coalash-scatter, fig.cap = "Scatter Plot of CoalAsh Percent", echo = FALSE}
data("coalash")
plot_ly(x = coalash$x, y = coalash$y, z = coalash$coalash,
type='scatter3d', mode='markers', color = coalash$coalash, alpha = 0.7)
```
The key idea in this chapter is to model the above random process $\{Z(s),s\in D\}$ with values on known locations. Then inference of unknown locations can be made.
## Spatial Data Example
### Main Concepts
Variogram is the crucial parameter of geostatistics.
> **Intrinsic Stationarity** \
Based on the random process mentioned above, intrinsic stationarity is defined through first differences
\begin{equation}
\begin{aligned}
&E\{Z(s+h) - Z(s)\} = 0,\\
&Var\{Z(s+h) - Z(s)\} = 2\gamma(h).
\end{aligned}
(\#eq:stationary2)
\end{equation}
> **Variogram** \
Variogram is $2\gamma(h)$ in the definition of intrinsic stationarity.
### Esitimation of Variogram
#### Estimator
The classical estimator of the variogram: \
\begin{equation}
2 \hat{\gamma}(h) \equiv \frac{1}{|N(h)|} \sum_{N(h)}\left(Z\left(s_{i}\right)-Z\left(s_{j}\right)\right)^{2}
(\#eq:v-estimation-classic)
\end{equation}
Estimator \@ref(eq:v-estimation-classic) is unbiased. However, it is badly affected by atypical observations due to square operation.
For an approximately symmetric distribution, there is a more robust approach to the estimation of the variogram by transforming the problem to location estimation. For a Gaussian process, we have:
\begin{equation}
2 \bar{\gamma}(\mathrm{h}) \equiv\left\{\frac{1}{|N(\mathrm{~h})|} \sum_{N(\mathrm{~h})}\left|Z\left(\mathrm{~s}_{i}\right)-Z\left(\mathrm{~s}_{j}\right)\right|^{1 / 2}\right\}^{4} /\left(0.457+\frac{0.494}{|N(\mathrm{~h})|}\right).
(\#eq:v-estimation-robust)
\end{equation}
Estimator \@ref(eq:v-estimation-robust) is also unbiased.
#### Visualization
**Variogram Cloud**: \
Cousin to estimator \@ref(eq:v-estimation-classic), variogram cloud is a useful diagnostic tool. Similarly, it can be badly affected by atypical observations. Variogram cloud in east-west direction of coalash data is shown in Figure \@ref(fig:v-cloud)
Here are steps to draw variogram cloud. \
- Group points with $y$ value;
- Calculate the classical estimator of the variogram $\frac{1}{|N(h)|}\cdot \displaystyle\sum_{N(h)}{(z(s_i) - z(s_j))^2}$. \
- $N(h) = \{(s_i, s_j):\ s_i + he = s_j\}$
- $e$: the unit vector of east direction (positive x axis)
- Notice that y values of each pair $(s_i, s_j)$ are the same. But they may vary among pairs.
- Plot the corresponding boxplot. \
- x axis: distance $h$
- y axis: the classical estimator of the variogram with parameter h, Units are $(\% coal ash)^2$
```{r, v-cloud, fig.cap="Variogram Cloud in the East-West Direction", echo = FALSE}
get_df <- function(coalash, fun){
df <- data.frame()
for(j in min(coalash$y):max(coalash$y)){
data <- coalash[which(coalash$y == j),]
x_len <- length(data$x)
mat1 <- matrix(rep(data$x, times = x_len), ncol = x_len)
mat_h <- mat1 - t(mat1)
mat2 <- matrix(rep(data$coalash, times = x_len), ncol = x_len)
mat_z <- mat2 - t(mat2)
df_current <- data.frame(h = as.numeric(mat_h),
z = fun(as.numeric(mat_z))) %>% filter(h > 0)
df <- rbind(df, df_current)
}
return(df)
}
df <- get_df(coalash, function(x){return(x**2)})
p <- ggplot(df, aes(x = as.factor(h), y = z)) +
geom_boxplot(aes(fill = h, color = h, alpha = 0.7), outlier.alpha = 0.7) +
theme_light() +
# labs(x = "h", y = TeX("$(Z(h+s)-Z(s))^2$"),
labs(x = "h", y = ("Variogram"),
title = "Variogram Cloud in the East-West Direction") +
theme(plot.title = element_text(hjust = 0.5)) +
guides(fill = "none", color = "none")
ggplotly(p)
```
**Square-Root-Differences Cloud**: \
The only different from variogram cloud: calculate the classical estimator of the variogram $\frac{1}{|N(h)|}\cdot \displaystyle\sum_{N(h)}{|Z(s_i) - Z(s_j)|^\frac12}$.
```{r, squre-cloud, fig.cap="Square-Root-Differences Cloud in the East-West Direction"}
df <- get_df(coalash, function(x){return(sqrt(abs(x)))})
p <- ggplot(df, aes(x = as.factor(h), y = z)) +
geom_boxplot(aes(fill = h, color = h, alpha = 0.7), outlier.alpha = 0.7) +
theme_light() +
labs(x = "h", y = ("Square-Root-Differences"),
title = "Square-Root-Differences Cloud in the East-West Direction") +
theme(plot.title = element_text(hjust = 0.5))
ggplotly(p)
```
**Pocket Plot**
The Pocket Plot tries to identify a localized area as being atypical with respect to a stationary mode. We plotting Pocket Plot \@ref(fig:pocket) in the North-South Direction follows the steps as below:
- Get $\{|N_{jk}|\}$ matrix (symmetric).
- Denote $|N_{jk}|$ as the value of row $j$ and column $k$ of the matrix
- $N_{jk} = \{(s_j, s_k):\ s_j \in row_j, s_k \in row_k, x_{s_j} = x_{s_k}\}$
- $|N_{jk}|$ means total pair number between row $j$ and $k$
- Get $\{Y_{jk}\}$ matrix (symmetric).
- For there are 23 distinct value in y axis, the matrix has the dim of $23\times 23$
- Denote $Y_{jk}$ as the value of row $j$ and column $k$ of the matrix, we have $Y_{jk} = \frac{1}{N_{jk}}\displaystyle\sum_{N_{jk}}|z(s_j) - z(s_k)|^{\frac12}$
- Calculate $Y_h$ vector.
- $Y_h$ is the weighted mean of $Y_{jk}$ where $|j - k| = h$
- Calculate $P_{jk} = Y_{jk} - Y_h$
- Plot the boxplot.
- x axis: index $j$
- y axis: $P_{jk}$.
- There should be 23 points of each columns.
```{r}
get_Y_jk <- function(coalash, j, k){
df_j <- coalash %>% filter(y == j) %>% select(x, coalash)
df_k <- coalash %>% filter(y == k) %>% select(x, coalash)
df <- merge(df_j, df_k, by = "x") %>%
mutate(dif = sqrt(abs(coalash.x - coalash.y)))
res <- mean(df$dif)
num <- nrow(df)
return(list(res, num))
}
```
```{r}
is_outlier <- function(x) {
return(x < quantile(x, 0.25, na.rm = TRUE) - 1.5 * IQR(x, na.rm = TRUE) | x > quantile(x, 0.75, na.rm = TRUE) + 1.5 * IQR(x, na.rm = TRUE))
}
```
```{r}
# get Y_jk, N_jk matrix
row_len <- length(unique(coalash$y))
Y_jk <- matrix(nrow = row_len, ncol = row_len)
N_jk <- matrix(nrow = row_len, ncol = row_len)
for(i in 1:row_len){
for(j in 1:row_len){
jk_info <- get_Y_jk(coalash, i, j) # func `get_Y_jk` def hidden
Y_jk[i,j] <- jk_info[[1]]
N_jk[i,j] <- jk_info[[2]]
}
}
colnames(Y_jk) <- 1:23
colnames(N_jk) <- 1:23
```
```{r}
# colnames(df_N_jk): "j", "k", "N_jk"
df_N_jk <- as.data.frame(as.table(N_jk)) %>%
mutate_if(is.factor, as.integer) %>%
rename(j = 1, k = 2, N_jk = 3)
# colnames(df_Y_jk): "j", "k", "Y_jk", "h", "N_jk"
df_Y_jk <- as.data.frame(as.table(Y_jk)) %>%
mutate_if(is.factor, as.integer)%>%
transform(h = abs(Var1 - Var2)) %>%
rename(j = 1, k = 2, Y_jk = 3) %>%
left_join(df_N_jk, by = c('j', 'k')) %>%
filter(h > 0)
# colnames(df_Y_h): "h", "N_h", "Y_h"
df_Y_h <- df_Y_jk %>% group_by(h) %>%
summarise(N_h = n(), Y_h = sum(Y_jk)/max(N_h))
# colnames(df_P_jk): "h", "j", "k", "Y_jk", "N_jk", "N_h", "Y_h", "P_jk", "outlier"
df_P_jk <- merge(df_Y_jk, df_Y_h, by = "h") %>%
transform(P_jk = Y_jk - Y_h) %>%
group_by(j) %>%
mutate(outlier = ifelse(is_outlier(P_jk), k, as.numeric(NA)))
```
```{r, pocket, fig.cap="Pocket Plot in the North-South Direction"}
# Pocket plot
p <- ggplot(df_P_jk, aes(x = as.factor(j), y = P_jk)) +
geom_boxplot(outlier.colour = 'red') +
geom_text(aes(label = outlier), check_overlap = TRUE, nudge_x = 0.5) +
theme_light() +
labs(x = "row number", y = TeX("$P_{j\ k}$"),
# labs(x = "row number", y = ("P_jk"),
title = "Pocket Plot in the North-South Direction") +
theme(plot.title = element_text(hjust = 0.5))
p
```
### Decomposition
The goal is to deompose data into large- and small-scale variation.
> **Large-Scale Variation**: The grand, column, and row effects.
> **Small-Scale Variation**: The residual.
- Like ANOVA, $F$ test within-rows and within-columns is first raised. However, it suffer from correlation among data points.
- Another additive decomposition follows the idea of $data = all + row + column + residual$. Then for value of a location $(i,j)$ can be decomposed as $Y_{ij} = Y_{..} + (Y_{i.}-Y_{..}) + (Y_{.j}-Y_{..}) + (Y_{ij}-Y_{i.}-Y_{.j}+Y_{..})$, where a subscripted dot (·) denotes averaging over that subscript. This can be settled by _median polishing_ method.
**Median Polishing** \
Successively sweeps medians out of rows, then columns, then rows, then columns, and so on, accumulating them in "row," "column," and "all" registers, and leaves behind the table of residuals.
## Stationary Process
The concept of _intrinsic stationarity_ has been introduced in section \@ref(main-concepts). Second-order (weak) stationarity is addressed in this section. It should be noticed that intrinsic stationarity is weaker than second-order stationarity.
> **Intrinsic Stationarity** \
Based on the random process mentioned above, intrinsic stationarity is defined through first differences
\begin{equation}
\begin{aligned}
&E\{Z(s+h) - Z(s)\} = 0,\\
&Var\{Z(s+h) - Z(s)\} = 2\gamma(h).
\end{aligned}
(\#eq:stationary2)
\end{equation}
> **Second-Order (Weak) Stationarity** \
When the random process meets the following assuptions, the process is second-order (weak) stationary.
\begin{equation}
\begin{aligned}
&E(Z(s)) = 0, \ \forall s \in D \\
&Cov(Z(s_i), Z(s_j)) = C(s_i - s_j), \ \forall s_i, s_j \in D
\end{aligned}
(\#eq:stationary)
\end{equation}
Other important concepts related to stationary process as listed below. \
- **Covariogram**: Function $C(.)$ in \@ref(eq:stationary).
- **Isotropy**: When $C(s_i - s_j)$ only depends on $\Vert s_i - s_j\Vert$ but not the direction of $s_i - s_j$, then $C(.)$ is isotropic.
- **Anisotropy**: Contrary to isotropy, when $C(s_i - s_j)$ depends on both $\Vert s_i - s_j\Vert$ and the direction of $s_i - s_j$, then $C(.)$ is anisotropic.
- Caused by underlying physical process evolving differentlly in space.
- The Euclidean space is not appropriate for measuring distance between locations.
- **Ergodicity**: The property which allows expectations over the $\Omega \in \mathbb{R}^d}$ space to be estimated by spatial averages.
- **Strong Stationarity**: Strong stationarity refers to the property that the distribution of any finite sample of $Z(.)$ is independent of locations (translation invariant).
### Variogram & Covariogram {#vc}
Recall section \@ref(main-concepts), variogram is defined as $2\gamma(h) = Var(Z(s+h) - z(s)),\ \forall s, s+h\in D$. Some exploratory analysis of variogram has been seen in the example. Some related concepts are listed below.
- **Semiv-variogram**: $\gamma(h) = \frac12 Var(Z(s+h) - z(s))$.
- **Relative Variogram**: Variogram of transformed data. Sometimes, simple transformations lead to intrinsic stationary processes.
- **Nugget Effect**: $c_0$ is named as nugget effect, where $c_0 = \lim_{h\to 0}\gamma(h)$.
- When $h\to 0$, $Var(Z(s+h) - Z(s))\to Var(0)$, which should be exactly 0. This implies that nuggest effect should be 0 if assumptions are all met.
- Causes of $c_0 > 0$:
- Measurement error;
- Microscale variation.
- Estimation: Extrapolating the variogram estimates fro lags closest to zero. for practically, it is impossible to calculate variogram with scale less than $\min \left\{\left\|s_{i}-s_{j}\right\|, 1 \leq i \leq j \leq n\right\}$.
- **Still**: The quantity $C(0)$ is called still. $C(0) - c_0$ is called the partial still.
Here are some other properties related to variogram and covariogram.
- Variogram is symmetric: $\gamma(h) = \gamma(-h),\ \gamma(0) = 0$.
- Speed of increase: $\lim _{|h| \rightarrow \infty} \frac{2 \gamma(h)}{|h|^{2}}=0$, which means that it is necessary for a variogram to increase slower than $|h|^2$.
- Under stationarity, the covariance function $C(h)$ is necessarily positive definite.
- Under intrinsic stationarity, variograms must be conditionally negative-definite.
### Estimation of Variogram & Covariogram
#### Estimator of Variogram
Considering the definition of variogram, the most direct solution is moment estimator $$2 \hat{\gamma}(h) \equiv \frac{1}{|N(h)|} \sum_{N(h)}\left(Z\left(s_{i}\right)-Z\left(s_{j}\right)\right)^{2}$$ where $N(h) = \{(s_i, s_j): s_i - s_j = h\}$
Generally speaking, the variogram estimator can be expressed as a quadratic form \@ref(eq:qestimator)
\begin{equation}
2\hat \gamma (h) = Z'A(h)Z
(\#eq:qestimator)
\end{equation}
Denote $Var(Z)=\Sigma$, then the means and variance of $2\gamma(h)$ can be calculated as:
\begin{equation}
\begin{aligned} E(2 \hat{\gamma}(h)) &=\operatorname{trace}(A(h) \Sigma) \\ \operatorname{var}(2 \hat{\gamma}(h)) &=2 \operatorname{trace}(A(h) \Sigma A(h) \Sigma) \end{aligned}
(\#eq:matrixestimator)
\end{equation}
#### Estimator of Covariogram
For Covariogram, estimators are in line with basic ideas of variogram estimators. In the same way, the most direct thought of moment estimator is as Formula \@ref(eq:mcov-estimator)
\begin{equation}
\hat{C}(h) \equiv \frac{1}{|N(h)|} \displaystyle\sum_{N(b)}\left(Z\left(s_{i}\right)-\bar{Z}\right)\left(Z\left(s_{j}\right)-\bar{Z}\right)
\end{equation}
where $\bar{Z}=\sum_{i=1}^{n} Z\left(s_{i}\right) / n$.
### Validity
Not all functions can be variograms or covariogram. There are some prerequisites ensuring validity of them.
#### Valid Variogram
- Under intrinsic stationarity, variogram must be conditionally negative definite, which means \@ref(eq:valid-variogram-cnd)
\begin{equation}
\sum_{i, j} a_{i} a_{j} 2 \gamma\left(s_{i}-s_{j}\right) \leq 0$ for $\sum a_{i}=0
(\#eq:valid-variogram-cnd)
\end{equation}
- Continuous conditionally negative definite function corresponds to an intrinsically stationary stochastic process.
- Valid variogram forms a convex cone.
- Theorem \@ref(thm:valid-variogram) provides us with several ways to check variogram validity.
::: {.theorem #valid-variogram}
If $2 \gamma(.)$ is continuous on $\mathbb{R}^{d}$ and $\gamma(0)=0$, then the following are equivalent:
1. $2 \gamma(.)$ is conditionally negative definite; \
2. For all $a>0$, $\exp (-a \gamma(.))$ is positive definite. \
3. $2 \gamma(h)=Q(h)+\int \frac{1-\cos \left(\omega^{\prime} h\right)}{\|\omega\|^{2}} G(d \omega)$, where $Q(.) \geq 0$ is a quadratic form and $G(.)$ is a positive, symmetric measure continuous at the origin that $\int\left(1+\|\omega\|^{2}\right)^{-1} G(d \omega)<\infty$.
:::
#### Valid Covariogram
- Under second-order stationarity, the covariance function $C(h)$ is necessarily positive definite. That is, for any real numbers $a_1,...,a_n$ and locations $s_1,...,s_n$,
\beegin{equation}
\operatorname{Var}\left(\sum_{i=1}^{n} a_{i} Z\left(s_{i}\right)\right)=\sum_{i, j} a_{i} a_{j} C\left(s_{i}-s_{j}\right) \geq 0
(\#eq:valid-covariogram)
\end{equation}
- Spectral representation form
- Accoding to Theorem \@ref(thm:valid-covariogram), a continuous function $C(h)$ is positive-definite if and only if it has a spectral representation $C(h)=\int \cos \left(\omega^{\prime} h\right) G(d \omega)$, where $G$ is a positive bounded symmetric measure.
- Spectral density and distribution
- $G/C(0)$ is the spectral distribution function
- If $G(d \omega)=g(\omega) d \omega$, $g/C(0)$ is the spectral density function.
::: {.theorem #valid-covariogram}
Theorem For any normalized continuous positive-definite function $f$ on $G$ (normalization here means that $f$ is 1 at the unit of $G$ ), there exists a unique probability measure $\mu$ on $\widehat{G}$ such that
$$
f(g)=\int_{\widehat{G}} \xi(g) d \mu(\xi)
$$
:::
## Variogram Model Fitting
Considering variogram validity prerequisites, variogram estimators presented in section \@ref(estimator-of-variogram) can not be used directly for spatial prediction. The key idea of variogram model fitting is to search for a valid variogram that is closest to the data.
When there are more than one valid model, probably from method illustrated from section \@ref(maximum-likelihood-estimator) to \@ref(least-squares-esimator), **corss validation** is an important tool for model checking.
### EDA Check Before Model Fitting
- Check assumptions such as constant mean, constant marginal variance, stationarity, intrinsic stationarity;
- Check the Gaussian assumption;
- If the data is approximately normally distributed, modelling the first two moments are quite enough.
- If some evidence of non-Gaussianity is found, the variogram estimator defined in section \@ref(estimator-of-variogram) is still valid, because they can be viewed as moment estimators which are independent of Gaussian assumption.
- Isolate suspicious observations for further study;
- Choose a appropriate parametric variogram model.
From now on, we consider
### Maximum Likelihood Estimator
- Assumptions:
- Suppose the data is generated from a stationary Gaussian process with a parametric variogram model $P=\{2 \gamma: 2 \gamma(h)=2 \gamma(h ; \theta) ; \theta \in \Theta\}$. Then the goal is to estimate $\theta$.
- Suppose we also observed some covariates $X_1,...,X_n$ which are linearly related to $Z$.
- Model: $Z=\left(Z\left(s_{1}\right), \ldots, Z\left(s_{n}\right)\right) \sim N(X \beta, \Sigma(\theta))$ where $\Sigma(\theta)=\left(\operatorname{cov}\left(Z\left(s_{i}\right), Z\left(s_{j}\right)\right)\right)$.
- Goal: find parameter estimation $(\hat \beta, \hat \theta)$ to minimize loglikelihood function \@ref(eq:loglikelihood).
- Where there is no spatial correlation, this amounts to a simple linear regression model. The solution is $\Sigma(\theta)=\sigma^{2} I$ and $\theta=\sigma^{2}.$
- General solution: $\hat{\sigma}^{2}=\sum\left(Z\left(s_{i}\right)-X_{i} \hat{\beta}\right)^{2} / n$
- $\hat{\sigma}^{2}$ is biased.
- For $E\left(\hat{\sigma}^{2}\right)=(n-q) / n \sigma^{2}$, the larger $q$ is, the more biased the estimator gets.
\begin{equation}
L(\beta, \theta)=\frac{n}{2} \log (2 \pi)+\frac{1}{2} \log |\Sigma(\theta)|+\frac{1}{2}(Z-X \beta)^{\prime} \Sigma(\theta)^{-1}(Z-X \beta)
(\#eq:loglikelihood)
\end{equation}
- Improvements: To de-bias $\hat \theta$, which is $\hat{\sigma}^{2}$ is out case, restricted maximum likelihood estimator (EMLE) is raised.
- Estimator form: $\hat{\theta}=\int_{\beta} \exp (-L(\beta, \theta))(d \beta)$
Once variogram is estimated, then covariogram can be easily get by $2\gamma(h) = 2(C(0) - C(h))$
### MINQ Estimation
Applicable Situation: the variance matrix of the data is linear in its parameters, i.e. $\Sigma(\theta)=\theta_{1} \Sigma_{1}+\ldots, \theta_{m} \Sigma_{m}$. The MINQ approach is particularly appropriate for a variance components model.
#### Least Squares Esimator
The main idea of Least Squares Esimator (LSE) is the same as least square regression. The goal function to minimize is the square error.
\begin{equation}
\sum_{j=1}^{K}\left\{2 \gamma^{\#}(h(j) e)-2 \gamma(h(j) e ; \theta)\right\}^{2}
(\#eq:lse)
\end{equation}
If we not $2 \boldsymbol{\gamma}^{\#}=\left(2 \gamma^{\#}(h(1)), \ldots, 2 \gamma^{\#}(h(K))\right)$ with covariance matrix $\operatorname{var}\left(2 \boldsymbol{\gamma}^{\#}\right)=V$, the goal to minimize is tranformed into
\begin{equation}
\left(2 \boldsymbol{\gamma}^{\#}-2 \boldsymbol{\gamma}(\theta)\right)^{\prime} V^{-1}\left(2 \boldsymbol{\gamma}^{\#}-2 \boldsymbol{\gamma}(\theta)\right)
(\#eq:lse2)
\end{equation}
- Notice that although a direction vector $e$ is included in formula \@ref(eq:lse), multiple directions could also be accounted for by adding the appropriate squared differences.
- Drawbacks: The least squares estimator takes no cognizance of the distributional variation and covariation of the generic estimator.
- Improvement:
- Weighted Least Squares (WLS): Replace $V$ by a diagonal matrix $\left.\Delta=\operatorname{diag}\left(\operatorname{var}\left(2 \gamma^{\#}(h(1))\right), \ldots, 2 \gamma^{\#}(h(K))\right)\right)$.
- Generalized Least Squares (GLS): use just the (asymptotic) second-order structure of the variogram estimator and does not make assumptions about the whole distribution of the data. When there is little knowledge of the true model, GLS is a good startingg choice.