-
Notifications
You must be signed in to change notification settings - Fork 8
/
notes-19_gaussian-processes_bda3-21.Rmd
151 lines (113 loc) · 5.43 KB
/
notes-19_gaussian-processes_bda3-21.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
# Section 19. Notes on 'Ch 21. Gaussian process models'
2022-01-13
```{r setup, warning=FALSE, message=FALSE}
knitr::opts_chunk$set(echo = TRUE, dpi = 300, comment = "#>")
library(glue)
library(ggtext)
library(tidyverse)
theme_set(
theme_bw() +
theme(axis.ticks = element_blank(), strip.background = element_blank())
)
```
> These are just notes on a single chapter of *BDA3* that were not part of the course.
## Chapter 21. Gaussian process models
- *Gaussian process* (GP): "flexible class of models for which any finite-dimensional marginal distribution is Gaussian" (pg. 501)
- "can be viewed as a potentially infinite-dimensional generalization of Gaussian distribution" (pg. 501)
### 21.1 Gaussian process regression {-}
- realizations from a GP correspond to random functions
- good prior for an unknown regression function $\mu(x)$
- $\mu \sim \text{GP}(m,k)$
- $m$: mean function
- $k$: covariance function
- $\mu$ is a random function ("stochastic process") where the values at any $n$ pooints $x_1, \dots, x_n$ are drawn from the $n-dimensional$ normal distribution
- with mean $m$ and covariance $K$:
$$
\mu(x_1), \dots, \mu(x_n) \sim \text{N}((m(x_1), \dots, m(x_n)), K(x_1, \dots, x_n))
$$
- the GP $\mu \sim \text{GP}(m,k)$ is nonparametric with infinitely many parameters
- the mean function $m$ represents an inital guess at the regression function
- the covariance function $k$ represents the covariance between the process at any two points
- controls the smoothness of realizations from the GP and degree of shrinkage towards the mean
- below is an example of realizations from a GP with mean function 0 and the *squared exponential* (a.k.a. exponentiated quadratic, Gaussian) covariance function with different parameters
$$
k(x, x^\prime) = \tau^2 \exp(-\frac{|x-x^\prime|^2}{2l^2})
$$
```{r}
squared_exponential_cov <- function(x, tau, l) {
n <- length(x)
k <- matrix(0, nrow = n, ncol = n)
denom <- 2 * (l^2)
for (i in 1:n) {
for (j in 1:n) {
a <- x[i]
b <- x[j]
k[i, j] <- tau^2 * exp(-(abs(a - b)^2) / (denom))
}
}
return(k)
}
my_gaussian_process <- function(x, tau, l, n = 3) {
m <- rep(0, length(x))
k <- squared_exponential_cov(x = x, tau = tau, l = l)
gp_samples <- mvtnorm::rmvnorm(n = n, mean = m, sigma = k)
return(gp_samples)
}
tidy_gp <- function(x, tau, l, n = 3) {
my_gaussian_process(x = x, tau = tau, l = l, n = n) %>%
as.data.frame() %>%
as_tibble() %>%
set_names(x) %>%
mutate(sample_idx = as.character(1:n())) %>%
pivot_longer(-sample_idx, names_to = "x", values_to = "y") %>%
mutate(x = as.numeric(x))
}
set.seed(0)
x <- seq(0, 10, by = 0.1)
gp_samples <- tibble(tau = c(0.25, 0.5, 0.25, 0.5), l = c(0.5, 0.5, 2, 2)) %>%
mutate(samples = purrr::map2(tau, l, ~ tidy_gp(x = x, tau = .x, l = .y, n = 3))) %>%
unnest(samples)
gp_samples %>%
mutate(grp = glue("\u03C4 = {tau}, \u2113 = {l}")) %>%
ggplot(aes(x = x, y = y)) +
facet_wrap(vars(grp), nrow = 2) +
geom_line(aes(color = sample_idx)) +
scale_x_continuous(expand = expansion(c(0, 0))) +
scale_y_continuous(expand = expansion(c(0.02, 0.02))) +
scale_color_brewer(type = "qual", palette = "Set1") +
theme(legend.position = "none", axis.text.y = element_markdown()) +
labs(x = "x", y = "\u03BC(x)")
```
#### Covariance functions {-}
- "Different covariance functions can be used to add structural prior assumptions like smoothness, nonstationarity, periodicity, and multi-scale or hierarchical structures." (pg. 502)
- sums and products of GPs are also GPs so can combine them in the same model
- can also use "anisotropic" GPs covariance functions for multiple predictors
#### Inference {-}
- computing the mean and covariance in the $n$-variate normal conditional posterior for $\tilde{\mu}$ involves a matrix inversion that requires $O(n^3)$ computation
- this needs to be repeated for each MCMC step
- limits the size of the data set and number of covariates in a model
#### Covariance function approximations {-}
- there are approximations to the GP that can speed up computation
- generally work by reducing the matrix inversion burden
### 21.3 Latent Gaussian process models {-}
- with non-Gaussian likelihoods, the GP prior becomes a latent function $f$ which determines the likelihood $p(y|f,\phi)$ through a link function
### 21.4 Functional data analysis {-}
- *functional data analysis*: considers responses and predictors not a scalar/vector-valued random variables but as random functions with infinitely-many points
- GPs fit this need well with little modification
### 21.5 Density estimation and regression {-}
- can get more flexibility by modeling the conditional observation model as a nonparametric GP
- so far have used a GP as a prior for a function controlling location or shape of a parametric observation model
- one solution is the *logistic Gaussian process* (LGP) or a Dirichlet process (covered in a later chapter)
#### Density estimation {-}
- LGP generates a random surface from a GP and then transforms the surface to the space of probability densities
- with 1D, the surface is just a curve
- use the continuous logistic transformation to constrain to non-negative and integrate to 1
- there is illustrative example in the book on page 513
#### Density regression {-}
- generalize the LPG to density regression by putting a prior on the collection of conditional densities
#### Latent-variable regression {-}
- an alternative to LPG
---
```{r}
sessionInfo()
```