-
Notifications
You must be signed in to change notification settings - Fork 0
/
Chapter 02.Rmd
374 lines (280 loc) · 13.4 KB
/
Chapter 02.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
orterio---
title: "Jonas Chapter 2"
output:
html_document: default
html_notebook: default
---
date: 2017-08-08
moderator: Karl
# Chapter 2 summary
* **Does nature use bayesian?**: animals don't go fully bayesian, instead they use heuristics [= solutions to problems attempting to be good and cheap for immediate goals (instead of optimal)]. Think of heuristics as models that focus on the main predictors, while ignoring some noisy ones. The benefit over a 'full'-model is that you have to sense less, and it protects from overfitting [see Joao's work on evolution of sensing in changing environments]. As long as we don't know anything about the system, e.i. which predictors contain most information, going full bayesian is a great start.
* **Information Entropy**: We still don't understand the details of entropy maximising functions. But we are starting to understand the broader idea: If *x* is a characteristic of a system [like the result of a coin-flip] and if *f(x)* is a function to describe what *x* is, then the shape of *f(x)* will be of an entropy maximising function when it contains the least information about *x*. On the other hand, when there is information about x [e.i. when then coin is not fair], then *f(x)* will be a function that is biased. In modelling, our main interest is to tell apart the biased part of *f(x)* from the random noise that is contained in the entropy maximising part of *f(x)*. We use the information of the biased part to estimate the model coefficients, and then use the remaining random noise to estimate the variance in the model.
* **Intuitive Likelihood definition**: The likelihood is the relative number of ways in which different conjectures of the model can produce the data. It is simple counting: For each conjecture of the model [= set of parameters of the model] count the different data it can produce, then eliminate the ways that are not consistent with the data. Do this for all conjectures of the model, and then divide by those counts by the total number of counts in which any conjecture can produce any data.
* **Priors = regularization**: Priors in a bayesian model is the relative number of ways in which any of the conjectures is possible before the data has happened. For instance, you might know that there are more black marbles than white in the factory, and so conjectures with more black marbles are more plausible. Conceptually, this is simular to regularization in regression analysis, where you say that models with small parameter values [and small numbers of parameters] is what you want [because you want to reduce overfitting] and so you penalise complex models. Regularization is somehow arbitrary in how you set up the penalising system [linear penalty, or quadratic, or on/off], but it is rigorous when it optimises the penalty parameters, because we can measure overfitting using corss-validation. We are interested in whether or not Bayesian does also rigorously test priors by evaluating the tendency to overfit.
* **Bayesian updating in meta-analysis**: Imagine 2 studies on the effect of drinking beer on body hight, one performed in Germany and one in Japan. Both studies are using a Bayesian model and give you the posterior distributions for the effect of beer on height in each population. You want to study the same effect, but in the entire world population. Would you use the posteriors of the 2 studies, multiply them and use them as your prior?
Here is why not:
* The 2 studies have a different design, different researchers collecting the data, and the models might differ, and so the distributions are 'small world' distributions, so they are wrong in the big world, and one is certainly better than the other
* It might be valuable to keep the separation to retain the information about asian and European populations
* Instead: you might want to use the point estimates from the 2 studies, add a wider variance to them to recognise your belief in those studies and this as a prior in your study.
# Chapter 2 - Exercises
## 2.4.1: Grid method to calculate the posterior
```{r}
# define the grid
p_grid <- seq(from=0, to=1, length.out = 40)
#define prior
prior <- rep(1, 40)
#visualize before bayesian update
plot( p_grid, prior, type="b", xlab="probabilities of water", ylab="prior probability")
mtext("40 points")
# compute likelihood over grid
likelihood <- dbinom(6, size=9, prob=p_grid)
# compute product of likelihood and prior
unstd.posterior <- likelihood * prior
# standardize the posterior, so it sums to 1
posterior = unstd.posterior / sum(unstd.posterior)
#visualize
plot( p_grid, posterior, type="b", xlab="probabilities of water", ylab="posterior probability")
mtext("40 points")
```
### with different priors
#### skewed
```{r}
prior <- ifelse ( p_grid < 0.5, 0, 1)
plot( p_grid, prior, type="b", xlab="probabilities of water", ylab="prior probability")
unstd.posterior <- likelihood * prior
posterior = unstd.posterior / sum(unstd.posterior)
plot( p_grid, posterior, type="b", xlab="probabilities of water", ylab="posterior probability")
```
#### weird
```{r}
prior <- exp( -5*abs(p_grid - 0.5) )
plot( p_grid, prior, type="b", xlab="probabilities of water", ylab="prior probability")
unstd.posterior <- likelihood * prior
posterior = unstd.posterior / sum(unstd.posterior)
plot( p_grid, posterior, type="b", xlab="probabilities of water", ylab="posterior probability")
```
## 2.4.2 Quadric approximation
`Map` stands for `Maximum a posteriori`, and is a fitting tool from the book's R package. To use `map`, you provide a _formula_, a list of _data_, and a list of _start_ values for the parameters. The formula defines the likelihood and the prior.
```{r}
library(rethinking)
globe.qa <- map(
alist(
w ~ dbinom(9,p), # binomial likelihood
p ~ dunif(0,1) # uniform prior
),
data = list(w=6)
)
```
```{r eval=TRUE, echo=TRUE}
# To prevent print output :{r eval=TRUE, echo=FALSE}
print(precis(globe.qa))
```
```{r}
print("Read as: Assuming the posterior is Gaussian, it is maximzed at `Mean`, and its standard deviation is `StdDev`", 3)
```
### Compare with the exact solution given by the beta distribution (why?)
```{r}
w <- 6
n <- 9
curve( dbeta(x , w+1, n-w+1), from=0, to=1)
curve( dnorm(x , 0.67, 0.16 ), lty=2, add=TRUE)
```
# Exercises
*2E1*
(1) Pr(rain|Monday)
*2E2*
(3) The probability that it is Monday, given that it is raining.
*2E3*
(1) Pr(Monday|rain)
*2E4*
That the most likely distribution of land and sea on earth is 30% and 70%.
What does it mean when we say that the probability of water in the globe experiment is 70%? In reality, we throw the globe and it will be water or land with 100% certainty, given the physics of the throw. Now, we don't understand the physics in all its details, but instead we try to make a 'good' guess by making a rought model of the mechanism; in this case the model uses only the information about how much water there is in the globe, and it ignores the parameters of the throw. To express that we are making a guess from this model, we use the term 'probability'.
*2M1*
plot posteriors for unbiased prior and different data for likely water distribution
```{r}
p_vals <- seq(from=0, to=1, length.out = 20)
prior <- rep(1, length(p_vals))
likelihood1 <- dbinom( x=3 , size=3 , prob=p_vals )
likelihood2 <- dbinom( x=3 , size=4 , prob=p_vals )
likelihood3 <- dbinom( x=5 , size=7 , prob=p_vals )
likelihood4 <- dbinom( x=700 , size=1000, prob=p_vals ) # [R] good idea
unstd.posterior1 <- prior * likelihood1
unstd.posterior2 <- prior * likelihood2
unstd.posterior3 <- prior * likelihood3
unstd.posterior4 <- prior * likelihood4
posterior1 <- unstd.posterior1 / sum(unstd.posterior1)
posterior2 <- unstd.posterior2 / sum(unstd.posterior2)
posterior3 <- unstd.posterior3 / sum(unstd.posterior3)
posterior4 <- unstd.posterior4 / sum(unstd.posterior4)
par(mfrow=c(1,4))
y2 = .2
plot(p_vals, posterior1, type='b', axes = TRUE, ylim=range(c(0,y2)))
plot(p_vals, posterior2, type='b', axes = TRUE, ylim=range(c(0,y2)))
plot(p_vals, posterior3, type='b', axes = TRUE, ylim=range(c(0,y2)))
plot(p_vals, posterior4, type='b')
```
*2M2*
now with step prior:
```{r}
step_prior = c(rep(0, length=(10)), rep(10, length=(10)))
# or step_prior <- ifelse( p_vals<0.5 , 0 , 1 )
step_prior = step_prior / sum(step_prior) # [R] not needed, right?
unstd.posterior1 <- step_prior * likelihood1
unstd.posterior2 <- step_prior * likelihood2
unstd.posterior3 <- step_prior * likelihood3
unstd.posterior4 <- step_prior * likelihood4
posterior1 <- unstd.posterior1 / sum(unstd.posterior1)
posterior2 <- unstd.posterior2 / sum(unstd.posterior2)
posterior3 <- unstd.posterior3 / sum(unstd.posterior3)
posterior4 <- unstd.posterior4 / sum(unstd.posterior4)
par(mfrow=c(1,4))
y2 = .2
plot(p_vals, posterior1, type='b', axes = TRUE, ylim=range(c(0,y2)))
plot(p_vals, posterior2, type='b', axes = TRUE, ylim=range(c(0,y2)))
plot(p_vals, posterior3, type='b', axes = TRUE, ylim=range(c(0,y2)))
plot(p_vals, posterior4, type='b')
```
*2M3*
Mars and Earth balls; L: Land, E: Earth, M: Mars
p(L|E) = .3
p(L|M) = 1
p(E), p(M) = .5
p(L) = 1/2 * (.3 + 1)
and
p(E|L) = p(L|E)*p(E) / p(L)
```{r}
p_e_l = .3*.5/(0.5*1.3)
print(p_e_l)
## or doing is analogous to above
p_prior <- c(0.5 , 0.5) # [Mars, Earth]
p_land <- c(1 , 0.3) # p that it is land given [Mars, Earth]
likelihood <- dbinom( x=1 , size=1 , prob=p_land)
posterior_nstr <- p_prior * likelihood
posterior <- posterior_nstr / sum( posterior_nstr )
print( "[Mars , Earth]")
print( posterior )
```
*2M4*
p(B|c1) = 1 # [R]: 2
p(B|c2) = .5 # [R]: 1
p(B|c3) = 0
p(c1,2,3) = 1/3
p(B) = 1/3 * (1 + .5 + 0) = .5
p(c1|B) = p(B|c1)*p(c1) / p(B)
= 1*1/3 / .5 = 2/3
*2M5*
now two c1 cards, hence:
p(c1,2,3) = 2/4, 1/4, 1/4
p(B) = 1/4 (1+1+.5+0) = 5/2*1/4 = 5/8
thus
p(c1|B) = 1*2/4 / (5/8) = 4/5
```{r}
card.1.likelihood <- 2
card.2.likelihood <- 1
card.3.likelihood <- 0
card.4.likelihood <- 2
likelihood <- c(card.1.likelihood, card.2.likelihood, card.3.likelihood, card.4.likelihood)
prior <- rep(x = 1, length = length(likelihood))
unstandardized.posterior <- prior * likelihood
posterior <- unstandardized.posterior / sum(unstandardized.posterior)
print(posterior)
```
*2M6*
now black card are heavy:
p(c1) = 1/6, p(c2) = 2/6, p(c3) = 3/6
thus p(c1|B) = 1*1/6 / p(w)
What's p(w) in this case?
I guess there are 12 paths to data:
c1 has one when facing 'up' or 'down',
c2 has one when facing 'up', and is drawn twice as often
c3 has none, and is drawn three times as often
Thus, 4 out of 12 paths lead to black, or, p(w) = 1/3
Hence: 1/6 / 1/3 = 3/6 = 0.5
```{r}
card.1.likelihood <- 2
card.2.likelihood <- 1
card.3.likelihood <- 0
likelihood <- c(card.1.likelihood, card.2.likelihood, card.3.likelihood)
prior <- c(1/6, 2/6, 3/6)
unstandardized.posterior <- prior * likelihood
posterior <- unstandardized.posterior / sum(unstandardized.posterior)
print(unstandardized.posterior )
print(1/6 / (4/6))
```
*2M7*
There are two ways c1 could make the first card black, and 1 way c2 could have done this.
Tracing paths back:
From second card 'white' there are two ways to c2, i.e. when c3 caused 'white' up, or down.
But there are 6 ways to c1, three for each way black was generated in the first card. Both times, one path from white back could come if c2 caused white, and two if c3 caused white.
Thus probability of black on the other side, i.e. c1: 6/8 total paths = 0.75.
```{r}
c1.likelihood = 2
c2.likelihood = 1
c3.likelihood = 0
likelihood = c(c1.likelihood, c2.likelihood, c3.likelihood)
prior = c(1,1,1)
unstd.posterior = likelihood*prior
print(unstd.posterior)
posterior = unstd.posterior / sum(unstd.posterior)
print(posterior)
prior = posterior
c1.likelihood = 3
c2.likelihood = 2
c3.likelihood = 0
likelihood = c(c1.likelihood, c2.likelihood, c3.likelihood)
unstd.posterior = likelihood*prior
print(unstd.posterior)
posterior = unstd.posterior / sum(unstd.posterior)
print(posterior)
```
*2H1*
```{r}
prior = c(1, 1)
speciesA.likelihood = .1
speciesB.likelihood = .2
likelihood = c(speciesA.likelihood, speciesB.likelihood)
# alternative [R]
p_birth <- c(0.1 , 0.2) # [A B]
likelihood <- dbinom( x=1 , size=1 , prob=p_birth)
unstd.posterior = prior*likelihood
posterior = unstd.posterior / sum(unstd.posterior)
print("this is the posterior probability of species membership:")
print(posterior)
print("multiplied with the probabilities of twins, and summed means the probability of twinse again is now higher than for species A and less than species B, but more than their average.")
print(sum(posterior*likelihood))
# alternative but the same [R]
print(sum(posterior*p_birth))
```
*2H2*
already got that: 1/3
*2H3*
```{r}
#from above
prior = posterior
speciesA.likelihood = .9
speciesB.likelihood = .8
likelihood = c(speciesA.likelihood, speciesB.likelihood)
unstd.posterior = prior*likelihood
twins_singletong_posterior = unstd.posterior / sum(unstd.posterior)
print("After giving birth to twins first, and then a singleton child,
the probability of the panda mom being of species A is:")
print(twins_singletong_posterior[1])
```
*2H4*
```{r}
prior = c(1,1)
speciesA.likelihood = .8
speciesB.likelihood = 1-.65
likelihood = c(speciesA.likelihood, speciesB.likelihood)
unstd.posterior = prior*likelihood
posterior = unstd.posterior / sum(unstd.posterior)
print('posterior from genetic test only')
print(posterior)
#now using child info (twins, singleton)
prior = twins_singletong_posterior
likelihood = c(speciesA.likelihood, speciesB.likelihood)
unstd.posterior = prior*likelihood
posterior = unstd.posterior / sum(unstd.posterior)
print('posterior including child info:')
print(posterior)
```