-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathstats_12_bayes_factors.Rmd
178 lines (115 loc) · 10.2 KB
/
stats_12_bayes_factors.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
---
title: "Bayes Factors"
author: "Marius 't Hart"
output:
pdf_document: default
word_document: default
html_notebook: default
html_document:
df_print: paged
---
```{r setup, cache=FALSE, include=FALSE}
library(knitr)
opts_chunk$set(comment='', eval=FALSE)
```
Recently the Null-Hypothesis Testing Statistical framework (NHST) has received some bad press, and part of the scientific community is pushing for Bayesian statistics instead. We should at least learn the basics and be able to read papers that use them. In this tutorial we will explore one of the simpler ones: Bayes Factors.
However, a p-value expresses the probability of observing this data given that the null-hypothesis is true. This is a somewhat problematic statistic because the definition does not include what we actually care about: the alternative hypothesis. What we'd like to know is: the probability that our alternative hypothesis is true given the data.
Bayes Factors do use a null hypothesis and an alternative hypothesis as well, and it seems to me they will usually be related to NHST stats.
# Bayes' theorem
First, let's look at the classic Bayes formula. It goes like this:
$p(A) \cdot \frac{p(B|A)}{p(B)} = p(A|B)$
That is: the prior ($p(A)$) multiplied by the likelihood equals the posterior ($p(A|B)$). The prior is our prior belief in A: the probability ($p()$) with which we think A is true, before seeing any evidence. The posterior is the probability that A _given_ B ($A|B$), that is, it represents our updated belief that A is true, after taking into account some new data (B).
The likelihood term is a little more complex. Sometimes you will not see the normalization by dividing by the probability of B occurring regardless of anything else (its' "base rate"). I'm not always sure why, but it could be because it is fully unknown, or because it makes no sense in the experiment.
Either way, the whole term gives you the probability of observing the data (B) given the hypothesis (A), normalized by the base rate of B's occurrence. So if B is very rare, but happens a lot in your experiment (and this is predicted by your hypothesis), then your belief in the hypothesis will be increased. However, if B is very common, or occurs at it's base rate, nothing much may change in your degree of belief in A. Your belief in A can also decrease, if B occurs less often than it's base rate.
So this normalized likelihood term expresses the strength of the evidence that the data provides for the hypothesis: how much does it change our belief in the hypothesis? Note that our posterior belief is hard to get to 100% and that it also depends on our prior belief. Unlikely claims require very strong evidence!
# Bayes Factor
It seems to me that the Bayes' Factor is highly related to this normalized likelihood term. Here's a formula that includes the Bayes Factor, prior times Bayes Factor equals posterior:
$\frac{p(H_1)}{p(H_0)} \cdot \frac{p(D | H_1)}{p(D | H_0)} = \frac{p(H_1 | D)}{p(H_0 | D)}$
Here, we weight two hypotheses that might sound familiar: the null hypothesis ($H_0$) and an alternative hypothesis ($H_1$). By the way, if we weight the alternative hypothesis over the null hypothesis (as above) then we have a $BF_{10}$, if we go the other way around we have a $BF_{01}$. The two can be very simply converted into each other, so don't let that confuse you:
$BF_{10} = \frac{1}{BF_{01}}$
Notice that a lot of why it is hard to learn NHST is that in NHST we calculate p-values: the probability of observing the data given that the null-hypothesis is true (as I said before). But who cares? We don't need probabilities of observing data... we need probabilities of a hypothesis being true or false. More specifically, we also don't care about the null-hypothesis at all. Instead, we want to know if the alternative hypothesis is true given the data, right? Now look at the posterior of the above formula. It weights the probability of the alternative hypothesis and the null hypothesis being true given the data. How awesome is that!
The fraction $\frac{p(D | H_1)}{p(D | H_0)}$ is the Bayes Factor and basically it is a number that tells you how much you have to adjust your prior beliefs based on the new data to get your posterior belief. And even if you can't really put a number on your prior belief, you can often still calculate the Bayes Factor. In that case, you can't tell people what their posterior belief should be, but you can tell them which of two hypotheses is favoured by your data.
The Bayes Factor is a number between 0 and $\infty$, and when it is 1, the data supports both hypotheses equally well (your posterior beliefs would be equal to your prior beliefs). So in general, when you find that, your data does not allow making a decision. You should still report it, but then perhaps do another experiment that approaches it in a different way.
Here is how people often interpret the values of Bayes Factors:
| Bayes Factor | What it means |
|--------------|-----------------------------|
| > 100 | Extreme evidence for H1 |
| 30 - 100 | Very strong evidence for H1 |
| 10 - 30 | Strong evidence for H1 |
| 3 - 10 | Moderate evidence for H1 |
| 1 - 3 | Anecdotal evidence for H1 |
| 1 | No evidence |
| 1/3 - 1 | Anecdotal evidence for H0 |
| 1/10 - 1/3 | Moderate evidence for H0 |
| 1/30 - 1/10 | Strong evidence for H0 |
| 1/100 - 1/30 | Very strong evidence for H0 |
| < 1/100 | Extreme evidence for H0 |
We need some actual Bayes Factors to see what this would mean.
# A simple example
For this we will use some data from [Gastrock et al., 2020](https://doi.org/10.1038/s41598-020-76940-3) from a control group, and we simplify the data a bit.
```{r}
RAE <- read.csv('data/Gastrock_etal_2020_control_nocursor.csv', stringsAsFactors = F)
RAE <- aggregate(cbind(aligned, exclusive, inclusive) ~ participant, data=RAE, FUN=mean)
```
This data is from Figure 5, and has the deviations of no-cursor reaches from the target (in degrees angle) for the control group (in orange), with the reach deviations from the baseline session in the `aligned` column, and after adaptation the reaches without strategy in the `exclusive` column and those with strategy in the `inclusive` column:
```{r}
str(RAE)
```
And let's have some quick summary stats:
```{r}
summary(RAE)
```
Aligned responses are closer to zero, and exclusive and inclusive are very similar, but possibly inclusive is a bit higher than exclusive. This should also be visible in a summary plot:
```{r}
boxplot(RAE[c('aligned','exclusive','inclusive')])
```
We will use the appropriately named `BayseFactor` package to test two questions, and compare it with what a t-test would say.
```{r}
#install.packages('BayesFactor')
library('BayesFactor')
```
First we want to see if, even "without strategy" the reach deviations have changed relative to baseline. The first thing people would do is run a t-test:
```{r}
t.test(RAE$aligned, RAE$exclusive, paired=T)
```
This shows a difference, so the training has an effect. Let's see what the Bayes factor is:
```{r}
ttestBF(RAE$aligned, RAE$exclusive, paired=T)
```
That really large number (309678218) is the Bayes Factor and it clearly shows that there was a difference between the two sessions. Using the table above, we could report this as:
> _"There is extreme evidence for an effect of training session on reach deviations ($BF_{10} = 3.1 \cdot 10^8$)."_
Let's also see if there is a difference between inclusive and exclusive reach deviations:
```{r}
t.test(RAE$inclusive, RAE$exclusive, paired=T)
```
Would you say that is a "trend"? We shouldn't. In NHST, you have to stick to the alpha level (assumed to be 0.05 unless otherwise specified), so what this means is that we can not reject the null-hypothesis. But this is rather indecisive: does that mean that the reach deviations are the same in the two conditions? (i.e. the null-hypothesis is true) or does it mean that there is some, albeit weak, evidence for the alternative hypothesis?
Let's see what the Bayes Factor says.
```{r}
ttestBF(RAE$inclusive, RAE$exclusive, paired=T)
```
The Bayes Factor is 1.0257 here... that's pretty much 1, so we could conclude:
> _"This data provides little to no evidence to say if there is a difference between include and exclude reach deviations ($BF_{10}=1.03$)."_
# Confirming the null-hypothesis
Of course, a cooler example would be if the data actually supports the null hypothesis. I don't have example data that shows this, so we'll cheat:
```{r}
zero <- RAE$aligned - mean(RAE$aligned)
print(zero)
```
The vector `zero` should now have a mean of exactly 0, so that the null-hypothesis has to be true.
A t-test however, will simply _not reject the null hypothesis_ which is different from confirming it:
```{r}
t.test(zero)
```
If the p-value is below alpha (0.05) we _reject_ the null hypothesis, but we can only either reject the null hypothesis or not reject it.
But a Bayes Factor would say there actually is evidence _for_ the null-hypothesis:
```{r}
ttestBF(zero)
```
This Bayes Factor of 0.23 is in between 1/10 and 1/3, so we could say this:
> _"There is moderate evidence that the vector has a mean of zero ($BF_{10}=0.23$)"_
The reason the Bayes Factor isn't lower is that there are "only" 20 observations with still quite a lot of variance.
Classical statistics can't make the distinction between the two situations: "no evidence" vs. "null-hypothesis is true", but Bayes Factors can! If we look around in the Bayes Factor package it also has functions to get Bayes Factors for ANOVA-like situations, which are a little more complicated, but spit out the same Bayes Factors that can be interpreted the same way.
# You have options
There are different kind of options in `ttestBF()` compared to your regular `t.test()` that have to do with the specifics of Bayesian statistics. For example, you can specify a prior instead using the default one, or get samples from the posterior distribution.
# JASP
Perhaps it is easier to first try some Bayesian statistics in a nice GUI. [JASP](https://jasp-stats.org) is free, open source software that implements many NHST analyses along with Bayesian versions of them. They also provide many guides on [this page](https://jasp-stats.org/jasp-materials/), including data sets to work through examples.