generated from statOmics/Rmd-website
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy path08_2_poison2_sol.Rmd
233 lines (167 loc) · 7.31 KB
/
08_2_poison2_sol.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
---
title: "Exercise 8.2: Non-additive linear model on the poison dataset"
author: "Lieven Clement, Jeroen Gilis and Milan Malfait"
date: "statOmics, Ghent University (https://statomics.github.io)"
---
# The poison dataset
In this experiment, 96 fish (dojofish, goldfish and zebrafish)
were placed separately in a tank with two liters of water and
a certain dose (in mg) of the poison EI-43,064. The resistance
of the fish against the poison was measured as the amount of
minutes the fish survived after being exposed to the poison (`Surv_time`, in
minutes). Additionally, the weight of each fish was measured.
# Goal
Suppose that researchers are ,mainly interested in studying the effect of
poison dose on the survival of fish. They know however that the weight can
also impact the survival and might also change the effect of the poison dose.
In this tutorial session we will focus on Dojofish and we will model the
survival time in function of the dose and the weight of the fish,
**and including an interaction between dose and weight.**
Load libraries
```{r, message=FALSE, warning=FALSE}
# install.packages("GGally")
library(GGally)
library(car)
library(multcomp)
library(tidyverse)
theme_set(theme_bw())
```
# Import the data
```{r, message=FALSE}
poison <- read_csv("https://raw.githubusercontent.com/statOmics/PSLSData/main/poison.csv")
```
# Data tidying
We can see a couple of things in the data that can be improved:
1. Capitalise the fist column name
2. Set the Species column as a factor
3. Change the species factor levels from 0, 1 and 2 to
Dojofish, Goldfish and Zebrafish. *Hint*: use the `fct_recode` function.
4. In previous analysis on this dataset (`Simple linear regression session`), we
performed a log-transformation on the response variable `Surv_time` to meet the
normality and homoscedasticity assumptions of the linear model. Here, we will
immediately work with log-transformed survival times; store these in the new
variable `log2Surv_time` and remove the non-transformed values.
5. Subset the data to only retain **Dojofish**.
```{r}
poison <- poison %>%
rename("Species" = "species") %>%
mutate(Species = as.factor(Species)) %>%
mutate(Species = fct_recode(Species,
Dojofish = "0", Goldfish = "1", Zebrafish = "2"
)) %>%
mutate(log2Surv_time = log2(Surv_time)) %>%
select(-Surv_time) %>%
filter(Species == "Dojofish")
poison
```
# Data exploration
Prior to the analysis, we should explore our data.
To start our data exploration, we will make use of the `ggpairs` function of the
`GGally` R package. This function will generate a visualization containing
multiple panels, which display (1) univariate plots of the different variables
in our dataset, (2) bivariate plots and (3) correlation coefficients between
the different variables.
```{r, message=FALSE, fig.width=12}
ggpairs(poison)
```
Based on these plots, we observe that:
- The survival time seems to be associated with dose and fish weight.
From the tutorial of H6 we have seen that the fish weights were not nicely
uniform across the different poison dosages due to the randomisation.
```{r, message=FALSE}
poison %>%
filter(Species == "Dojofish") %>%
ggplot(aes(x = Dose, y = Weight)) +
geom_point() +
ggtitle("Association between dose and weight") +
theme_bw() +
stat_summary(
fun = mean, geom = "point",
col = "black", size = 4,
shape = 24, fill = "red"
)
```
# Analysis with main effect and interaction for dose and weight
## Model specification
$$
y_i=\beta_0+\beta_d x_d + \beta_g x_g +\beta_{d:g} x_d x_g+ \epsilon_i,
$$
with $\epsilon_i \text{ i.i.d. } N(0,\sigma^2)$
## Assumptions
The model will again be fit to allow for assessing the model assumptions
```{r}
# lm_Int <- lm(log2Surv_time ~ Dose+Weight+Dose:Weight, data = poison) # equivalent
lm_Int <- lm(log2Surv_time ~ Dose * Weight, data = poison) # * -> short notation
par(mfrow = c(2, 2))
plot(lm_Int)
```
The plots look very similar to those of the additive model from the previous exercise so we know that all assumptions are **met**.
## Inference
We then inspect the results.
```{r}
summary(lm_Int)
```
## Interpretation of model parameters
We may interpret the estimate parameters as follows:
- When we compare the log2 survival of fish with weight $x_g$ that are exposed to a dose that differs with 1 mg/l, the expected log2 survival time will be
$\beta_d+\beta_{d:g}*x_g$ higher for the fish that were exposed to the highest dose.
- When we compare the log2 survival of fish exposed to dose $x_g$, but that have a weight that differs with 1 g, the expected log2 survival time
will be $\beta_d+\beta_{d:g}*x_g$ higher for fish with the highest weight.
The parameter $\beta_{d:g}$ thus shows that the effect of dose on the log2 survival time is dependent on the
weight of the fish, and, that the effect
of weight on the log2 survival time dependents on the dose that was
administered.
## Inference
The effect of dose is now parameterized by two model parameters ($\beta_d$
and $\beta_{d:g}$). We first evaluate an *omnibus* hypotheses that there is
no effect of dose, i.e., no main effect nor an interaction effect. We can test
this with an F-test that compares a *full* model (1) containing a main effect
for dose, a main effect for weight and an interaction between dose and weight
with a model (2) that only contains a main effect for weight (i.e. no effect
for dose).
```{r}
lmDojo_weight <- lm(log2Surv_time ~ Weight, data = poison)
anova(lmDojo_weight, lm_Int)
```
We observe an extremely significant (overall, or global) effect for dose on
the log2 survival time of dojofish
(p-value <<0.001)`.
## Conventional approach
We already established that there is a significant overall effect of dose.
Now, we will test if there is a significant interaction effect between dose
and weight. Since we only have one interaction term in this model, this can
be achieved in several ways:
1. The `summary` function
2. An F-test comparing models with and without the interaction effect
3. An ANOVA table with type III sum of squares
```{r}
summary(lm_Int) # 1
lm_additive <- lm(log2Surv_time ~ Dose + Weight, data = poison) # 2
anova(lm_additive, lm_Int) # 2
Anova(lm_Int, type = "III") # 3
```
There is no significant interaction between dose and weight. As such, the
effect of dose on survival is not signifiantly different between fish of
different weight (p = `r format(Anova(lm_Int,type="III")$"Pr(>F)"[4], digits = 2)`).
The conventional approach is to remove the interaction effect from the
model. As such, we are left with the additive linear regression model.
```{r}
summary(lm_additive)
```
```{r}
2^(coef(lm_additive)) # original scale
2^(confint(lm_additive))
```
### Conclusion
The conclusion is exactly the same as for the additive model analysis above
(obviously, as we are dealing with the same model):
The dose of the poison has an extremely significant effect on the survival time
of dojofish (p-value << 0.001). The geometric average of the survival time for
dojofish that are exposed to a poison dose that is 1mg/L larger is approximately
halved,
factor = $2^{\beta_d}=$ `r format(2^(lm_additive$coef["Dose"]),digits=2)`) .
The effect of dose on survival is not significantly different between fish of
different weight. (p = `r format(summary(lm_Int)$coefficients[4,4], digits=2)`).
# Remark
Note, that we have have to test if the interaction is significant before using
the additive model.