-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path04-Week1_class.Rmd
668 lines (453 loc) · 22.9 KB
/
04-Week1_class.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
# Week 1 - Class
```{r include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
show_answers <- TRUE
```
During the practical you will work on some exercises about ANOVA and ANCOVA using regression and path modeling. Note that ANOVA and ANCOVA are special cases of regression, as discussed during MTS3 or a similar course. How to perform an ANOVA/ANCOVA as a regression analysis is prerequisite knowledge.
This practical we will work on these topics (ANOVA, ANCOVA, regression and how they are related). If you need to refresh your knowledge you could use the internet to find information or you could look it up in a book on statistics, for example @field2012discovering (*The chapters on ANOVA, Factorial ANOVA, and ANCOVA (11.6)*).
We start with two exercises in which you have to explore your data and perform a regression analysis, ANOVA and an ANCOVA. You will also practice with performing an ANCOVA as a regression analysis in exercise 3 today.
## Loading data
Open the file Sesam.sav:
```{r, echo = TRUE, message=FALSE, eval = FALSE}
# Library for reading SPSS files:
library(foreign)
# Load the data and put them in the object called "data"
data <- read.spss("sesam.sav", to.data.frame = TRUE, use.value.labels = FALSE)
```
```{r, message=FALSE, echo= FALSE}
# Library for reading SPSS files:
library(foreign)
# Load the data and put them in the object called "data"
data <- read.spss("TCSM_student/sesam.sav", to.data.frame = TRUE, use.value.labels = FALSE)
```
This file is part of a larger dataset that evaluates the impact of the first year of the Sesame Street television series. Sesame Street is mainly concerned with teaching preschool related skills to children in the 3-5 year age range.
The following variables will be used in this exercise:
* **age** measured in months
* **prelet** knowledge of letters before watching Sesame Street (range 0-58)
* **prenumb** knowledge of numbers before watching Sesame Street (range 0-54)
* **prerelat** knowledge of relations before watching Sesame Street (range 0-17)
* **peabody** vocabulary maturity before watching Sesame Street (range 20-120)
* **postnumb** knowledge of numbers after a year of Sesame Street (range 0-54)
## Section 1
### Question 1.a
What is the level of measurement of each of the variables?
<details>
<summary>Click for explanation</summary>
In the 'Environment' panel in the top right corner of the screen, click the arrow in the next to the object called 'data'. Alternatively, run the rode: `head(data)`.
```{r echo = FALSE}
knitr::include_graphics("measurement_level.png")#, out.width = "456px")
```
</details>
### Question 1.b
What is the average age in the sample? And the range (youngest and oldest child)?
*Hint: Use library(tidySEM); descriptives()*
<details>
<summary>Click for explanation</summary>
As in the take home exercises, use the function `descriptives()` from the `tidySEM` package to describe the data:
```{r, echo = TRUE}
library(tidySEM)
descriptives(data)
```
`r if(knitr::is_html_output()){"\\details"}`
### Question 1.c
What is the average gain in knowledge of numbers? Provide both the mean and the standard deviation.
*Hint: Use the <- operator to assign to a new variable in data. You can use descriptives(), or the functions mean() and sd().*
<details>
<summary>Click for explanation</summary>
Create a new variable that represents the difference between pre- and post-test scores:
```{r, echo = TRUE}
data$dif <- data$postnumb - data$prenumb
```
There are specialized functions to obtain the mean and sd:
```{r, echo = TRUE}
mean(data$dif)
sd(data$dif)
```
`r if(knitr::is_html_output()){"\\details"}`
### Question 1.d
Choose an appropriate graph to present the gain scores. What did you choose and why?
*Hint: As explained in the introductory chapters, you can use ggplot and add a histogram, density plot, or boxplot:* `geom_histogram(); geom_density(); geom_boxplot()`
<details>
<summary>Click for explanation</summary>
```{r, echo = TRUE, include = show_answers}
library(ggplot2)
p <- ggplot(data, aes(x = dif))
p + geom_histogram()
p + geom_density()
p + geom_boxplot()
```
`r if(knitr::is_html_output()){"\\details"}`
### Question 1.e
Can you think of a graph based on two variables that is informative? What is it and how is it informative?
*Hint: A useful plotting function for a bivariate distribution is the scatterplot:* `geom_point()`
<details>
<summary>Click for explanation</summary>
```{r, echo = TRUE, include = show_answers}
#Possible variables would be the pre- and post measurement
ggplot(data, aes(x = prenumb, y = postnumb)) + geom_point()
```
`r if(knitr::is_html_output()){"\\details"}`
### Question 1.f
Which of the variables age, prelet, prenumb, prerelat and peabody are related to postnumb? Use Pearson’s correlations (`cor()`). You don’t need to check assumptions. If you want p-values for the correlations, use the function `corr.test()` from the `psych` package instead.
*Hint: The function* `corr.test()` *from the psych package provides Pearson's correlationsand p-values (the base R function cor() does not provide p-values). Select variables by name from a data.frame object (like* `data`*) using the following syntax*: `data[, c("each", "variable", "name")]`*.*
<details>
<summary>Click for explanation</summary>
```{r, echo = TRUE}
library(psych)
corr.test(data[, c("age", "prelet", "prenumb", "prerelat", "peabody", "postnumb")])
```
The use of `data[,]` follows the conventions of matrix indexation: You can select rows (the horizontal lines) like this, `data[i, ]`, and columns (the vertical lines) like this, `data[ ,j]`, where i are the rows and j are the columns you want to select. As you can see in the example, you can select multiple columns using c( ... , ... ).
`r if(knitr::is_html_output()){"\\details"}`
### Question 1.g
Can age and prenumb be used to predict postnumb? If so, discuss the substantial importance of the model and the significance and substantial importance of the separate predictors.
*Hint: The function* `lm()` *(short for linear model) conducts linear regression. The functions* `summary()` *provides relevant summary statistics for the model. It can be helpful to store the results of your analysis in an object, too.*
<details>
<summary>Click for explanation</summary>
```{r, echo = TRUE}
results <- lm(formula = postnumb ~ age + prenumb,
data = data)
summary(results)
```
`r if(knitr::is_html_output()){"\\details"}`
### Question 1.h
Provide the null hypotheses and the alternative hypotheses of the model in 1.g.
<details>
<summary>Click for explanation</summary>
The null-hypotheses of the **model** pertain to the variance explained: $\rho^2$ (that's Greek letter rho, for the population value of $\rho^2$).
$H_0: \rho^2 = 0$
$H_a: \rho^2 > 0$
`r if(knitr::is_html_output()){"\\details"}`
### Question 1.i
Consider the path model below. How many regression coefficients are estimated in this model? And how many variances? And how many covariances? How many degrees of freedom does this model have? ($df = N_{obs} – N_{par}$, see slides Lecture 1).
```{r echo = FALSE}
#knitr::include_graphics("1_path_model.png")#, out.width = "456px")
res <- lavaan::sem("postnumb ~ prerelat + prelet + prenumb
prerelat ~ age
prelet ~ age
prenumb ~ age", data)
library(tidySEM)
p <- prepare_graph(res, layout = get_layout("", "prerelat", "",
"age", "prelet", "postnumb",
"", "prenumb", "", rows = 3), angle = 1)
edges(p)$label <- NA
plot(p)
```
### Question 1.j
Consider a multiple regression analysis with three continuous independent variables, tests in language, history and logic, and one continuous dependent variable, a score on a math test. We want to know whether the various tests can predict the math score. Sketch a path model for this analysis (there are examples in the lecture slides of week 1).
How many regression parameters are there? How many variances could you estimate?
How many covariances could you estimate? How many degrees of freedom does this model have?
## Section 2
Open the file `Drivers.sav`:
```{r, echo = TRUE, eval=FALSE}
# Load the data and put them in the object called "data"
data <- read.spss("Drivers.sav", to.data.frame = TRUE)
```
```{r, message=FALSE, echo = FALSE}
# Load the data and put them in the object called "data"
data <- read.spss("TCSM_student/Drivers.sav", to.data.frame = TRUE)
```
### Research question 1 (ANOVA): Does talking on the phone interfere with people’s driving skills?
The IV for this reseach question is `condition`, with conditions:
* hand-held phone
* hands-free phone
* control
The DV is reaction time in milliseconds in a driver simulation test, in variable `RT`.
### Question 2.a
Perform the ANOVA. You can use `lm(y ~ -1 + x)` to remove the intercept from a regression with dummies, and get a separate mean for each group. The function `aov()` is an alternate interface for `lm()` that reports results in a way that matches the conventions for ANOVA analyses more closely.
<details>
<summary>Click for explanation</summary>
You can use `summary(lm(y ~ -1 + x))` to get the means for each group:
```{r, echo = TRUE, message=FALSE}
results <- lm(formula = RT ~ -1 + condition, data = data)
summary(results)
```
And you can use `aov()` to get the sum of squares for the factor:
```{r, echo = TRUE, message=FALSE}
results <- aov(formula = RT ~ condition, data = data)
summary(results)
```
`r if(knitr::is_html_output()){"\\details"}`
### Question 2.b
What are the assumptions you need to check?
<details>
<summary>Click for explanation</summary>
We can check several assumptions:
1. Presence of outliers
2. Normality of residuals
3. Homogeneity of residuals
Let's deal with them in order.
#### Presence of outliers:
**In Y-space**
We can check the range of the standardized (`scale()`) residuals for outliers in Y-space. The residuals are **inside** of the results object, so we can just extract them, standardize them, and get the range:
```{r, echo = TRUE}
range(scale(results$residuals))
```
What is your conclusiong about the outliers?
#### Normality of residuals
We can check the normality of residuals using a QQplot.
```{r, echo = TRUE}
qqnorm(results$residuals)
qqline(results$residuals)
```
There appears to be some mild deviation from normality at the extremes.
You can also **test** for normality with the `shapiro.test(x)` function:
```{r, echo = TRUE}
shapiro.test(results$residuals)
```
#### Homogeneity of Variances
The `bartlett.test()` function provides a parametric K-sample test of the equality of variances. This test has the same hypotheses as the Levene's test.
```{r, echo = TRUE}
bartlett.test(formula = RT~condition, data = data)
```
It can also be nice to use a paneled boxplot to visualize the distributions. For this, we will use `ggplot2`. This time, we introduce a new command, `theme_bw()`: A theme for the plot that conforms to APA standards. We can apply this theme to any figure created using `ggplot()`:
```{r, echo = TRUE}
library(ggplot2)
ggplot(data, aes(y = RT, group = condition)) +
geom_boxplot() +
theme_bw()
```
`r if(knitr::is_html_output()){"\\details"}`
### Question 2.c
Explain for each of the assumptions why they are important to check.
### Question 2.d
What are your conclusions regarding the assumption checks?
<details>
<summary>Click for explanation</summary>
There are no outliers in X-space, no evidence for (severe) deviations from normality of residuals, and no evidence for (severe) heteroscedasticity.
</details>
### Question 2.e
Answer the research question.
*Hint: Use* `summary()` *and* `TukeyHSD()`*.*
<details>
<summary>Click for explanation</summary>
We can examine the overall F-test, which is significant:
```{r, echo = TRUE, message=FALSE}
summary(results)
TukeyHSD(results)
```
Post-hoc tests with Bonferroni correction can be obtained using `TukeyHSD(results)`. We notice that none of these comparisons are significant. However, the research question was *Does talking on the phone interfere with peoples driving skills?* There are two conditions for talking on the phone. We could thus test a planned contrast of these two conditions against the control condition, instead of all possible post-hoc tests:
The standard contrasts are dummy coded:
```{r, echo = TRUE, message=FALSE}
contrasts(data$condition)
```
We can replace these with planned contrasts for "phone" vs control, and hand-held vs hands-free:
```{r, echo = TRUE, message=FALSE}
contrasts(data$condition) <- cbind(phoneVcontrol = c(-1, -1, 2), handVfree = c(-1, 1, 0))
results <- aov(RT ~ condition, data)
# Ask for the lm summary, which gives you t-tests for the planned contrasts:
summary.lm(results)
```
`r if(knitr::is_html_output()){"\\details"}`
### Research question 2 (ANCOVA): Are there differences in reaction time between the conditions when controlling for age?
### Question 2.f
What are the assumptions you need to check?
<details>
<summary>Click for explanation</summary>
Assumptions for ANCOVA are the same as for ANOVA (no outliers, normality of residuals, homoscedasticity). ANCOVA has the following additional assumptions:
* Homogeneity of regression slopes for the covariate (no interaction between factor variable and covariate)
* The covariate is independent of the treatment effects. I.e. there is no difference in the covariate between the groups of the independent variable.
`r if(knitr::is_html_output()){"\\details"}`
### Question 2.g
Explain for each of the assumptions why they are important to check.
### Question 2.h
Check the assumptions of ANCOVA.
*Hint: Within formulas, you can use* `*` *instead of* `+` *to include interaction effects.*
<details>
<summary>Click for explanation</summary>
#### Homogeneity of regression slopes
Add the interaction to the model and test whether the interaction is significant:
```{r, echo = TRUE}
results_age <- aov(RT ~ condition + age, data)
results_age_int <- aov(RT ~ condition * age, data)
summary(results_age_int)
#Or you could use `anova()` to compare two different models
anova(results_age, results_age_int)
```
What would your conclusion be about this assumption?
<details>
<summary>Click for explanation</summary>
The interaction is NOT significant; no evidence for violation of the assumption.
`r if(knitr::is_html_output()){"\\details"}`
#### The covariate is independent of the treatment effects
```{r, echo = TRUE}
results_indep <- aov(age ~ condition, data)
summary(results_indep)
```
What would your conclusion be about this assumption?
<details>
<summary>Click for explanation</summary>
The covariate is not significantly related to treatment effect. The assumption is met.
`r if(knitr::is_html_output()){"\\details"}`
<!--#### Outliers in X-space
In addition to the aforementioned outliers in Y-space, we can now test for (multivariate) outliers in X-space using Mahalanobis' distance. In this case, we only have one continuous covariate. The function requires a **matrix** of data, a vector of means for centering the data, and a covariance matrix. In the syntax below, we use `drop = FALSE` when extracting a single column from the data, in order to make sure that the data will still be in matrix format when we extract only one column. This is necessary for the underlying matrix algebra. We have to take the `sqrt()` because the function `mahalanobis()` returns the **squared** Mahalanobis' distances.
```{r, echo = TRUE}
mahal <- sqrt(mahalanobis(data[ , "age", drop = FALSE],
center = mean(data$age),
cov = cov(data[, "age", drop = FALSE])))
range(mahal)
```
`r if(knitr::is_html_output()){"\\details"}`
-->
### Question 2.i
Answer the research question. (Do you have to include the interaction or not?)
<details>
<summary>Click for explanation</summary>
```{r, echo = TRUE, message=FALSE}
results <- aov(formula = RT ~ condition + age, data = data)
TukeyHSD(results)
```
The handheld-condition has a significant **higher** reaction time than the control condition
`r if(knitr::is_html_output()){"\\details"}`
## Section 3
Open the file Sesam2.sav.
```{r, echo = TRUE, eval=FALSE}
# Load the data and put them in the object called "data"
data <- read.spss("Sesam2.sav", to.data.frame = TRUE)
```
```{r, message=FALSE, echo = FALSE}
# Load the data and put them in the object called "data"
data <- read.spss("TCSM_student/Sesam2.sav", to.data.frame = TRUE)
```
Use postnumb as the dependent variable in all the following analyses.
### Question 3.a
Viewcat is a factor variable, but is not coded as such in the data. Turn it into a factor. Afterwards, make sure that viewcat=1 is the reference group in the contrasts, i.e., the group that is identified by zero scores on all the associated dummy variables.
*Hint: Use* `<- factor()` *and* `contrasts()`.
<details>
<summary>Click for explanation</summary>
```{r, echo = TRUE}
data$VIEWCAT <- factor(data$VIEWCAT)
contrasts(data$VIEWCAT)
```
`r if(knitr::is_html_output()){"\\details"}`
### Question 3.b
Perform a multiple regression analysis with just the viewcat dummies as predictors.
<details>
<summary>Click for explanation</summary>
```{r, echo = TRUE}
results <- lm(POSTNUMB ~ VIEWCAT, data)
summary(results)
```
`r if(knitr::is_html_output()){"\\details"}`
### Question 3.c
What do the regression coefficients represent? How can you determine the average postnumb score for each of the viewcat categories, based on the regression parameters?
### Question 3.d
Make a coloured scatter plot with age on the x-axis and postnumb on the y-axis. Colour the dots according to the their `viewcat` category. How do you interpret the differences in slopes of these four fit lines?
*Hint: Use* `ggplot()` and `geom_point()`; *use the argument aes(colour = '...') to map colour to a certain variable. A new command is* `geom_smooth()` *: This plots a smooth line (like a regression line).*
<details>
<summary>Click for explanation</summary>
We will use ggplot again:
```{r, echo = TRUE}
ggplot(data, aes(x = AGE, y = POSTNUMB, colour = VIEWCAT)) +
geom_point() + # For scatterplot
geom_smooth(method = "lm", se = FALSE) + # For regression lines
theme_bw() # For a pretty theme
```
`r if(knitr::is_html_output()){"\\details"}`
### Question 3.e
Add an interaction between age and viewcat to the regression analysis.
*Hint: An interaction is created by multiplying two variables. You can multiply with \* in the formula of* `lm()`.
<details>
<summary>Click for explanation</summary>
```{r, echo = TRUE}
results_interaction <- lm(POSTNUMB ~ VIEWCAT*AGE, data)
summary(results_interaction)
```
`r if(knitr::is_html_output()){"\\details"}`
### Question 3.f
Perform a sequential multiple regression. Include age and viewcat as the predictors in the first analysis. Add the interaction term in the second analysis. Make sure to obtain information about the change in R-square!
*Hint: Use* `anova()` *to compare two regression models.*
<details>
<summary>Click for explanation</summary>
```{r, echo = TRUE}
results_main <- lm(POSTNUMB ~ VIEWCAT + AGE, data)
anova(results_main, results_interaction)
```
`r if(knitr::is_html_output()){"\\details"}`
### Question 3.g
```{r, echo = FALSE}
library(tidySEM)
library(lavaan)
tmp <- data.frame(model.matrix(~.-1, data))
res <- sem("POSTNUMB ~ VIEWCAT2 + VIEWCAT3 + VIEWCAT4 + AGE", data = tmp)
set.seed(6)
p <- prepare_graph(res, angle = 179)
edges(p) <- edges(p)[!(edges(p)$from == edges(p)$to | !is.na(edges(p)$curvature)), ]
ggsave("3_g1.png", plot(p))
tmp <- data.frame(POSTNUMB = data$POSTNUMB, model.matrix(POSTNUMB~-1+VIEWCAT *AGE, data))
res <- sem("POSTNUMB ~ VIEWCAT2 + VIEWCAT3 + VIEWCAT4 + AGE + VIEWCAT2.AGE + VIEWCAT3.AGE + VIEWCAT4.AGE", data = tmp)
set.seed(6)
p <- prepare_graph(res, angle = 179)
edges(p) <- edges(p)[!(edges(p)$from == edges(p)$to | !is.na(edges(p)$curvature)), ]
ggsave("3_g2.png", plot(p))
```
Sketch path models of both steps of the regression analysis (on paper).
<details>
<summary>Click for explanation</summary>
Step 1:
![](3_g1.png)
Step 2:
![](3_g2.png)
`r if(knitr::is_html_output()){"\\details"}`
### Question 3.h
Write down the regression equations of both steps of the sequential analysis.
<details>
<summary>Click for explanation</summary>
$Postnumb_i = b_0 + b_1D_{view2i} + b_2D_{view3i} + b_3D_{view4i} + b_4Age_i + \epsilon_i$
$$
\begin{aligned}
Postnumb_i = b_0 + &b_1D_{view2i} + b_2D_{view3i} + b_3D_{view4i} + b_4Age_i +\\
&b_5D_{view2i}Age_i + b_6D_{view3i}Age_i + b_7D_{view4i}Age_i + \epsilon_i
\end{aligned}
$$
`r if(knitr::is_html_output()){"\\details"}`
### Question 3.i
Write down the null hypothesis that is tested to determine whether there is an interaction between age and viewcat.
<details>
<summary>Click for explanation</summary>
$H_0: b_5 = b_6 = b_7 = 0$
`r if(knitr::is_html_output()){"\\details"}`
### Question 3.j
Indicate for each parameter in the second regression model what it means. Also write down the regression equation for each of the four categories of viewcat separately.
<details>
<summary>Click for explanation</summary>
| Parameter | Meaning |
|---|---|
| b_0 | Intercept; the predicted value of postnumb for someone of age 0 in viewcat 1 |
| b_1 | Slope of the dummy for viewcat 2; difference in the predicted value of postnumb for someone aged 0 in category 2, compared to category 1 |
| b_4 | The effect of age for someone in viewcat 1 |
| b_5 | Difference in the effect of age for someone in viewcat 2, compared to viewcat 1 |
| b_7 | Difference in the effect of age for someone in viewcat 4, compared to viewcat 1 |
For viewcat 1:
$Postnumb_i = b_0 + b_4Age_i + \epsilon_i$
For viewcat 2:
$Postnumb_i = b_0 + b_4Age_i + b_1D_{view2i} + b_4Age_i + b_5D_{view2i}Age_i + \epsilon_i$
Etc.
`r if(knitr::is_html_output()){"\\details"}`
### Question 3.k
What do you conclude about the interaction between age and viewcat?
### Question 3.l
Note that you can also look at this problem as an ANCOVA. What are the research question and null hypothesis in this case?
<details>
<summary>Click for explanation</summary>
RQ: Is there a significant difference between the marginal means of postnumb by viewcat, after controlling for age?
$H_0:$ After controling for age, the mans of postnumb are equal in all groups.
`r if(knitr::is_html_output()){"\\details"}`
### Question 3.m
Perform this analysis as an ANCOVA.
*Hint: Add* `-1` *to a formula to drop the intercept.*
<details>
<summary>Click for explanation</summary>
To drop the intercept from the analysis, and estimate the marginal means for all viewcat categories, we can add `-1` (minus the intercept) to the formula:
```{r, echo = TRUE}
results_ancov <- aov(POSTNUMB~AGE+VIEWCAT-1, data)
```
`r if(knitr::is_html_output()){"\\details"}`
Examine the parameter estimates of the ANCOVA. What do the parameter estimates represent?
<details>
<summary>Click for explanation</summary>
We use summary.lm() again to obtain the parameter estimates:
```{r, echo = TRUE}
summary.lm(results_ancov)
```
The parameter estimates are the means of each VIEWCAT category when age = 0.
`r if(knitr::is_html_output()){"\\details"}`