-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path09-Week4_home.Rmd
235 lines (160 loc) · 7.84 KB
/
09-Week4_home.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
# Week 4 - Home
```{r include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
show_answers <- TRUE
```
```{r echo = FALSE, warning=FALSE, message=FALSE}
library(foreign)
data <- read.spss("TCSM_student/SocialRejection.sav", to.data.frame = TRUE)
```
Load the data file SocialRejection.sav into R. It contains three variables: Condition (IV), SelfEst (IV), and Spent (DV).
### Question 1
Check the assumption of homogeneous regression lines (no interaction) first. What is your conclusion?
*Hint: You need to estimate a model with, and one without an interaction, and compare them using the `anova()` function.*
<details>
<summary>Click for explanation</summary>
```{r, warning = FALSE, message = FALSE}
ancova_main <- aov(Spent ~ Condition + SelfEst, data = data)
ancova_int <- aov(Spent ~ Condition*SelfEst, data = data)
anova(ancova_main, ancova_int)
```
The lines are homogeneous, the assumption is met (interaction is not significant).
\details
### Question 2
What should you do when this assumption is violated?
### Question 3
Before you can do an ANCOVA, you should also check the assumption of homogeneity. What does homogeneity imply, and is the assumption met?
*Hint: You did this in the Week 1 class exercise.*
<details>
<summary>Click for explanation</summary>
We can do a test of homogeneity of the variances of Spent across conditions:
```{r}
bartlett.test(formula = Spent~Condition, data = data)
```
However, note that the assumption of homogeneity actually requires the **residual** variance, after controlling for the covariate SelfEst, to be the same across conditions. We can extract these residuals from a regression with only SelfEst as a predictor:
```{r}
reg_selfest <- lm(Spent ~ SelfEst, data = data)
residuals_selfest <- reg_selfest$residuals
```
Then, we can test the null hypothesis that the error variance of the dependent variable is equal across groups:
```{r}
bartlett.test(residuals_selfest, data$Condition)
```
The test is not significant, meaning that the error variances are indeed equal, the assumption is met.
\details
### Question 4
What should you do when this assumption is violated?
<details>
<summary>Click for explanation</summary>
You cannot really "solve" this problem in classical regression or ANCOVA, because only one parameter is estimated for the error variance. In SEM, however, you can estimate different error variance parameters for each group.
\details
### Question 5
Run the actual ANCOVA (or use previous output). What are your conclusions about the effects of the factor and the covariate?
<details>
<summary>Click for explanation</summary>
```{r}
summary(ancova_main)
```
Self esteem is significant, F (1, 55) = 29.118, p < .001, the level of self esteem of the respondent is related to the amount spent.
Condition is significant after controlling for the effect of self-esteem, F (2, 55) = 4.402, p = .017, the amount spent differs between the three conditions.
\details
### Question 6
Let's examine the differences in conditional means between the three conditions. In order to do so, we can use several approaches.
#### Approach 1: Conditional means
We can obtain the conditional means of the three groups by asking for the predicted (expected) value, based on the model, for each of the three conditions, keeping the covariate constant at 0. For this, we apply the `predict()` function to the object containing our analysis. We make a small new dataset for the values that we want predictions for:
```{r}
new_data <- data.frame(Condition = c("rejection", "neutral", "confirming"),
SelfEst = c(0, 0, 0))
predict(ancova_main, new_data)
```
What are your conclusions about the three conditions (i.e., how do they differ)?
#### Approach 2: Testing significance
We can test the significance for these differences using `TukeyHSD()` again, but to get the conditional means, we need to use the residuals from a model that includes only SelfEst, which we obtained before:
```{r}
reg_selfest <- lm(Spent ~ SelfEst, data = data)
residuals_selfest <- reg_selfest$residuals
anova_conditional <- aov(residuals_selfest ~ data$Condition)
TukeyHSD(anova_conditional)
```
Respondents in the rejection condition spent more, than respondents in the neutral condition and the confirming condition. These differences are not tested on significance between two groups.
#### Approach 3: Plotting the difference
This is where R really shines: We can quickly put together a plot that shows the difference between groups, along with the raw data. We use the package `ggplot2`
```{r}
library(ggplot2)
# Put the data for the plot together
plot_data <- data.frame(Spent_resid = residuals_selfest,
Condition = data$Condition)
# Basic plot; indicate that you want condition on the x-axis and Spent_resid on
# the y-axis
ggplot(plot_data, aes(x = Condition, y = Spent_resid)) +
geom_boxplot() + # Add a boxplot for each condition
geom_jitter(width = .2) + # Plot raw datapoints on top
theme_bw() # Add a nice APA theme
```
### Question 7
An AN(C)OVA can also be specified as a regression analysis. R automatically creates dummies. Use the `lm()` function instead of `aov()`, and compare the results.
<details>
<summary>Click for explanation</summary>
```{r}
ancova_main <- aov(Spent ~ Condition + SelfEst, data = data)
summary(ancova_main)
lm_main <- lm(Spent ~ Condition + SelfEst, data = data)
summary(lm_main)
```
\details
You can get the conditional means directly from this `lm()` model by dropping the intercept, using `-1` (which means: minus the intercept) in the formula:
```{r}
lm_no_intercept <- lm(Spent ~ -1 + Condition + SelfEst, data = data)
summary(lm_no_intercept)
```
### Question 8
To perform this analysis as a structural equation model, we need to manually compute dummy variables. We can use the function `model.matrix()` to "expand" a factor variable into dummies:
```{r}
data_dummies <- model.matrix(~ -1 + Condition, data = data)
head(data_dummies)
```
We can then bind these columns with dummies to our original data using `cbind()` (column bind):
```{r}
data <- cbind(data, data_dummies)
head(data)
```
Begin by specifying the model in lavaan like this:
![](week4home1.png)
<details>
<summary>Click for explanation</summary>
```{r}
library(lavaan)
ancova_lavaan <- sem('Spent ~ SelfEst + Conditionrejection + Conditionconfirming', data = data)
summary(ancova_lavaan)
```
To obtain a plot of these results and compare it to our picture above, use `SemPlot`:
```{r}
library(semPlot)
semPaths(ancova_lavaan, whatLabels = "est", rotation = 2)
```
\details
### Additional options
**Note:** When you are doing an ANCOVA (even as a regression model with dummies), you want to analyze both the covariance structure AND the mean structure. To include the latter in your analysis, you have to tell lavaan to include this by adding the argument `meanstructure = TRUE` in the fitting function:
```{r}
library(lavaan)
ancova_lavaan <- sem('Spent ~ SelfEst + Conditionrejection + Conditionconfirming',
data = data,
meanstructure = TRUE)
summary(ancova_lavaan)
```
To obtain the standardized results and the proportion of explained variance (= squared multiple correlation, i.e., R2), you can use the options in the `summary()` function:
```{r}
summary(ancova_lavaan, standardized = TRUE, rsquare = TRUE)
```
### Question 9
Compare your results to those obtained with the regression analysis. What is your conclusion?
### Question 10
Check the model fit. What do you conclude?
*Note: Use* `summary()` *and* `fit.measures = TRUE`
<details>
<summary>Click for explanation</summary>
```{r}
summary(ancova_lavaan, fit.measures = TRUE)
```
Saturated model, so perfect fit. Because the number of parameters to be estimated is equal to the number of observed statistics, there is a perfect fit. Here our interest is mainly in getting the estimates, not in the model-fit.
\details