-
Notifications
You must be signed in to change notification settings - Fork 3
/
13-Week6_home.Rmd
271 lines (194 loc) · 10.8 KB
/
13-Week6_home.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
# Week 6 - Home
```{r include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
show_answers <- TRUE
```
```{r echo = FALSE, warning=FALSE, message=FALSE, eval=FALSE}
library(haven)
library(labelled)
library(semTools)
library(kableExtra)
library(lavaan)
library(psych)
data <- unlabelled(read_sav("suiciderisk.sav"))
```
```{r echo = FALSE, warning=FALSE, message=FALSE}
library(haven)
library(labelled)
library(semTools)
library(kableExtra)
library(lavaan)
library(psych)
data <- unlabelled(read_sav("TCSM_student/suiciderisk.sav"))
```
Exercise based on:
Metha, A., Chen, E, Mulvenon, S. and Dode, I. (1998). A Theoretical Model of Suicide Risk. *Archives of Suicide Research, 4*, p. 115-133.
Download the dataset “suiciderisk.sav” [here](https://github.com/cjvanlissa/TCSM_student). This is a dataset that was simulated from the covariance matrices in the original paper presented on p.123. The idea behind simulation is that you can recreate the dataset from some summary statistics. The covariance matrix suffices for doing this, as it summarizes all relations between all variables (see http://en.wikipedia.org/wiki/Computer_simulation for an introduction).
The new data matrix almost exactly corresponds to the data as presented in the paper, albeit that there are inconsistent data cells (negative- or our of range values for the scale of the original variables). You don’t have to worry about this now.
In the take home- and class exercise we will investigate the dataset on suicidesisk and test whether there is a possible moderation of gender.
### Question 1
Baron and Kenny (1986) present four cases to illustrate how moderation can be studied. If gender is our moderator, and either depression, hopelessness, selfesteem and/or substance abuse our independent variables, what case should we use to investigate moderation?
<details>
<summary>Click for explanation</summary>
Case 2: dichotomous moderator, continuous independent variables
\details
### Question 2
First, run bivariate correlations of all continuous independent variables with suicide risk. Do you think it is useful to investigate these predictors? What if one one the correlations is non-significant. Is it then still useful to study moderation?
*Note: You can obtain correlations using* `cor()` *or* `psych::corr.test()`
<details>
<summary>Click for explanation</summary>
A basic correlation matrix for the first 5 variables:
```{r}
cor(data[, 1:5])
```
A correlation table with p-values can be obtained using the `psych::corr.test()` function:
```{r}
library(psych)
corr.test(data[, 1:5])
```
Yes, correlations are similar to those in paper, and are all significant. Even when correlations are n.s., it is still useful to investigate moderation: there might be a suppression effect
\details
### Question 3
Examine the relationships between suicide risk and the four continuous predictors visually. What do you think about the relations?
<details>
<summary>Click for explanation</summary>
#### A psych solution
The package `psych' contains a function to help us visually check assumptions, such as linearity:
```{r}
pairs.panels(data[, 1:5],
density = TRUE, # show density plots
ellipses = TRUE # show correlation ellipses
)
```
#### Ggplot to build plots
In this case, psych is easier. But generally, we can make any plot we want using `ggplot2`. In ggplot, we add one plot element at a time. Here is an example for one variable:
```{r}
library(ggplot2)
ggplot(data, aes(x = subabuse, y = suirisk)) +
geom_point() + # Add scatter
geom_smooth() + # Add smooth trend
theme_bw() # Add black/white theme
```
You could also add a linear trend line:
```{r}
ggplot(data, aes(x = subabuse, y = suirisk)) +
geom_point() + # Add scatter
geom_smooth() + # Add smooth trend
theme_bw() + # Add black/white theme
# Add linear trend line (without confidence bound and red in color)
geom_smooth(method = "lm", color = "red", se = FALSE)
```
This seems linear; you can check the other variables!
\details
### Question 4
Build the scatterplots again, but now map participants' gender to the colour of the dots and smooth line. Do your conclusions differ from question 3?
<details>
<summary>Click for explanation</summary>
#### A psych solution
Again, `psych` has a standard solution for this situation:
```{r}
pairs.panels(data[, 1:5], bg = c("red", "blue")[data$gender], pch = 21)
```
In ggplot, we can make the plots more detailed. We can map gender to the color of the dots and smooth line using `aes(colour = gender)`:
```{r}
ggplot(data, aes(x = subabuse, y = suirisk, colour = gender)) +
geom_point() + # Add scatter
geom_smooth() + # Add smooth trend
theme_bw() # Add black/white theme
```
This seems linear; you can check the other variables! There are also some differences by gender, in range and steepness.
\details
### Question 5
Baron and Kenny (1986) note the correlational method (as we just used) has two serious deficiencies. We can investigate only one of them. Which? Please investigate whether this potential problem exists for the moderator gender.
*Note: Use* `psych::describeBy()` *or* `car::leveneTest`
<details>
<summary>Click for explanation</summary>
The variances might not be equal for all levels of gender. We can do a preliminary investigation using `describeBy`; a version of the `describe` that splits the data by group.
```{r, eval = FALSE}
library(psych)
describeBy(data[, 1:5], group = data$gender)
```
```{r, echo = FALSE}
library(psych)
print(describeBy(data[, 1:5], group = data$gender))
```
The variances turn out to be roughly the same for most variables, but there is a large difference in the variance of nartotic substance use. Let's apply Levene's test, which lives in the `car` package:
```{r}
library(car)
leveneTest(y = data$subabuse, group = data$gender)
```
If we want a Levene's test for all five continuous variables, it can be helpful to **apply** the function to the 5 columns of data. The function `lapply()` applies a second function (`leveneTest()`) to every element of a list; in this case - the five columns. The argument `group` stays the same for each variable:
```{r}
lapply(data[1:5], leveneTest, group = data$gender)
```
\details
### Question 6
Baron and Kenny also state how this problem might be resolved by looking at regression coefficients. Run a multiple regression model, with suicide risk as dependent variable, and all 4 independent variables as predictors. Run this model separately for both genders. Do you find any differences in the regression coefficients? (unstandardized)
<details>
<summary>Click for explanation</summary>
```{r}
# In the lm() function, ~ . means: use all predictors
summary(lm(suirisk ~ ., data = data[data$gender == "females", 1:5]))
summary(lm(suirisk ~ ., data = data[data$gender == "males", 1:5]))
```
A small difference for substance use (.12 males, .06 females) and depression (.05 females, .115 males)? Other than that, they are similar.
\details
### Question 7
What do you think, does it make sense to study the moderating role of gender in a model that explains suicide risk?
<details>
<summary>Click for explanation</summary>
Yes, all assumptions hold, so fine to do it.
\details
### Question 8
One way to study moderation (and this is something you might have done in an earlier statistics course), is to compute interaction terms of the moderator variable with the independent variables. R does this automatically, if you use the multiplication symbol `*` in your regression equation.
Run a hierarchical regression analysis: Build two regression models, and include all independent variables and gender in the first model, and add the interaction terms in the second model. Then, compare the fit of the two models using `anova()` Do you find that the interaction variables explain part of the variance in suicide risk over and above that of the independent variables?
<details>
<summary>Click for explanation</summary>
```{r}
reg_1 <- lm(suirisk ~ ., data)
summary(reg_1)
reg_2 <- lm(suirisk ~ gender * subabuse + gender * hopeless + gender * selfesteem + gender * depression, data)
summary(reg_2)
anova(reg_1, reg_2)
```
R2 goes from .20 to .22, so small difference. This difference is significant. The only significant interaction effect is gendermales:depression.
\details
### Question 9
Build a basic regression model in lavaan, see the figure below. You have built this model before. Compare your lavaan results to your earlier findings. What do you conclude?
![](week6home1.png)
<details>
<summary>Click for explanation</summary>
```{r}
library(lavaan)
model <- "suirisk ~ subabuse + hopeless + selfesteem + depression + gender"
fit <- sem(model, data)
summary(fit)
library(semPlot)
semPaths(fit, whatLabels = "est", rotation = 2)
```
Same results, except for a small difference in the standard errors, that results from the fact we use Ordinary Least Squares (OLS)-estimation in R and Maximum Likelihood (ML) in lavaan. Apart from that, we can also estimate the correlations/covariances between the independent variables in lavaan, that we cannot estimate in R.
\details
### Question 10
Do you have any idea why we find a Chi-square value of exactly 0.0?
<details>
<summary>Click for explanation</summary>
There are no degrees of freedom (the model is identical to the saturated model) left. This means we want to estimate as many paths as there are sample moments. So, all covariances between variables are accounted for in the model, and there is no room for any deviation between our “model” and the data.
\details
### Question 11
Mehta et al (1998) state that the theory of suicide risk is more complicated than we so far modeled using regression models. The effects of self-esteem and depression are mediated through hopelessness, while depression also explains self-esteem. On page 117-118 they summarize their expectations about the effects in 6 hypotheses (H1 and H2a-e). Build a mediation model in lavaan based on these hypotheses. Leave gender out of the model for now.
Run the model, and see whether the model fits by looking at the Chi-square(df), it’s p-value, CFI and RMSEA. In case your model does not fit, can you make it fit the data by making an alteration to the model? Note that your Chi-square values might differ slightly from those presented by Mehta et al (1998), because of data simulation procedures.
<details>
<summary>Click for explanation</summary>
The model fits when a covariance is added between substance use and depression (see figure 1 in the article). Then, Chi-square(3) =9.4, p=.025 CFI=.992, RMSEA=.064. This is probably what Mehta et al. (1998) did as well, without showing us.
```{r}
model <- "
suirisk ~ hopeless + depression + subabuse
hopeless ~ depression + selfesteem
selfesteem ~ depression
subabuse ~~ depression
"
fit <- sem(model, data)
summary(fit, fit.measures = TRUE)
semPaths(fit, whatLabels = "est", rotation = 2)
```
\details