13-Week6_home.Rmd

# Week 6 - Home

```{r include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
show_answers <- TRUE
```
```{r echo = FALSE, warning=FALSE, message=FALSE, eval=FALSE}
library(haven)
library(labelled)
library(semTools)
library(kableExtra)
library(lavaan)
library(psych)
data <- unlabelled(read_sav("suiciderisk.sav"))
```
```{r echo = FALSE, warning=FALSE, message=FALSE}
library(haven)
library(labelled)
library(semTools)
library(kableExtra)
library(lavaan)
library(psych)
data <- unlabelled(read_sav("TCSM_student/suiciderisk.sav"))
```

Exercise based on: 

Metha, A., Chen, E, Mulvenon, S. and Dode, I. (1998). A Theoretical Model of Suicide Risk. *Archives of Suicide Research, 4*, p.  115-133.

Download the dataset “suiciderisk.sav”  [here](https://github.com/cjvanlissa/TCSM_student). This is a dataset that was simulated from the covariance matrices in the original paper presented on p.123. The idea behind simulation is that you can recreate the dataset from some summary statistics. The covariance matrix suffices for doing this, as it summarizes all relations between all variables (see http://en.wikipedia.org/wiki/Computer_simulation for an introduction).
The new data matrix almost exactly corresponds to the data as presented in the paper, albeit that there are inconsistent data cells (negative- or our of range values for the scale of the original variables). You don’t have to worry about this now.

In the take home- and class exercise we will investigate the dataset on suicidesisk and test whether there is a possible moderation of gender.

### Question 1

Baron and Kenny (1986) present four cases to illustrate how moderation can be studied. If gender is our moderator, and either depression, hopelessness, selfesteem and/or substance abuse our independent variables, what case should we use to investigate moderation?

<details>
  <summary>Click for explanation</summary>
Case 2: dichotomous moderator, continuous independent variables
\details

### Question 2

First, run bivariate correlations of all continuous independent variables with suicide risk. Do you think it is useful to investigate these predictors? What if one one the correlations is non-significant. Is it then still useful to study moderation?

*Note: You can obtain correlations using* `cor()` *or* `psych::corr.test()`

<details>
  <summary>Click for explanation</summary>
A basic correlation matrix for the first 5 variables:

```{r}
cor(data[, 1:5])
```

A correlation table with p-values can be obtained using the `psych::corr.test()` function:

```{r}
library(psych)
corr.test(data[, 1:5])
```
Yes, correlations are similar to those in paper, and are all significant. Even when correlations are n.s., it is still useful to investigate moderation: there might be a suppression effect
\details
 
### Question 3

Examine the relationships between suicide risk and the four continuous predictors visually. What do you think about the relations?

<details>
  <summary>Click for explanation</summary>

#### A psych solution

The package `psych' contains a function to help us visually check assumptions, such as linearity:

```{r}
pairs.panels(data[, 1:5], 
             density = TRUE, # show density plots
             ellipses = TRUE # show correlation ellipses
             )
```

#### Ggplot to build plots

In this case, psych is easier. But generally, we can make any plot we want using `ggplot2`. In ggplot, we add one plot element at a time. Here is an example for one variable:

```{r}
library(ggplot2)
ggplot(data, aes(x = subabuse, y = suirisk)) +
  geom_point() +  # Add scatter
  geom_smooth() + # Add smooth trend
  theme_bw()      # Add black/white theme
```

You could also add a linear trend line:

```{r}
ggplot(data, aes(x = subabuse, y = suirisk)) +
  geom_point() +  # Add scatter
  geom_smooth() + # Add smooth trend
  theme_bw() +    # Add black/white theme
  # Add linear trend line (without confidence bound and red in color)
  geom_smooth(method = "lm", color = "red", se = FALSE)
```

This seems linear; you can check the other variables!
\details
 	 
### Question 4

Build the scatterplots again, but now map participants' gender to the colour of the dots and smooth line. Do your conclusions differ from question 3?

<details>
  <summary>Click for explanation</summary>

#### A psych solution

Again, `psych` has a standard solution for this situation:

```{r}
pairs.panels(data[, 1:5], bg = c("red", "blue")[data$gender], pch = 21)
```        

In ggplot, we can make the plots more detailed. We can map gender to the color of the dots and smooth line using `aes(colour = gender)`:

```{r}
ggplot(data, aes(x = subabuse, y = suirisk, colour = gender)) +
  geom_point() +  # Add scatter
  geom_smooth() + # Add smooth trend
  theme_bw()      # Add black/white theme
```

This seems linear; you can check the other variables! There are also some differences by gender, in range and steepness.
\details


### Question 5

Baron and Kenny (1986) note the correlational method (as we just used) has two serious deficiencies. We can investigate only one of them. Which? Please investigate whether this potential problem exists for the moderator gender.

*Note: Use* `psych::describeBy()` *or* `car::leveneTest`

<details>
  <summary>Click for explanation</summary>

The variances might not be equal for all levels of gender. We can do a preliminary investigation using `describeBy`; a version of the `describe` that splits the data by group.

```{r, eval = FALSE}
library(psych)
describeBy(data[, 1:5], group = data$gender)
```
```{r, echo = FALSE}
library(psych)
print(describeBy(data[, 1:5], group = data$gender))
```

The variances turn out to be roughly the same for most variables, but there is a large difference in the variance of nartotic substance use. Let's apply Levene's test, which lives in the `car` package:

```{r}
library(car)
leveneTest(y = data$subabuse, group = data$gender)
```

If we want a Levene's test for all five continuous variables, it can be helpful to **apply** the function to the 5 columns of data. The function `lapply()` applies a second function (`leveneTest()`) to every element of a list; in this case - the five columns. The argument `group` stays the same for each variable:

```{r}
lapply(data[1:5], leveneTest, group = data$gender)
```
\details

### Question 6

Baron and Kenny also state how this problem might be resolved by looking at regression coefficients. Run a multiple regression model, with suicide risk as dependent variable, and all 4 independent variables as predictors. Run this model separately for both genders. Do you find any differences in the regression coefficients? (unstandardized)

<details>
  <summary>Click for explanation</summary>
```{r}
 # In the lm() function, ~ . means: use all predictors
summary(lm(suirisk ~ ., data = data[data$gender == "females", 1:5]))
summary(lm(suirisk ~ ., data = data[data$gender == "males", 1:5]))
```
A small difference for substance use (.12 males, .06 females) and depression (.05 females, .115 males)? Other than that, they are similar.

\details

### Question 7

What do you think, does it make sense to study the moderating role of gender in a model that explains suicide risk?

<details>
  <summary>Click for explanation</summary>
Yes, all assumptions hold, so fine to do it.
\details

### Question 8

One way to study moderation (and this is something you might have done in an earlier statistics course), is to compute interaction terms of the moderator variable with the independent variables. R does this automatically, if you use the multiplication symbol `*` in your regression equation. 

Run a hierarchical regression analysis: Build two regression models, and include all independent variables and gender in the first model, and add the interaction terms in the second model. Then, compare the fit of the two models using `anova()` Do you find that the interaction variables explain part of the variance in suicide risk over and above that of the independent variables?


<details>
  <summary>Click for explanation</summary>
```{r}
reg_1 <- lm(suirisk ~ ., data)
summary(reg_1)
reg_2 <- lm(suirisk ~ gender * subabuse + gender * hopeless + gender * selfesteem + gender * depression, data)
summary(reg_2)

anova(reg_1, reg_2)
```
R2 goes from .20 to .22, so small difference. This difference is significant. The only significant interaction effect is gendermales:depression.

\details

### Question 9

Build a basic regression model in lavaan, see the figure below. You have built this model before. Compare your lavaan results to your earlier findings. What do you conclude?

![](week6home1.png)

<details>
  <summary>Click for explanation</summary>
  
```{r}
library(lavaan)
model <- "suirisk ~ subabuse + hopeless + selfesteem + depression + gender"
fit <- sem(model, data)
summary(fit)

library(semPlot)
semPaths(fit, whatLabels = "est", rotation = 2)
```

Same results, except for a small difference in the standard errors, that results from the fact we use Ordinary Least Squares (OLS)-estimation in R and Maximum Likelihood (ML) in lavaan. Apart from that, we can also estimate the correlations/covariances between the independent variables in lavaan, that we cannot estimate in R.
\details 

### Question 10

Do you have any idea why we find a Chi-square value of exactly 0.0?

<details>
  <summary>Click for explanation</summary>
There are no degrees of freedom (the model is identical to the saturated model) left. This means we want to estimate as many paths as there are sample moments. So, all covariances between variables are accounted for in the model, and there is no room for any deviation between our “model” and the data.
\details

### Question 11

Mehta et al (1998) state that the theory of suicide risk is more complicated than we so far modeled using regression models. The effects of self-esteem and depression are mediated through hopelessness, while depression also explains self-esteem. On page 117-118 they summarize their expectations about the effects in 6 hypotheses (H1 and H2a-e). Build a mediation model in lavaan based on these hypotheses. Leave gender out of the model for now. 

Run the model, and see whether the model fits by looking at the Chi-square(df), it’s p-value, CFI and RMSEA. In case your model does not fit, can you make it fit the data by making an alteration to the model? Note that your Chi-square values might differ slightly from those presented by Mehta et al (1998), because of data simulation procedures.

<details>
  <summary>Click for explanation</summary>
The model fits when a covariance is added between substance use and depression (see figure 1 in the article). Then, Chi-square(3) =9.4, p=.025 CFI=.992, RMSEA=.064. This is probably what Mehta et al. (1998) did as well, without showing us.
  
```{r}
model <- "
suirisk ~ hopeless + depression + subabuse
hopeless ~ depression + selfesteem
selfesteem ~ depression
subabuse ~~ depression
"

fit <- sem(model, data)
summary(fit, fit.measures = TRUE)
semPaths(fit, whatLabels = "est", rotation = 2)
```
\details