15-Week7_home.Rmd

# Week 7 - Putting it all together

```{r include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
show_answers <- TRUE
```
**Path analysis on Theory of Reasoned Action data (file: toradata.sav)**

A popular theory in psychology to explain social behavior is the Theory of Reasoned Action (TORA) of Ajzen and Fishbein (sometimes called the Ajzen-Fishbein model). This states that behavior is predicted by behavioral intention, which is in turn predicted by the attitude toward the behavior and the subjective norm about the behavior. Ajzen and Fisbein originally proposed a complex way to measure attitude and norm, but in practice researchers usually measure the attitude by including a number of attitude questions, and the norm by including a number of questions about the perceived social norms (e.g., how many of your friends would do...). The intention is usually measured by a single question on a 7 or 9-point scale and the behavior is either a dichotomous outcome (yes/no) or a frequency (how often do you...). Later, a third determinant was added, perceived behavioral control. If people feel they have little control over their behavior, this will also influence their behavior.

### Loading the data

The TORA data are an artificial data set following results found by Reinecke (1998). The behavior under investigation is condom use by 16-24 year adolescents. The dependent variable ‘condom use’ is measured on a 5-point frequency scale (How often do you...), and the behavioral intention on a similar 5-point scale (In general do you intend to...). There are three attitude items about condom use (e.g., using a condom is awkward), three normative items (e.g., I think most of my friends would use...), and three control items (e.g., I know well how to use a condom), all measured on a 5-point Likert-type scale.

<details>
  <summary>Click for explanation</summary>
```{r}
df <- read.csv("TCSM_student/toradata.csv", stringsAsFactors = TRUE)
```
\details

### Question 1

We have 3 indicators for attitude, norm and control, plus the variables intention and behavior. Carry out a factor analysis for the latent variables attitude, norm and control, and interpret the factor loadings of the indicators for each factor.

<!--*Hint: Use* `[]; %in%`-->

<details>
  <summary>Click for explanation</summary>
```{r}
library(lavaan)
TORA_cfa <- 'attit =~ attit_1 + attit_2 + attit_3
             norm =~ norm_1 + norm_2 + norm_3
             control =~ control_1 + control_2 + control_3'

fit <- cfa(TORA_cfa, data = df)

summary(fit, fit.measures=TRUE)
```
\details

### Question 2

Set up a TORA model with attitude and norms as latent variables both predicting behavorial intention, and intention predicting behavior. Analyze it. Interpret the model fit. If you find the model to fit, look at the results. How much variance does the model explain?

<details>
  <summary>Click for explanation</summary>
```{r}
TORA_sem <- 'attit =~ attit_1 + attit_2 + attit_3
             norm =~ norm_1 + norm_2 + norm_3
             
             intent ~ attit + norm
             behavior ~ intent'

fit <- sem(TORA_sem, data = df)

summary(fit, fit.measures = TRUE, rsquare = TRUE)
```
\details

### Question 3

Add control as a latent variable. Assume control has an effect on intention. Analyze this model and interpret the results. How much variance does the model explain?

<details>
  <summary>Click for explanation</summary>
  
The option `rsquare = TRUE` tells us how much variance is explained in each dependent variable:

```{r}
TORA_sem <- 'attit =~ attit_1 + attit_2 + attit_3
             norm =~ norm_1 + norm_2 + norm_3
             control =~ control_1 + control_2 + control_3
             
             intent ~ attit + norm + control
             behavior ~ intent'

fit <- sem(TORA_sem, data = df)

summary(fit, fit.measures = TRUE, rsquare = TRUE)
```
\details

### Question 4

The TORA model forbids direct paths between attitude and norm and actual
behavior; the effect should be mediated totally by the behavioral intention. The effect of
control can be either direct or indirect; the TORA does not specify which. Test a model with both a direct and indirect effect of control on TORA and decide whether you want to keep or remove them. Decide which model you accept as the best model for these data, and explain how you decided to keep this model.

<details>
  <summary>Click for explanation</summary>
```{r}
TORA_dir <- 'attit =~ attit_1 + attit_2 + attit_3
             norm =~ norm_1 + norm_2 + norm_3
             control =~ control_1 + control_2 + control_3
             
             intent ~ attit + norm + control
             behavior ~ intent + control'

TORA_ind <- 'attit =~ attit_1 + attit_2 + attit_3
             norm =~ norm_1 + norm_2 + norm_3
             control =~ control_1 + control_2 + control_3
             
             intent ~ attit + norm + control
             behavior ~ intent'

fit_dir <- sem(TORA_dir, data = df)
fit_ind <- sem(TORA_ind, data = df)
```

To compare the models, we can use a $\chi^2$-difference test, using the `anova()` function or ``semTools::compareFits`:

```{r}
anova(fit_dir, fit_ind)
library(semTools)
compareFit(Direct = fit_dir, Indirect = fit_ind)
```

\details

### Question 5

Use `tidySEM::graph_sem()` to plot both models, and check whether the picture corresponds with theory and with the way you intended to specify the model.

<details>
  <summary>Click for explanation</summary>
```{r}
library(tidySEM)
graph_sem(fit_dir)
graph_sem(fit_ind)
```
\details

### Question 6

Before you learned about latent variables, you might have analyzed these data by computing sum or mean scores for attitude, norms, and control, and using these as observed variables in a path model. It is possible to obtain mean scale scores using `tidySEM`, as follows:

```{r, message=FALSE}
library(dplyr)
df[,c(2:7, 10:12)] %>%
  tidy_sem() %>%
  create_scales() -> scales
```

The `scales` object contains a table with scale descriptive statistics:

```{r}
scales$descriptives
```

Use the mean scores of the variables to set up a TORA model,
and compare the results with the previous analysis.
Which model would you prefer, and why? 

*Hint: Use* `cbind()` *to add the scale scores to* `df`

<details>
  <summary>Click for explanation</summary>
```{r}
library(dplyr)
names(df)
df %>%
  tidy_sem() %>%
  tidySEM::create_scales() -> scales
df_obs <- cbind(df, scales$scores)

TORA_obs <- 'intent ~ attit + norm + control
             behavior ~ intent'
fit_obs <- sem(TORA_obs, df_obs)
```

The fit of this new model is worse than the model that included latent variables.
However, note that these models are **not nested**, so we cannot directly compare fit indices.
They are estimated on different variables. We will get a warning to emphasize this point. We can only interpret the Model Fit Indices table, and the Differences in Fit Indices, but not the Nested Model Comparison.

```{r}
compareFit(Latent = fit_dir,
           Observed = fit_obs)
```

We can also inspect the regression coefficients in both models:

```{r}
table_results(fit_dir) %>%
  filter(grepl("ON", label))
```

```{r}
table_results(fit_obs) %>%
  filter(grepl("ON", label))
```
\details

### Moderated mediation

We now want to investigate whether the Theory of Reasoned Action for condom use is different for boys and girls. This is a typical case of a moderated mediation model. 

### Question 7

Estimate the sex moderated model. What happened to the fit of the model when you compare it to your earlier model that didn’t include sex?

<details>
  <summary>Click for explanation</summary>
  
```{r}
fit_mod <- sem(TORA_sem, data = df, group = "sex")

summary(fit_mod, fit.measures = TRUE)
```
\details

### Measurement invariance

Before conducting this analysis, you should check for measurement invariance. There are different 'levels' of measurement invariance:

1. Configural: The same model in both groups
2. Metric: Same factor loadings
3. Scalar: Same factor loadings, and intercepts for the indicators.

We can specify these models with `groups.equal` constraints and then use `compareFit()`:

```{r}
library(semTools)
fit.config <- cfa(TORA_cfa, data = df, group = "sex")
fit.metric <- cfa(TORA_cfa, data = df, group = "sex",
                  group.equal = "loadings")
fit.scalar <- cfa(TORA_cfa, data = df, group = "sex",
                  group.equal = c("loadings", "intercepts"))
compareFit(Configural = fit.config, 
           Metric = fit.metric, 
           Scalar = fit.scalar)
```

You can see that imposing increasingly strict constraints does not significantly deteriorate the fit. So, scalar measurement invariance is supported.

### Question 8

Going back to the multi-group model, impose scalar invariance, and then determine whether it makes sense to constrain all the regression coefficients to be equal. Does it?

<details>
  <summary>Click for explanation</summary>
  
```{r}
fit_mod_invar <- sem(TORA_sem, data = df, 
                     group = "sex", 
                     group.equal = c("loadings", "intercepts"))
fit_mod_invar_reg <- sem(TORA_sem, data = df, 
                         group = "sex", 
                         group.equal = c("loadings", "intercepts", "regressions"))

compareFit(Multigroup = fit_mod_invar,
           Constrain_regressions = fit_mod_invar_reg)
```
\details


### Question 9

Depending on your answer to the previous question, investigate which regression coefficients of the TORA-model is the same or different for boys and girls. Decide on a final model and report how you choose your model.

With such a model, there are many ways to interpret the results. For example, you can look at the direct effects, indirect effects, explained variance, etc. Look at your final model and report on what you think are the most interesting results.