diff --git a/docs/in-class-exercises-6.html b/docs/in-class-exercises-6.html
index 5b4cba7..32724d4 100644
--- a/docs/in-class-exercises-6.html
+++ b/docs/in-class-exercises-6.html
@@ -424,12 +424,12 @@
As you saw in the lecture, measurement invariance testing allows us to
+empirically test for differences in the measurement model between the groups. If
+we can establish measurement invariance, we can draw the following (equivalent)
+conclusions:
Anytime we make between-group comparisons (e.g., ANOVA, t-tests, moderation by
+group, etc.) we assume invariant measurement. That is, we assume that the scores
+we’re comparing have the same meaning in each group. When doing multiple group
+SEM, however, we’re apprised of the incredibly powerful capability of actually
+testing this—very important, and often violated—assumption.
The process of testing measurement invariance can get quite complex, but the
+basic procedure boils down to using model comparison tests to evaluate the
+plausibility of increasingly strong between-group constraints. For most problems,
+these constraints amount to the following three levels:
+
Multiple-Group SEM for Moderation
+
+
Now, we’re going to revisit the TORA model from the
+Week 6 In-Class Exercises, and use a multiple-group
+model to test the moderating effect of sex.
+
+
+
+
Load the data contained in the toradata.csv file.
+
+
+Click to show code
+
+condom <- read.csv("toradata.csv", stringsAsFactors = TRUE)
+
+
+
Before we get to any moderation tests, however, we first need to establish
+measurement invariance. The first step in any multiple-group analysis that
+includes latent variables is measurment invariance testing.
+
+
+
+
+
Test for measurement invariance across sex groups in the three latent variables
+of the TORA model from 6.4.2.
+
+- Test configural, weak, and strong invariance.
+- Test for invariance in all three latent factors simultaneously.
+- Is full measurement invariance (i.e., up to and including strong invariance)
+supported?
+
+
+
+Click to show code
+
+tora_cfa <- '
+ attitudes =~ attit_1 + attit_2 + attit_3
+ norms =~ norm_1 + norm_2 + norm_3
+ control =~ control_1 + control_2 + control_3
+'
+
+## Estimate the models:
+config <- cfa(tora_cfa, data = condom, group = "sex")
+weak <- cfa(tora_cfa, data = condom, group = "sex", group.equal = "loadings")
+strong <- cfa(tora_cfa,
+ data = condom,
+ group = "sex",
+ group.equal = c("loadings", "intercepts")
+ )
+
+## Check that everything went well:
+summary(config)
+## lavaan 0.6.16 ended normally after 54 iterations
+##
+## Estimator ML
+## Optimization method NLMINB
+## Number of model parameters 60
+##
+## Number of observations per group:
+## woman 161
+## man 89
+##
+## Model Test User Model:
+##
+## Test statistic 66.565
+## Degrees of freedom 48
+## P-value (Chi-square) 0.039
+## Test statistic for each group:
+## woman 42.623
+## man 23.941
+##
+## Parameter Estimates:
+##
+## Standard errors Standard
+## Information Expected
+## Information saturated (h1) model Structured
+##
+##
+## Group 1 [woman]:
+##
+## Latent Variables:
+## Estimate Std.Err z-value P(>|z|)
+## attitudes =~
+## attit_1 1.000
+## attit_2 1.005 0.075 13.427 0.000
+## attit_3 -0.965 0.075 -12.878 0.000
+## norms =~
+## norm_1 1.000
+## norm_2 0.952 0.101 9.470 0.000
+## norm_3 0.879 0.101 8.742 0.000
+## control =~
+## control_1 1.000
+## control_2 0.794 0.144 5.526 0.000
+## control_3 0.989 0.152 6.523 0.000
+##
+## Covariances:
+## Estimate Std.Err z-value P(>|z|)
+## attitudes ~~
+## norms 0.450 0.087 5.200 0.000
+## control 0.468 0.089 5.249 0.000
+## norms ~~
+## control 0.387 0.079 4.912 0.000
+##
+## Intercepts:
+## Estimate Std.Err z-value P(>|z|)
+## .attit_1 2.839 0.090 31.702 0.000
+## .attit_2 2.907 0.084 34.728 0.000
+## .attit_3 3.174 0.084 37.969 0.000
+## .norm_1 2.832 0.080 35.342 0.000
+## .norm_2 2.832 0.079 35.775 0.000
+## .norm_3 2.795 0.081 34.694 0.000
+## .control_1 2.851 0.082 34.755 0.000
+## .control_2 2.857 0.081 35.104 0.000
+## .control_3 2.888 0.081 35.877 0.000
+## attitudes 0.000
+## norms 0.000
+## control 0.000
+##
+## Variances:
+## Estimate Std.Err z-value P(>|z|)
+## .attit_1 0.398 0.059 6.739 0.000
+## .attit_2 0.227 0.045 5.011 0.000
+## .attit_3 0.294 0.048 6.092 0.000
+## .norm_1 0.346 0.065 5.328 0.000
+## .norm_2 0.385 0.065 5.962 0.000
+## .norm_3 0.513 0.072 7.108 0.000
+## .control_1 0.587 0.090 6.531 0.000
+## .control_2 0.754 0.096 7.815 0.000
+## .control_3 0.557 0.086 6.453 0.000
+## attitudes 0.893 0.142 6.273 0.000
+## norms 0.688 0.121 5.706 0.000
+## control 0.497 0.119 4.184 0.000
+##
+##
+## Group 2 [man]:
+##
+## Latent Variables:
+## Estimate Std.Err z-value P(>|z|)
+## attitudes =~
+## attit_1 1.000
+## attit_2 1.167 0.149 7.843 0.000
+## attit_3 -1.060 0.142 -7.443 0.000
+## norms =~
+## norm_1 1.000
+## norm_2 1.070 0.215 4.965 0.000
+## norm_3 0.922 0.189 4.869 0.000
+## control =~
+## control_1 1.000
+## control_2 0.995 0.290 3.435 0.001
+## control_3 0.949 0.285 3.332 0.001
+##
+## Covariances:
+## Estimate Std.Err z-value P(>|z|)
+## attitudes ~~
+## norms 0.086 0.103 0.837 0.403
+## control 0.388 0.113 3.430 0.001
+## norms ~~
+## control 0.200 0.101 1.976 0.048
+##
+## Intercepts:
+## Estimate Std.Err z-value P(>|z|)
+## .attit_1 3.270 0.117 28.063 0.000
+## .attit_2 3.180 0.128 24.905 0.000
+## .attit_3 2.787 0.126 22.187 0.000
+## .norm_1 3.236 0.131 24.692 0.000
+## .norm_2 3.337 0.132 25.293 0.000
+## .norm_3 3.303 0.131 25.136 0.000
+## .control_1 3.157 0.111 28.415 0.000
+## .control_2 3.135 0.129 24.249 0.000
+## .control_3 3.213 0.130 24.805 0.000
+## attitudes 0.000
+## norms 0.000
+## control 0.000
+##
+## Variances:
+## Estimate Std.Err z-value P(>|z|)
+## .attit_1 0.440 0.095 4.631 0.000
+## .attit_2 0.405 0.109 3.699 0.000
+## .attit_3 0.541 0.112 4.822 0.000
+## .norm_1 0.740 0.175 4.230 0.000
+## .norm_2 0.647 0.182 3.555 0.000
+## .norm_3 0.868 0.175 4.972 0.000
+## .control_1 0.673 0.146 4.602 0.000
+## .control_2 1.066 0.197 5.417 0.000
+## .control_3 1.110 0.199 5.582 0.000
+## attitudes 0.768 0.182 4.220 0.000
+## norms 0.788 0.242 3.259 0.001
+## control 0.426 0.168 2.537 0.011
+
+## lavaan 0.6.16 ended normally after 37 iterations
+##
+## Estimator ML
+## Optimization method NLMINB
+## Number of model parameters 60
+## Number of equality constraints 6
+##
+## Number of observations per group:
+## woman 161
+## man 89
+##
+## Model Test User Model:
+##
+## Test statistic 68.557
+## Degrees of freedom 54
+## P-value (Chi-square) 0.088
+## Test statistic for each group:
+## woman 43.148
+## man 25.409
+##
+## Parameter Estimates:
+##
+## Standard errors Standard
+## Information Expected
+## Information saturated (h1) model Structured
+##
+##
+## Group 1 [woman]:
+##
+## Latent Variables:
+## Estimate Std.Err z-value P(>|z|)
+## attitudes =~
+## attit_1 1.000
+## attit_2 (.p2.) 1.048 0.068 15.413 0.000
+## attit_3 (.p3.) -0.995 0.067 -14.762 0.000
+## norms =~
+## norm_1 1.000
+## norm_2 (.p5.) 0.977 0.091 10.708 0.000
+## norm_3 (.p6.) 0.889 0.089 9.996 0.000
+## control =~
+## cntrl_1 1.000
+## cntrl_2 (.p8.) 0.843 0.130 6.506 0.000
+## cntrl_3 (.p9.) 0.983 0.135 7.306 0.000
+##
+## Covariances:
+## Estimate Std.Err z-value P(>|z|)
+## attitudes ~~
+## norms 0.431 0.082 5.256 0.000
+## control 0.450 0.083 5.395 0.000
+## norms ~~
+## control 0.378 0.075 5.031 0.000
+##
+## Intercepts:
+## Estimate Std.Err z-value P(>|z|)
+## .attit_1 2.839 0.088 32.194 0.000
+## .attit_2 2.907 0.084 34.445 0.000
+## .attit_3 3.174 0.084 37.913 0.000
+## .norm_1 2.832 0.080 35.504 0.000
+## .norm_2 2.832 0.080 35.571 0.000
+## .norm_3 2.795 0.080 34.727 0.000
+## .control_1 2.851 0.082 34.882 0.000
+## .control_2 2.857 0.082 34.746 0.000
+## .control_3 2.888 0.080 36.037 0.000
+## attitudes 0.000
+## norms 0.000
+## control 0.000
+##
+## Variances:
+## Estimate Std.Err z-value P(>|z|)
+## .attit_1 0.408 0.058 6.987 0.000
+## .attit_2 0.221 0.045 4.896 0.000
+## .attit_3 0.293 0.048 6.128 0.000
+## .norm_1 0.353 0.063 5.580 0.000
+## .norm_2 0.380 0.064 5.952 0.000
+## .norm_3 0.512 0.071 7.178 0.000
+## .control_1 0.590 0.088 6.695 0.000
+## .control_2 0.744 0.096 7.731 0.000
+## .control_3 0.565 0.085 6.663 0.000
+## attitudes 0.844 0.129 6.540 0.000
+## norms 0.672 0.113 5.921 0.000
+## control 0.485 0.110 4.417 0.000
+##
+##
+## Group 2 [man]:
+##
+## Latent Variables:
+## Estimate Std.Err z-value P(>|z|)
+## attitudes =~
+## attit_1 1.000
+## attit_2 (.p2.) 1.048 0.068 15.413 0.000
+## attit_3 (.p3.) -0.995 0.067 -14.762 0.000
+## norms =~
+## norm_1 1.000
+## norm_2 (.p5.) 0.977 0.091 10.708 0.000
+## norm_3 (.p6.) 0.889 0.089 9.996 0.000
+## control =~
+## cntrl_1 1.000
+## cntrl_2 (.p8.) 0.843 0.130 6.506 0.000
+## cntrl_3 (.p9.) 0.983 0.135 7.306 0.000
+##
+## Covariances:
+## Estimate Std.Err z-value P(>|z|)
+## attitudes ~~
+## norms 0.092 0.114 0.807 0.420
+## control 0.425 0.109 3.912 0.000
+## norms ~~
+## control 0.217 0.103 2.100 0.036
+##
+## Intercepts:
+## Estimate Std.Err z-value P(>|z|)
+## .attit_1 3.270 0.120 27.254 0.000
+## .attit_2 3.180 0.125 25.501 0.000
+## .attit_3 2.787 0.125 22.275 0.000
+## .norm_1 3.236 0.132 24.423 0.000
+## .norm_2 3.337 0.130 25.610 0.000
+## .norm_3 3.303 0.132 25.086 0.000
+## .control_1 3.157 0.112 28.208 0.000
+## .control_2 3.135 0.127 24.750 0.000
+## .control_3 3.213 0.131 24.540 0.000
+## attitudes 0.000
+## norms 0.000
+## control 0.000
+##
+## Variances:
+## Estimate Std.Err z-value P(>|z|)
+## .attit_1 0.419 0.093 4.528 0.000
+## .attit_2 0.438 0.099 4.436 0.000
+## .attit_3 0.540 0.107 5.057 0.000
+## .norm_1 0.704 0.158 4.456 0.000
+## .norm_2 0.692 0.153 4.520 0.000
+## .norm_3 0.864 0.164 5.271 0.000
+## .control_1 0.668 0.139 4.797 0.000
+## .control_2 1.110 0.186 5.960 0.000
+## .control_3 1.094 0.193 5.663 0.000
+## attitudes 0.862 0.166 5.200 0.000
+## norms 0.859 0.193 4.443 0.000
+## control 0.447 0.137 3.260 0.001
+
+## lavaan 0.6.16 ended normally after 60 iterations
+##
+## Estimator ML
+## Optimization method NLMINB
+## Number of model parameters 63
+## Number of equality constraints 15
+##
+## Number of observations per group:
+## woman 161
+## man 89
+##
+## Model Test User Model:
+##
+## Test statistic 72.050
+## Degrees of freedom 60
+## P-value (Chi-square) 0.137
+## Test statistic for each group:
+## woman 43.961
+## man 28.089
+##
+## Parameter Estimates:
+##
+## Standard errors Standard
+## Information Expected
+## Information saturated (h1) model Structured
+##
+##
+## Group 1 [woman]:
+##
+## Latent Variables:
+## Estimate Std.Err z-value P(>|z|)
+## attitudes =~
+## attit_1 1.000
+## attit_2 (.p2.) 1.028 0.065 15.693 0.000
+## attit_3 (.p3.) -0.990 0.065 -15.114 0.000
+## norms =~
+## norm_1 1.000
+## norm_2 (.p5.) 0.998 0.089 11.182 0.000
+## norm_3 (.p6.) 0.918 0.088 10.467 0.000
+## control =~
+## cntrl_1 1.000
+## cntrl_2 (.p8.) 0.848 0.126 6.736 0.000
+## cntrl_3 (.p9.) 0.987 0.131 7.558 0.000
+##
+## Covariances:
+## Estimate Std.Err z-value P(>|z|)
+## attitudes ~~
+## norms 0.428 0.081 5.259 0.000
+## control 0.454 0.084 5.438 0.000
+## norms ~~
+## control 0.372 0.073 5.060 0.000
+##
+## Intercepts:
+## Estimate Std.Err z-value P(>|z|)
+## .attit_1 (.25.) 2.864 0.085 33.535 0.000
+## .attit_2 (.26.) 2.887 0.083 34.826 0.000
+## .attit_3 (.27.) 3.166 0.082 38.500 0.000
+## .norm_1 (.28.) 2.816 0.078 36.330 0.000
+## .norm_2 (.29.) 2.838 0.078 36.453 0.000
+## .norm_3 (.30.) 2.812 0.078 36.253 0.000
+## .cntrl_1 (.31.) 2.847 0.078 36.562 0.000
+## .cntrl_2 (.32.) 2.859 0.076 37.381 0.000
+## .cntrl_3 (.33.) 2.891 0.077 37.531 0.000
+## attitds 0.000
+## norms 0.000
+## control 0.000
+##
+## Variances:
+## Estimate Std.Err z-value P(>|z|)
+## .attit_1 0.405 0.058 6.931 0.000
+## .attit_2 0.226 0.045 5.051 0.000
+## .attit_3 0.291 0.048 6.084 0.000
+## .norm_1 0.362 0.062 5.795 0.000
+## .norm_2 0.377 0.064 5.931 0.000
+## .norm_3 0.506 0.071 7.109 0.000
+## .control_1 0.592 0.088 6.743 0.000
+## .control_2 0.743 0.096 7.739 0.000
+## .control_3 0.566 0.085 6.684 0.000
+## attitudes 0.861 0.130 6.607 0.000
+## norms 0.650 0.109 5.950 0.000
+## control 0.482 0.108 4.477 0.000
+##
+##
+## Group 2 [man]:
+##
+## Latent Variables:
+## Estimate Std.Err z-value P(>|z|)
+## attitudes =~
+## attit_1 1.000
+## attit_2 (.p2.) 1.028 0.065 15.693 0.000
+## attit_3 (.p3.) -0.990 0.065 -15.114 0.000
+## norms =~
+## norm_1 1.000
+## norm_2 (.p5.) 0.998 0.089 11.182 0.000
+## norm_3 (.p6.) 0.918 0.088 10.467 0.000
+## control =~
+## cntrl_1 1.000
+## cntrl_2 (.p8.) 0.848 0.126 6.736 0.000
+## cntrl_3 (.p9.) 0.987 0.131 7.558 0.000
+##
+## Covariances:
+## Estimate Std.Err z-value P(>|z|)
+## attitudes ~~
+## norms 0.093 0.113 0.825 0.409
+## control 0.428 0.109 3.926 0.000
+## norms ~~
+## control 0.213 0.101 2.102 0.036
+##
+## Intercepts:
+## Estimate Std.Err z-value P(>|z|)
+## .attit_1 (.25.) 2.864 0.085 33.535 0.000
+## .attit_2 (.26.) 2.887 0.083 34.826 0.000
+## .attit_3 (.27.) 3.166 0.082 38.500 0.000
+## .norm_1 (.28.) 2.816 0.078 36.330 0.000
+## .norm_2 (.29.) 2.838 0.078 36.453 0.000
+## .norm_3 (.30.) 2.812 0.078 36.253 0.000
+## .cntrl_1 (.31.) 2.847 0.078 36.562 0.000
+## .cntrl_2 (.32.) 2.859 0.076 37.381 0.000
+## .cntrl_3 (.33.) 2.891 0.077 37.531 0.000
+## attitds 0.356 0.133 2.680 0.007
+## norms 0.480 0.133 3.602 0.000
+## control 0.318 0.116 2.733 0.006
+##
+## Variances:
+## Estimate Std.Err z-value P(>|z|)
+## .attit_1 0.420 0.094 4.484 0.000
+## .attit_2 0.456 0.100 4.557 0.000
+## .attit_3 0.537 0.107 5.023 0.000
+## .norm_1 0.724 0.157 4.599 0.000
+## .norm_2 0.686 0.153 4.489 0.000
+## .norm_3 0.859 0.165 5.220 0.000
+## .control_1 0.669 0.139 4.821 0.000
+## .control_2 1.109 0.186 5.958 0.000
+## .control_3 1.094 0.193 5.664 0.000
+## attitudes 0.872 0.167 5.214 0.000
+## norms 0.830 0.186 4.455 0.000
+## control 0.445 0.136 3.280 0.001
+## Test measurement invariance:
+compareFit(config, weak, strong) %>% summary()
+## ################### Nested Model Comparison #########################
+##
+## Chi-Squared Difference Test
+##
+## Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
+## config 48 6021.5 6232.8 66.565
+## weak 54 6011.5 6201.6 68.557 1.9924 0 6 0.9204
+## strong 60 6003.0 6172.0 72.050 3.4934 0 6 0.7448
+##
+## ####################### Model Fit Indices ###########################
+## chisq df pvalue rmsea cfi tli srmr aic bic
+## config 66.565† 48 .039 .056 .979 .968 .048† 6021.476 6232.764
+## weak 68.557 54 .088 .046 .983 .978 .050 6011.469 6201.628
+## strong 72.050 60 .137 .040† .986† .983† .051 6002.962† 6171.992†
+##
+## ################## Differences in Fit Indices #######################
+## df rmsea cfi tli srmr aic bic
+## weak - config 6 -0.009 0.005 0.010 0.003 -10.008 -31.136
+## strong - weak 6 -0.006 0.003 0.006 0.001 -8.507 -29.635
+## Make sure the strongly invariant model still fits well in an absolute sense:
+fitMeasures(strong)
+## npar fmin chisq
+## 48.000 0.144 72.050
+## df pvalue baseline.chisq
+## 60.000 0.137 948.362
+## baseline.df baseline.pvalue cfi
+## 72.000 0.000 0.986
+## tli nnfi rfi
+## 0.983 0.983 0.909
+## nfi pnfi ifi
+## 0.924 0.770 0.986
+## rni logl unrestricted.logl
+## 0.986 -2953.481 -2917.456
+## aic bic ntotal
+## 6002.962 6171.992 250.000
+## bic2 rmsea rmsea.ci.lower
+## 6019.828 0.040 0.000
+## rmsea.ci.upper rmsea.ci.level rmsea.pvalue
+## 0.071 0.900 0.669
+## rmsea.close.h0 rmsea.notclose.pvalue rmsea.notclose.h0
+## 0.050 0.013 0.080
+## rmr rmr_nomean srmr
+## 0.062 0.067 0.051
+## srmr_bentler srmr_bentler_nomean crmr
+## 0.051 0.055 0.054
+## crmr_nomean srmr_mplus srmr_mplus_nomean
+## 0.059 0.052 0.054
+## cn_05 cn_01 gfi
+## 275.398 307.658 0.997
+## agfi pgfi mfi
+## 0.994 0.554 0.976
+## ecvi
+## 0.672
+
+
+Click for explanation
+
+Yes, we have been able to establish full measurement invariance.
+
+- Configural invariance holds.
+
+- The unrestricted, multiple-group model fits the data well
+(\(\chi^2[48] = 66.56\),
+\(p = 0.039\),
+\(\textit{RMSEA} = 0.056\),
+\(\textit{CFI} = 0.979\),
+\(\textit{SRMR} = 0.048\)).
+
+- Weak invariance holds.
+
+- The model comparison test shows a non-significant loss of fit between the
+configural and weak models (\(\Delta \chi^2[6] = 1.99\), \(p = 0.92\)).
+
+- Strong invariance holds.
+
+- The model comparison test shows a non-significant loss of fit between the
+weak and strong models (\(\Delta \chi^2[6] = 3.49\), \(p = 0.745\)).
+- The strongly invariant model still fits the data well
+(\(\chi^2[48] = 66.56\),
+\(p = 0.039\),
+\(\textit{RMSEA} = 0.056\),
+\(\textit{CFI} = 0.979\),
+\(\textit{SRMR} = 0.048\)).
+
+
+
+
+
+
Once we’ve established measurement invariance, we can move on to testing
+hypotheses about between-group differences secure in the knowledge that our
+latent factors represent the same hypothetical constructs in all groups.
+
+
+
+
+
Estimate the full TORA model from 6.4.4 as a multiple-group model.
+
+- Use
sex
as the grouping variables.
+- Keep the strong invariance constraints in place.
+
+
+
+Click to show code
+
+## Add the structural paths to the model:
+tora_sem <- paste(tora_cfa,
+ 'intent ~ attitudes + norms
+ behavior ~ intent + control',
+ sep = '\n')
+
+## Estimate the model:
+toraOut <- sem(tora_sem,
+ data = condom,
+ group = "sex",
+ group.equal = c("loadings", "intercepts")
+ )
+
+## Check the results:
+summary(toraOut, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE)
+## lavaan 0.6.16 ended normally after 62 iterations
+##
+## Estimator ML
+## Optimization method NLMINB
+## Number of model parameters 79
+## Number of equality constraints 17
+##
+## Number of observations per group:
+## woman 161
+## man 89
+##
+## Model Test User Model:
+##
+## Test statistic 141.903
+## Degrees of freedom 92
+## P-value (Chi-square) 0.001
+## Test statistic for each group:
+## woman 83.870
+## man 58.033
+##
+## Model Test Baseline Model:
+##
+## Test statistic 1378.913
+## Degrees of freedom 110
+## P-value 0.000
+##
+## User Model versus Baseline Model:
+##
+## Comparative Fit Index (CFI) 0.961
+## Tucker-Lewis Index (TLI) 0.953
+##
+## Loglikelihood and Information Criteria:
+##
+## Loglikelihood user model (H0) -3470.878
+## Loglikelihood unrestricted model (H1) -3399.927
+##
+## Akaike (AIC) 7065.756
+## Bayesian (BIC) 7284.087
+## Sample-size adjusted Bayesian (SABIC) 7087.541
+##
+## Root Mean Square Error of Approximation:
+##
+## RMSEA 0.066
+## 90 Percent confidence interval - lower 0.043
+## 90 Percent confidence interval - upper 0.087
+## P-value H_0: RMSEA <= 0.050 0.114
+## P-value H_0: RMSEA >= 0.080 0.137
+##
+## Standardized Root Mean Square Residual:
+##
+## SRMR 0.058
+##
+## Parameter Estimates:
+##
+## Standard errors Standard
+## Information Expected
+## Information saturated (h1) model Structured
+##
+##
+## Group 1 [woman]:
+##
+## Latent Variables:
+## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
+## attitudes =~
+## attit_1 1.000 0.919 0.816
+## attit_2 (.p2.) 1.023 0.066 15.495 0.000 0.940 0.877
+## attit_3 (.p3.) -1.016 0.066 -15.434 0.000 -0.935 -0.884
+## norms =~
+## norm_1 1.000 0.808 0.798
+## norm_2 (.p5.) 0.956 0.083 11.551 0.000 0.772 0.766
+## norm_3 (.p6.) 0.942 0.084 11.256 0.000 0.761 0.743
+## control =~
+## cntrl_1 1.000 0.671 0.646
+## cntrl_2 (.p8.) 0.846 0.125 6.768 0.000 0.567 0.545
+## cntrl_3 (.p9.) 1.008 0.129 7.814 0.000 0.676 0.665
+##
+## Regressions:
+## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
+## intent ~
+## attitudes 0.436 0.082 5.335 0.000 0.401 0.403
+## norms 0.598 0.100 6.008 0.000 0.483 0.486
+## behavior ~
+## intent 0.347 0.064 5.436 0.000 0.347 0.351
+## control 0.727 0.138 5.274 0.000 0.488 0.496
+##
+## Covariances:
+## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
+## attitudes ~~
+## norms 0.425 0.081 5.262 0.000 0.572 0.572
+## control 0.464 0.082 5.634 0.000 0.752 0.752
+## norms ~~
+## control 0.386 0.073 5.320 0.000 0.712 0.712
+##
+## Intercepts:
+## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
+## .attit_1 (.31.) 2.863 0.084 34.071 0.000 2.863 2.542
+## .attit_2 (.32.) 2.884 0.082 35.225 0.000 2.884 2.690
+## .attit_3 (.33.) 3.168 0.081 39.065 0.000 3.168 2.995
+## .norm_1 (.34.) 2.802 0.076 36.874 0.000 2.802 2.767
+## .norm_2 (.35.) 2.830 0.075 37.555 0.000 2.830 2.807
+## .norm_3 (.36.) 2.796 0.076 36.725 0.000 2.796 2.730
+## .cntrl_1 (.37.) 2.855 0.077 37.078 0.000 2.855 2.749
+## .cntrl_2 (.38.) 2.866 0.076 37.909 0.000 2.866 2.753
+## .cntrl_3 (.39.) 2.897 0.076 37.969 0.000 2.897 2.849
+## .intent (.40.) 2.712 0.078 34.861 0.000 2.712 2.726
+## .behavir (.41.) 1.630 0.175 9.289 0.000 1.630 1.658
+## attitds 0.000 0.000 0.000
+## norms 0.000 0.000 0.000
+## control 0.000 0.000 0.000
+##
+## Variances:
+## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
+## .attit_1 0.423 0.059 7.172 0.000 0.423 0.333
+## .attit_2 0.266 0.045 5.879 0.000 0.266 0.231
+## .attit_3 0.246 0.043 5.671 0.000 0.246 0.219
+## .norm_1 0.372 0.060 6.230 0.000 0.372 0.363
+## .norm_2 0.420 0.062 6.760 0.000 0.420 0.413
+## .norm_3 0.469 0.066 7.063 0.000 0.469 0.448
+## .control_1 0.629 0.085 7.426 0.000 0.629 0.583
+## .control_2 0.762 0.094 8.088 0.000 0.762 0.703
+## .control_3 0.577 0.080 7.238 0.000 0.577 0.558
+## .intent 0.374 0.049 7.615 0.000 0.374 0.378
+## .behavior 0.391 0.052 7.486 0.000 0.391 0.404
+## attitudes 0.845 0.129 6.561 0.000 1.000 1.000
+## norms 0.653 0.108 6.048 0.000 1.000 1.000
+## control 0.450 0.101 4.457 0.000 1.000 1.000
+##
+## R-Square:
+## Estimate
+## attit_1 0.667
+## attit_2 0.769
+## attit_3 0.781
+## norm_1 0.637
+## norm_2 0.587
+## norm_3 0.552
+## control_1 0.417
+## control_2 0.297
+## control_3 0.442
+## intent 0.622
+## behavior 0.596
+##
+##
+## Group 2 [man]:
+##
+## Latent Variables:
+## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
+## attitudes =~
+## attit_1 1.000 0.922 0.815
+## attit_2 (.p2.) 1.023 0.066 15.495 0.000 0.943 0.817
+## attit_3 (.p3.) -1.016 0.066 -15.434 0.000 -0.937 -0.782
+## norms =~
+## norm_1 1.000 0.875 0.723
+## norm_2 (.p5.) 0.956 0.083 11.551 0.000 0.837 0.679
+## norm_3 (.p6.) 0.942 0.084 11.256 0.000 0.825 0.667
+## control =~
+## cntrl_1 1.000 0.663 0.631
+## cntrl_2 (.p8.) 0.846 0.125 6.768 0.000 0.561 0.467
+## cntrl_3 (.p9.) 1.008 0.129 7.814 0.000 0.668 0.540
+##
+## Regressions:
+## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
+## intent ~
+## attitudes 0.535 0.097 5.497 0.000 0.494 0.446
+## norms 0.858 0.111 7.702 0.000 0.751 0.679
+## behavior ~
+## intent 0.613 0.060 10.188 0.000 0.613 0.706
+## control 0.007 0.159 0.045 0.964 0.005 0.005
+##
+## Covariances:
+## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
+## attitudes ~~
+## norms 0.060 0.108 0.552 0.581 0.074 0.074
+## control 0.426 0.107 3.978 0.000 0.697 0.697
+## norms ~~
+## control 0.220 0.096 2.291 0.022 0.380 0.380
+##
+## Intercepts:
+## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
+## .attit_1 (.31.) 2.863 0.084 34.071 0.000 2.863 2.531
+## .attit_2 (.32.) 2.884 0.082 35.225 0.000 2.884 2.499
+## .attit_3 (.33.) 3.168 0.081 39.065 0.000 3.168 2.645
+## .norm_1 (.34.) 2.802 0.076 36.874 0.000 2.802 2.314
+## .norm_2 (.35.) 2.830 0.075 37.555 0.000 2.830 2.296
+## .norm_3 (.36.) 2.796 0.076 36.725 0.000 2.796 2.261
+## .cntrl_1 (.37.) 2.855 0.077 37.078 0.000 2.855 2.719
+## .cntrl_2 (.38.) 2.866 0.076 37.909 0.000 2.866 2.385
+## .cntrl_3 (.39.) 2.897 0.076 37.969 0.000 2.897 2.344
+## .intent (.40.) 2.712 0.078 34.861 0.000 2.712 2.450
+## .behavir (.41.) 1.630 0.175 9.289 0.000 1.630 1.696
+## attitds 0.369 0.130 2.834 0.005 0.400 0.400
+## norms 0.534 0.126 4.246 0.000 0.610 0.610
+## control 0.309 0.115 2.691 0.007 0.466 0.466
+##
+## Variances:
+## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
+## .attit_1 0.430 0.091 4.710 0.000 0.430 0.336
+## .attit_2 0.443 0.094 4.691 0.000 0.443 0.333
+## .attit_3 0.556 0.108 5.130 0.000 0.556 0.388
+## .norm_1 0.699 0.137 5.088 0.000 0.699 0.477
+## .norm_2 0.819 0.150 5.457 0.000 0.819 0.539
+## .norm_3 0.849 0.153 5.538 0.000 0.849 0.555
+## .control_1 0.663 0.135 4.908 0.000 0.663 0.602
+## .control_2 1.130 0.187 6.025 0.000 1.130 0.782
+## .control_3 1.082 0.191 5.673 0.000 1.082 0.708
+## .intent 0.363 0.089 4.091 0.000 0.363 0.296
+## .behavior 0.460 0.069 6.671 0.000 0.460 0.498
+## attitudes 0.850 0.164 5.190 0.000 1.000 1.000
+## norms 0.766 0.171 4.495 0.000 1.000 1.000
+## control 0.440 0.133 3.301 0.001 1.000 1.000
+##
+## R-Square:
+## Estimate
+## attit_1 0.664
+## attit_2 0.667
+## attit_3 0.612
+## norm_1 0.523
+## norm_2 0.461
+## norm_3 0.445
+## control_1 0.398
+## control_2 0.218
+## control_3 0.292
+## intent 0.704
+## behavior 0.502
+
+
+
+
+
+
Conduct an omnibus test to check if sex
moderates any of the latent regression
+paths in the model from 7.4.3.3.
+
+
+Click for explanation
+
+## Estimate a restricted model wherein the latent regressions are all equated
+## across groups.
+toraOut0 <- sem(tora_sem,
+ data = condom,
+ group = "sex",
+ group.equal = c("loadings", "intercepts", "regressions")
+ )
+
+## Test the constraints:
+anova(toraOut, toraOut0)
+
+
+
+
+
+Click for explanation
+
+We can equate the latent regressions by specifying the group.equal = "regressions"
+argument in sem()
. Then, we simply test this constrained model against the
+unconstrained model from 7.4.3.3 to get our test of moderation.
+Equating all regression paths across groups produces a significant loss of fit
+(\(\Delta \chi^2[4] = 52.86\), \(p < 0.001\)). Therefore, sex must
+moderate at least some of these paths.
+
+
+
+
+
+
+
Conduct a two-parameter test to check if sex
moderates the effects of intent
+and control
on behavior
.
+
+- Use the
lavTestWald()
function to conduct your test.
+- Keep only the weak invariance constraints when estimating the model.
+
+
+
+Click to show code
+
+## Add the structural paths to the model and assign labels:
+tora_sem <- paste(tora_cfa,
+ 'intent ~ attitudes + norms
+ behavior ~ c(b1f, b1m) * intent + c(b2f, b2m) * control',
+ sep = '\n')
+
+## Estimate the model:
+toraOut <- sem(tora_sem, data = condom, group = "sex", group.equal = "loadings")
+
+## Check the results:
+summary(toraOut, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE)
+## lavaan 0.6.16 ended normally after 60 iterations
+##
+## Estimator ML
+## Optimization method NLMINB
+## Number of model parameters 76
+## Number of equality constraints 6
+##
+## Number of observations per group:
+## woman 161
+## man 89
+##
+## Model Test User Model:
+##
+## Test statistic 119.722
+## Degrees of freedom 84
+## P-value (Chi-square) 0.006
+## Test statistic for each group:
+## woman 76.908
+## man 42.814
+##
+## Model Test Baseline Model:
+##
+## Test statistic 1378.913
+## Degrees of freedom 110
+## P-value 0.000
+##
+## User Model versus Baseline Model:
+##
+## Comparative Fit Index (CFI) 0.972
+## Tucker-Lewis Index (TLI) 0.963
+##
+## Loglikelihood and Information Criteria:
+##
+## Loglikelihood user model (H0) -3459.788
+## Loglikelihood unrestricted model (H1) -3399.927
+##
+## Akaike (AIC) 7059.576
+## Bayesian (BIC) 7306.078
+## Sample-size adjusted Bayesian (SABIC) 7084.172
+##
+## Root Mean Square Error of Approximation:
+##
+## RMSEA 0.058
+## 90 Percent confidence interval - lower 0.032
+## 90 Percent confidence interval - upper 0.081
+## P-value H_0: RMSEA <= 0.050 0.272
+## P-value H_0: RMSEA >= 0.080 0.058
+##
+## Standardized Root Mean Square Residual:
+##
+## SRMR 0.047
+##
+## Parameter Estimates:
+##
+## Standard errors Standard
+## Information Expected
+## Information saturated (h1) model Structured
+##
+##
+## Group 1 [woman]:
+##
+## Latent Variables:
+## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
+## attitudes =~
+## attit_1 1.000 0.909 0.813
+## attit_2 (.p2.) 1.047 0.069 15.249 0.000 0.951 0.882
+## attit_3 (.p3.) -1.025 0.068 -15.075 0.000 -0.931 -0.882
+## norms =~
+## norm_1 1.000 0.824 0.809
+## norm_2 (.p5.) 0.936 0.083 11.256 0.000 0.771 0.768
+## norm_3 (.p6.) 0.908 0.084 10.810 0.000 0.748 0.734
+## control =~
+## cntrl_1 1.000 0.690 0.666
+## cntrl_2 (.p8.) 0.832 0.126 6.593 0.000 0.574 0.551
+## cntrl_3 (.p9.) 1.006 0.131 7.673 0.000 0.694 0.682
+##
+## Regressions:
+## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
+## intent ~
+## attituds 0.441 0.083 5.294 0.000 0.400 0.403
+## norms 0.580 0.098 5.918 0.000 0.478 0.480
+## behavior ~
+## intent (b1f) 0.531 0.074 7.127 0.000 0.531 0.524
+## control (b2f) 0.490 0.130 3.767 0.000 0.338 0.336
+##
+## Covariances:
+## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
+## attitudes ~~
+## norms 0.428 0.081 5.266 0.000 0.572 0.572
+## control 0.454 0.082 5.517 0.000 0.723 0.723
+## norms ~~
+## control 0.384 0.074 5.169 0.000 0.676 0.676
+##
+## Intercepts:
+## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
+## .attit_1 2.839 0.088 32.237 0.000 2.839 2.541
+## .attit_2 2.907 0.085 34.191 0.000 2.907 2.695
+## .attit_3 3.174 0.083 38.126 0.000 3.174 3.005
+## .norm_1 2.832 0.080 35.280 0.000 2.832 2.780
+## .norm_2 2.832 0.079 35.765 0.000 2.832 2.819
+## .norm_3 2.795 0.080 34.787 0.000 2.795 2.742
+## .control_1 2.851 0.082 34.875 0.000 2.851 2.749
+## .control_2 2.857 0.082 34.813 0.000 2.857 2.744
+## .control_3 2.888 0.080 35.985 0.000 2.888 2.836
+## .intent 2.677 0.078 34.159 0.000 2.677 2.692
+## .behavior 1.107 0.207 5.338 0.000 1.107 1.098
+## attitudes 0.000 0.000 0.000
+## norms 0.000 0.000 0.000
+## control 0.000 0.000 0.000
+##
+## Variances:
+## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
+## .attit_1 0.423 0.059 7.208 0.000 0.423 0.339
+## .attit_2 0.259 0.045 5.713 0.000 0.259 0.223
+## .attit_3 0.248 0.044 5.701 0.000 0.248 0.222
+## .norm_1 0.359 0.060 5.989 0.000 0.359 0.346
+## .norm_2 0.415 0.062 6.711 0.000 0.415 0.411
+## .norm_3 0.479 0.067 7.152 0.000 0.479 0.461
+## .control_1 0.599 0.085 7.027 0.000 0.599 0.557
+## .control_2 0.755 0.095 7.940 0.000 0.755 0.696
+## .control_3 0.555 0.081 6.822 0.000 0.555 0.535
+## .intent 0.382 0.050 7.657 0.000 0.382 0.386
+## .behavior 0.402 0.049 8.152 0.000 0.402 0.396
+## attitudes 0.825 0.127 6.495 0.000 1.000 1.000
+## norms 0.679 0.112 6.066 0.000 1.000 1.000
+## control 0.477 0.106 4.483 0.000 1.000 1.000
+##
+## R-Square:
+## Estimate
+## attit_1 0.661
+## attit_2 0.777
+## attit_3 0.778
+## norm_1 0.654
+## norm_2 0.589
+## norm_3 0.539
+## control_1 0.443
+## control_2 0.304
+## control_3 0.465
+## intent 0.614
+## behavior 0.604
+##
+##
+## Group 2 [man]:
+##
+## Latent Variables:
+## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
+## attitudes =~
+## attit_1 1.000 0.921 0.812
+## attit_2 (.p2.) 1.047 0.069 15.249 0.000 0.964 0.831
+## attit_3 (.p3.) -1.025 0.068 -15.075 0.000 -0.944 -0.787
+## norms =~
+## norm_1 1.000 0.928 0.753
+## norm_2 (.p5.) 0.936 0.083 11.256 0.000 0.869 0.698
+## norm_3 (.p6.) 0.908 0.084 10.810 0.000 0.843 0.676
+## control =~
+## cntrl_1 1.000 0.669 0.634
+## cntrl_2 (.p8.) 0.832 0.126 6.593 0.000 0.556 0.464
+## cntrl_3 (.p9.) 1.006 0.131 7.673 0.000 0.673 0.547
+##
+## Regressions:
+## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
+## intent ~
+## attituds 0.501 0.098 5.134 0.000 0.462 0.432
+## norms 0.749 0.112 6.696 0.000 0.695 0.649
+## behavior ~
+## intent (b1m) 0.344 0.086 4.005 0.000 0.344 0.453
+## control (b2m) 0.307 0.168 1.830 0.067 0.205 0.253
+##
+## Covariances:
+## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
+## attitudes ~~
+## norms 0.084 0.113 0.742 0.458 0.098 0.098
+## control 0.424 0.107 3.960 0.000 0.688 0.688
+## norms ~~
+## control 0.240 0.102 2.361 0.018 0.387 0.387
+##
+## Intercepts:
+## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
+## .attit_1 3.270 0.120 27.179 0.000 3.270 2.881
+## .attit_2 3.180 0.123 25.848 0.000 3.180 2.740
+## .attit_3 2.787 0.127 21.900 0.000 2.787 2.321
+## .norm_1 3.236 0.131 24.790 0.000 3.236 2.628
+## .norm_2 3.337 0.132 25.309 0.000 3.337 2.683
+## .norm_3 3.303 0.132 24.992 0.000 3.303 2.649
+## .control_1 3.157 0.112 28.223 0.000 3.157 2.992
+## .control_2 3.135 0.127 24.653 0.000 3.135 2.613
+## .control_3 3.213 0.130 24.627 0.000 3.213 2.610
+## .intent 3.427 0.113 30.233 0.000 3.427 3.205
+## .behavior 2.607 0.303 8.614 0.000 2.607 3.211
+## attitudes 0.000 0.000 0.000
+## norms 0.000 0.000 0.000
+## control 0.000 0.000 0.000
+##
+## Variances:
+## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
+## .attit_1 0.439 0.092 4.795 0.000 0.439 0.341
+## .attit_2 0.417 0.092 4.518 0.000 0.417 0.310
+## .attit_3 0.549 0.107 5.120 0.000 0.549 0.381
+## .norm_1 0.656 0.137 4.801 0.000 0.656 0.432
+## .norm_2 0.792 0.148 5.342 0.000 0.792 0.512
+## .norm_3 0.845 0.153 5.504 0.000 0.845 0.543
+## .control_1 0.666 0.134 4.955 0.000 0.666 0.598
+## .control_2 1.130 0.187 6.053 0.000 1.130 0.785
+## .control_3 1.063 0.187 5.669 0.000 1.063 0.701
+## .intent 0.385 0.086 4.471 0.000 0.385 0.337
+## .behavior 0.399 0.063 6.328 0.000 0.399 0.605
+## attitudes 0.849 0.164 5.174 0.000 1.000 1.000
+## norms 0.861 0.190 4.528 0.000 1.000 1.000
+## control 0.447 0.134 3.334 0.001 1.000 1.000
+##
+## R-Square:
+## Estimate
+## attit_1 0.659
+## attit_2 0.690
+## attit_3 0.619
+## norm_1 0.568
+## norm_2 0.488
+## norm_3 0.457
+## control_1 0.402
+## control_2 0.215
+## control_3 0.299
+## intent 0.663
+## behavior 0.395
+## Test the constraints:
+lavTestWald(toraOut, "b1f == b1m; b2f == b2m")
+## $stat
+## [1] 9.85773
+##
+## $df
+## [1] 2
+##
+## $p.value
+## [1] 0.00723471
+##
+## $se
+## [1] "standard"
+
+
+Click for explanation
+
+The Wald test suggest significant moderation (\(\Delta \chi^2[2] = 9.86\),
+\(p = 0.007\)). Equating these two regression slopes across groups produces a
+significant loss of fit. Therefore, sex must moderate one or both of these paths.
+
+
+
+
End of In-Class Exercises
diff --git a/docs/in-class-exercises.html b/docs/in-class-exercises.html
index 5427c23..cd42a81 100644
--- a/docs/in-class-exercises.html
+++ b/docs/in-class-exercises.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/index.html b/docs/index.html
index 1f89411..3f873e6 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/installing-software.html b/docs/installing-software.html
index 0f0c27e..8501c7f 100644
--- a/docs/installing-software.html
+++ b/docs/installing-software.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/instructors.html b/docs/instructors.html
index c0410d0..71e5773 100644
--- a/docs/instructors.html
+++ b/docs/instructors.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/introduction-to-r.html b/docs/introduction-to-r.html
index e5d2ae7..4a1095e 100644
--- a/docs/introduction-to-r.html
+++ b/docs/introduction-to-r.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/learning-goals.html b/docs/learning-goals.html
index 702719f..c08e87b 100644
--- a/docs/learning-goals.html
+++ b/docs/learning-goals.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/lecture-1.html b/docs/lecture-1.html
index 9e8f686..14838ea 100644
--- a/docs/lecture-1.html
+++ b/docs/lecture-1.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/lecture-2.html b/docs/lecture-2.html
index d88a360..30c9c05 100644
--- a/docs/lecture-2.html
+++ b/docs/lecture-2.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/lecture-3.html b/docs/lecture-3.html
index dd25f14..a587741 100644
--- a/docs/lecture-3.html
+++ b/docs/lecture-3.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/lecture-4.html b/docs/lecture-4.html
index e358d0b..16d0493 100644
--- a/docs/lecture-4.html
+++ b/docs/lecture-4.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/lecture-5.html b/docs/lecture-5.html
index 3c4d0d0..e0abd4f 100644
--- a/docs/lecture-5.html
+++ b/docs/lecture-5.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/lecture-6.html b/docs/lecture-6.html
index a44e831..0d73191 100644
--- a/docs/lecture-6.html
+++ b/docs/lecture-6.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/lecture.html b/docs/lecture.html
index 66f1dbc..2c69d6b 100644
--- a/docs/lecture.html
+++ b/docs/lecture.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/mediation-moderation.html b/docs/mediation-moderation.html
index f03d6e7..7273f09 100644
--- a/docs/mediation-moderation.html
+++ b/docs/mediation-moderation.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/multiple-group-models.html b/docs/multiple-group-models.html
index b0ff7fc..2636c1e 100644
--- a/docs/multiple-group-models.html
+++ b/docs/multiple-group-models.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/note-on-data-updates.html b/docs/note-on-data-updates.html
index 1bc6065..341bd06 100644
--- a/docs/note-on-data-updates.html
+++ b/docs/note-on-data-updates.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/procedures.html b/docs/procedures.html
index 861c41a..199c68f 100644
--- a/docs/procedures.html
+++ b/docs/procedures.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/reading-1.html b/docs/reading-1.html
index 9868b91..b43ad4d 100644
--- a/docs/reading-1.html
+++ b/docs/reading-1.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/reading-2.html b/docs/reading-2.html
index 363d28e..3d16f4d 100644
--- a/docs/reading-2.html
+++ b/docs/reading-2.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/reading-3.html b/docs/reading-3.html
index 9a8ee9e..1434d79 100644
--- a/docs/reading-3.html
+++ b/docs/reading-3.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/reading-4.html b/docs/reading-4.html
index 457aad0..3b1d385 100644
--- a/docs/reading-4.html
+++ b/docs/reading-4.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/reading-5.html b/docs/reading-5.html
index 20bba22..09d90e0 100644
--- a/docs/reading-5.html
+++ b/docs/reading-5.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/reading-6.html b/docs/reading-6.html
index 298a13f..49ea074 100644
--- a/docs/reading-6.html
+++ b/docs/reading-6.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/reading-questions.html b/docs/reading-questions.html
index e2f6205..48cc5af 100644
--- a/docs/reading-questions.html
+++ b/docs/reading-questions.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/reading.html b/docs/reading.html
index 9ade9f6..b1e13f3 100644
--- a/docs/reading.html
+++ b/docs/reading.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/reference-keys.txt b/docs/reference-keys.txt
index 230285c..74ceb5a 100644
--- a/docs/reference-keys.txt
+++ b/docs/reference-keys.txt
@@ -175,12 +175,12 @@ section-87
section-88
in-class-exercises-5
section-89
+toraCFA
section-90
-section-91
updatedModel
+section-91
section-92
section-93
-section-94
multiple-group-models
lecture-6
recordings-3
@@ -188,21 +188,29 @@ slides-6
reading-6
at-home-exercises-6
mgPathAnalysis
+section-94
section-95
section-96
-section-97
fullPaMod
resPaMod
-section-98
+section-97
mgCFA
-section-99
+section-98
oneGroupCFA
-section-100
+section-99
twoGroupCFA
+section-100
section-101
-section-102
in-class-exercises-6
w7MeasurementInvariance
+section-102
+miTesting
+testing-between-group-differences
section-103
+multiple-group-sem-for-moderation
section-104
+section-105
+toraFullModel
+section-106
+section-107
wrap-up
diff --git a/docs/resources.html b/docs/resources.html
index 2e4d657..386e0fb 100644
--- a/docs/resources.html
+++ b/docs/resources.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/rules.html b/docs/rules.html
index e3e29ad..0268b3c 100644
--- a/docs/rules.html
+++ b/docs/rules.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/schedule.html b/docs/schedule.html
index 35579be..028e5fe 100644
--- a/docs/schedule.html
+++ b/docs/schedule.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/search_index.json b/docs/search_index.json
index 2bca59a..9fa4fd5 100644
--- a/docs/search_index.json
+++ b/docs/search_index.json
@@ -1 +1 @@
-[["index.html", "Theory Construction and Statistical Modeling Course Information", " Theory Construction and Statistical Modeling Kyle M. Lang Last updated: 2023-10-16 Course Information In order to test a theory, we must express the theory as a statistical model and then test this model on quantitative (numeric) data. In this course we will use datasets from different disciplines within the social sciences (educational sciences, psychology, and sociology) to explain and illustrate theories and practices that are used in all social science disciplines to statistically model social science theories. This course uses existing tutorial datasets to practice the process of translating verbal theories into testable statistical models. If you are interested in the methods of acquiring high quality data to test your own theory, we recommend following the course Conducting a Survey which is taught from November to January. Most information about the course is available in this GitBook. Course-related communication will be through https://uu.blackboard.com (Log in with your student ID and password). "],["acknowledgement.html", "Acknowledgement", " Acknowledgement This course was originally developed by dr. Caspar van Lissa. I (dr. Kyle M. Lang) have modified Caspar’s original materials and take full responsibility for any errors or inaccuracies introduced through these modifications. Credit for any particularly effective piece of pedagogy should probably go to Caspar. You can view the original version of this course here on Caspar’s GitHub page. "],["instructors.html", "Instructors", " Instructors Coordinator: dr. Kyle M. Lang Lectures: dr. Kyle M. Lang Practicals: Rianne Kraakman Daniëlle Remmerswaal Danielle McCool "],["course-overview.html", "Course overview", " Course overview This course comprises three parts: Path analysis: You will learn how to estimate complex path models of observed variables (e.g., linked linear regressions) as structural equation models. Factor analysis: You will learn different ways of defining and estimating latent (unobserved) constructs. Full structural equation modeling: You will combine the first two topics to estimate path models describing the associations among latent constructs. Each of these three themes will be evaluated with a separate assignment. The first two assignments will be graded on a pass/fail basis. Your course grade will be based on your third assignment grade. "],["schedule.html", "Schedule", " Schedule Course Week Calendar Week Lecture/Practical Topic Workgroup Activity Assignment Deadline 0 36 Pre-course preparation 1 37 Introduction to R 2 38 Statistical modeling, Path analysis 3 39 Mediation, Moderation 4 40 Exploratory factor analysis (EFA) A1 Peer-Review A1: 2023-10-04 @ 23:59 5 41 Confirmatory factor analysis (CFA) 6 42 Structural equation modeling (SEM) A2 Peer-Review A2: 2023-10-18 @ 23:59 7 43 Multiple group models 8 44 Wrap-up A3 Peer-Review 9 45 Exam week: No class meetings A3: 2023-11-10 @ 23:59 NOTE: The schedule (including topics covered and assignment deadlines) is subject to change at the instructors’ discretion. "],["learning-goals.html", "Learning goals", " Learning goals In this course you will learn how to translate a social scientific theory into a statistical model, how to analyze your data with these models, and how to interpret and report your results following APA standards. After completing the course, you will be able to: Translate a verbal theory into a conceptual model, and translate a conceptual model into a statistical model. Independently analyze data using the free, open-source statistical software R. Apply a latent variable model to a real-life problem wherein the observed variables are only indirect indicators of an unobserved construct. Use a path model to represent the hypothesized causal relations among several variables, including relationships such as mediation and moderation. Explain to a fellow student how structural equation modeling combines latent variable models with path models and the benefits of doing so. Reflect critically on the decisions involved in defining and estimating structural equation models. "],["resources.html", "Resources", " Resources Literature You do not need a separate book for this course! Most of the information is contained within this GitBook and the course readings (which you will be able to access via links in this GitBook). All literature is freely available online, as long as you are logging in from within the UU-domain (i.e., from the UU campus or through an appropriate VPN). All readings are linked in this GitBook via either direct download links or DOIs. If you run into any trouble accessing a given article, searching for the title using Google Scholar or the University Library will probably due the trick. Software You will do all of your statistical analyses with the statistical programming language/environment R and the add-on package lavaan. If you want to expand your learning, you can follow this optional lavaan tutorial. "],["reading-questions.html", "Reading questions", " Reading questions Along with every article, we will provide reading questions. You will not be graded on the reading questions, but it is important to prepare the reading questions before every lecture. The reading questions serve several important purposes: Provide relevant background knowledge for the lecture Help you recognize and understand the key terms and concepts Make you aware of important publications that shaped the field Help you extract the relevant insights from the literature "],["weekly-preparation.html", "Weekly preparation", " Weekly preparation Before every class meeting (both lectures and practicals) you need to do the assigned homework (delineated in the GitBook chapter for that week). This course follows a flipped classroom procedure, so you must complete the weekly homework to meaningfully participate in, and benefit from, the class meetings. Background knowledge We assume you have basic knowledge about multivariate statistics before entering this course. You do not need any prior experience working with R. If you wish to refresh your knowledge, we recommend the chapters on ANOVA, multiple regression, and exploratory factor analysis from Field’s Discovering Statistics using R. If you cannot access the Field book, many other introductory statistics textbooks cover these topics equally well. So, use whatever you have lying around from past statistics courses. You could also try one of the following open-access options: Applied Statistics with R Introduction to Modern Statistics Introduction to Statistical Learning "],["grading.html", "Grading", " Grading Your grade for the course is based on a “portfolio” composed of the three take-home assignments: Path modeling Deadline: Wednesday 2023-10-04 at 23:59 Group assignment Pass/Fail Confirmatory factor analysis Deadline: Wednesday 2023-10-18 at 23:59 Group assignment Pass/Fail Full structural equation modeling Deadline: Friday 2023-11-10 at 23:59 Individual assignment Comprises your entire numeric course grade The specifics of the assignments will be explicated in the Assignments chapter of this GitBook "],["attendance.html", "Attendance", " Attendance Attendance is not mandatory, but we strongly encourage you to attend all lectures and practicals. In our experience, students who actively participate tend to pass the course, whereas those who do not participate tend to drop out or fail. The lectures and practicals build on each other, so, in the unfortunate event that you have to miss a class meeting, please make sure you have caught up with the material before the next session. "],["assignments.html", "Assignments", " Assignments This chapter contains the details and binding information about the three assignments that comprise the portfolio upon which your course grade is based. Below, you can find a brief idea of what each assignment will cover. For each assignment, you will use R to analyze some real-world data, and you will write up your results in a concise report (not a full research paper). Guidelines for these analyses/reports are delineated in the following three sections. You will submit your reports via Blackboard. You will complete the first two assignments in your Assignment Groups. You will complete the third assignment individually. The first two assignments are graded as pass/fail. You must pass both of these assignments to pass the course. The third assignment constitutes your course grade. "],["assignment-1-path-analysis.html", "Assignment 1: Path Analysis", " Assignment 1: Path Analysis For the first assignment, you will work in groups to apply a path model that describes how several variables could be causally related. The components of the first assignment are described below. Choose a suitable dataset, and describe the data. You can use any of the 8 datasets linked below. State the research question; define and explicate the theoretical path model. This model must include, at least, three variables. Use a path diagram to show your theoretical model. Translate your theoretical path model into lavaan syntax, and estimate the model. Include the code used to define and estimate your model as an appendix. Explain your rationale for important modeling decisions. Discuss the conceptual fit between your theory and your model. Evaluate the model assumptions. Discuss other important decisions that could have influence your results. Report the results in APA style. Provide relevant output in a suitable format. Include measures of explained variance for the dependent variables. Discuss the results. Use your results to answer the research question. Consider the strengths and limitations of your analysis. Evaluation See the Grading section below for more information on how Assignment 1 will be evaluated. You can access an evaluation matrix for Assignment 1 here. This matrix gives an indication of what level of work constitutes insufficient, sufficient, and excellent responses to the six components described above. Submission Assignment 1 is due at 23:59 on Wednesday 4 October 2023. Submit your report via the Assignment 1 portal on Blackboard. "],["assignment-2-confirmatory-factor-analysis.html", "Assignment 2: Confirmatory Factor Analysis", " Assignment 2: Confirmatory Factor Analysis In the second assignment, you will work in groups to run a CFA wherein the observed variables are indirect indicators of the unobserved constructs you want to analyze. The components of the second assignment are described below. Choose a suitable dataset, and describe the data. Ideally, you will work with the same data that you analyzed in Assignment 1. If you want to switch, you can use any of the 8 datasets linked below. State the research question; define and explicate the theoretical CFA model. This model must include, at least, two latent constructs. Use a path diagram to represent your model. Translate your theoretical model into lavaan syntax, and estimate the model. Include the code used to define and estimate your model as an appendix. Explain your rationale for important modeling decisions. Discuss the conceptual fit between your theory and your model. Evaluate the model assumptions. Discuss other important decisions that could have influence your results. Report the results in APA style. Provide relevant output in a suitable format. Include measures of model fit. Discuss the results. Use your results to answer the research question. Consider the strengths and limitations of your analysis. Evaluation See the Grading section below for more information on how Assignment 2 will be evaluated. You can access an evaluation matrix for Assignment 2 here. This matrix gives an indication of what level of work constitutes insufficient, sufficient, and excellent responses to the six components described above. Submission Assignment 2 is due at 23:59 on Wednesday 18 October 2023. Submit your report via the Assignment 2 portal on Blackboard. "],["a3_components.html", "Assignment 3: Full Structural Equation Model", " Assignment 3: Full Structural Equation Model In the third assignment, you will work individually to apply a full SEM that describes how several (latent) variables could be causally related. The components of the third assignment are described below. Choose a suitable dataset, and describe the data. Ideally, you will work with the same data that you analyzed in Assignments 1 & 2. If you want to switch, you can use any of the 8 datasets linked below. State the research question; define and explicate the theoretical SEM. The structural component of this model must include, at least, three variables. The model must include, at least, two latent variables. Use a path diagram to represent your model. Translate your theoretical SEM into lavaan syntax, and estimate the model. Include the code used to define and estimate your model as an appendix. Explain your rationale for important modeling decisions. Discuss the conceptual fit between your theory and your model. Evaluate the model assumptions. Discuss other important decisions that could have influence your results. Report the results in APA style. Provide relevant output in a suitable format. Include measures of model fit. Include measures of explained variance for the dependent variables. Discuss the results. Use your results to answer the research question. Consider the strengths and limitations of your analysis. Evaluation See the Grading section below for more information on how the component scores represented in the rubric are combined into an overall assignment grade. You can access an evaluation matrix for Assignment 3 here. This matrix gives an indication of what level of work constitutes insufficient, sufficient, and excellent responses to the six components described above. Submission Assignment 3 is due at 23:59 on Friday 10 November 2023. Submit your report via the Assignment 3 portal on Blackboard. "],["elaboration-tips.html", "Elaboration & Tips", " Elaboration & Tips Theoretical Model & Research Question You need to provide some justification for your model and research question, but only enough to demonstrate that you’ve actually conceptualized and estimated a theoretically plausible statistical model (as opposed to randomly combining variables until lavaan returns a pretty picture). You have several ways to show that your model is plausible. Use common-sense arguments. Reference (a small number of) published papers. Replicate an existing model/research question. Don’t provide a rigorous literature-supported theoretical motivation. You don’t have the time to conduct a thorough literature review, and we don’t have the time to read such reviews when grading. Literature review is not one of the learning goals for this course, so you cannot get “bonus points” for an extensive literature review. You are free to test any plausible model that meets the size requirements. You can derive your own model/research question or you can replicate a published analysis. Model Specifications We will not cover methods for modeling categorical outcome variables. So, use only continuous variables as outcomes. DVs in path models and the structural parts of SEMs Observed indicators of latent factors in CFA/SEM NOTE: You may treat ordinal items as continuous, for the purposes of these assignments. We will not cover methods for latent variable interactions. Don’t specify a theoretical model that requires an interaction involving a latent construct. There is one exception to the above prohibition. If the moderator is an observed grouping variable, you can estimate the model as a multiple-group model. We’ll cover these methods in Week 7. Assumptions You need to show that you’re thinking about the assumptions and their impact on your results, but you don’t need to run thorough model diagnostics. Indeed, the task of checking assumptions isn’t nearly as straight forward in path analysis, CFA, and SEM as it is in linear regression modeling. You won’t be able to directly apply the methods you have learned for regression diagnostics, for example. Since all of our models are estimated with normal-theory maximum likelihood, the fundamental assumption of all the models we’ll consider in this course boils down to the following. All random variables in my model are i.i.d. multivariate normally distributed. So, you can get by with basic data screening and checking the observed random variables in your model (i.e., all variables other than fixed predictors) for normality. Since checking for multivariate normality is a bit tricky, we’ll only ask you to evaluate univariate normality. You should do these evaluations via graphical means. To summarize, we’re looking for the following. Data Consider whether the measurement level of your data matches the assumptions of your model. Check your variables for univariate outliers. If you find any outliers, either treat them in some way or explain why you are retaining them for the analysis. Check for missing data. For the purposes of the assignment, you can use complete case analysis to work around the missing data. If you’re up for more of a challenge, feel free to try multiple imputation or full information maximum likelihood. Model Evaluate the univariate normality of any random, observed variables in your model. E.g., DVs in path models, observed IVs modeled as random variables, indicators of latent factors If you fit a multiple-group model for Assignment 3, do this evaluation within groups. Use graphical tools to evaluate the normality assumption. Normal QQ-Plots Histograms Results What do we mean by reporting your results “in a suitable format”? Basically, put some effort into making your results readable, and don’t include a bunch of superfluous information. Part of demonstrating that you understand the analysis is showing that you know which pieces of output convey the important information. Tabulate your results; don’t directly copy the R output. Don’t include everything lavaan gives you. Include only the output needed to understand your results and support your conclusions. "],["data_options.html", "Data", " Data Below, you can find links to a few suitable datasets that you can use for the assignments. You must use one of the following datasets. You may not choose your own data from the wild. Coping with Covid Dataset Codebook Pre-Registration Feminist Perspectives Scale Dataset Article Hypersensitive Narcissism Scale & Dirty Dozen Dataset HSNS Article DD Article Kentucky Inventory of Mindfulness Skills Dataset Article Depression Anxiety Stress Scale Dataset DASS Information Nomophobia Dataset Recylced Water Acceptance Dataset Article "],["procedures.html", "Procedures", " Procedures Formatting You must submit your assignment reports in PDF format. Each report should include a title page. The title page should include the following information: The name of the assignment. The names of all assignment authors (i.e., all group members for Assignments 1 & 2, your name for Assignment 3). The Assignment Group number (only for Assignments 1 & 2). You must include the code used to define and run your model(s) as an appendix. Try to format the text in this appendix clearly. Use a monospace font. Length You may use as many words as necessary to adequately explain yourself; though, concision and parsimony are encouraged. Note that the assignments are not intended to be full-blown papers! The focus should be on the definition of your model, how this model relates to theory (introduction), and what you have learned from your estimated model (discussion). For each of the assignments, you should be able to get the job done in fewer than 10 pages of text (excluding title page, figures, appendices, and references). Submission You will submit your reports through Blackboard. Each assignment has a corresponding item in the “Assignments” section of the BB page through which you will submit your reports. For Assignments 1 & 2, you may only submit one report per group. Designate one group member to submit the report. The grade for this submission will apply to all group members. If something goes wrong with the submission, or you notice a mistake (before the deadline) that you want to correct, you may upload a new version of your report. We will grade the final submitted version. The submissions will be screened with Ouriginal. "],["grading-1.html", "Grading", " Grading Group Assignments Assignments 1 & 2 are simply graded as pass/fail. To pass, your submission must: Do a reasonable job of addressing the relevant components listed above Be submitted before the deadline Otherwise, you will fail the assignment. Individual Assignment Assignment 3 will be fully graded on the usual 10-point scale. Points will be allocated according to the extent to which your submission addresses the six components listed above. The evaluation matrix gives an indication of how these points will be apportioned. Further details over the grading procedures for Assignment 3 (e.g., exactly how your 10-point grade will be defined) will be provided at a later date. Assuming your group passes the first two assignments, your final course grade will simply be your Assignment 3 grade. Resits You must get a “pass” for Assignments 1 & 2 and score at least 5.5 on Assignment 3 to pass the course. If you fail any of the assignments, you will have the opportunity to resit the failed assignment(s). If you resit Assignment 3, your revised graded cannot be higher than 6. Further details on the resit procedure will be provided at a later date. Example Assignment You can find an example of a good submission (for an older version of Assignment 2) here. This example is not perfect (no paper ever is), and several points could be improved. That being said, this submission exemplifies what we’re looking for in your project reports. So, following the spirit of this example would earn you a high grade. "],["rules.html", "Rules", " Rules Resources For all three assignments, you may use any reference materials you like, including: All course materials The course GitBook Additional books and papers The internet Collaboration You will complete the first two assignments in groups. Although you will work in groups, your group may not work together with other groups. You will complete the final assignment individually. For this assignment, you may not work with anyone else. For all three assignments, you are obligated to submit original work (i.e., work conducted for this course by you or your group). Submitting an assignment that violates this condition constitutes fraud. Such cases of fraud will be addressed according to the University’s standard policy. Academic integrity Hopefully, you also feel a moral obligation to obey the rules. For this course, we have implemented an examination that allows you to showcase what you have learned in a more realistic way than a written exam would allow. This assessment format spares you the stress of long exams (the two exams for this course used to be 4 hours each) and the attendant studying/cramming. The assignments will also help you assess your ability to independently analyse data, which is important to know for your future courses and/or career. However, this format also assumes that you complete the assignments in good faith. So, I simply ask that you hold up your end of the bargain, and submit your original work to show us what you’ve learned. Strict stuff By submitting your assignments (both group and individual), you confirm the following: You have completed the assignment yourself (or with your group) You are submitting work that you have written yourself (or with your group) You are using your own UU credentials to submit the assignment You have not had outside help that violates the conditions delineated above while completing the assignment All assignments will be submitted via Ouriginal in Blackboard and, thereby, checked for plagiarism. If fraud or plagiarism is detected or suspected, we will inform the Board of Examiners in the usual manner. In the event of demonstrable fraud, the sanctions delineated in Article 5.15 of the Education and Examination Regulations (EER) will apply. "],["software-setup.html", "Software Setup", " Software Setup This chapter will help you prepare for the course by showing how to install R and RStudio on your computer. If you’re already using R, there may be nothing new for you here. That being said, you should look over this chapter to ensure that your current setup will be compatible with the course requirements. If you have never used R before, this chapter is essential! The information is this chapter will be crucial for getting your computer ready for the course. "],["typographic-conventions.html", "Typographic Conventions", " Typographic Conventions Throughout this GitBook, we (try to) use a consistent set of typographic conventions: Functions are typeset in a code font, and the name of the function is always followed by parentheses E.g., sum(), mean() Other R objects (e.g., data objects, function arguments) are in also typeset in a code font but without parentheses E.g., seTE, method.tau Sometimes, we’ll use the package name followed by two colons (::, the so-called *scope-resolution operator), like lavaan::sem(). This command is valid R code and will run if you copy it into your R console. The lavaan:: part of the command tells R that we want to use the sem() from the lavaan package. "],["installing-software.html", "Installing software", " Installing software Before we start the course, we have to install three things: R: A free program for statistical programming RStudio: An integrated development environment (IDE) which makes it easier to work with R. Several packages: Separate pieces of ‘add-on’ software for R with functions to do specific analyses. Packages also include documentation describing how to use their functions and sample data. Installing R The latest version of R is available here. Click the appropriate link for your operating system and follow the instructions for installing the latest stable release. Depending on which OS you select, you may be given an option to install different components (e.g., base, contrib, Rtools). For this course, you will only need the base package. Installing RStudio Download the Free Desktop version of RStudio from the download page of the RStudio website. Installing packages To participate in this course, you will need a few essential R packages. Here’s an overview of the packages and why we need them: Package Description lavaan A sophisticated and user-friendly package for structural equation modeling dplyr A powerful suite of data-processing tools ggplot2 A flexible and user-friendly package for making graphs tidySEM Plotting and tabulating the output of SEM-models semTools Comparing models, establishing measurement invariance across groups psych Descriptive statistics and EFA rockchalk Probing interactions foreign Loading data from SPSS ‘.sav’ files readxl Loading data from Excel ‘.xslx’ files To install these packages, we use the install.packages() function in R. Open RStudio Inside RStudio, find the window named Console on left side of the screen. Copy the following code into the console and hit Enter/Return to run the command. install.packages(c("lavaan", "dplyr", "ggplot2", "tidySEM", "semTools", "psych", "rockchalk", "foreign", "readxl"), dependencies = TRUE) "],["course-data.html", "Course Data", " Course Data All of the data files you will need for the course are available in this SurfDrive directory. Follow the link to download a ZIP archive containing the data you will need to complete the practical exercises and assignments. Extract these data files to a convenient location on your computer. "],["note-on-data-updates.html", "Note on Data Updates", " Note on Data Updates During the course, we may need to update some of these datasets and/or add some new datasets to the SurfDrive directory. If so, you will need to download the updated data. We will let you know if and when any datasets are modified. In such situations, you are responsible for updating your data. Working with outdated data will probably produce incorrect results. Your answer won’t match the solutions we expect. Your answer will be marked as incorrect, even if the code used to produce the answer is correct. Points lost on an assignment due to using outdated datasets will not be returned. "],["introduction-to-r.html", "1 Introduction to R", " 1 Introduction to R This week is all about getting up-and-running with R and RStudio. Homework before the lecture Complete the preparatory material: Read over the Course Information chapter Work through the Software Setup chapter Watch the Lecture Recording for this week. Homework before the practical Complete the At-Home Exercises. Practical content During the practical you will work on the In-Class Exercises. "],["lecture.html", "1.1 Lecture", " 1.1 Lecture This week, you will learn the basics of R and RStudio. Rather than re-inventing the proverbial wheel, we’re linked to existing resources developed by R-Ladies Sydney. 1.1.1 Recordings Tour of RStudio \\[\\\\[6pt]\\] R Packages \\[\\\\[6pt]\\] Data I/0 1.1.2 Slides You can access the accompanying resources on the R-Ladies Sydney website here. "],["reading.html", "1.2 Reading", " 1.2 Reading There is no official reading this week. If you’d like to deepen your dive into R, feel free to check out Hadley Wickham’s excellent book R for Data Science. Otherwise, you may want to get a jump-start on the At-Home Exercises for this week. \\[\\\\[12pt]\\] "],["at-home-exercises.html", "1.3 At-Home Exercises", " 1.3 At-Home Exercises This week is all about gaining familiarity with R and RStudio. We’ll be using the primers available on Posit Cloud to work through some basic elements of data visualization and statistical programming in R. Although you should already have R working, this week’s at-home and in-class exercises don’t require that you have R installed on your system. If following along within this GitBook doesn’t work for you, you can also find the tutorials online on the Posit Primers page. 1.3.1 Visualizations with R 1.3.2 Programming with R End of At-Home Exercises "],["in-class-exercises.html", "1.4 In-Class Exercises", " 1.4 In-Class Exercises In the practical this week, we’ll go a little further into what it’s possible with R. Don’t worry if you cannot remember everything in these primers—they’re only meant to familiarize you with what is possible and to get you some experience interacting with R and RStudio. The following primers come from Posit Cloud and were created with the learnr package. 1.4.1 Viewing Data This first primer introduces a special data format called a tibble, as well as some functions for viewing your data. 1.4.2 Dissecting Data In the next primer, we’ll explore tools to subset and rearrange you data: select(), filter(), and arrange(). 1.4.3 Grouping and Manipulating Data Advanced If you made it through the previous two sections with ease and want to challenge yourself, go ahead with this next section. If you’re running short on time, you can skip ahead to Exploratory Data Analysis. \\[\\\\[3pt]\\] 1.4.4 Exploratory Data Analysis 1.4.5 Visualizing Data Visualizing data is a great way to start understanding a data set. In this section, we’ll highlight a few examples of how you can use the ggplot2 libarary to visualize your data. Primers on many other visualizations are available on Posit Cloud. Bar Charts for Categorical Variables Scatterplots for Continuous Variables 1.4.6 Tidying Data This primer will provide an overview of what’s meant by “tidy data”. You only need to complete the Tidy Data section—the sections on Gathering and Spreading columns are useful, but we won’t ask you to apply those techniques in this course. Recap Hopefully, you now feel more comfortable using some of R’s basic functionality and packages to work with data. Here’s a brief description of the functions covered above: install.packages() for installing packages Remember to put the package names in quotes library() for loading packages View() for viewing your dataset select() for picking only certain columns filter() for picking only certain rows arrange() for changing the rows order %>% aka “the pipe” for chaining commands together In RStudio, you can hit ctrl+shift+m as a handy key combination ? for help files Logical tests and Boolean operators == equal to != not equal to < less than <= less than or equal to > greater than >= greater than or equal to is.na() is the value NA (not available) !is.na is the value not NA & and (true only if the left and right are both true) | or (true if either the left or right are true) ! not (invert true/false) %in% in (is left in the larger set of right values) any() any (true if any in the set are true) all() all (true if all in the set are true) xor() xor (true if one and only one of the set are true) ggplot2 ggplot() create the basic object from which to building a plot aes() contains the aesthetic mappings (like x and y) geom_bar() bar plots for distributions of categorical variables geom_point() scatterplots for plotting two continuous variables geom_label_repel() for plotting text facet_wrap() for creating sets of conditional plots End of In-Class Exercises "],["statistical-modeling-path-analysis.html", "2 Statistical Modeling & Path Analysis", " 2 Statistical Modeling & Path Analysis This week, we will cover statistical modeling and path analysis. Homework before the lecture Watch the Lecture Recording for this week. Complete the Reading for this week, and answer the associated reading questions. Homework before the practical Complete the At-Home Exercises. Practical content During the practical you will work on the In-Class Exercises. "],["lecture-1.html", "2.1 Lecture", " 2.1 Lecture In this lecture, we will begin by discussing the paradigm and contextualizing statistical modeling relative to other ways that we can conduct statistical analyses. We will conclude with an introduction to . 2.1.1 Recordings Statistical Reasoning Statistical Modeling Path Analysis 2.1.2 Slides You can download the lectures slides here "],["reading-1.html", "2.2 Reading", " 2.2 Reading Reference Smaldino, P. E. (2017). Models are stupid, and we need more of them. In R.R. Vallacher, S.J. Read, & A. Nowakt (Eds.), Computational Social Psychology (pp. 311–331). New York: Routledge. SKIP PAGES 322 - 327 Questions What are the differences between a “verbal model” and a “formal model”? As explained in the paragraph “A Brief Note on Statistical Models”, formal models are not the same as statistical models. Still, we can learn a lot from Smaldino’s approach. Write down three insights from this paper that you would like to apply to your statistical modeling during this course. Suggested Reading (Optional) The following paper is not required, but it’s definitely worth a read. Breiman provides a very interesting perspective on different ways to approach a modeling-based analysis. Breiman, L. (2001). Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statistical Science, 16(3) 199–231. https://doi.org/10.1214/ss/1009213726 "],["at-home-exercises-1.html", "2.3 At-Home Exercises", " 2.3 At-Home Exercises Load the LifeSat.sav data. library(dplyr) library(haven) LifeSat <- read_spss("LifeSat.sav") 2.3.1 Make a table of descriptive statistics for the variables: LifSat, educ, ChildSup, SpouSup, and age. What is the average age in the sample? What is the range (youngest and oldest child)? Hint: Use the tidySEM::descriptives() function.` Click for explanation The package tidySEM contains the descriptives() function for computing descriptive statistics. The describe() function in the psych package is a good alternative. library(tidySEM) descriptives(LifeSat[ , c("LifSat", "educ", "ChildSup", "SpouSup", "age")]) 2.3.2 Run a simple linear regression with LifSat as the dependent variable and educ as the independent variable. Hints: The lm() function (short for linear model) does linear regression. The summary() function provides relevant summary statistics for the model. It can be helpful to store the results of your analysis in an object. Click for explanation results <- lm(LifSat ~ educ, data = LifeSat) summary(results) ## ## Call: ## lm(formula = LifSat ~ educ, data = LifeSat) ## ## Residuals: ## Min 1Q Median 3Q Max ## -43.781 -11.866 2.018 12.418 43.018 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 35.184 7.874 4.469 2.15e-05 *** ## educ 3.466 1.173 2.956 0.00392 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 17.64 on 96 degrees of freedom ## Multiple R-squared: 0.08344, Adjusted R-squared: 0.0739 ## F-statistic: 8.74 on 1 and 96 DF, p-value: 0.003918 2.3.3 Repeat the analysis from 2.3.2 with age as the independent variable. Click for explanation results <- lm(LifSat ~ age, data = LifeSat) summary(results) ## ## Call: ## lm(formula = LifSat ~ age, data = LifeSat) ## ## Residuals: ## Min 1Q Median 3Q Max ## -35.321 -14.184 3.192 13.593 40.626 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 200.2302 52.1385 3.840 0.00022 *** ## age -2.0265 0.7417 -2.732 0.00749 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 17.75 on 96 degrees of freedom ## Multiple R-squared: 0.07215, Adjusted R-squared: 0.06249 ## F-statistic: 7.465 on 1 and 96 DF, p-value: 0.007487 2.3.4 Repeat the analysis from 2.3.2 and 2.3.3 with ChildSup as the independent variable. Click for explanation results <- lm(LifSat ~ ChildSup, data = LifeSat) summary(results) ## ## Call: ## lm(formula = LifSat ~ ChildSup, data = LifeSat) ## ## Residuals: ## Min 1Q Median 3Q Max ## -37.32 -12.14 0.66 12.41 44.68 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.559 8.342 4.502 1.89e-05 *** ## ChildSup 2.960 1.188 2.492 0.0144 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 17.86 on 96 degrees of freedom ## Multiple R-squared: 0.06076, Adjusted R-squared: 0.05098 ## F-statistic: 6.211 on 1 and 96 DF, p-value: 0.01441 2.3.5 Run a multiple linear regression with LifSat as the dependent variable and educ, age, and ChildSup as the independent variables. Hint: You can use the + sign to add multiple variables to the RHS of your model formula. Click for explanation results <- lm(LifSat ~ educ + age + ChildSup, data = LifeSat) summary(results) ## ## Call: ## lm(formula = LifSat ~ educ + age + ChildSup, data = LifeSat) ## ## Residuals: ## Min 1Q Median 3Q Max ## -32.98 -12.56 2.68 11.03 41.91 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 134.9801 53.2798 2.533 0.0130 * ## educ 2.8171 1.1436 2.463 0.0156 * ## age -1.5952 0.7188 -2.219 0.0289 * ## ChildSup 2.4092 1.1361 2.121 0.0366 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 16.92 on 94 degrees of freedom ## Multiple R-squared: 0.1741, Adjusted R-squared: 0.1477 ## F-statistic: 6.603 on 3 and 94 DF, p-value: 0.0004254 2.3.6 Compare the results from 2.3.5 with those from 2.3.2, 2.3.3, and 2.3.4. What do you notice when you compare the estimated slopes for each of the three predictors in the multiple regression model with the corresponding estimates from the simple regression models? "],["in-class-exercises-1.html", "2.4 In-Class Exercises", " 2.4 In-Class Exercises During this practical, you will work through some exercises meant to expand your statistical reasoning skills and improve your understanding of linear models. For this exercise, having some familiarity with regression will be helpful. If you feel like you need to refresh your knowledge in this area, consider the resources listed in the Background knowledge section. Data: You will use the following dataset for these exercises. Sesam.sav 2.4.1 Data Exploration Open the file “Sesam.sav” # Load `dplyr` for data processing: library(dplyr) # Load the `haven` library for reading in SPSS files: library(haven) ## Load the 'Sesam.sav' data ## Use haven::zap_formats() to remove SPSS attributes sesam <- read_sav(file = "Sesam.sav") %>% zap_formats() This file is part of a larger dataset that evaluates the impact of the first year of the Sesame Street television series. Sesame Street is mainly concerned with teaching preschool related skills to children in the 3–5 year age range. The following variables will be used in this exercise: age: measured in months prelet: knowledge of letters before watching Sesame Street (range 0–58) prenumb: knowledge of numbers before watching Sesame Street (range 0–54) prerelat: knowledge of size/amount/position relationships before watching Sesame Street (range 0–17) peabody: vocabulary maturity before watching Sesame Street (range 20–120) postnumb: knowledge of numbers after a year of Sesame Street (range 0–54) Note: Unless stated otherwise, the following questions refer to the sesam data and the above variables. 2.4.1.1 What is the type of each variable? Hint: The output of the str() function should be helpful here. Click to show code ## Examine the data structure: str(sesam) ## tibble [240 × 8] (S3: tbl_df/tbl/data.frame) ## $ id : num [1:240] 1 2 3 4 5 6 7 8 9 10 ... ## $ age : num [1:240] 66 67 56 49 69 54 47 51 69 53 ... ## $ prelet : num [1:240] 23 26 14 11 47 26 12 48 44 38 ... ## $ prenumb : num [1:240] 40 39 9 14 51 33 13 52 42 31 ... ## $ prerelat: num [1:240] 14 16 9 9 17 14 11 15 15 10 ... ## $ peabody : num [1:240] 62 80 32 27 71 32 28 38 49 32 ... ## $ postnumb: num [1:240] 44 39 40 19 54 39 44 51 48 52 ... ## $ gain : num [1:240] 4 0 31 5 3 6 31 -1 6 21 ... ## ..- attr(*, "display_width")= int 10 Click for explanation All variables are numeric. str() uses the abbreviation “num” to indicate a numeric vector. 2.4.1.2 What is the average age in the sample? What is the age range (youngest and oldest child)? Hint: Use tidySEM::descriptives() Click to show code As in the take home exercises, you can use the descriptives() function from the tidySEM package to describe the data: library(tidySEM) descriptives(sesam) Click for explanation We can get the average age from the “mean” column in the table ( 51.5), and the age range from the columns “min” and “max”, (34 and 69 respectively.) 2.4.1.3 What is the average gain in knowledge of numbers? What is the standard deviation of this gain? Hints: You will need to compute the gain and save the change score as a new object. You can then use the base-R functions mean() and sd() to do the calculations. Click to show code Create a new variable that represents the difference between pre- and post-test scores on knowledge of numbers: sesam <- mutate(sesam, ndif = postnumb - prenumb) Compute the mean and SD of the change score: sesam %>% summarise(mean(ndif), sd(ndif)) 2.4.1.4 Create an appropriate visualization of the gain scores you computed in 2.4.1.3. Justify your choice of visualization. Hint: Some applicable visualizations are explained in the Visualizations with R section. Click to show code library(ggplot2) ## Create an empty baseline plot object: p <- ggplot(sesam, aes(x = ndif)) ## Add some appropriate geoms: p + geom_histogram() p + geom_density() p + geom_boxplot() Click for explanation Because the gain score is numeric, we should use something appropriate for showing the distribution of a continuous variable. In this case, we can use either a density plot, or a histogram (remember from the lecture, this is like a density plot, but binned). We can also use a box plot, which can be a concise way to display a lot of information about a variable in a little less space. 2.4.1.5 Create a visualization that provides information about the bivariate relationship between the pre- and post-test number knowledge. Justify your choice of visualization. Describe the relationship based on what you see in your visualization. Hint: Again, the Visualizations with R section may provide some useful insights. Click to show code ## Create a scatterplot of the pre- and post-test number knowledge ggplot(sesam, aes(x = prenumb, y = postnumb)) + geom_point() Click for explanation A scatterplot is a good tool for showing patterns in the way that two continuous variables relate to each other. From it, we can quickly gather information about whether a relationship exists, its direction, its strength, how much variation there is, and whether or not a relationship might be non-linear. Based on this scatterplot, we see a positive relationship between the prior knowledge of numbers and the knowledge of numbers at the end of the study. Children who started with a higher level of numeracy also ended with a higher level of numeracy. There is a considerable amount of variance in the relationship. Not every child increases their numeracy between pre-test and post-test. Children show differing amounts of increase. 2.4.2 Linear Modeling 2.4.2.1 Are there significant, bivariate associations between postnumb and the following variables? age prelet prenumb prerelat peabody Use Pearson correlations to answer this question. You do not need to check the assumptions here (though you would in real life). Hint: The base-R cor.test() function and the corr.test() function from the psych package will both conduct hypothesis tests for a correlation coefficients (the base-R cor() function only computes the coefficients). Click to show code library(psych) ## Test the correlations using psych::corr.test(): sesam %>% select(postnumb, age, prelet, prenumb, prerelat, peabody) %>% corr.test() ## Call:corr.test(x = .) ## Correlation matrix ## postnumb age prelet prenumb prerelat peabody ## postnumb 1.00 0.34 0.50 0.68 0.54 0.52 ## age 0.34 1.00 0.33 0.43 0.44 0.29 ## prelet 0.50 0.33 1.00 0.72 0.47 0.40 ## prenumb 0.68 0.43 0.72 1.00 0.72 0.61 ## prerelat 0.54 0.44 0.47 0.72 1.00 0.56 ## peabody 0.52 0.29 0.40 0.61 0.56 1.00 ## Sample Size ## [1] 240 ## Probability values (Entries above the diagonal are adjusted for multiple tests.) ## postnumb age prelet prenumb prerelat peabody ## postnumb 0 0 0 0 0 0 ## age 0 0 0 0 0 0 ## prelet 0 0 0 0 0 0 ## prenumb 0 0 0 0 0 0 ## prerelat 0 0 0 0 0 0 ## peabody 0 0 0 0 0 0 ## ## To see confidence intervals of the correlations, print with the short=FALSE option ## OR ## library(magrittr) ## Test the correlations using multiple cor.test() calls: sesam %$% cor.test(postnumb, age) ## ## Pearson's product-moment correlation ## ## data: postnumb and age ## t = 5.5972, df = 238, p-value = 5.979e-08 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.2241066 0.4483253 ## sample estimates: ## cor ## 0.3410578 sesam %$% cor.test(postnumb, prelet) ## ## Pearson's product-moment correlation ## ## data: postnumb and prelet ## t = 8.9986, df = 238, p-value < 2.2e-16 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.4029239 0.5926632 ## sample estimates: ## cor ## 0.5038464 sesam %$% cor.test(postnumb, prenumb) ## ## Pearson's product-moment correlation ## ## data: postnumb and prenumb ## t = 14.133, df = 238, p-value < 2.2e-16 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.6002172 0.7389277 ## sample estimates: ## cor ## 0.6755051 sesam %$% cor.test(postnumb, prerelat) ## ## Pearson's product-moment correlation ## ## data: postnumb and prerelat ## t = 9.9857, df = 238, p-value < 2.2e-16 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.4475469 0.6268773 ## sample estimates: ## cor ## 0.5433818 sesam %$% cor.test(postnumb, peabody) ## ## Pearson's product-moment correlation ## ## data: postnumb and peabody ## t = 9.395, df = 238, p-value < 2.2e-16 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.4212427 0.6067923 ## sample estimates: ## cor ## 0.520128 Click for explanation Yes, based on the p-values (remember that 0 here really means very small, making it less than .05), we would say that there are significant correlations between postnumb and all other variables in the data. (In fact, all variables in the data are significantly correlated with one another.) 2.4.2.2 Do age and prenumb explain a significant proportion of the variance in postnumb? What statistic did you use to justify your conclusion? Interpret the model fit. Use the lm() function to fit your model. Click to show code lmOut <- lm(postnumb ~ age + prenumb, data = sesam) summary(lmOut) ## ## Call: ## lm(formula = postnumb ~ age + prenumb, data = sesam) ## ## Residuals: ## Min 1Q Median 3Q Max ## -38.130 -6.456 -0.456 5.435 22.568 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 7.4242 5.1854 1.432 0.154 ## age 0.1225 0.1084 1.131 0.259 ## prenumb 0.7809 0.0637 12.259 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 9.486 on 237 degrees of freedom ## Multiple R-squared: 0.4592, Adjusted R-squared: 0.4547 ## F-statistic: 100.6 on 2 and 237 DF, p-value: < 2.2e-16 Click for explanation Yes, age and prenumb explain a significant amount of variability in postnumb (\\(R^2 = 0.459\\), \\(F[2, 237] = 100.629\\), \\(p < 0.001\\)). We use the F statistic for the overall test of model fit to support this conclusion. The variables age and prenumb together explain 45.9% of the variability in postnumb. 2.4.2.3 Write the null and alternative hypotheses tested for in 2.4.2.2. Click for explanation Since we are testing for explained variance, our hypotheses concern the \\(R^2\\). \\[ \\begin{align*} H_0: R^2 = 0\\\\ H_1: R^2 > 0 \\end{align*} \\] Note that this is a directional hypotheses because the \\(R^2\\) cannot be negative. 2.4.2.4 Define the model syntax to estimate the model from 2.4.2.2 as a path analysis using lavaan. Click to show code mod <- 'postnumb ~ 1 + age + prenumb' 2.4.2.5 Estimate the path analytic model you defined above. Use the lavaan::sem() function to estimate the model. Click to show code library(lavaan) lavOut1 <- sem(mod, data = sesam) 2.4.2.6 Summarize the fitted model you estimated above. Use the summary() function to summarize the model. Click to show code summary(lavOut1) ## lavaan 0.6.16 ended normally after 1 iteration ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 4 ## ## Number of observations 240 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## postnumb ~ ## age 0.123 0.108 1.138 0.255 ## prenumb 0.781 0.063 12.336 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .postnumb 7.424 5.153 1.441 0.150 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .postnumb 88.864 8.112 10.954 0.000 In OLS regression, the predictor variables are usually treated as fixed and do not covary. We can easily relax this assumption in path analysis. 2.4.2.7 Re-estimate the path analytic model you defined in 2.4.2.4. Specify the predictors as random, correlated variables. Hint: You can make the predictors random in, at least, two ways: Modify the model syntax to specify the correlation between age and prenumb. Add fixed.x = FALSE to your sem() call. Click to show code lavOut2 <- sem(mod, data = sesam, fixed.x = FALSE) ## OR ## mod <- ' postnumb ~ 1 + age + prenumb age ~~ prenumb ' lavOut2 <- sem(mod, data = sesam) 2.4.2.8 Summarize the fitted model you estimated above. Compare the results to those from the OLS regression in 2.4.2.2 and the path model in 2.4.2.5. Click to show code summary(lavOut2) ## lavaan 0.6.16 ended normally after 26 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 9 ## ## Number of observations 240 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## postnumb ~ ## age 0.123 0.108 1.138 0.255 ## prenumb 0.781 0.063 12.336 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## age ~~ ## prenumb 28.930 4.701 6.154 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .postnumb 7.424 5.153 1.441 0.150 ## age 51.525 0.405 127.344 0.000 ## prenumb 20.896 0.688 30.359 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .postnumb 88.864 8.112 10.954 0.000 ## age 39.291 3.587 10.954 0.000 ## prenumb 113.702 10.379 10.954 0.000 summary(lavOut1) ## lavaan 0.6.16 ended normally after 1 iteration ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 4 ## ## Number of observations 240 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## postnumb ~ ## age 0.123 0.108 1.138 0.255 ## prenumb 0.781 0.063 12.336 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .postnumb 7.424 5.153 1.441 0.150 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .postnumb 88.864 8.112 10.954 0.000 summary(lmOut) ## ## Call: ## lm(formula = postnumb ~ age + prenumb, data = sesam) ## ## Residuals: ## Min 1Q Median 3Q Max ## -38.130 -6.456 -0.456 5.435 22.568 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 7.4242 5.1854 1.432 0.154 ## age 0.1225 0.1084 1.131 0.259 ## prenumb 0.7809 0.0637 12.259 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 9.486 on 237 degrees of freedom ## Multiple R-squared: 0.4592, Adjusted R-squared: 0.4547 ## F-statistic: 100.6 on 2 and 237 DF, p-value: < 2.2e-16 2.4.2.9 Consider the path model below. How many regression coefficients are estimated in this model? How many variances are estimated? How many covariances are estimated? Click for explanation Six regression coefficients (red) Four (residual) variances (blue) No covariances 2.4.2.10 Consider a multiple regression analysis with three continuous independent variables: scores on tests of language, history, and logic, and one continuous dependent variable: score on a math test. We want to know if scores on the language, history, and logic tests can predict the math test score. Sketch a path model that you could use to answer this question How many regression parameters are there? How many variances could you estimate? How many covariances could you estimate? 2.4.3 Categorical IVs Load the Drivers.sav data. # Read the data into a data frame named 'drivers': drivers <- read_sav("Drivers.sav") %>% as_factor() # This preserves the SPSS labels for nominal variables In this section, we will evaluate the following research question: Does talking on the phone interfere with people's driving skills? These data come from an experiment. The condition variable represents the three experimental conditions: Hand-held phone Hands-free phone Control (no phone) We will use condition as the IV in our models. The DV, RT, represents the participant’s reaction time (in milliseconds) during a driving simulation. 2.4.3.1 Use the package ggplot2 to create a density plot for the variable RT. What concept are we representing with this plot? Hint: Consider the lap times example from the statistical modeling section of Lecture 2. Click to show code ggplot(drivers, aes(x = RT)) + geom_density() Click for explanation This shows the distribution of all the combined reaction times from drivers in all three categories. 2.4.3.2 Modify this density plot by mapping the variable condition from your data to the fill aesthetic in ggplot. What is the difference between this plot and the previous plot? Do you think there is evidence for differences between the groups? How might we test this by fitting a model to our sample? Click to show code Hint: To modify the transparency of the densities, use the aesthetic alpha. ggplot(drivers, aes(x = RT, fill = condition)) + geom_density(alpha = .5) Click for explanation This figure models the conditional distribution of reaction time, where the type of cell phone usage is the grouping factor. Things you can look at to visually assess whether the three groups differ are the amount of overlap of the distributions, how much distance there is between the individual means, and whether the combined distribution is much different than the conditional distributions. If we are willing to assume that these conditional distributions are normally distributed and have equivalent variances, we could use a linear model with dummy-coded predictors. Aside: ANOVA vs. Linear Regression As you may know, the mathematical model underlying ANOVA is just a linear regression model with nominal IVs. So, in terms of the underlying statistical models, there is no difference between ANOVA and regression; the differences lie in the focus of the analysis. ANOVA is really a type of statistical test wherein we are testing hypotheses about the effects of some set of nominal grouping factors on some continuous outcome. When doing an ANOVA, we usually don’t interact directly with the parameter estimates from the underlying model. Regression is a type of statistical model (i.e., a way to represent a univariate distribution with a conditional mean and fixed variance). When we do a regression analysis, we primarily focus on the estimated parameters of the underling linear model. When doing ANOVA in R, we estimate the model exactly as we would for linear regression; we simply summarize the results differently. If you want to summarize your model in terms of the sums of squares table you usually see when running an ANOVA, you can supply your fitted lm object to the anova() function. This is a statistical modeling course, not a statistical testing course, so we will not consider ANOVA any further. 2.4.3.3 Estimate a linear model that will answer the research question stated in the beginning of this section. Use lm() to estimate the model. Summarize the fitted model and use the results to answer the research question. Click to show code library(magrittr) lmOut <- drivers %>% mutate(condition = relevel(condition, ref = "control")) %$% lm(RT ~ condition) summary(lmOut) ## ## Call: ## lm(formula = RT ~ condition) ## ## Residuals: ## Min 1Q Median 3Q Max ## -317.50 -71.25 2.98 89.55 243.45 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 553.75 29.08 19.042 <2e-16 *** ## conditionhand-held 100.75 41.13 2.450 0.0174 * ## conditionhands-free 63.80 41.13 1.551 0.1264 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 130.1 on 57 degrees of freedom ## Multiple R-squared: 0.09729, Adjusted R-squared: 0.06562 ## F-statistic: 3.072 on 2 and 57 DF, p-value: 0.05408 anova(lmOut) Click for explanation The effect of condition on RT is nonsignificant (\\(F[2, 57] = 3.07\\), \\(p = 0.054\\)). Therefore, based on these results, we do not have evidence for an effect of mobile phone usage on driving performance. 2.4.3.4 Use lavaan to estimate the model from 2.4.3.3 as a path model. Hint: lavaan won’t let us use factors for our categorical predictors. So, you will need to create your own dummy codes. Click to show code mod <- 'RT ~ 1 + HH + HF' lavOut <- drivers %>% mutate(HH = ifelse(condition == "hand-held", 1, 0), # Create dummy code for "hand-held" condition HF = ifelse(condition == "hands-free", 1, 0) # Create dummy code for "hands-free" condition ) %>% sem(mod, data = .) # Estimate the model summary(lavOut) ## lavaan 0.6.16 ended normally after 1 iteration ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 4 ## ## Number of observations 60 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## RT ~ ## HH 100.750 40.085 2.513 0.012 ## HF 63.800 40.085 1.592 0.111 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .RT 553.750 28.344 19.537 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .RT 16068.028 2933.607 5.477 0.000 At this point, we haven’t covered the tools you need to conduct the ANOVA-style tests with path models. So, you can’t yet answer the research question with the above model. When we discuss model comparisons, you’ll get the missing tools. End of In-Class Exercises 2 "],["mediation-moderation.html", "3 Mediation & Moderation", " 3 Mediation & Moderation In this lecture, we will discuss two particular types of processes that we can model using path analysis: mediation and moderation. Homework before the lecture Watch the Lecture Recording for this week. Complete the Reading for this week, and answer the associated reading questions. Homework before the practical Complete the At-Home Exercises. Practical content During the practical you will work on the In-Class Exercises. "],["lecture-2.html", "3.1 Lecture", " 3.1 Lecture Researchers often have theories about possible causal processes linking multiple variables. Mediation is a particularly important example of such a process in which in an input variable, X, influences the outcome, Y, through an intermediary variable, M (the mediator). For instance, psychotherapy (X), may affect thoughts (M), which in turn affects mood (Y). We can investigate mediation via a specific sequence of linear regression equations, but path modeling will make our lives much easier. We can use path models to simultaneously estimate multiple related regression equations. So, mediation analysis is an ideal application of path modeling. In this lecture, we consider both approaches and discuss their relative strengths and weaknesses. As with mediation, researchers often posit theories involving moderation. Moderation implies that the effect of X on Y depends on another variable, Z. For instance, the effect of feedback (X) on performance (Y) may depend on age (Z). Older children might process feedback more effectively than younger children. Hence, the feedback is more effective for older children than for younger children, and the effect of feedback on performance is stronger for older children than for younger children. In such a case, we would say that age moderates the effect of feedback on performance. 3.1.1 Recordings Note: In the following recordings, the slide numbers are a bit of a mess, because I made these videos by cutting together recordings that used different slide decks. My apologies to those who are particularly distracted by continuity errors. Mediation Basics Mediation Testing Bootstrapping Moderation Basics Moderation Probing 3.1.2 Slides You can download the lecture slides here "],["reading-2.html", "3.2 Reading", " 3.2 Reading Reference Baron, R. M. & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical Considerations. Journal of Personality and Individual Differences, 51(6), 1173–1182 Questions What is mediation? Give an example of mediation. According to the authors, we must satisfy four criteria to infer mediation. What are these criteria? What is “moderation”, and how is it different from “mediation”? Give an example of moderation. What are the four methods given by Baron and Kenny as suitable ways to to study interaction effects? The authors suggest that one of the most common ways to address unreliability is to use multiple indicators. Thinking back to what you’ve learned about factor analysis, briefly explain why multiple indicators can improve reliability. How can you determine whether a variable is a mediator or moderator? Reference Hayes, A. F. (2009). Beyond Baron and Kenny: Statistical mediation analysis in the new millennium. Communication Monographs, 76(4), 408–420. Questions What is an indirect or mediated effect? What is the difference between the total and direct effect? What is the main problem with the Barron & Kenny “Causal Steps Approach”? What is bootstrapping, and why is it a better way to test mediation than Sobel’s test? Explain how it is possible that “effects that don’t exist can be mediated”. "],["at-home-exercises-2.html", "3.3 At-Home Exercises", " 3.3 At-Home Exercises 3.3.1 Mediation In the first part of this practical, we will analyze the data contained in SelfEsteem.sav. These data comprise 143 observations of the following variables.1 case: Participant ID number ParAtt: Parental Attachment PeerAtt: Peer Attachment Emp: Empathy ProSoc: Prosocial behavior Aggr: Aggression SelfEst: Self-esteem 3.3.1.1 Load the SelfEsteem.sav data. Note: Unless otherwise specified, all analyses in Section 3.3.1 apply to these data. Click to show code library(haven) seData <- read_sav("SelfEsteem.sav") Suppose we are interested in the (indirect) effect of peer attachment on self-esteem, and whether empathy has a mediating effect on this relationship. We might generate the following hypotheses: Better peer relationships promote higher self esteem This effect is mediated by a student’s empathy levels, where better peer relationships increase empathy, and higher levels of empathy lead to higher self-esteem. To evaluate these hypotheses, we will use lavaan to estimate a path model. 3.3.1.2 Draw a path model (on paper) that can be used to test the above hypotheses. Label the input (X), outcome (Y), and mediator/intermediary (M). Label the paths a, b, and c’. Hint: Refer back to the Mediation Basics lecture if you need help here. Click for explanation 3.3.1.3 Specify the lavaan model syntax implied by the path diagram shown above. Save the resulting character string as an object in your environment. Hint: Refer back to the example in which opinions of systematic racism mediate the relationship between political affiliation and support for affirmative action policies from the Mediation Testing lecture this week. Click to show code mod <- ' ## Equation for outcome: SelfEst ~ Emp + PeerAtt ## Equation for the mediator: Emp ~ PeerAtt ' 3.3.1.4 Use the lavaan::sem() function to estimate the model defined in 3.3.1.3. Use the default settings in sem(). Click to show code library(lavaan) out <- sem(mod, data = seData) 3.3.1.5 Explore the summary of the fitted model. Which numbers correspond to the a, b, and c’ paths? Interpret these paths. Do the direction of the effects seem to align with our hypothesis? Click to show code summary(out) ## lavaan 0.6.16 ended normally after 1 iteration ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 5 ## ## Number of observations 143 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## SelfEst ~ ## Emp 0.234 0.091 2.568 0.010 ## PeerAtt 0.174 0.088 1.968 0.049 ## Emp ~ ## PeerAtt 0.349 0.076 4.628 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .SelfEst 0.934 0.110 8.456 0.000 ## .Emp 0.785 0.093 8.456 0.000 Click for explanation The results show estimates of the a path (Emp ~ PeerAtt), the b path (SelfEst ~ Emp), and the c’ path (SelfEst ~ PeerAtt). All three of these effects are positive and significant, including the direct effect of PeerAtt on SelfEst (\\(\\beta = 0.174\\), \\(Z = 1.97\\), \\(p = 0.025\\)), and the parts of the indirect effect made up by the effect of PeerAtt on Emp (\\(\\beta = 0.349\\), \\(Z = 4.63\\), \\(p = 0\\)), and Emp on SelfEst (\\(\\beta = 0.234\\), \\(Z = 2.57\\), \\(p = 0.005\\)). We can see that the direction of the effects seems to support of our hypotheses, but without taking the next steps to investigate the indirect effect, we should be hesitant to say more. Remember that an indirect effect (IE) is the product of multiple regression slopes. Therefore, to estimate an IE, we must define this product in our model syntax. In lavaan, we define the new IE parameter in two steps. Label the relevant regression paths. Use the labels to define a new parameter that represent the desired IE. We can define new parameters in lavaan model syntax via the := operator. The lavaan website contains a tutorial on this procedure: http://lavaan.ugent.be/tutorial/mediation.html 3.3.1.6 Use the procedure described above to modify the model syntax from 3.3.1.3 by adding the definition of the hypothesized IE from PeerAtt to SelfEst. Click to show code mod <- ' ## Equation for outcome: SelfEst ~ b * Emp + PeerAtt ## Equation for mediator: Emp ~ a * PeerAtt ## Indirect effect: ie := a * b ' Click for explanation Notice that I only label the parameters that I will use to define the IE. You are free to label any parameter that you like, but I choose the to label only the minimally sufficient set to avoid cluttering the code/output. 3.3.1.7 Use lavaan::sem() to estimate the model with the IEs defined. Use the default settings for sem(). Is the hypothesized IE significant according to the default tests? Hint: Refer to the Mediation Testing lecture Click to show code out <- sem(mod, data = seData) summary(out) ## lavaan 0.6.16 ended normally after 1 iteration ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 5 ## ## Number of observations 143 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## SelfEst ~ ## Emp (b) 0.234 0.091 2.568 0.010 ## PeerAtt 0.174 0.088 1.968 0.049 ## Emp ~ ## PeerAtt (a) 0.349 0.076 4.628 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .SelfEst 0.934 0.110 8.456 0.000 ## .Emp 0.785 0.093 8.456 0.000 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ## ie 0.082 0.036 2.245 0.025 Click for explanation The IE of Peer Attachment on Self Esteem through Empathy is statistically significant (\\(\\hat{\\textit{IE}} = 0.082\\), \\(Z = 2.25\\), \\(p = 0.012\\)). Note: The p-value above doesn’t match the output because we’re testing a directional hypothesis, but lavaan conducts two-tailed tests for the model parameters. As we learned in the lecture, the above test of the indirect effect is equivalent to Sobel’s Z test (which we don’t really want). An appropriate, robust test of the indirect effect requires bootstrapping, which we will do later this week as part of the in-class exercises. For now, we’ll add another input variable to our model: parental attachment. We will use this model to evaluate the following research questions: Is there a direct effect of parental attachment on self-esteem, after controlling for peer attachment and empathy? Is there a direct effect of peer attachment on self-esteem, after controlling for parental attachment and empathy? Is the effect of parental attachment on self-esteem mediated by empathy, after controlling for peer attachment? Is the effect of peer attachment on self-esteem mediated by empathy, after controlling for parental attachment? 3.3.1.8 Run the path model needed to test the research questions listed above. Specify the lavaan model syntax implied by the research questions. Allow peer attachment and parental attachment to covary. Define two new parameters to represent the hypothesized indirect effects. Estimate the model using lavaan::sem(). Use the default settings in sem(). Investigate the model summary. Click to show code mod <- ' ## Equation for outcome: SelfEst ~ b * Emp + ParAtt + PeerAtt ## Equation for mediator: Emp ~ a1 * ParAtt + a2 * PeerAtt ## Covariance: ParAtt ~~ PeerAtt ie_ParAtt := a1 * b ie_PeerAtt := a2 * b ' out <- sem(mod, data = seData) summary(out) ## lavaan 0.6.16 ended normally after 16 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 10 ## ## Number of observations 143 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## SelfEst ~ ## Emp (b) 0.206 0.088 2.357 0.018 ## ParAtt 0.287 0.078 3.650 0.000 ## PeerAtt 0.024 0.094 0.252 0.801 ## Emp ~ ## ParAtt (a1) 0.078 0.075 1.045 0.296 ## PeerAtt (a2) 0.306 0.086 3.557 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## ParAtt ~~ ## PeerAtt 0.537 0.103 5.215 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .SelfEst 0.854 0.101 8.456 0.000 ## .Emp 0.779 0.092 8.456 0.000 ## ParAtt 1.277 0.151 8.456 0.000 ## PeerAtt 0.963 0.114 8.456 0.000 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ## ie_ParAtt 0.016 0.017 0.956 0.339 ## ie_PeerAtt 0.063 0.032 1.965 0.049 3.3.1.9 What can we say about the two indirect effects? Can we say that empathy mediates both paths? Click to show explanation According to the Sobel-style test, after controlling for parental attachment, the indirect effect of peer attachment on self-esteem was statistically significant (\\(\\hat{IE} = 0.063\\), \\(Z = 1.96\\), \\(p = 0.049\\)), as was the analogous direct effect (\\(\\hat{\\beta} = 0.306\\), \\(Z = 3.56\\), \\(p < 0.001\\)). After controlling for peer attachment, neither the indirect effect (\\(\\hat{IE} = 0.016\\), \\(Z = 0.96\\), \\(p = 0.339\\)) nor the direct effect (\\(\\hat{\\beta} = 0.078\\), \\(Z = 1.05\\), \\(p = 0.296\\)) of parental attachment on self-esteem was significant, though. 3.3.2 Moderation Remember that moderation attempts to describe when one variable influences another. For the home exercise, we’ll go back to the Sesame Street data we worked with for the in-class exercises last week. 3.3.2.1 Load the Sesam2.sav data.2 NOTE: Unless otherwise specified, all analyses in Section 3.3.2 use these data. Click to show code # Read the data into an object called 'sesam2': sesam2 <- read_sav("Sesam2.sav") VIEWCAT is a nominal grouping variable, but it is represented as a numeric variable in the sesam2 data. The levels represent the following frequencies of Sesame Street viewership of the children in the data: VIEWCAT = 1: Rarely/Never VIEWCAT = 2: 2–3 times a week VIEWCAT = 3: 4–5 times a week VIEWCAT = 4: > 5 times a week 3.3.2.2 Convert VIEWCAT into a factor. Make sure that VIEWCAT = 1 is the reference group. Hints: You can identify the reference group with the levels() or contrasts() functions. The reference group is the group labelled with the first level printed by levels(). When you run contrasts(), you will see a pattern matrix that defines a certain dummy coding scheme. The reference group is the group that has zeros in each column of this matrix. If you need to change the reference group, you can use the relevel() function. Click to show code library(forcats) ## Convert 'VIEWCAT' to a factor: sesam2 <- sesam2 %>% mutate(VIEWCAT = factor(VIEWCAT)) ## Optionally specify the labels # sesam2 <- # sesam2 %>% # mutate(VIEWCAT = factor(VIEWCAT, # levels = c(1, 2, 3, 4), # labels = c("Rarely/never", # "2-3 times per week", # "4-5 times per week", # "> 5 times per week"))) ## Check the reference group: levels(sesam2$VIEWCAT) ## [1] "1" "2" "3" "4" contrasts(sesam2$VIEWCAT) ## 2 3 4 ## 1 0 0 0 ## 2 1 0 0 ## 3 0 1 0 ## 4 0 0 1 ## If necessary, relevel # sesam <- # sesam2 %>% # mutate(VIEWCAT = relevel(VIEWCAT, 1)) 3.3.2.3 Use lm() to estimate a multiple regression model wherein VIEWCAT predicts POSTNUMB. Summarize the model. Interpret the estimates. Click to show code lmOut <- lm(POSTNUMB ~ VIEWCAT, data = sesam2) summary(lmOut) ## ## Call: ## lm(formula = POSTNUMB ~ VIEWCAT, data = sesam2) ## ## Residuals: ## Min 1Q Median 3Q Max ## -25.474 -7.942 0.240 8.526 25.240 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 18.760 2.316 8.102 8.95e-14 *** ## VIEWCAT2 9.331 2.900 3.218 0.00154 ** ## VIEWCAT3 14.714 2.777 5.298 3.49e-07 *** ## VIEWCAT4 18.032 2.809 6.419 1.24e-09 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 11.58 on 175 degrees of freedom ## Multiple R-squared: 0.2102, Adjusted R-squared: 0.1967 ## F-statistic: 15.53 on 3 and 175 DF, p-value: 5.337e-09 Click for explanation Viewing category explains a statistically significant proportion of the variance in the post-test score of numbers learned (\\(R^2 = 0.21\\), \\(F(3, 175) = 15.53\\), \\(p < 0.001\\)). Kids who never or rarely watched Sesame Street had an average score of 18.76 on the post-test. Kids with weekly viewing habits of 2–3, 4–5, or 5+ times per week all had significantly higher scores on the post-test than kids who never or rarely watched Sesame Street (2–3: \\(\\hat{\\beta} = 9.33\\), \\(t = 3.22\\), \\(p = 0.002\\); 4–5: \\(\\hat{\\beta} = 14.71\\), \\(t = 5.3\\), \\(p < 0.001\\); 5+: \\(\\hat{\\beta} = 18.03\\), \\(t = 6.42\\), \\(p < 0.001\\)). If we compare the box plot, kernel density plot, and model output below, the relationships between the regression coefficient estimates for the viewing categories and the group means should be evident. 3.3.2.4 Use ggplot() to make a scatterplot with AGE on the x-axis and POSTNUMB on the y-axis. Color the points according to the their VIEWCAT level. Save the plot object to a variable in your environment. Hint: You can map color to the levels of a variable on your dataset by assigning the variable names to the color argument of the aes() function in ggplot(). Click to show code library(ggplot2) ## Add aes(..., color = VIEWCAT) to get different colors for each group: p <- ggplot(sesam2, aes(x = AGE, y = POSTNUMB, color = VIEWCAT)) + geom_point() # Add points for scatterplot ## Print the plot stored as 'p': p We assigned the global color aesthetic to the VIEWCAT variable, so the points are colored based on their group. 3.3.2.5 Add linear regression lines for each group to the above scatterplot. Hints: You can add regression lines with ggplot2::geom_smooth() To get linear regression lines, set the argument method = \"lm\" To omit error envelopes, set the argument se = FALSE Click to show code ## Add OLS best-fit lines: p + geom_smooth(method = "lm", se = FALSE) The global color aesthetic assignment from above carries through to any additional plot elements that we add, including the regression lines. So, we also get a separate regression line for each VIEWCAT group. 3.3.2.6 How would you interpret the pattern of regression lines above? Click for explanation All the lines show a positive slope, so post-test number recognition appears to increase along with increasing age. The lines are not parallel, though. So VIEWCAT may be moderating the effect of AGE on POSTNUMB. Based on the figure we just created, we may want to test for moderation in our regression model. To do so, we need to add an interaction between AGE and VIEWCAT. The VIEWCAT factor is represented by 3 dummy codes in our model, though. So when we interact AGE and VIEWCAT, we will create 3 interaction terms. To test the overall moderating influence of VIEWCAT, we need to conduct a multiparameter hypothesis test of all 3 interaction terms. One way that we can go about implementing such a test is through a hierarchical regression analysis entailing three steps: Estimate the additive model wherein we regress POSTNUMB onto AGE and VIEWCAT without any interaction. Estimate the moderated model by adding the interaction between AGE and VIEWCAT into the additive model. Conduct a \\(\\Delta R^2\\) test to compare the fit of the two models. 3.3.2.7 Conduct the hierarchical regression analysis described above. Does VIEWCAT significantly moderate the effect of AGE on POSTNUMB? Provide statistical justification for your conclusion. Click to show code ## Estimate the additive model a view the results: results_add <- lm(POSTNUMB ~ VIEWCAT + AGE, data = sesam2) summary(results_add) ## ## Call: ## lm(formula = POSTNUMB ~ VIEWCAT + AGE, data = sesam2) ## ## Residuals: ## Min 1Q Median 3Q Max ## -23.680 -8.003 -0.070 8.464 22.635 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -10.1056 6.5091 -1.553 0.12235 ## VIEWCAT2 9.1453 2.7390 3.339 0.00103 ** ## VIEWCAT3 13.8602 2.6294 5.271 3.98e-07 *** ## VIEWCAT4 16.9215 2.6636 6.353 1.79e-09 *** ## AGE 0.5750 0.1221 4.708 5.08e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 10.94 on 174 degrees of freedom ## Multiple R-squared: 0.2995, Adjusted R-squared: 0.2834 ## F-statistic: 18.6 on 4 and 174 DF, p-value: 9.642e-13 ## Estimate the moderated model and view the results: results_mod <- lm(POSTNUMB ~ VIEWCAT * AGE, data = sesam2) summary(results_mod) ## ## Call: ## lm(formula = POSTNUMB ~ VIEWCAT * AGE, data = sesam2) ## ## Residuals: ## Min 1Q Median 3Q Max ## -23.8371 -8.2387 0.6158 8.7988 22.5611 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -18.7211 15.5883 -1.201 0.2314 ## VIEWCAT2 9.9741 20.6227 0.484 0.6293 ## VIEWCAT3 23.5825 19.3591 1.218 0.2248 ## VIEWCAT4 34.3969 19.3600 1.777 0.0774 . ## AGE 0.7466 0.3074 2.429 0.0162 * ## VIEWCAT2:AGE -0.0175 0.4060 -0.043 0.9657 ## VIEWCAT3:AGE -0.1930 0.3782 -0.510 0.6104 ## VIEWCAT4:AGE -0.3416 0.3770 -0.906 0.3663 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 10.99 on 171 degrees of freedom ## Multiple R-squared: 0.3046, Adjusted R-squared: 0.2762 ## F-statistic: 10.7 on 7 and 171 DF, p-value: 3.79e-11 ## Test for moderation: anova(results_add, results_mod) Click for explanation VIEWCAT does not significantly moderate the effect of AGE on POSTNUMB (\\(F[3, 171] = 0.422\\), \\(p = 0.738\\)). 3.3.2.8 Sketch the analytic path diagrams for the additive and moderated models you estimated in 3.3.2.7 (on paper). Click for explanation Additive Model Moderated Model End of At-Home Exercises 3 These data were simulated from the covariance matrix provided in Laible, D. J., Carlo, G., & Roesch, S. C. (2004). Pathways to self-esteem in late adolescence: The role of parent and peer attachment, empathy, and social behaviours. Journal of adolescence, 27(6), 703-716.↩︎ These data are from the very interesting study: Ball, S., & Bogatz, G. A. (1970). A Summary of the Major Findings in” The First Year of Sesame Street: An Evaluation”.↩︎ "],["in-class-exercises-2.html", "3.4 In-Class Exercises", " 3.4 In-Class Exercises 3.4.1 Mediation In this practical, we’ll go back to the data from the at-home exercises, SelfEsteem.sav. Recall that these data comprise 143 observations of the following variables. case: Participant ID number ParAtt: Parental Attachment PeerAtt: Peer Attachment Emp: Empathy ProSoc: Prosocial behavior Aggr: Aggression SelfEst: Self-esteem When we last worked with the data, we built a model with one mediator (Emp), creating indirect effects between our predictors ParAtt and PeerAtt, and our outcome variable SelfEst. Below, you will estimate a more complex, multiple-mediator model. 3.4.1.1 Load the data into the object seData using haven::read_sav() Click to show code library(haven) seData <- read_sav("SelfEsteem.sav") For this analysis, we are interested in the (indirect) effects of parental and peer attachment on self-esteem. Furthermore, we want to evaluate the mediating roles of empathy and social behavior (i.e., prosocial behavior and aggression). Specifically, we have the following hypotheses. Better peer relationships will promote higher self-esteem via a three-step indirect process. Better peer relationships will increase empathy levels. Higher empathy will increase prosocial behavior and decrease aggressive behavior. More prosocial behaviors and less aggressive behavior will both produce higher self-esteem. Better relationships with parents directly increase self-esteem. To evaluate these hypotheses, we will use lavaan to estimate the following multiple mediator model as a path model. 3.4.1.2 Specify the lavaan model syntax implied by the path diagram shown above. Save the resulting character string as an object in your environment. Click to show code mod0 <- ' ## Equation for outcome: SelfEst ~ ProSoc + Aggr + Emp + ParAtt + PeerAtt ## Equations for stage 2 mediators: ProSoc ~ PeerAtt + ParAtt + Emp Aggr ~ PeerAtt + ParAtt + Emp ## Equation for stage 1 mediator: Emp ~ ParAtt + PeerAtt ## Covariances: ProSoc ~~ Aggr ParAtt ~~ PeerAtt ' 3.4.1.3 Use the lavaan::sem() function to estimate the model defined in 3.4.1.2. Use the default settings in sem(). Summarize the fitted model. Click to show code library(lavaan) out <- sem(mod0, data = seData) summary(out) ## lavaan 0.6.16 ended normally after 16 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 21 ## ## Number of observations 143 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## SelfEst ~ ## ProSoc 0.252 0.096 2.634 0.008 ## Aggr 0.185 0.085 2.172 0.030 ## Emp 0.143 0.098 1.460 0.144 ## ParAtt 0.244 0.078 3.133 0.002 ## PeerAtt 0.051 0.091 0.555 0.579 ## ProSoc ~ ## PeerAtt -0.037 0.080 -0.469 0.639 ## ParAtt 0.193 0.067 2.886 0.004 ## Emp 0.477 0.074 6.411 0.000 ## Aggr ~ ## PeerAtt -0.095 0.090 -1.055 0.291 ## ParAtt -0.034 0.075 -0.454 0.650 ## Emp -0.309 0.084 -3.697 0.000 ## Emp ~ ## ParAtt 0.078 0.075 1.045 0.296 ## PeerAtt 0.306 0.086 3.557 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## .ProSoc ~~ ## .Aggr -0.086 0.058 -1.476 0.140 ## ParAtt ~~ ## PeerAtt 0.537 0.103 5.215 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .SelfEst 0.796 0.094 8.456 0.000 ## .ProSoc 0.618 0.073 8.456 0.000 ## .Aggr 0.777 0.092 8.456 0.000 ## .Emp 0.779 0.092 8.456 0.000 ## ParAtt 1.277 0.151 8.456 0.000 ## PeerAtt 0.963 0.114 8.456 0.000 3.4.1.4 Considering the parameter estimates from 3.4.1.3, what can you say about the hypotheses? Click for explanation Notice that all of the hypotheses stated above are explicitly directional. Hence, when evaluating the significance of the structural paths that speak to these hypotheses, we should use one-tailed tests. We cannot ask lavaan to return one-tailed p-values, but we have no need to do so. We can simply divide the two-tailed p-values in half. The significant direct effect of ParAtt on SelfEst (\\(\\beta = 0.244\\), \\(Z = 3.13\\), \\(p = 0.001\\)) and the lack of a significant direct effect of PeerAtt on SelfEst (\\(\\beta = 0.051\\), \\(Z = 0.555\\), \\(p = 0.29\\)) align with our hypotheses. The remaining patterns of individual estimates also seem to conform to the hypotheses (e.g., all of the individual paths comprising the indirect effects of PeerAtt on SelfEst are significant). We cannot make any firm conclusions until we actually estimate and test the indirect effects, though. 3.4.1.5 Modify the model syntax from 3.4.1.2 by adding definitions of the two hypothesized IEs from PeerAtt to SelfEst. Click to show code You can use any labeling scheme that makes sense to you, but I recommend adopting some kind of systematic rule. Here, I will label the individual estimates in terms of the short variable names used in the path diagram above. mod <- ' ## Equation for outcome: SelfEst ~ y_m21 * ProSoc + y_m22 * Aggr + Emp + ParAtt + PeerAtt ## Equations for stage 2 mediators: ProSoc ~ m21_x2 * PeerAtt + ParAtt + m21_m1 * Emp Aggr ~ m22_x2 * PeerAtt + ParAtt + m22_m1 * Emp ## Equation for stage 1 mediator: Emp ~ ParAtt + m1_x2 * PeerAtt ## Covariances: ProSoc ~~ Aggr ParAtt ~~ PeerAtt ## Indirect effects: ie_pro := m1_x2 * m21_m1 * y_m21 ie_agg := m1_x2 * m22_m1 * y_m22 ' 3.4.1.6 Use lavaan::sem() to estimate the model with the IEs defined. Use the default settings for sem(). Are the hypothesized IEs significant according to the default tests? Click to show code out <- sem(mod, data = seData) summary(out) ## lavaan 0.6.16 ended normally after 16 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 21 ## ## Number of observations 143 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## SelfEst ~ ## ProSoc (y_21) 0.252 0.096 2.634 0.008 ## Aggr (y_22) 0.185 0.085 2.172 0.030 ## Emp 0.143 0.098 1.460 0.144 ## ParAtt 0.244 0.078 3.133 0.002 ## PerAtt 0.051 0.091 0.555 0.579 ## ProSoc ~ ## PerAtt (m21_2) -0.037 0.080 -0.469 0.639 ## ParAtt 0.193 0.067 2.886 0.004 ## Emp (m21_1) 0.477 0.074 6.411 0.000 ## Aggr ~ ## PerAtt (m22_2) -0.095 0.090 -1.055 0.291 ## ParAtt -0.034 0.075 -0.454 0.650 ## Emp (m22_1) -0.309 0.084 -3.697 0.000 ## Emp ~ ## ParAtt 0.078 0.075 1.045 0.296 ## PerAtt (m1_2) 0.306 0.086 3.557 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## .ProSoc ~~ ## .Aggr -0.086 0.058 -1.476 0.140 ## ParAtt ~~ ## PeerAtt 0.537 0.103 5.215 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .SelfEst 0.796 0.094 8.456 0.000 ## .ProSoc 0.618 0.073 8.456 0.000 ## .Aggr 0.777 0.092 8.456 0.000 ## .Emp 0.779 0.092 8.456 0.000 ## ParAtt 1.277 0.151 8.456 0.000 ## PeerAtt 0.963 0.114 8.456 0.000 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ## ie_pro 0.037 0.018 2.010 0.044 ## ie_agg -0.017 0.011 -1.657 0.098 Click for explanation The IE of Peer Attachment on Self Esteem through Empathy and Prosocial Behavior is significant (\\(\\hat{\\textit{IE}} = 0.037\\), \\(Z = 2.01\\), \\(p = 0.022\\)), as is the analogous IE through Aggressive Behavior (\\(\\hat{\\textit{IE}} = -0.017\\), \\(Z = -1.66\\), \\(p = 0.049\\)). Though, this latter effect is just barely significant at the \\(\\alpha = 0.05\\) level. The tests we used to evaluate the significance of the IEs in 3.4.1.6 are flawed because they assume normal sampling distributions for the IEs. However the IEs are defined as products of multiple, normally distributed, regression slopes. So the IEs themselves cannot be normally distributed (at least in finite samples), and the results of the normal-theory significance tests may be misleading. To get an accurate test of the IEs, we should use bootstrapping to generate an empirical sampling distribution for each IE. In lavaan, we implement bootstrapping by specifying the se = \"bootstrap\" option in the fitting function (i.e., the cfa() or sem() function) and specifying the number of bootstrap samples via the bootstrap option. Workflow Tip To draw reliable conclusions from bootstrapped results, we need many bootstrap samples (i.e., B > 1000), but we must estimate the full model for each of these samples, so the estimation can take a long time. To avoid too much frustration, you should first estimate the model without bootstrapping to make sure everything is specified correctly. Only after you are certain that your code is correct do you want to run the full bootstrapped version. 3.4.1.7 Re-estimate the model from 3.4.1.6 using 1000 bootstrap samples. Other than the se and bootstrap options, use the defaults. Are the hypothesized IEs significant according to the bootstrap-based test statistics? Click to show code ## Set a seed to get replicable bootstrap samples: set.seed(235711) ## Estimate the model with bootstrapping: out_boot <- sem(mod, data = seData, se = "bootstrap", bootstrap = 1000) ## Summarize the model: summary(out_boot) ## lavaan 0.6.16 ended normally after 16 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 21 ## ## Number of observations 143 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Bootstrap ## Number of requested bootstrap draws 1000 ## Number of successful bootstrap draws 1000 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## SelfEst ~ ## ProSoc (y_21) 0.252 0.100 2.529 0.011 ## Aggr (y_22) 0.185 0.085 2.174 0.030 ## Emp 0.143 0.095 1.507 0.132 ## ParAtt 0.244 0.079 3.089 0.002 ## PerAtt 0.051 0.095 0.530 0.596 ## ProSoc ~ ## PerAtt (m21_2) -0.037 0.082 -0.456 0.648 ## ParAtt 0.193 0.068 2.831 0.005 ## Emp (m21_1) 0.477 0.078 6.092 0.000 ## Aggr ~ ## PerAtt (m22_2) -0.095 0.087 -1.093 0.275 ## ParAtt -0.034 0.076 -0.448 0.654 ## Emp (m22_1) -0.309 0.092 -3.356 0.001 ## Emp ~ ## ParAtt 0.078 0.072 1.092 0.275 ## PerAtt (m1_2) 0.306 0.079 3.896 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## .ProSoc ~~ ## .Aggr -0.086 0.058 -1.493 0.135 ## ParAtt ~~ ## PeerAtt 0.537 0.128 4.195 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .SelfEst 0.796 0.082 9.698 0.000 ## .ProSoc 0.618 0.068 9.114 0.000 ## .Aggr 0.777 0.104 7.476 0.000 ## .Emp 0.779 0.090 8.651 0.000 ## ParAtt 1.277 0.197 6.473 0.000 ## PeerAtt 0.963 0.105 9.203 0.000 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ## ie_pro 0.037 0.019 1.891 0.059 ## ie_agg -0.017 0.011 -1.638 0.101 Click for explanation As with the normal-theory tests, the hypothesized IE of Peer Attachment on Self Esteem was significant (\\(\\hat{\\textit{IE}} = 0.037\\), \\(Z = 1.89\\), \\(p = 0.029\\)), but the IE of Aggressive Behavior has crossed into nonsignificant territory (\\(\\hat{\\textit{IE}} = -0.017\\), \\(Z = -1.64\\), \\(p = 0.051\\)). Note: Bootstrapping is a stochastic method, so each run can provide different results. Since the indirect effect of aggressive behavior is so close to the critical value, you may come to a different conclusions vis-á-vis statistical significance if you run this analysis with a different random number seed or a different number of bootstrap samples. When you use the summary() function to summarize the bootstrapped model from 3.4.1.7, the output will probably look pretty much the same as it did in 3.4.1.6, but it’s not. The standard errors and test statistics in the bootstrapped summary are derived from empirical sampling distributions, whereas these values are based on an assumed normal sampling distribution in 3.4.1.6. The standard method of testing IEs with bootstrapping is to compute confidence intervals (CIs) from the empirical sampling distribution of the IEs. In lavaan, we can compute basic (percentile, 95%) CIs by adding the ci = TRUE option to the summary() function. To evaluate our directional hypotheses at an \\(\\alpha = 0.05\\) level, however, we need to compute 90% CIs. We can get more control over the summary statistics (include the CIs) with the parameterEstimates() function. 3.4.1.8 Check the documentation for lavaan::parameterEstimates(). Click to show code ?parameterEstimates 3.4.1.9 Use the parameterEstimates() function to compute bootstrapped CIs for the hypothesized IEs. Compute percentile CIs. Are the IEs significant according to the bootstrapped CIs? Click to show code parameterEstimates(out_boot, ci = TRUE, level = 0.9) Click for explanation When evaluating a directional hypothesis with a CI, we only consider one of the interval’s boundaries. For a hypothesized positive effect, we check only if the lower boundary is greater than zero. For a hypothesized negative effect, we check if the upper boundary is less than zero. As with the previous tests, the IE of Peer Attachment on Self Esteem through Empathy and Prosocial Behavior is significant (\\(\\hat{\\textit{IE}} = 0.037\\), \\(95\\% ~ CI = [0.009; \\infty]\\)), but the analogous IE through Aggressive Behavior is not quite significant (\\(\\hat{\\textit{IE}} = -0.017\\), \\(95\\% ~ CI = [-\\infty; -0.003]\\)). 3.4.1.10 Based on the analyses you’ve conducted here, what do you conclude vis-à-vis the original hypotheses? Click for explanation When using normal-theory tests, both hypothesized indirect effects between Peer Attachment and Self Esteem were supported in that the IE through Empathy and Prosocial Behavior as well as the IE through Empathy and Aggressive Behavior were both significant. The hypothesized direct effect of Parent Attachment on Self Esteem was also born out via a significant direct effect in the model. When testing the indirect effects with bootstrapping, however, the effect through Aggressive Behavior was nonsignificant. Since bootstrapping gives a more accurate test of the indirect effect, we should probably trust these results more than the normal-theory results. We should not infer a significant indirect effect of Peer Attachment on Self Esteem transmitted through Empathy and Aggressive Behavior. These results may not tell the whole story, though. We have not tested for indirect effects between Parent Attachment and Self Esteem, and we have not evaluated simpler indirect effects between Peer Attachment and Self Esteem (e.g., PeerAtt \\(\\rightarrow\\) Emp \\(\\rightarrow\\) SelfEst). 3.4.2 Moderation We will first analyze a synthetic version of the Outlook on Life Survey data. The original data were collected in the United States in 2012 to measure, among other things, attitudes about racial issues, opinions of the Federal government, and beliefs about the future. We will work with a synthesized subset of the original data. You can access these synthetic data as outlook.rds. This dataset comprises 2288 observations of the following 13 variables. d1:d3: Three observed indicators of a construct measuring disillusionment with the US Federal government. Higher scores indicate more disillusionment s1:s4: Four observed indicators of a construct measuring the perceived achievability of material success. Higher scores indicate greater perceived achievability progress: A single item assessing perceived progress toward achieving the “American Dream” Higher scores indicate greater perceived progress merit: A single item assessing endorsement of the meritocratic ideal that hard work leads to success. Higher scores indicate stronger endorsement of the meritocratic ideal lib2Con: A single item assessing liberal-to-conservative orientation Lower scores are more liberal, higher scores are more conservative party: A four-level factor indicating self-reported political party affiliation disillusion: A scale score representing disillusionment with the US Federal government Created as the mean of d1:d3 success: A scale score representing the perceived achievability of material success Created as the mean of s1:s4 To satisfy the access and licensing conditions under which the original data are distributed, the data contained in outlook.rds were synthesized from the original variables using the methods described by Volker and Vink (2021). You can access the original data here, and you can access the code used to process the data here. 3.4.2.1 Read in the outlook.rds dataset. Hint: An RDS file is an R object that’s been saved to a file. To read in this type of file, we use readRDS() from base R. Click to show code outlook <- readRDS("outlook.rds") 3.4.2.2 Summarize the outlook data to get a sense of their characteristics. Click to show code head(outlook) summary(outlook) ## d1 d2 d3 s1 ## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 ## 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:3.000 1st Qu.:2.000 ## Median :4.000 Median :3.000 Median :4.000 Median :2.000 ## Mean :3.642 Mean :3.218 Mean :3.629 Mean :2.288 ## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:3.000 ## Max. :5.000 Max. :5.000 Max. :5.000 Max. :4.000 ## s2 s3 s4 progress ## Min. :1.000 Min. :1.000 Min. :1.000 Min. : 1.000 ## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:2.000 1st Qu.: 5.000 ## Median :2.000 Median :2.000 Median :2.000 Median : 7.000 ## Mean :1.922 Mean :2.012 Mean :2.469 Mean : 6.432 ## 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.: 8.000 ## Max. :4.000 Max. :4.000 Max. :4.000 Max. :10.000 ## merit lib2Con party disillusion ## Min. :1.000 Min. :1.000 republican : 332 Min. :1.000 ## 1st Qu.:4.000 1st Qu.:3.000 democrat :1264 1st Qu.:3.000 ## Median :5.000 Median :4.000 independent: 576 Median :3.667 ## Mean :4.826 Mean :3.998 other : 116 Mean :3.497 ## 3rd Qu.:6.000 3rd Qu.:5.000 3rd Qu.:4.000 ## Max. :7.000 Max. :7.000 Max. :5.000 ## success ## Min. :1.000 ## 1st Qu.:1.750 ## Median :2.000 ## Mean :2.173 ## 3rd Qu.:2.500 ## Max. :4.000 str(outlook) ## 'data.frame': 2288 obs. of 13 variables: ## $ d1 : num 4 4 4 5 5 4 5 4 4 4 ... ## $ d2 : num 4 2 4 4 3 5 4 2 4 5 ... ## $ d3 : num 4 4 4 5 4 4 4 3 3 4 ... ## $ s1 : num 3 3 4 2 2 2 2 1 3 3 ... ## $ s2 : num 2 2 2 1 1 2 1 1 2 2 ... ## $ s3 : num 3 2 4 1 2 1 1 1 3 2 ... ## $ s4 : num 3 3 3 1 2 3 3 2 2 2 ... ## $ progress : num 8 4 6 1 6 5 7 6 9 7 ... ## $ merit : num 6 5 5 4 3 4 2 5 5 5 ... ## $ lib2Con : num 5 6 4 1 4 4 4 4 4 5 ... ## $ party : Factor w/ 4 levels "republican","democrat",..: 1 3 3 2 2 2 2 2 4 1 ... ## $ disillusion: num 4 3.33 4 4.67 4 ... ## $ success : num 2.75 2.5 3.25 1.25 1.75 2 1.75 1.25 2.5 2.25 ... We will first use OLS regression to estimate a model encoding the following relations: Belief in the achievability of success, success, predicts perceived progress toward the American Dream, progress, as the focal effect. Disillusionment with the US Federal government, disillusion moderates the success \\(\\rightarrow\\) progress effect. Placement on the liberal-to-conservative continuum, lib2Con is partialed out as a covariate. 3.4.2.3 Draw the conceptual path diagram for the model described above. Click for explanation 3.4.2.4 Write out the regression equation necessary to evaluate the moderation hypothesis described above. Click for explanation \\[ Y_{progress} = \\beta_0 + \\beta_1 W_{lib2Con} + \\beta_2 X_{success} + \\beta_3 Z_{disillusion} + \\beta_4 XZ + \\varepsilon \\] 3.4.2.5 Use lm() to estimate the moderated regression model via OLS regression. Click to show code olsFit <- lm(progress ~ lib2Con + success * disillusion, data = outlook) 3.4.2.6 Summarize the fitted model and interpret the results. Is the moderation hypothesis supported? How does disillusionment level affect the focal effect? Click to show code summary(olsFit) ## ## Call: ## lm(formula = progress ~ lib2Con + success * disillusion, data = outlook) ## ## Residuals: ## Min 1Q Median 3Q Max ## -7.4315 -1.2525 0.1307 1.4369 5.6717 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 6.81128 0.62073 10.973 < 2e-16 *** ## lib2Con 0.03052 0.03040 1.004 0.3155 ## success 0.42360 0.25853 1.638 0.1015 ## disillusion -0.78002 0.16864 -4.625 3.95e-06 *** ## success:disillusion 0.17429 0.07273 2.396 0.0166 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.041 on 2283 degrees of freedom ## Multiple R-squared: 0.1385, Adjusted R-squared: 0.137 ## F-statistic: 91.74 on 4 and 2283 DF, p-value: < 2.2e-16 Click for explanation Yes, disillusion significantly moderates the relation between success and progress (\\(\\beta = 0.174\\), \\(t[2283] = 2.396\\), \\(p = 0.017\\)) such that the effect of success on progress increases as levels of disillusion increase, after controlling for lib2Con. The rockchalk package contains some useful routines for probing interactions estimated via lm(). Specifically, the plotslopes() function will estimate and plot simple slopes, and the testSlopes() function tests the simple slopes estimated by plotSlopes(). 3.4.2.7 Probe the interaction. Use the plotSlopes() and testSlopes() functions from the rockchalk package to conduct a simple slopes analysis for the model from 3.4.2.5. Click to show code library(rockchalk) ## Estimate and plot simple slopes: psOut <- plotSlopes(olsFit, plotx = "success", modx = "disillusion", modxVals = "std.dev") ## Test the simple slopes: tsOut <- testSlopes(psOut) ## Values of disillusion OUTSIDE this interval: ## lo hi ## -28.9332857 0.2672244 ## cause the slope of (b1 + b2*disillusion)success to be statistically significant ## View the results: tsOut$hypotests Note: The message printed by testSlopes() gives the boundaries of the Johnson-Neyman Region of Significance (Johnson & Neyman, 1936). Johnson-Neyman analysis is an alternative method of probing interactions that we have not covered in this course. For more information, check out Preacher, et al. (2006). We will now use lavaan to estimate the moderated regression model from above as a path analysis. 3.4.2.8 Define the model syntax for the path analytic version of the model described above. Parameterize the model as in the OLS regression. Use only observed items and scale scores. Click to show code pathMod <- ' progress ~ 1 + lib2Con + success + disillusion + success:disillusion ' 3.4.2.9 Estimate the path model on the outlook data. Click to show code pathFit <- sem(pathMod, data = outlook) 3.4.2.10 Summarize the fitted path model and interpret the results. Do the results match the OLS regression results? What proportion of the variability in progress is explained by this model? Hint: the function lavInspect() can be used to extract information from models Click to show code summary(pathFit) ## lavaan 0.6.16 ended normally after 1 iteration ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 6 ## ## Number of observations 2288 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## progress ~ ## lib2Con 0.031 0.030 1.005 0.315 ## success 0.424 0.258 1.640 0.101 ## disillusion -0.780 0.168 -4.630 0.000 ## success:dsllsn 0.174 0.073 2.399 0.016 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .progress 6.811 0.620 10.985 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .progress 4.157 0.123 33.823 0.000 lavInspect(pathFit, "r2") ## progress ## 0.138 Click for explanation Yes, the estimates and inferential conclusions are all the same as in the OLS regression model. The model explains 13.85% of the variability in progress. The semTools package contains some helpful routines for probing interactions estimated via the lavaan() function (or one of it’s wrappers). Specifically, the probe2WayMC() and plotProbe() functions will estimate/test simple slopes and plot the estimated simple slopes, respectively. 3.4.2.11 Probe the interaction from 3.4.2.9 using semTools utilities. Use probe2WayMC() to estimate and test the simple slopes. Use plotProbe() to visualize the simple slopes. Define the simple slopes with the same conditional values of disillusion that you used in 3.4.2.7. Which simple slopes are significant? Do these results match the results from 3.4.2.7? Click to show code library(semTools) ## Define the conditional values at which to calculate simple slopes: condVals <- summarise(outlook, "m-sd" = mean(disillusion) - sd(disillusion), mean = mean(disillusion), "m+sd" = mean(disillusion) + sd(disillusion) ) %>% unlist() ## Compute simple slopes and intercepts: ssOut <- probe2WayMC(pathFit, nameX = c("success", "disillusion", "success:disillusion"), nameY = "progress", modVar = "disillusion", valProbe = condVals) ## Check the results: ssOut ## $SimpleIntcept ## disillusion est se z pvalue ## m-sd 2.719 4.690 0.231 20.271 0 ## mean 3.497 4.084 0.190 21.508 0 ## m+sd 4.274 3.477 0.230 15.122 0 ## ## $SimpleSlope ## disillusion est se z pvalue ## m-sd 2.719 0.897 0.083 10.792 0 ## mean 3.497 1.033 0.065 15.994 0 ## m+sd 4.274 1.169 0.088 13.223 0 ## Visualize the simple slopes: plotProbe(ssOut, xlim = range(outlook$success), xlab = "Ease of Personal Success", ylab = "Progress toward American Dream", legendArgs = list(legend = names(condVals)) ) Click for explanation Each of the simple slopes is significant. As level of disillusionment increases, the effect of success on progress also increases, and this effect is significant for all levels of disillusion considered here. These results match the simple slopes from the OLS regression analysis. End of In-Class Exercises 3 "],["efa.html", "4 EFA", " 4 EFA This week will be a general introduction to latent variables and scaling procedures. We will discuss several different aspects of exploratory factor analysis (EFA). Most notably: The differences between Principal Component Analyses (PCA) and Factor Analysis Model estimation and factor extraction methods Factor rotations You will have to make decisions regarding each of these aspects when conducting a factor analysis. We will also discuss reliability and factor scores as means of evaluating the properties of a scale. Homework before the lecture Watch the Lecture Recording for this week. Complete the Reading for this week, and answer the associated reading questions. Homework before the practical Complete the At-Home Exercises. Practical content During the practical you will work on the In-Class Exercises. "],["lecture-3.html", "4.1 Lecture", " 4.1 Lecture How do you know if you have measured the putative hypothetical construct that you intend to measure? The methods introduced in this lecture (namely, latent variables, factor analysis, and reliability analysis) can shed empirical light on this issue. In the social and behavioral sciences we’re often forced to measure key concepts indirectly. For example, we have no way of directly quantifying a person’s current level of depression, or their innate motivation, or their risk-aversion, or any of the other myriad psychological features that comprise the human mental state. In truth, we cannot really measure these hypothetical constructs at all, we must estimate latent representations thereof (though, psychometricians still use the language of physical measurement to describe this process). Furthermore, we can rarely estimate an adequate representation with only a single observed variable (e.g., question on a survey, score on a test, reading from a sensor). We generally need several observed variables to reliably represent a single hypothetical construct. For example, we cannot accurately determine someone’s IQ or socio-economic status based on their response to a single question; we need several questions that each tap into slightly different aspects of IQ or SES. Given multiple items measuring the same construct, we can use the methods discussed in this lecture (i.e., factor analysis and reliability analysis) to evaluate the quality of our measurement (i.e., how well we have estimated the underlying hypothetical construct). If we do well enough in this estimation task, we will be able to combine these estimated latent variables with the path analysis methods discussed in previous two weeks to produce the full structural equation models that we will cover at the end of this course. 4.1.1 Recording Notes: This week (and next), we’ll be re-using Caspar van Lissa’s old slides and lecture recording. So, you’ll see Caspar in the following video, and the slides will have a notably different flavor than our usual materials. Don’t be confused by any mention of “model fit” in the lecture. We haven’t covered model fit yet, but we will do so next week. 4.1.2 Slides You can download the lecture slides here. "],["reading-3.html", "4.2 Reading", " 4.2 Reading This week, you will read two papers. Reference 1 Preacher, K. J., & MacCullum, R. C. (2003). Repairing Tom Swift’s electric factor analysis machine, Understanding Statistics 2(1) 13–43. Questions 1 What is a latent variable? Give an example of a latent variable. What is factor analysis, and what can you investigate using this method? In the introduction, Preacher and Maccallum describe a “little jiffy” method of doing factor analysis. Briefly describe this little jiffy—or bad practice—method. Briefly explain the key differences between Principal Component Analyses (PCA) and Exploratory Factor Analyses (EFA). What is the purpose of factor rotation? Reference 2 Kestilä, E. (2006). Is there demand for radical right populism in the Finnish electorate? Scandinavian Political Studies 29(3), 169–191. Questions 2 What is the research question that the author tries to answer? Briefly describe the characteristics of the Radical Right Parties (RRP) in Europe. What are the two main explanations of support for RRP upon which this paper focuses? Does the empirical part of the paper reflect the theoretical framework well? Why or why not? According to the author, is Finland very different from other European countries on the main dependent variables? What is the author’s conclusion (i.e., how does the author answer the research question)? "],["at-home-exercises-3.html", "4.3 At-Home Exercises", " 4.3 At-Home Exercises In these exercises, you will attempt to replicate some of the analyses from the second reading for this week: Kestilä, E. (2006). Is there demand for radical right populism in the Finnish electorate? Scandinavian Political Studies 29(3), 169–191. The data for this practical were collected during the first round of the European Social Survey (ESS). The ESS is a repeated cross-sectional survey administered in 32 European countries. The first wave was collected in 2002, and two new waves have been collected each year since. You can find more info and access the data at https://www.europeansocialsurvey.org. The data we will analyze for this practical are contained in the file named ESSround1-a.sav. This file contains data for all respondents, but only includes those variables that you will need to complete the following exercises. 4.3.1 Load the ESSround1-a.sav dataset into R. Inspect the data after loading to make sure everything went well. Click to show code ## Load the 'haven' package: library(haven) library(tidySEM) ## Read the 'ESSround1-a.sav' data into a data frame called 'ess': ess <- read_spss("ESSround1-a.sav") ## Inspect the result: dim(ess) head(ess) descriptives(ess) ## [1] 42359 50 Click here for a description of the variables. Variable Description name Title of dataset essround ESS round edition Edition proddate Production date cntry Country idno Respondent’s identification number trstlgl Trust in the legal system trstplc Trust in the police trstun Trust in the United Nations trstep Trust in the European Parliament trstprl Trust in country’s parliament stfhlth State of health services in country nowadays stfedu State of education in country nowadays stfeco How satisfied with present state of economy in country stfgov How satisfied with the national government stfdem How satisfied with the way democracy works in country pltinvt Politicians interested in votes rather than peoples opinions pltcare Politicians in general care what people like respondent think trstplt Trust in politicians imsmetn Allow many/few immigrants of same race/ethnic group as majority imdfetn Allow many/few immigrants of different race/ethnic group from majority eimrcnt Allow many/few immigrants from richer countries in Europe eimpcnt Allow many/few immigrants from poorer countries in Europe imrcntr Allow many/few immigrants from richer countries outside Europe impcntr Allow many/few immigrants from poorer countries outside Europe qfimchr Qualification for immigration: christian background qfimwht Qualification for immigration: be white imwgdwn Average wages/salaries generally brought down by immigrants imhecop Immigrants harm economic prospects of the poor more than the rich imtcjob Immigrants take jobs away in country or create new jobs imbleco Taxes and services: immigrants take out more than they put in or less imbgeco Immigration bad or good for country’s economy imueclt Country’s cultural life undermined or enriched by immigrants imwbcnt Immigrants make country worse or better place to live imwbcrm Immigrants make country’s crime problems worse or better imrsprc Richer countries should be responsible for accepting people from poorer countries pplstrd Better for a country if almost everyone share customs and traditions vrtrlg Better for a country if a variety of different religions shrrfg Country has more than its fair share of people applying refugee status rfgawrk People applying refugee status allowed to work while cases considered gvrfgap Government should be generous judging applications for refugee status rfgfrpc Most refugee applicants not in real fear of persecution own countries rfggvfn Financial support to refugee applicants while cases considered rfgbfml Granted refugees should be entitled to bring close family members gndr Gender yrbrn Year of birth edulvl Highest level of education eduyrs Years of full-time education completed polintr How interested in politics lrscale Placement on left right scale One thing you might notice when inspecting the ess data is that most of the variables are stored as labelled vectors. When loading SPSS data, haven will use these labelled vectors to preserve the metadata associated with SPSS scale variables (i.e., variable labels and value labels). While it’s good to have this metadata available, we want to analyze these items as numeric variables and factors, so the value labels are only going to make our lives harder. Thankfully, the labelled package contains many routines for manipulating labelled vectors. We’ll deal with the numeric variables in just a bit, but our first task will be to covert grouping variables to factors. 4.3.2 Convert the cntry, gndr, edulvl, and polintr variables into factors. Use the as_factor() function to do the conversion. Convert edulvl and polintr to ordered factors. Click to see code library(dplyr) ess <- mutate(ess, country = as_factor(cntry), sex = as_factor(gndr), edulvl = as_factor(edulvl, ordered = TRUE), polintr = as_factor(polintr, ordered = TRUE) ) The ess dataset contains much more information than Kestilä (2006) used. Kestilä only analyzed data from the following ten countries: Austria Belgium Denmark Finland France Germany Italy Netherlands Norway Sweden So, our next task is to subset the data to only the relevant population. When we apply logical subsetting, we can select rows from a dataset based on logical conditions. In this case, we want to select only rows from the 10 countries listed above. 4.3.3 Subset the data to include only the 10 countries analyzed by Kestilä (2006). Inspect the subsetted data to check that everything went well. Hints: Use the %in% operator to create a logical vector that indicates which elements of the cntry variable are in the set of target counties. Use the droplevels() levels function to clean up empty factor levels. Click to show code ## Create a character vector naming the target countries: targets <- c("Austria", "Belgium", "Denmark", "Finland", "France", "Germany", "Italy", "Netherlands", "Norway", "Sweden") ## Select only those rows that come from a target country: ess <- filter(ess, country %in% targets) %>% # Subset rows droplevels() # Drop empty factor levels ## Inspect the result: dim(ess) ## [1] 19690 52 table(ess$country) ## ## Austria Belgium Germany Denmark Finland France ## 2257 1899 2919 1506 2000 1503 ## Italy Netherlands Norway Sweden ## 1207 2364 2036 1999 In keeping with common practice, we will treat ordinal Likert-type rating scales with five or more levels as continuous. Since some R routines will treat labelled vectors as discrete variables, we can make things easier for ourselves by converting all the labelled vectors in our data to numeric vectors. We can use the labelled::remove_val_labels() function to strip the value labels and convert all of the labelled vectors to numeric vectors. 4.3.4 Convert the remaining labelled vectors to numeric vectors. Click to see code ## If necessary, install the labelled package: # install.packages("labelled", repos = "https://cloud.r-project.org") ## Load the labelled package: library(labelled) ## Strip the value labels: ess <- remove_val_labels(ess) ## Check the effects: str(ess) ## tibble [19,690 × 52] (S3: tbl_df/tbl/data.frame) ## $ name : chr [1:19690] "ESS1e06_1" "ESS1e06_1" "ESS1e06_1" "ESS1e06_1" ... ## ..- attr(*, "label")= chr "Title of dataset" ## ..- attr(*, "format.spss")= chr "A9" ## ..- attr(*, "display_width")= int 14 ## $ essround: num [1:19690] 1 1 1 1 1 1 1 1 1 1 ... ## ..- attr(*, "label")= chr "ESS round" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 10 ## $ edition : chr [1:19690] "6.1" "6.1" "6.1" "6.1" ... ## ..- attr(*, "label")= chr "Edition" ## ..- attr(*, "format.spss")= chr "A3" ## ..- attr(*, "display_width")= int 9 ## $ proddate: chr [1:19690] "03.10.2008" "03.10.2008" "03.10.2008" "03.10.2008" ... ## ..- attr(*, "label")= chr "Production date" ## ..- attr(*, "format.spss")= chr "A10" ## ..- attr(*, "display_width")= int 12 ## $ cntry : num [1:19690] 1 18 1 1 18 1 2 18 1 18 ... ## ..- attr(*, "label")= chr "Country" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 7 ## $ idno : num [1:19690] 1 1 2 3 3 4 4 4 6 6 ... ## ..- attr(*, "label")= chr "Respondent's identification number" ## ..- attr(*, "format.spss")= chr "F9.0" ## ..- attr(*, "display_width")= int 11 ## $ trstlgl : num [1:19690] 10 6 8 4 8 10 9 7 7 7 ... ## ..- attr(*, "label")= chr "Trust in the legal system" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ trstplc : num [1:19690] 10 8 5 8 8 9 8 9 4 9 ... ## ..- attr(*, "label")= chr "Trust in the police" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ trstun : num [1:19690] 9 8 6 NA 5 8 NA 7 5 7 ... ## ..- attr(*, "label")= chr "Trust in the United Nations" ## ..- attr(*, "format.spss")= chr "F2.0" ## $ trstep : num [1:19690] NA 3 0 7 3 7 0 3 4 6 ... ## ..- attr(*, "label")= chr "Trust in the European Parliament" ## ..- attr(*, "format.spss")= chr "F2.0" ## $ trstprl : num [1:19690] 9 7 0 6 8 8 10 2 6 8 ... ## ..- attr(*, "label")= chr "Trust in country's parliament" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ stfhlth : num [1:19690] 10 4 0 7 6 8 NA 6 3 5 ... ## ..- attr(*, "label")= chr "State of health services in country nowadays" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ stfedu : num [1:19690] 8 7 7 5 8 7 NA 7 6 7 ... ## ..- attr(*, "label")= chr "State of education in country nowadays" ## ..- attr(*, "format.spss")= chr "F2.0" ## $ stfeco : num [1:19690] 7 6 0 7 8 6 NA 9 8 9 ... ## ..- attr(*, "label")= chr "How satisfied with present state of economy in country" ## ..- attr(*, "format.spss")= chr "F2.0" ## $ stfgov : num [1:19690] 7 7 0 7 6 3 NA 5 5 7 ... ## ..- attr(*, "label")= chr "How satisfied with the national government" ## ..- attr(*, "format.spss")= chr "F2.0" ## $ stfdem : num [1:19690] 8 5 5 5 7 7 NA 7 7 9 ... ## ..- attr(*, "label")= chr "How satisfied with the way democracy works in country" ## ..- attr(*, "format.spss")= chr "F2.0" ## $ pltinvt : num [1:19690] 1 3 1 1 4 1 1 3 2 3 ... ## ..- attr(*, "label")= chr "Politicians interested in votes rather than peoples opinions" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ pltcare : num [1:19690] 1 4 1 1 4 3 2 5 2 3 ... ## ..- attr(*, "label")= chr "Politicians in general care what people like respondent think" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ trstplt : num [1:19690] 0 5 0 2 5 4 8 2 4 6 ... ## ..- attr(*, "label")= chr "Trust in politicians" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ imsmetn : num [1:19690] 4 3 2 3 2 1 NA 2 NA 1 ... ## ..- attr(*, "label")= chr "Allow many/few immigrants of same race/ethnic group as majority" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ imdfetn : num [1:19690] 3 3 2 3 2 2 NA 2 NA 1 ... ## ..- attr(*, "label")= chr "Allow many/few immigrants of different race/ethnic group from majority" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ eimrcnt : num [1:19690] 4 2 2 2 3 1 NA 2 NA 1 ... ## ..- attr(*, "label")= chr "Allow many/few immigrants from richer countries in Europe" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ eimpcnt : num [1:19690] 3 2 2 2 2 2 NA 2 NA 1 ... ## ..- attr(*, "label")= chr "Allow many/few immigrants from poorer countries in Europe" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ imrcntr : num [1:19690] 3 3 2 2 2 1 NA 2 NA 2 ... ## ..- attr(*, "label")= chr "Allow many/few immigrants from richer countries outside Europe" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ impcntr : num [1:19690] 3 2 2 3 2 1 NA 2 NA 2 ... ## ..- attr(*, "label")= chr "Allow many/few immigrants from poorer countries outside Europe" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ qfimchr : num [1:19690] 4 2 0 6 2 0 99 0 1 2 ... ## ..- attr(*, "label")= chr "Qualification for immigration: christian background" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ qfimwht : num [1:19690] 1 0 0 0 0 0 99 0 0 1 ... ## ..- attr(*, "label")= chr "Qualification for immigration: be white" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ imwgdwn : num [1:19690] 3 4 2 2 3 3 NA 4 NA 4 ... ## ..- attr(*, "label")= chr "Average wages/salaries generally brought down by immigrants" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ imhecop : num [1:19690] 2 2 1 4 3 2 NA 3 NA 2 ... ## ..- attr(*, "label")= chr "Immigrants harm economic prospects of the poor more than the rich" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ imtcjob : num [1:19690] 7 5 6 5 7 10 NA 8 NA 4 ... ## ..- attr(*, "label")= chr "Immigrants take jobs away in country or create new jobs" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ imbleco : num [1:19690] 9 4 2 NA 3 10 NA 9 NA 6 ... ## ..- attr(*, "label")= chr "Taxes and services: immigrants take out more than they put in or less" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ imbgeco : num [1:19690] 4 3 10 7 5 10 NA 8 NA 5 ... ## ..- attr(*, "label")= chr "Immigration bad or good for country's economy" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ imueclt : num [1:19690] 9 4 10 5 4 10 NA 9 NA 3 ... ## ..- attr(*, "label")= chr "Country's cultural life undermined or enriched by immigrants" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ imwbcnt : num [1:19690] 7 3 5 5 5 10 NA 8 NA 5 ... ## ..- attr(*, "label")= chr "Immigrants make country worse or better place to live" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ imwbcrm : num [1:19690] 3 3 5 2 3 5 NA 5 NA 3 ... ## ..- attr(*, "label")= chr "Immigrants make country's crime problems worse or better" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ imrsprc : num [1:19690] 2 2 1 4 1 2 NA 1 1 3 ... ## ..- attr(*, "label")= chr "Richer countries should be responsible for accepting people from poorer countries" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ pplstrd : num [1:19690] 2 4 2 2 3 4 NA 4 4 2 ... ## ..- attr(*, "label")= chr "Better for a country if almost everyone share customs and traditions" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ vrtrlg : num [1:19690] 3 5 3 2 4 1 NA 4 2 3 ... ## ..- attr(*, "label")= chr "Better for a country if a variety of different religions" ## ..- attr(*, "format.spss")= chr "F1.0" ## $ shrrfg : num [1:19690] 3 2 1 1 3 3 NA 3 4 3 ... ## ..- attr(*, "label")= chr "Country has more than its fair share of people applying refugee status" ## ..- attr(*, "format.spss")= chr "F1.0" ## $ rfgawrk : num [1:19690] 2 2 1 2 2 2 NA 2 1 2 ... ## ..- attr(*, "label")= chr "People applying refugee status allowed to work while cases considered" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ gvrfgap : num [1:19690] 4 3 2 4 2 2 NA 3 2 4 ... ## ..- attr(*, "label")= chr "Government should be generous judging applications for refugee status" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ rfgfrpc : num [1:19690] 4 3 2 4 4 4 NA 4 3 4 ... ## ..- attr(*, "label")= chr "Most refugee applicants not in real fear of persecution own countries" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ rfggvfn : num [1:19690] 2 3 2 4 3 2 NA 2 2 2 ... ## ..- attr(*, "label")= chr "Financial support to refugee applicants while cases considered" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ rfgbfml : num [1:19690] 2 3 1 2 2 1 NA 4 2 3 ... ## ..- attr(*, "label")= chr "Granted refugees should be entitled to bring close family members" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ gndr : num [1:19690] 1 2 1 2 2 1 NA 2 2 1 ... ## ..- attr(*, "label")= chr "Gender" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 6 ## $ yrbrn : num [1:19690] 1949 1978 1953 1940 1964 ... ## ..- attr(*, "label")= chr "Year of birth" ## ..- attr(*, "format.spss")= chr "F4.0" ## ..- attr(*, "display_width")= int 7 ## $ edulvl : Ord.factor w/ 7 levels "Not completed primary education"<..: NA 4 NA NA 4 NA NA 7 NA 6 ... ## $ eduyrs : num [1:19690] 11 16 14 9 12 18 NA 17 15 17 ... ## ..- attr(*, "label")= chr "Years of full-time education completed" ## ..- attr(*, "format.spss")= chr "F2.0" ## $ polintr : Ord.factor w/ 4 levels "Very interested"<..: 3 3 1 2 3 2 1 4 3 3 ... ## $ lrscale : num [1:19690] 6 7 6 5 8 5 NA 8 5 7 ... ## ..- attr(*, "label")= chr "Placement on left right scale" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ country : Factor w/ 10 levels "Austria","Belgium",..: 1 9 1 1 9 1 2 9 1 9 ... ## $ sex : Factor w/ 2 levels "Male","Female": 1 2 1 2 2 1 NA 2 2 1 ... descriptives(ess) Click for explanation Note that the numeric variables are now simple numeric vectors, but the variable labels have been retained as column attributes (which is probably useful). If we want to completely nuke the labelling information, we can use the labelled::remove_labels() function to do so. In addition to screening with summary statistics, we can also visualize the variables’ distributions. You have already created a few such visualizations for single variables. Now, we will use a few tricks to efficiently plot each of our target variables. The first step in this process will be to convert the interesting part of our data from “wide format” (one column per variable) into “long format” (one column of variable names, one column of data values). The pivot_longer() function from the tidyr package provides a convenient way to execute this conversion. 4.3.5 Use tidyr::pivot_longer() to create a long-formatted data frame from the target variables in ess. The target variables are all columns from trstlgl to rfgbfml. Click to show code ## Load the tidyr package: library(tidyr) ## Convert the target variables into a long-formatted data frame: ess_plot <- pivot_longer(ess, cols = trstlgl:rfgbfml, # Which columns to convert names_to = "variable", # Name for the new grouping variable values_to = "value") # Name for the column of stacked values The next step in the process will be to plot the variables using ggplot(). In the above code, I’ve named the new grouping variable variable and the new stacked data variable value. So, to create one plot for each (original, wide-format) variable, we will use the facet_wrap() function to facet the plots of value on the variable column (i.e., create a separate conditional plot of value for each unique value in variable). 4.3.6 Use ggplot() with an appropriate geom (e.g., geom_histogram(), geom_density(), geom_boxplot()) and facet_wrap() to visualize each of the target variables. Hint: To implement the faceting, simply add facet_wrap(~ variable, scales = \"free_x\") to the end of your ggplot() call (obviously, replacing “variable” with whatever you named the grouping variable in your pivot_longer() call). Click to show code library(ggplot2) ggplot(ess_plot, aes(x = value)) + geom_histogram() + # Create a histogram facet_wrap(~ variable, scales = "free_x") # Facet on 'variable' Click for explanation Notice that the variables are actually discrete (i.e., each variable takes only a few integer values). However, most variables look relatively normal despite being categorical. So, we’ll bend the rules a bit and analyze these variables as continuous. It also looks like there’s something weird going on with qfimchr and qfimwht. More on that below. 4.3.7 Check the descriptives for the target variables again. Do you see any remaining issues? Click to show code select(ess, trstlgl:rfgbfml) %>% descriptives() Click for explanation The variables qfimchr and qfimwht both contain values that fall outside the expected range for our survey responses: 77, 88, and 99. In SPSS, these were labeled as “Refusal” “Don’t know” and “No answer” respectively, and would not have contributed to the analysis. 4.3.8 Correct any remaining issues you found above. Click to show code ess <- ess %>% mutate(across(c(qfimchr, qfimwht), na_if, 77)) %>% mutate(across(c(qfimchr, qfimwht), na_if, 88)) %>% mutate(across(c(qfimchr, qfimwht), na_if, 99)) ## Check the results: select(ess, trstlgl:rfgbfml) %>% descriptives() Click to show explanation Here, we need to tell R that these values should be considered missing, or NA. Otherwise they will contribute the numeric value to the analysis, as though someone had provided an answer of 77 on a 10-point scale. We’ve done quite a bit of data processing, and we’ll continue to use these data for several future practicals, so it would be a good idea to save the processed dataset for later use. When saving data that you plan to analyze in R, you will usually want to use the R Data Set (RDS) format. Datasets saved in RDS format retain all of their attributes and formatting (e.g., factor are still factors, missing values are coded as NA, etc.). So, you don’t have to redo any data processing before future analyses. 4.3.9 Use the saveRDS() function to save the processed dataset. Click to show code ## Save the processed data: saveRDS(ess, "ess_round1.rds") Now, we’re ready to run the analyses and see if we can replicate the Kestilä (2006) results. 4.3.10 Run two principal component analyses (PCA): one for trust in politics, one for attitudes towards immigration. Use the principal() function from the psych package. Use exactly the same specifications as Kestilä (2006) concerning the estimation method, rotation, number of components extracted, etc. Hints: Remember that you can view the help file for psych::principal() by running ?psych::principal or, if the psych package already loaded, simply running ?principal. When you print the output from psych::principal(), you can use the cut option to hide any factor loadings smaller than a given threshold. You could consider hiding any loadings smaller than those reported by Kestilä (2006) to make the output easier to interpret. Click to show code Trust in politics Kestilä extracted three components with VARIMAX rotation. ## Load the psych package: library(psych) ## Run the PCA: pca_trust <- select(ess, trstlgl:trstplt) %>% principal(nfactors = 3, rotate = "varimax") ## Print the results: print(pca_trust, cut = 0.3, digits = 3) ## Principal Components Analysis ## Call: principal(r = ., nfactors = 3, rotate = "varimax") ## Standardized loadings (pattern matrix) based upon correlation matrix ## RC3 RC2 RC1 h2 u2 com ## trstlgl 0.779 0.669 0.331 1.21 ## trstplc 0.761 0.633 0.367 1.18 ## trstun 0.675 0.556 0.444 1.44 ## trstep 0.651 0.332 0.549 0.451 1.57 ## trstprl 0.569 0.489 0.650 0.350 2.49 ## stfhlth 0.745 0.567 0.433 1.04 ## stfedu 0.750 0.603 0.397 1.14 ## stfeco 0.711 0.300 0.616 0.384 1.44 ## stfgov 0.634 0.377 0.587 0.413 1.88 ## stfdem 0.369 0.568 0.325 0.564 0.436 2.38 ## pltinvt 0.817 0.695 0.305 1.08 ## pltcare 0.811 0.695 0.305 1.11 ## trstplt 0.510 0.611 0.716 0.284 2.40 ## ## RC3 RC2 RC1 ## SS loadings 2.942 2.668 2.490 ## Proportion Var 0.226 0.205 0.192 ## Cumulative Var 0.226 0.432 0.623 ## Proportion Explained 0.363 0.329 0.307 ## Cumulative Proportion 0.363 0.693 1.000 ## ## Mean item complexity = 1.6 ## Test of the hypothesis that 3 components are sufficient. ## ## The root mean square of the residuals (RMSR) is 0.07 ## with the empirical chi square 15240.94 with prob < 0 ## ## Fit based upon off diagonal values = 0.967 Attitudes toward immigration Kestilä extracted five components with VARIMAX rotation. pca_att <- select(ess, imsmetn:rfgbfml) %>% principal(nfactors = 5, rotate = "varimax") print(pca_att, cut = 0.3, digits = 3) ## Principal Components Analysis ## Call: principal(r = ., nfactors = 5, rotate = "varimax") ## Standardized loadings (pattern matrix) based upon correlation matrix ## RC2 RC1 RC5 RC3 RC4 h2 u2 com ## imsmetn 0.797 0.725 0.275 1.30 ## imdfetn 0.775 0.794 0.206 1.70 ## eimrcnt 0.827 0.715 0.285 1.09 ## eimpcnt 0.800 0.789 0.211 1.49 ## imrcntr 0.835 0.747 0.253 1.15 ## impcntr 0.777 0.782 0.218 1.63 ## qfimchr 0.813 0.688 0.312 1.08 ## qfimwht 0.752 0.637 0.363 1.26 ## imwgdwn 0.807 0.712 0.288 1.19 ## imhecop 0.747 0.669 0.331 1.42 ## imtcjob 0.569 0.334 0.484 0.516 1.99 ## imbleco 0.703 0.554 0.446 1.25 ## imbgeco 0.698 0.605 0.395 1.52 ## imueclt 0.568 -0.340 0.545 0.455 2.43 ## imwbcnt 0.673 0.633 0.367 1.87 ## imwbcrm 0.655 0.478 0.522 1.23 ## imrsprc 0.614 0.440 0.560 1.34 ## pplstrd 0.324 -0.551 0.468 0.532 2.11 ## vrtrlg -0.345 0.471 0.419 0.581 2.67 ## shrrfg 0.365 -0.352 0.418 0.582 4.16 ## rfgawrk 0.614 0.396 0.604 1.10 ## gvrfgap 0.691 0.559 0.441 1.35 ## rfgfrpc -0.387 0.327 0.673 3.34 ## rfggvfn 0.585 0.417 0.583 1.46 ## rfgbfml 0.596 0.460 0.540 1.61 ## ## RC2 RC1 RC5 RC3 RC4 ## SS loadings 4.374 3.393 2.774 2.199 1.723 ## Proportion Var 0.175 0.136 0.111 0.088 0.069 ## Cumulative Var 0.175 0.311 0.422 0.510 0.579 ## Proportion Explained 0.302 0.235 0.192 0.152 0.119 ## Cumulative Proportion 0.302 0.537 0.729 0.881 1.000 ## ## Mean item complexity = 1.7 ## Test of the hypothesis that 5 components are sufficient. ## ## The root mean square of the residuals (RMSR) is 0.05 ## with the empirical chi square 29496.06 with prob < 0 ## ## Fit based upon off diagonal values = 0.976 Feature engineering (i.e., creating new variables by combining and/or transforming existing variables) is one of the most common applications of PCA. PCA is a dimension reduction technique that distills the most salient information from a set of variables into a (smaller) set of component scores. Hence, PCA can be a good way of creating aggregate items (analogous to weighted scale scores) when the data are not collected with validated scales. Principal component scores are automatically generated when we run the PCA. If we want to use these scores in subsequent analyses (e.g., as predictors in a regression model), we usually add them to our dataset as additional columns. 4.3.11 Add the component scores produced by the analyses you ran above to the ess data frame. Give each component score an informative name, based on your interpretation of the factor loading matrix I.e., What hypothetical construct do you think each component represents given the items that load onto it? Hints: You can use the data.frame() function to join multiple objects into a single data frame. You can use the colnames() function to assign column names to a matrix or data frame. 1. Extract the component scores Click to show code ## Save the component scores in stand-alone matrices: trust_scores <- pca_trust$scores att_scores <- pca_att$scores ## Inspect the result: head(trust_scores) ## RC3 RC2 RC1 ## [1,] NA NA NA ## [2,] 0.09755193 -0.01552183 0.994954 ## [3,] 0.23069626 -1.53162604 -2.022642 ## [4,] NA NA NA ## [5,] -0.21112678 0.84370377 1.200007 ## [6,] 1.86596955 0.31083233 -1.062603 summary(trust_scores) ## RC3 RC2 RC1 ## Min. :-4.035 Min. :-3.706 Min. :-3.139 ## 1st Qu.:-0.527 1st Qu.:-0.652 1st Qu.:-0.649 ## Median : 0.155 Median : 0.094 Median : 0.092 ## Mean : 0.055 Mean : 0.015 Mean : 0.049 ## 3rd Qu.: 0.727 3rd Qu.: 0.742 3rd Qu.: 0.742 ## Max. : 3.302 Max. : 3.452 Max. : 3.539 ## NA's :4912 NA's :4912 NA's :4912 head(att_scores) ## RC2 RC1 RC5 RC3 RC4 ## [1,] 1.9873715 1.3233586 -0.8382499 -0.02172765 -0.0908143 ## [2,] 0.1692841 -1.2178436 -0.5016936 -0.21749066 0.6758844 ## [3,] -0.3630480 0.3260383 -1.5133423 -0.51405480 -2.2071787 ## [4,] NA NA NA NA NA ## [5,] -0.1137484 -0.7891232 -1.4732563 -0.05843873 0.4110692 ## [6,] -0.9195530 2.8231404 -0.3480398 -0.75699796 -1.3230602 summary(att_scores) ## RC2 RC1 RC5 RC3 ## Min. :-3.660 Min. :-3.929 Min. :-3.824 Min. :-2.764 ## 1st Qu.:-0.616 1st Qu.:-0.585 1st Qu.:-0.656 1st Qu.:-0.748 ## Median :-0.085 Median : 0.062 Median :-0.008 Median :-0.121 ## Mean :-0.013 Mean : 0.012 Mean : 0.021 Mean : 0.014 ## 3rd Qu.: 0.680 3rd Qu.: 0.654 3rd Qu.: 0.652 3rd Qu.: 0.698 ## Max. : 3.743 Max. : 4.584 Max. : 4.108 Max. : 4.084 ## NA's :5447 NA's :5447 NA's :5447 NA's :5447 ## RC4 ## Min. :-3.784 ## 1st Qu.:-0.683 ## Median : 0.046 ## Mean : 0.003 ## 3rd Qu.: 0.717 ## Max. : 3.254 ## NA's :5447 Click for explanation The object produced by psych::principal() is simply list, and the component scores are already stored therein. So, to extract the component scores, we simply use the $ operator to extract them. 2. Name the component scores Click to show code ## Check names (note the order): colnames(trust_scores) ## [1] "RC3" "RC2" "RC1" colnames(att_scores) ## [1] "RC2" "RC1" "RC5" "RC3" "RC4" ## Give informative names: colnames(trust_scores) <- c("Trust_Institutions", "Satisfaction", "Trust_Politicians") colnames(att_scores) <- c("Quantity", "Effects", "Refugees", "Diversity", "Economic") 3. Add the component scores to the dataset Click to show code # Add the component scores to the 'ess' data: ess <- data.frame(ess, trust_scores, att_scores) 4.3.12 Were you able to replicate the results of Kestilä (2006)? Click for explanation Yes, more-or-less. Although the exact estimates differ somewhat, the general pattern of factor loadings in Kestilä (2006) matches what we found here. End of At-Home Exercises "],["in-class-exercises-3.html", "4.4 In-Class Exercises", " 4.4 In-Class Exercises In these exercises, we will continue with our re-analysis/replication of the Kestilä (2006) results. Rather than attempting a direct replication, we will now redo the analysis using exploratory factor analysis (EFA). 4.4.1 Load the ess_round1.rds dataset. These are the data that we saved after the data processing in the At-Home Exercises. Click to show code ess <- readRDS("ess_round1.rds") 4.4.2 Kestilä (2006) claimed that running a PCA is a good way to test if the questions in the ESS measure attitudes towards immigration and trust in politics. Based on what you’ve learned from the readings and lectures, do you agree with this position? Click for explanation Hopefully not. PCA is not a method for estimating latent measurement structure; PCA is a dimension reduction technique that tries to summarize a set of data with a smaller set of component scores. If we really want to estimate the factor structure underlying a set of observed variables, we should use EFA. 4.4.3 Suppose you had to construct the trust in politics and attitude towards immigration scales described by Kestilä (2006) based on the theory and background information presented in that article. What type of analysis would you choose? What key factors would influence your decision? Click for explanation We are trying to estimate meaningful latent factors, so EFA would be an appropriate method. The theory presented by Kestilä (2006) did not hypothesize a particular number of factors, so we would need to use appropriate techniques to estimate the best number. In particular, combining information from: Scree plots Parallel analysis Substantive interpretability of the (rotated) factor loadings Since the factors are almost certainly correlated, we should apply an oblique rotation. We will now rerun the two PCAs that you conducted for the At-Home Exercises using EFA. We will estimate the EFA models using the psych::fa() function, but we need to know how many factors to extract. We could simply estimate a range of solutions and compare the results. We can restrict the range of plausible solutions and save some time by first checking/plotting the eigenvalues and running parallel analysis. 4.4.4 Estimate the number of latent factors underlying the Trust items based on the eigenvalues, the scree plot, and parallel analysis. How many factors are suggested by each method? 1. Eigenvalue estimation Click to show code ## Load the psych package: library(psych) ## Run a trivial EFA on the 'trust' items efa_trust0 <- select(ess, trstlgl:trstplt) %>% fa(nfactors = 1, rotate = "none") Click for explanation (EFA) First, we run a trivial EFA using the psych::fa() function to estimate the eigenvalues. We don’t care about the factors yet, so we can extract a single factor. We also don’t care about interpretable solutions, so we don’t need rotation. ## View the estimated eigenvalues: round(efa_trust0$values, digits = 3) ## [1] 4.980 0.716 0.482 0.165 0.069 0.014 -0.066 -0.092 -0.182 -0.207 ## [11] -0.284 -0.296 -0.319 Click for explanation (eigenvalue extraction) We can check the eigenvalues to see what proportion of the observed variance is accounted for by each additional factor we may extract. Since only one eigenvalue is greater than one, the so-called “Kaiser Criterion” would suggest extracting a single factor. The Kaiser Criterion is not a valid way to select the number of factors in EFA. So, we don’t want to rely on this information alone. We can still use the eigenvalues to help us with factor enumeration, though. One way to do so is by plotting the eigenvalues in a scree plot. 2. Scree plot Click to show code Given a vector of estimated eigenvalues, we can create a scree plot using ggplot() and the geom_line() or geom_path() geometry. library(ggplot2) library(magrittr) efa_trust0 %$% data.frame(y = values, x = 1:length(values)) %>% ggplot(aes(x, y)) + geom_line() + xlab("No. of Factors") + ylab("Eigenvalues") We can also use the psych::scree() function to create a scree plot directly from the data. select(ess, trstlgl:trstplt) %>% scree(pc = FALSE) Click for explanation (scree plot) Although the scree plot provides useful information, we need to interpret that information subjectively, and the conclusions are sometimes ambiguous, in this case. In this case, the plot seems to suggest either one or three components, depending on where we consider the “elbow” to lie. As recommended in the lecture, we can also use “parallel analysis” (Horn, 1965) to provide more objective information about the number of factors. We’ll use the psych::fa.parallel() function to implement parallel analysis. Parallel analysis relies on randomly simulated/permuted data, so we should set a seed to make sure our results are reproducible. We can set the fa = \"fa\" option to get only the results for EFA. 3. Parallel Analysis Click to show code ## Set the random number seed: set.seed(235711) ## Run the parallel analysis: pa_trust <- select(ess, trstlgl:trstplt) %>% fa.parallel(fa = "fa") ## Parallel analysis suggests that the number of factors = 6 and the number of components = NA Click for explanation The results of the parallel analysis suggest 6 factors. If you’ve been paying close attention, you may have noticed that we need to compute the eigenvalues from the original data to run parallel analysis. Hence, we don’t actually need to run a separate EFA to estimate the eigenvalues. ## View the eigenvalues estimated during the parallel analysis: pa_trust$fa.values ## [1] 4.97995262 0.71644127 0.48201040 0.16517645 0.06885820 0.01422241 ## [7] -0.06606777 -0.09225113 -0.18231333 -0.20740917 -0.28415857 -0.29573407 ## [13] -0.31877470 ## Compare to the version from the EFA: pa_trust$fa.values - efa_trust0$values ## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 ## Recreate the scree plot from above: pa_trust %$% data.frame(y = fa.values, x = 1:length(fa.values)) %>% ggplot(aes(x, y)) + geom_line() + xlab("No. of Factors") + ylab("Eigenvalues") Of course, we also see the same scree plot printed as part of the parallel analysis. So, there’s really no reason to create a separate scree plot, at all, if we’re doing parallel analysis. 4. Conclusion Click for explanation The different criteria disagree on how many factors we should extract, but we have narrowed the range. Based on the scree plot and parallel analysis, we should consider solutions for 3 to 6 factors. We need to examine the factor loadings to see which solution makes the most substantive sense. 4.4.5 Do the same analysis for the attitudes toward immigration items. Click to show code This time, we’ll start by running the parallel analysis and get the eigenvalues and scree plot from psych::fa.parallel(). ## Set the seed: set.seed(235711) ## Run parallel analysis on the 'attitudes' items: pa_att <- select(ess, imsmetn:rfgbfml) %>% fa.parallel(fa = "fa") ## Parallel analysis suggests that the number of factors = 7 and the number of components = NA ## Check the eigenvalues: round(pa_att$fa.values, digits = 3) ## [1] 7.895 1.449 0.734 0.533 0.313 0.156 0.121 0.019 -0.001 -0.064 ## [11] -0.083 -0.103 -0.119 -0.131 -0.150 -0.175 -0.185 -0.200 -0.212 -0.233 ## [21] -0.239 -0.247 -0.334 -0.422 -0.427 Click for explanation For the attitudes toward immigration analysis, the results are even more ambiguous than they were for the trust items. The Kaiser Criterion suggests 2 factors. The scree plot is hopelessly ambiguous. At least 3 factors? No more than 9 factors? Parallel analysis suggests 7 factors Based on the scree plot and parallel analysis, it seems reasonable to consider solutions for 3 to 7 factors. Again, we need to check the substantive interpretation to choose the most reasonable solution. To evaluate the substantive interpretability of the different solutions, we need to estimate the full EFA models for each candidate number of factors. We then compare the factor loadings across solutions to see which set of loadings define the most reasonable set of latent variables. 4.4.6 For the trust items, estimate the EFA models for each plausible number of components that you identified above. Use the psych::fa() function to estimate the models. You will need to specify a few key options. The data (including only the variables you want to analyze) The number of factors that you want to extract The rotation method The estimation method The method of estimating factor scores Hint: You can save yourself a lot of typing/copy-pasting (and the attendant chances of errors) by using a for() loop to iterate through numbers of factors. Click to show code ## Define an empty list to hold all of our fitted EFA objects: efa_trust <- list() ## Loop through the interesting numbers of factors and estimate an EFA for each: for(i in 3:6) efa_trust[[as.character(i)]] <- ess %>% select(trstlgl:trstplt) %>% fa(nfactors = i, # Number of factors = Loop index rotate = "promax", # Oblique rotation scores = "Bartlett") # Estimate factor scores with WLS 4.4.7 Repeat the above analysis for the attitudes items. Click to show code efa_att <- list() for(i in 3:7) efa_att[[as.character(i)]] <- ess %>% select(imsmetn:rfgbfml) %>% fa(nfactors = i, rotate = "promax", scores = "Bartlett") 4.4.8 Compare the factor loading matrices from the models estimated from the Trust items, and select the best solution. Hints: The factor loadings are stored in the loadings slot of the object returned by psych::fa(). Looping can also be useful here. Click to show code for(x in efa_trust) print(x$loadings) ## ## Loadings: ## MR3 MR2 MR1 ## trstlgl 0.839 -0.115 ## trstplc 0.763 -0.218 ## trstun 0.579 0.161 ## trstep 0.554 0.198 ## trstprl 0.444 0.342 ## stfhlth 0.656 -0.125 ## stfedu 0.695 -0.157 ## stfeco -0.102 0.704 0.146 ## stfgov 0.593 0.226 ## stfdem 0.183 0.476 0.150 ## pltinvt 0.813 ## pltcare 0.808 ## trstplt 0.330 0.526 ## ## MR3 MR2 MR1 ## SS loadings 2.299 2.016 1.970 ## Proportion Var 0.177 0.155 0.152 ## Cumulative Var 0.177 0.332 0.483 ## ## Loadings: ## MR2 MR1 MR4 MR3 ## trstlgl 0.797 ## trstplc 0.725 ## trstun 0.656 0.113 ## trstep 1.003 -0.175 ## trstprl 0.121 0.455 0.200 0.112 ## stfhlth 0.663 -0.106 ## stfedu 0.704 -0.110 0.100 ## stfeco 0.729 ## stfgov 0.631 0.175 -0.149 ## stfdem 0.501 0.107 0.115 ## pltinvt 0.855 ## pltcare -0.103 0.863 ## trstplt 0.479 0.340 ## ## MR2 MR1 MR4 MR3 ## SS loadings 2.161 1.952 1.722 1.239 ## Proportion Var 0.166 0.150 0.132 0.095 ## Cumulative Var 0.166 0.316 0.449 0.544 ## ## Loadings: ## MR1 MR4 MR5 MR3 MR2 ## trstlgl 0.935 ## trstplc 0.810 ## trstun 0.505 0.168 ## trstep -0.138 1.128 -0.108 -0.154 ## trstprl 0.359 0.250 0.140 0.201 -0.104 ## stfhlth 0.557 ## stfedu 0.752 ## stfeco 0.710 -0.118 0.172 ## stfgov 0.973 -0.132 ## stfdem 0.556 0.153 ## pltinvt 0.882 ## pltcare 0.855 ## trstplt 0.288 0.308 0.313 ## ## MR1 MR4 MR5 MR3 MR2 ## SS loadings 2.019 1.716 1.655 1.674 0.936 ## Proportion Var 0.155 0.132 0.127 0.129 0.072 ## Cumulative Var 0.155 0.287 0.415 0.543 0.615 ## ## Loadings: ## MR5 MR1 MR4 MR3 MR2 MR6 ## trstlgl 0.980 ## trstplc 0.655 ## trstun 0.911 ## trstep -0.116 0.739 0.163 ## trstprl 0.197 0.577 0.138 ## stfhlth 0.614 ## stfedu 0.771 ## stfeco 0.689 -0.123 0.144 ## stfgov 0.891 ## stfdem 0.513 0.144 ## pltinvt 0.816 ## pltcare 0.778 ## trstplt 0.706 0.193 ## ## MR5 MR1 MR4 MR3 MR2 MR6 ## SS loadings 1.606 1.417 1.442 1.327 1.014 0.879 ## Proportion Var 0.124 0.109 0.111 0.102 0.078 0.068 ## Cumulative Var 0.124 0.233 0.343 0.446 0.524 0.591 Click for explanation Note: Any factor loadings with magnitude lower than 0.1 are suppressed in above output. The factor loadings matrix indicates how strongly each latent factor (columns) associates with the observed items (rows). We can interpret these factor loadings in the same way that we would interpret regression coefficients (indeed, a factor analytic model can be viewed as a multivariate regression model wherein the latent factors are the predictors and the observed items are the outcomes). A higher factor loading indicates a stronger association between the item and factor linked by that loading. Items with high factor loadings are “good” indicators of the respective factors. Items with only very low loadings do not provide much information about any factor. You may want to exclude such items from your analysis. Note that the size of the factor loadings depends on the number of factors. So, you should only consider excluding an observed item after you have chosen the number of latent factors. When we print the loading matrix, we see additional information printed below the factor loadings. Proportion Var: What proportion of the items’ variance is explained by each of the factors. Cumulative Var: How much variance the factors explain, in total. If you estimated as many factors as items, then the Cumulative Var for the final factor would be 1.00 (i.e., 100%). 4.4.9 Compare the factor loading matrices from the models estimated from the Attitudes items, and select the best solution. Click to show code for(x in efa_att) print(x$loadings) ## ## Loadings: ## MR1 MR2 MR3 ## imsmetn 0.802 ## imdfetn 0.754 0.106 ## eimrcnt 0.843 ## eimpcnt 0.814 ## imrcntr 0.857 ## impcntr 0.769 ## qfimchr 0.235 0.858 ## qfimwht 0.132 0.719 ## imwgdwn 0.293 -0.181 ## imhecop 0.371 -0.162 ## imtcjob 0.619 ## imbleco 0.702 ## imbgeco 0.687 ## imueclt 0.561 -0.207 ## imwbcnt 0.732 ## imwbcrm 0.637 ## imrsprc -0.494 -0.125 ## pplstrd 0.249 -0.413 ## vrtrlg -0.275 0.240 ## shrrfg 0.514 -0.111 ## rfgawrk -0.386 ## gvrfgap -0.601 -0.148 ## rfgfrpc 0.432 ## rfggvfn -0.489 ## rfgbfml -0.545 ## ## MR1 MR2 MR3 ## SS loadings 4.819 3.950 1.683 ## Proportion Var 0.193 0.158 0.067 ## Cumulative Var 0.193 0.351 0.418 ## ## Loadings: ## MR2 MR4 MR1 MR3 ## imsmetn 0.788 ## imdfetn 0.731 0.153 0.110 ## eimrcnt 0.855 -0.143 ## eimpcnt 0.790 0.165 ## imrcntr 0.860 ## impcntr 0.743 0.182 ## qfimchr -0.122 0.853 ## qfimwht 0.723 ## imwgdwn 0.638 0.264 ## imhecop 0.680 0.217 ## imtcjob 0.633 0.136 ## imbleco 0.563 -0.212 0.153 ## imbgeco 0.604 -0.168 ## imueclt 0.392 -0.236 -0.168 ## imwbcnt 0.526 -0.282 ## imwbcrm 0.397 -0.292 ## imrsprc 0.616 ## pplstrd 0.231 -0.378 ## vrtrlg 0.279 0.264 ## shrrfg 0.299 -0.271 ## rfgawrk 0.452 ## gvrfgap 0.123 0.774 ## rfgfrpc 0.193 -0.281 ## rfggvfn 0.467 ## rfgbfml 0.619 ## ## MR2 MR4 MR1 MR3 ## SS loadings 3.828 2.778 2.570 1.602 ## Proportion Var 0.153 0.111 0.103 0.064 ## Cumulative Var 0.153 0.264 0.367 0.431 ## ## Loadings: ## MR2 MR1 MR5 MR3 MR4 ## imsmetn 0.792 ## imdfetn 0.728 0.169 0.113 ## eimrcnt 0.910 -0.150 -0.237 ## eimpcnt 0.779 0.126 0.213 ## imrcntr 0.910 -0.128 -0.187 ## impcntr 0.731 0.131 0.236 ## qfimchr 0.109 -0.156 0.882 ## qfimwht 0.139 0.736 ## imwgdwn 0.740 ## imhecop 0.700 ## imtcjob 0.543 0.124 0.182 ## imbleco 0.682 0.135 ## imbgeco 0.799 ## imueclt 0.572 -0.202 ## imwbcnt 0.712 ## imwbcrm 0.545 -0.124 ## imrsprc 0.620 ## pplstrd 0.207 -0.396 ## vrtrlg -0.198 0.151 0.285 0.116 ## shrrfg 0.208 -0.263 0.139 ## rfgawrk 0.457 ## gvrfgap 0.783 ## rfgfrpc -0.338 0.156 ## rfggvfn 0.477 ## rfgbfml -0.125 0.538 ## ## MR2 MR1 MR5 MR3 MR4 ## SS loadings 3.970 2.790 2.215 1.693 1.166 ## Proportion Var 0.159 0.112 0.089 0.068 0.047 ## Cumulative Var 0.159 0.270 0.359 0.427 0.473 ## ## Loadings: ## MR2 MR1 MR6 MR3 MR5 MR4 ## imsmetn 0.705 0.166 ## imdfetn 0.833 ## eimrcnt 0.249 0.859 ## eimpcnt 0.946 ## imrcntr 0.456 0.517 ## impcntr 0.951 ## qfimchr 0.134 -0.122 0.875 ## qfimwht 0.151 0.725 ## imwgdwn 0.748 ## imhecop 0.678 ## imtcjob 0.566 0.123 0.175 ## imbleco 0.753 0.144 ## imbgeco 0.822 ## imueclt 0.580 -0.201 ## imwbcnt 0.751 ## imwbcrm 0.597 ## imrsprc 0.146 0.527 ## pplstrd 0.204 -0.392 ## vrtrlg -0.204 0.143 0.281 0.115 ## shrrfg 0.198 -0.275 0.141 ## rfgawrk 0.517 ## gvrfgap 0.784 ## rfgfrpc -0.294 0.144 ## rfggvfn 0.512 ## rfgbfml 0.596 ## ## MR2 MR1 MR6 MR3 MR5 MR4 ## SS loadings 3.304 3.013 1.994 1.649 1.065 1.133 ## Proportion Var 0.132 0.121 0.080 0.066 0.043 0.045 ## Cumulative Var 0.132 0.253 0.332 0.398 0.441 0.486 ## ## Loadings: ## MR2 MR1 MR6 MR3 MR5 MR7 MR4 ## imsmetn 0.700 0.162 ## imdfetn 0.821 ## eimrcnt 0.245 0.879 ## eimpcnt 0.935 ## imrcntr 0.452 0.523 ## impcntr 0.938 ## qfimchr 0.751 ## qfimwht 0.720 ## imwgdwn 0.700 ## imhecop 0.172 0.624 ## imtcjob 0.574 -0.120 0.174 ## imbleco 0.679 0.108 ## imbgeco 0.832 -0.145 ## imueclt 0.531 -0.191 ## imwbcnt 0.649 0.138 ## imwbcrm 0.464 0.131 0.290 ## imrsprc 0.146 0.440 -0.100 ## pplstrd -0.274 0.392 ## vrtrlg -0.121 0.190 -0.297 0.115 ## shrrfg -0.124 0.437 0.131 ## rfgawrk 0.538 ## gvrfgap 0.616 -0.237 ## rfgfrpc -0.131 0.437 0.135 ## rfggvfn 0.504 ## rfgbfml 0.526 ## ## MR2 MR1 MR6 MR3 MR5 MR7 MR4 ## SS loadings 3.224 2.467 1.456 1.305 1.105 0.901 0.984 ## Proportion Var 0.129 0.099 0.058 0.052 0.044 0.036 0.039 ## Cumulative Var 0.129 0.228 0.286 0.338 0.382 0.418 0.458 It is very possible that you selected a different numbers of factors than Kestilä (2006). We need to keep these exercises consistent, though. So, the remaining questions will all assume you have extract three factors from the Trust items and five factors from the Attitudes items, to parallel the Kestilä (2006) results. ## Select the three-factor solution for 'trust': efa_trust <- efa_trust[["3"]] ## Select the five-factor solution for 'attitudes': efa_att <- efa_att[["5"]] 4.4.10 Give the factor scores meaningful names, and add the scores to the ess dataset as new columns. Hint: If you’re not sure of what do to, check 4.3.11. Click to show code ## Rename the factor scores: colnames(efa_trust$scores) <- c("trust_inst", "satisfy", "trust_pol") colnames(efa_att$scores) <- c("effects", "allowance", "refugees", "ethnic", "europe") ## Add factor scores to the dataset as new columns: ess <- data.frame(ess, efa_trust$scores, efa_att$scores) Kestilä (2006) used the component scores to descriptively evaluate country-level differences in Attitudes toward Immigration and Political Trust. So, now it’s time to replicate those analyses. 4.4.11 Repeat the Kestilä (2006) between-country comparison using the factor scores you created in 4.4.10 and an appropriate statistical test. Click to show code Here, we’ll only demonstrate a possible approach to analyzing one of the Trust dimensions. We can use a linear model to test whether the countries differ in average levels of Trust in Institutions (as quantified by the relevant factor score). ## Estimate the model: out <- lm(trust_inst ~ country, data = ess) ## View the regression-style summary: summary(out) ## ## Call: ## lm(formula = trust_inst ~ country, data = ess) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.2295 -0.6226 0.1171 0.7194 3.3061 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.09028 0.02445 -3.692 0.000224 *** ## countryBelgium -0.28923 0.03642 -7.942 2.12e-15 *** ## countryGermany -0.05966 0.03211 -1.858 0.063205 . ## countryDenmark 0.75509 0.03882 19.452 < 2e-16 *** ## countryFinland 0.59235 0.03439 17.224 < 2e-16 *** ## countryItaly 0.10991 0.04071 2.700 0.006939 ** ## countryNetherlands -0.05357 0.03379 -1.585 0.112893 ## countryNorway 0.36922 0.03493 10.570 < 2e-16 *** ## countrySweden 0.28560 0.03613 7.904 2.89e-15 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.029 on 14769 degrees of freedom ## (4912 observations deleted due to missingness) ## Multiple R-squared: 0.082, Adjusted R-squared: 0.0815 ## F-statistic: 164.9 on 8 and 14769 DF, p-value: < 2.2e-16 ## View the results as an ANOVA table: anova(out) ## Post-hoc tests out %>% aov() %>% TukeyHSD() ## Tukey multiple comparisons of means ## 95% family-wise confidence level ## ## Fit: aov(formula = .) ## ## $country ## diff lwr upr p adj ## Belgium-Austria -0.289225482 -0.40219224 -0.17625873 0.0000000 ## Germany-Austria -0.059655996 -0.15926604 0.03995405 0.6429963 ## Denmark-Austria 0.755089552 0.63466911 0.87551000 0.0000000 ## Finland-Austria 0.592348290 0.48565882 0.69903776 0.0000000 ## Italy-Austria 0.109910185 -0.01636587 0.23618624 0.1476635 ## Netherlands-Austria -0.053567808 -0.15838407 0.05124846 0.8131104 ## Norway-Austria 0.369224250 0.26085692 0.47759158 0.0000000 ## Sweden-Austria 0.285601197 0.17350905 0.39769334 0.0000000 ## Germany-Belgium 0.229569486 0.12386351 0.33527546 0.0000000 ## Denmark-Belgium 1.044315033 0.91880537 1.16982470 0.0000000 ## Finland-Belgium 0.881573772 0.76917165 0.99397589 0.0000000 ## Italy-Belgium 0.399135667 0.26799745 0.53027389 0.0000000 ## Netherlands-Belgium 0.235657673 0.12503199 0.34628336 0.0000000 ## Norway-Belgium 0.658449732 0.54445381 0.77244566 0.0000000 ## Sweden-Belgium 0.574826679 0.45728417 0.69236918 0.0000000 ## Denmark-Germany 0.814745547 0.70110863 0.92838247 0.0000000 ## Finland-Germany 0.652004286 0.55303505 0.75097352 0.0000000 ## Italy-Germany 0.169566181 0.04974170 0.28939066 0.0003895 ## Netherlands-Germany 0.006088188 -0.09085878 0.10303516 0.9999999 ## Norway-Germany 0.428880246 0.32810453 0.52965596 0.0000000 ## Sweden-Germany 0.345257193 0.24048642 0.45002796 0.0000000 ## Finland-Denmark -0.162741262 -0.28263218 -0.04285034 0.0008579 ## Italy-Denmark -0.645179366 -0.78279052 -0.50756821 0.0000000 ## Netherlands-Denmark -0.808657360 -0.92688442 -0.69043030 0.0000000 ## Norway-Denmark -0.385865301 -0.50725174 -0.26447886 0.0000000 ## Sweden-Denmark -0.469488354 -0.59421139 -0.34476531 0.0000000 ## Italy-Finland -0.482438105 -0.60820928 -0.35666693 0.0000000 ## Netherlands-Finland -0.645916098 -0.75012357 -0.54170862 0.0000000 ## Norway-Finland -0.223124040 -0.33090264 -0.11534544 0.0000000 ## Sweden-Finland -0.306747093 -0.41827017 -0.19522402 0.0000000 ## Netherlands-Italy -0.163477993 -0.28766412 -0.03929186 0.0014719 ## Norway-Italy 0.259314065 0.13211649 0.38651164 0.0000000 ## Sweden-Italy 0.175691012 0.04530545 0.30607657 0.0009794 ## Norway-Netherlands 0.422792059 0.31686740 0.52871671 0.0000000 ## Sweden-Netherlands 0.339169005 0.22943659 0.44890142 0.0000000 ## Sweden-Norway -0.083623053 -0.19675232 0.02950622 0.3462227 Click for explanation According to the omnibus F-test, average levels of Trust in Institutions significantly differ between countries, but this test cannot tell us between which countries the differences lie. Similarly, the t statistics associated with each dummy code in the regression-style summary only tell us if that country differs significantly from the reference country (i.e., Austria), but we cannot see, for example, if there is a significant difference in average trust levels between Belgium and the Netherlands. One way to test for differences between the individual countries would be a post hoc test of all pairwise comparisons. Since we’ll be doing 45 tests, we need to apply a correction for repeated testing. Above, we use the TukeyHSD() function to conduct all pairwise comparisons while applying Tukey’s HSD correction. The TukeyHSD() function only accepts models estimated with the aov() function, so we first pass our fitted lm object through aov(). The second part of the Kestilä (2006) analysis was to evaluate how socio-demographic characteristics affected attitudes towards immigrants and trust in politics among the Finnish electorate. Before we can replicate this part of the analysis, we need to subset the data to only the Finnish cases. 4.4.12 Create a new data frame that contains only the Finnish cases from ess. Hint: You can use logical indexing based on the country variable. Click to show code ess_finland <- filter(ess, country == "Finland") We still have one more step before we can estimate any models. We must prepare our variables for analysis. Our dependent variables will be the factor scores generated above. So, we do not need to apply any further processing. We have not yet used any of the independent variables, though. So, we should inspect those variables to see if they require any processing. In our processed ess data, the relevant variables have the following names: sex yrbrn eduyrs polintr lrscale 4.4.13 Inspect the independent variables listed above. Click to show code library(tidySEM) select(ess_finland, sex, yrbrn, eduyrs, polintr, lrscale) %>% descriptives() Click for explanation It looks like we still need some recoding. 4.4.14 Apply any necessary recoding/transformations. 1. Age Click to show code ess_finland <- mutate(ess_finland, age = 2002 - yrbrn) Click for explanation The data contain the participants’ years of birth instead of their age, but Kestilä analyzed age. Fortunately, we know that the data were collected in 2002, so we can simply subtract each participant’s value of yrbrn from the 2002 to compute their age. 2. Political Interest Click to show code First, we’ll transform polintr. ## Recode the four factor levels into two factor levels: ess_finland <- mutate(ess_finland, polintr_bin = recode_factor(polintr, "Not at all interested" = "Low Interest", "Hardly interested" = "Low Interest", "Quite interested" = "High Interest", "Very interested" = "High Interest") ) ## Check the conversion: with(ess_finland, table(old = polintr, new = polintr_bin, useNA = "always")) ## new ## old Low Interest High Interest <NA> ## Very interested 0 144 0 ## Quite interested 0 785 0 ## Hardly interested 842 0 0 ## Not at all interested 228 0 0 ## <NA> 0 0 1 Click for explanation Kestilä (2006) dichotomized polintr by combining the lowest two and highest two categories. So, we don’t actually want to convert the polint variable into a numeric, Likert-type variable. We want polint to be a binary factor. The recode_factor() function from dplyr() will automatically convert our result into a factor. As with the ess_round1.rds data, we will be coming back to this Finnish subsample data in future practical exercises. So, we should save our work by writing the processed dataset to disk. 4.4.15 Use the saveRDS() function to save the processed Finnish subsample data. Click to see code ## Save the processed Finnish data: saveRDS(ess_finland, "ess_finland.rds") Now, we’re finally ready to replicate the regression analysis from Kestilä (2006). Creating a single aggregate score by summing the individual component scores is a pretty silly thing to do, though. So, we won’t reproduce that aspect of the analysis. 4.4.16 Run a series of multiple linear regression analyses with the factor scores you created in 4.4.10 as the dependent variables and the same predictors used by Kestilä (2006). Do your results agree with those reported by Kestilä (2006)? Click to show code ## Predicting 'Trust in Institutions': out_trust_inst <- lm(trust_inst ~ sex + age + eduyrs + polintr_bin + lrscale, data = ess_finland) summary(out_trust_inst) ## ## Call: ## lm(formula = trust_inst ~ sex + age + eduyrs + polintr_bin + ## lrscale, data = ess_finland) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.9499 -0.5102 0.1337 0.6638 2.5919 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.057518 0.124294 -0.463 0.643595 ## sexFemale 0.004091 0.045170 0.091 0.927849 ## age -0.003071 0.001380 -2.225 0.026219 * ## eduyrs 0.023223 0.006388 3.635 0.000286 *** ## polintr_binHigh Interest 0.166860 0.046448 3.592 0.000337 *** ## lrscale 0.058951 0.011232 5.249 1.72e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.9321 on 1734 degrees of freedom ## (260 observations deleted due to missingness) ## Multiple R-squared: 0.04155, Adjusted R-squared: 0.03879 ## F-statistic: 15.03 on 5 and 1734 DF, p-value: 1.78e-14 ## Predicting 'Trust in Politicians': out_trust_pol <- lm(trust_pol ~ sex + age + eduyrs + polintr_bin + lrscale, data = ess_finland) summary(out_trust_pol) ## ## Call: ## lm(formula = trust_pol ~ sex + age + eduyrs + polintr_bin + lrscale, ## data = ess_finland) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.03673 -0.67306 0.05346 0.69666 2.38771 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.165989 0.126840 -1.309 0.19083 ## sexFemale 0.015572 0.046095 0.338 0.73554 ## age -0.009112 0.001409 -6.469 1.28e-10 *** ## eduyrs 0.018476 0.006519 2.834 0.00465 ** ## polintr_binHigh Interest 0.463763 0.047399 9.784 < 2e-16 *** ## lrscale 0.054932 0.011462 4.793 1.79e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.9512 on 1734 degrees of freedom ## (260 observations deleted due to missingness) ## Multiple R-squared: 0.09806, Adjusted R-squared: 0.09546 ## F-statistic: 37.71 on 5 and 1734 DF, p-value: < 2.2e-16 ## Predicting 'Attitudes toward Refugees': out_refugees <- lm(refugees ~ sex + age + eduyrs + polintr_bin + lrscale, data = ess_finland) summary(out_refugees) ## ## Call: ## lm(formula = refugees ~ sex + age + eduyrs + polintr_bin + lrscale, ## data = ess_finland) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.9118 -0.6860 -0.0594 0.6904 4.1044 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -1.690e-01 1.438e-01 -1.175 0.240080 ## sexFemale -4.828e-01 5.181e-02 -9.318 < 2e-16 *** ## age 2.903e-05 1.604e-03 0.018 0.985561 ## eduyrs -2.537e-02 7.459e-03 -3.401 0.000688 *** ## polintr_binHigh Interest -2.131e-01 5.345e-02 -3.986 6.99e-05 *** ## lrscale 9.359e-02 1.296e-02 7.223 7.65e-13 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.06 on 1699 degrees of freedom ## (295 observations deleted due to missingness) ## Multiple R-squared: 0.09535, Adjusted R-squared: 0.09269 ## F-statistic: 35.81 on 5 and 1699 DF, p-value: < 2.2e-16 That does it for our replication of the Kesilä (2006) analyses, but we still have one more topic to consider in this practical. One of the most common applications of EFA is scale development. Given a pool of items without a known factor structure, we try to estimate the underlying latent factors that define the (sub)scales represented by our items. In such applications, we use the factor loading matrix for our optimal solution to make “bright-line” assignments of items to putative factors according to the simple structure represented by the estimated factor loading matrix. In other words, we disregard small factor loadings and assign observed items to only the single latent factor upon which they load most strongly. We then hypothesize that those items are true indicators of that latent factor. We can use confirmatory factor analysis (which you will learn about next week) to test rigorously this hypothesis, but we can already get started by estimating the internal consistency (a type of reliability) of the hypothesized subscales. 4.4.17 Estimate the internal consistency of the three Trust subscales and five Attitudes subscales implied by your EFA solutions from above. Use Cronbach’s Alpha to quantify internal consistency. Use the alpha() function from the psych package to conduct the analysis. Run your analysis on the full ess dataset, not the Finnish subset. Are the subscales implied by your EFA reliable, in the sense of good internal consistency? Note that \\(\\alpha > 0.7\\) is generally considered acceptable, and \\(\\alpha > 0.8\\) is usually considered good. Click to show code ## Run the reliability analysis on the subscale data: ( out <- select(ess, starts_with("stf")) %>% psych::alpha() ) ## ## Reliability analysis ## Call: psych::alpha(x = .) ## ## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r ## 0.79 0.79 0.77 0.44 3.9 0.0023 5.4 1.7 0.41 ## ## 95% confidence boundaries ## lower alpha upper ## Feldt 0.79 0.79 0.8 ## Duhachek 0.79 0.79 0.8 ## ## Reliability if an item is dropped: ## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r ## stfhlth 0.78 0.78 0.73 0.47 3.5 0.0026 0.0064 0.46 ## stfedu 0.76 0.76 0.72 0.45 3.2 0.0028 0.0109 0.44 ## stfeco 0.74 0.74 0.70 0.41 2.8 0.0031 0.0069 0.39 ## stfgov 0.74 0.74 0.69 0.42 2.9 0.0030 0.0035 0.41 ## stfdem 0.75 0.75 0.71 0.43 3.0 0.0029 0.0074 0.40 ## ## Item statistics ## n raw.r std.r r.cor r.drop mean sd ## stfhlth 19481 0.69 0.69 0.56 0.50 5.8 2.3 ## stfedu 18844 0.73 0.73 0.62 0.55 5.9 2.3 ## stfeco 19211 0.78 0.78 0.70 0.63 5.0 2.4 ## stfgov 19106 0.77 0.76 0.69 0.61 4.5 2.3 ## stfdem 19106 0.75 0.75 0.67 0.59 5.7 2.3 Click for explanation Here, we estimate the reliability of the Satisfaction subscale from the Trust analysis. According to our EFA, the Satisfaction subscale should be indicated by the following five variables: stfeco stfgov stfdem stfedu stfhlth We select these variables using the tidy-select function starts_with() to extract all variables beginning with the three characters “stf”. To estimate the internal consistency of this subscale, we simply provide a data frame containing only the subscale data to the alpha() function. The raw_alpha value is the estimate of Cronbach’s Alpha. In this case \\(\\alpha = 0.794\\), so the subscale is pretty reliable. The table labeled “Reliability if an item is dropped” shows what Cronbach’s Alpha would be if each item were excluded from the scale. If this value is notably higher than the raw_alpha value, it could indicate a bad item. Note that reliability is only one aspect of scale quality, though. So, you shouldn’t throw out items just because they perform poorly in reliability analysis. End of In-Class Exercises "],["cfa.html", "5 CFA", " 5 CFA This week, we will introduce confirmatory factor analysis (CFA) and discuss how it differs from EFA. Furthermore, we will revisit the idea of model fit and introduce into the R-package lavaan. Homework before the lecture Watch the Lecture Recording for this week. Complete the Reading for this week, and answer the associated reading questions. Homework before the practical Complete the At-Home Exercises. Practical content During the practical you will work on the In-Class Exercises. "],["lecture-4.html", "5.1 Lecture", " 5.1 Lecture Often, we work with scales that have a validated or hypothesized factor structure. In the former case, the scale structure has been validated through previous psychometric studies. In the latter case, we may have conducted an EFA to estimate the factor structure on prior data, or theory/intuition may suggest a plausible structure. Regardless of how we come to expect a given factor structure, such situations represent confirmatory modeling problems, because we are attempting to empirically confirm an a priori expectation. Hence, exploratory methods like EFA are not appropriate, and we should employ confirmatory modeling techniques. This week we consider one such technique: confirmatory factor analysis (CFA). As the name suggests, CFA is related to the EFA methods we discussed last week in that both methods are flavors of factor analysis. However, the two methods address fundamentally different research questions. Rather than attempting to estimate an unknown factor structure (as in EFA), we now want to compare a hypothesized measurement model (i.e., factor structure) to observed data in order to evaluate the model’s plausibility. 5.1.1 Recording Note: When Caspar discusses the complexity of the second-order CFA model, it’s easy to misunderstand his statements. We need to be careful not to over-generalize. In general, a second-order CFA is not more complex than a first-order CFA. Actually, in most practical applications, the opposite is true. A second-order CFA is more complex than a first-order CFA, when the factors in the first-order CFA are uncorrelated. This is the situation Caspar references in the recording when claiming that the second-order model is more complex. We hardly ever want to fit such first-order CFA, though. The default CFA fully saturates the latent covariance structure. If the factors in the first-order CFA are fully correlated (according to standard practice), and we include a single second-order factor, the following statements hold. If the first-order CFA has more than three factors, the first-order model is more complex than the second-order model. If the first-order model has three or fewer factors, the first- and second-order models are equivalent (due to scaling constraints we need to impose to identify the second-order model). The second-order model cannot be more complex than the first-order model (assuming both models are correctly identified and no extra constraints are imposed). The above statements may not hold in more complex situations (e.g., more than one second-order factor, partially saturated first-order correlation structure, etc.). You can always identify the more complex model by calculating the degrees of freedom for both models. The model with fewer degrees of freedom is more complex. 5.1.2 Slides You can download the lecture slides here "],["reading-4.html", "5.2 Reading", " 5.2 Reading Reference Byrne, B. (2005). Factor analytic models: Viewing the structure of an assessment instrument from three perspectives, Journal of Personality Assessment, 85(1), 17–32. Questions What are the main differences between exploratory factor analysis (EFA) and confirmatory factor analysis (CFA)? In which circumstances should a researcher use EFA, and in which should they use CFA? What are the five main limitations of EFA that CFA overcomes? In which circumstances can a second order CFA model be useful? Consider the following four techniques: PCA, EFA, CFA, second order CFA. For each of the following three research situations, which of the above techniques would you use and why? A researcher has developed a new questionnaire that should measure personality and wants to know how many factors underlie the items in their new measure. A researcher is modeling data collected with a seven-item scale that has been used since the 1960s to measure authoritarianism. A researcher has recorded highest completed level of education, years of education, and highest level of education attempted for all respondents in a survey. The researcher wants to include some operationalization of the concept of ‘education’ in their model but is unsure of which observed variable to use. "],["at-home-exercises-4.html", "5.3 At-Home Exercises", " 5.3 At-Home Exercises This week, we will wrap up our re-analysis of the Kestilä (2006) results. During this practical, you will conduct a CFA of the Trust in Politics items and compare the results to those obtained from your previous EFA- and PCA-based replications of Kestilä (2006). 5.3.1 Load the ESS data. The relevant data are contained in the ess_round1.rds file. This file is in R Data Set (RDS) format. The dataset is already stored as a data frame with the processing and cleaning that you should have done for previous practicals completed. Click to show code ess <- readRDS("ess_round1.rds") Although you may have settled on any number of EFA solutions during the Week 4 In-Class Exercises, we are going to base the following CFA on a three-factor model of Trust in Politics similar to the original PCA results from Kestilä (2006). Note: Unless otherwise specified, all following questions refer to the Trust in Politics items. We will not consider the Attitudes toward Immigration items in these exercises. 5.3.2 Define the lavaan model syntax for the CFA implied by the three-factor EFA solution you found in the Week 4 In-Class Exercises. Covary the three latent factors. Do not specify any mean structure. Save this model syntax as an object in your environment. Click to show code mod_3f <- ' institutions =~ trstlgl + trstplc + trstun + trstep + trstprl satisfaction =~ stfhlth + stfedu + stfeco + stfgov + stfdem politicians =~ pltinvt + pltcare + trstplt ' Click for explanation We don’t have to specify the latent covariances in the model syntax, we can tell lavaan to estimate all latent covariances when we fit the model. 5.3.3 Estimate the CFA model you defined above, and summarize the results. Use the lavaan::cfa() function to estimate the model. Use the default settings for the cfa() function. Request the model fit statistics with the summary by supplying the fit.measures = TRUE argument to summary(). Request the standardized parameter estimates with the summary by supplying the standardized = TRUE argument to summary(). Check the results, and answer the following questions: Does the model fit the data well? How are the latent variances and covariances specified when using the default settings? How is the model identified when using the default settings? Click the code ## Load the lavaan package: library(lavaan) ## Estimate the CFA model: fit_3f <- cfa(mod_3f, data = ess) ## Summarize the fitted model: summary(fit_3f, fit.measures = TRUE, standardized = TRUE) ## lavaan 0.6.16 ended normally after 46 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 29 ## ## Used Total ## Number of observations 14778 19690 ## ## Model Test User Model: ## ## Test statistic 10652.207 ## Degrees of freedom 62 ## P-value (Chi-square) 0.000 ## ## Model Test Baseline Model: ## ## Test statistic 81699.096 ## Degrees of freedom 78 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.870 ## Tucker-Lewis Index (TLI) 0.837 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -371404.658 ## Loglikelihood unrestricted model (H1) -366078.555 ## ## Akaike (AIC) 742867.317 ## Bayesian (BIC) 743087.743 ## Sample-size adjusted Bayesian (SABIC) 742995.583 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.108 ## 90 Percent confidence interval - lower 0.106 ## 90 Percent confidence interval - upper 0.109 ## P-value H_0: RMSEA <= 0.050 0.000 ## P-value H_0: RMSEA >= 0.080 1.000 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.059 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## institutions =~ ## trstlgl 1.000 1.613 0.677 ## trstplc 0.770 0.012 61.866 0.000 1.241 0.567 ## trstun 0.929 0.013 69.227 0.000 1.498 0.642 ## trstep 0.908 0.013 70.929 0.000 1.464 0.660 ## trstprl 1.139 0.014 84.084 0.000 1.837 0.809 ## satisfaction =~ ## stfhlth 1.000 1.173 0.521 ## stfedu 1.106 0.022 50.840 0.000 1.297 0.577 ## stfeco 1.415 0.025 57.214 0.000 1.659 0.713 ## stfgov 1.480 0.025 58.764 0.000 1.736 0.756 ## stfdem 1.384 0.024 57.904 0.000 1.623 0.731 ## politicians =~ ## pltinvt 1.000 0.646 0.613 ## pltcare 1.021 0.016 62.862 0.000 0.660 0.628 ## trstplt 3.012 0.039 76.838 0.000 1.946 0.891 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## institutions ~~ ## satisfaction 1.391 0.032 43.206 0.000 0.736 0.736 ## politicians 0.909 0.018 49.934 0.000 0.872 0.872 ## satisfaction ~~ ## politicians 0.539 0.013 41.053 0.000 0.711 0.711 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .trstlgl 3.068 0.041 75.262 0.000 3.068 0.541 ## .trstplc 3.248 0.041 80.037 0.000 3.248 0.678 ## .trstun 3.197 0.041 77.141 0.000 3.197 0.588 ## .trstep 2.776 0.036 76.243 0.000 2.776 0.564 ## .trstprl 1.776 0.029 61.361 0.000 1.776 0.345 ## .stfhlth 3.695 0.046 79.989 0.000 3.695 0.729 ## .stfedu 3.368 0.043 77.916 0.000 3.368 0.667 ## .stfeco 2.656 0.038 69.070 0.000 2.656 0.491 ## .stfgov 2.264 0.035 64.201 0.000 2.264 0.429 ## .stfdem 2.289 0.034 67.172 0.000 2.289 0.465 ## .pltinvt 0.694 0.009 78.255 0.000 0.694 0.624 ## .pltcare 0.668 0.009 77.562 0.000 0.668 0.605 ## .trstplt 0.978 0.028 34.461 0.000 0.978 0.205 ## institutions 2.601 0.059 44.198 0.000 1.000 1.000 ## satisfaction 1.375 0.044 31.407 0.000 1.000 1.000 ## politicians 0.417 0.011 38.843 0.000 1.000 1.000 Click for explanation No, the model does not seem to fit the data well. The SRMR looks good, but one good looking fit statistic is not enough. The RMSEA, TLI, and CFI are all in the “unacceptable” range. The \\(\\chi^2\\) is highly significant, but we don’t care. The cfa() function is just a wrapper for the lavaan() function with several options set at the defaults you would want for a standard CFA. By default: All latent variances and covariances are freely estimated (due to the argument auto.cov.lv.x = TRUE) The model is identified by fixing the first factor loading of each factor to 1 (due to the argument auto.fix.first = TRUE) To see a full list of the (many) options you can specify to tweak the behavior of lavaan estimation functions run ?lavOptions. Now, we will consider a couple of alternative factor structures for the Trust in Politics CFA. First, we will go extremely simple by estimating a one-factor model wherein all Trust items are explained by a single latent variable. 5.3.4 Define the lavaan model syntax for a one-factor model of the Trust items. Save this syntax as an object in your environment. Click to show code mod_1f <- ' political_trust =~ trstlgl + trstplc + trstun + trstep + trstprl + stfhlth + stfedu + stfeco + stfgov + stfdem + pltinvt + pltcare + trstplt ' 5.3.5 Estimate the one-factor model, and summarize the results. Does this model appear to fit better or worse than the three-factor model? Note: You can use the lavaan::fitMeasures() function to extract only the model fit information from a fitted lavaan object. Click to show code ## Estimate the one factor model: fit_1f <- cfa(mod_1f, data = ess) ## Summarize the results: summary(fit_1f, fit.measures = TRUE) ## lavaan 0.6.16 ended normally after 33 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 26 ## ## Used Total ## Number of observations 14778 19690 ## ## Model Test User Model: ## ## Test statistic 17667.304 ## Degrees of freedom 65 ## P-value (Chi-square) 0.000 ## ## Model Test Baseline Model: ## ## Test statistic 81699.096 ## Degrees of freedom 78 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.784 ## Tucker-Lewis Index (TLI) 0.741 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -374912.206 ## Loglikelihood unrestricted model (H1) -366078.555 ## ## Akaike (AIC) 749876.413 ## Bayesian (BIC) 750074.036 ## Sample-size adjusted Bayesian (SABIC) 749991.410 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.135 ## 90 Percent confidence interval - lower 0.134 ## 90 Percent confidence interval - upper 0.137 ## P-value H_0: RMSEA <= 0.050 0.000 ## P-value H_0: RMSEA >= 0.080 1.000 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.080 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## political_trust =~ ## trstlgl 1.000 ## trstplc 0.774 0.013 57.949 0.000 ## trstun 0.930 0.014 64.200 0.000 ## trstep 0.909 0.014 65.679 0.000 ## trstprl 1.182 0.015 79.401 0.000 ## stfhlth 0.615 0.013 45.947 0.000 ## stfedu 0.695 0.014 51.424 0.000 ## stfeco 0.895 0.014 62.316 0.000 ## stfgov 0.985 0.014 68.200 0.000 ## stfdem 0.998 0.014 70.899 0.000 ## pltinvt 0.382 0.006 59.215 0.000 ## pltcare 0.396 0.006 61.195 0.000 ## trstplt 1.183 0.014 81.716 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .trstlgl 3.370 0.042 79.787 0.000 ## .trstplc 3.410 0.041 82.311 0.000 ## .trstun 3.451 0.043 80.749 0.000 ## .trstep 3.019 0.038 80.272 0.000 ## .trstprl 1.938 0.027 70.878 0.000 ## .stfhlth 4.201 0.050 84.093 0.000 ## .stfedu 3.941 0.047 83.419 0.000 ## .stfeco 3.565 0.044 81.289 0.000 ## .stfgov 3.044 0.038 79.326 0.000 ## .stfdem 2.631 0.034 78.072 0.000 ## .pltinvt 0.775 0.009 82.043 0.000 ## .pltcare 0.743 0.009 81.579 0.000 ## .trstplt 1.548 0.023 67.052 0.000 ## political_trst 2.299 0.055 41.569 0.000 ## Compare fit statistics: fitMeasures(fit_3f) ## npar fmin chisq ## 29.000 0.360 10652.207 ## df pvalue baseline.chisq ## 62.000 0.000 81699.096 ## baseline.df baseline.pvalue cfi ## 78.000 0.000 0.870 ## tli nnfi rfi ## 0.837 0.837 0.836 ## nfi pnfi ifi ## 0.870 0.691 0.870 ## rni logl unrestricted.logl ## 0.870 -371404.658 -366078.555 ## aic bic ntotal ## 742867.317 743087.743 14778.000 ## bic2 rmsea rmsea.ci.lower ## 742995.583 0.108 0.106 ## rmsea.ci.upper rmsea.ci.level rmsea.pvalue ## 0.109 0.900 0.000 ## rmsea.close.h0 rmsea.notclose.pvalue rmsea.notclose.h0 ## 0.050 1.000 0.080 ## rmr rmr_nomean srmr ## 0.255 0.255 0.059 ## srmr_bentler srmr_bentler_nomean crmr ## 0.059 0.059 0.064 ## crmr_nomean srmr_mplus srmr_mplus_nomean ## 0.064 0.059 0.059 ## cn_05 cn_01 gfi ## 113.901 126.971 0.897 ## agfi pgfi mfi ## 0.849 0.611 0.699 ## ecvi ## 0.725 fitMeasures(fit_1f) ## npar fmin chisq ## 26.000 0.598 17667.304 ## df pvalue baseline.chisq ## 65.000 0.000 81699.096 ## baseline.df baseline.pvalue cfi ## 78.000 0.000 0.784 ## tli nnfi rfi ## 0.741 0.741 0.741 ## nfi pnfi ifi ## 0.784 0.653 0.784 ## rni logl unrestricted.logl ## 0.784 -374912.206 -366078.555 ## aic bic ntotal ## 749876.413 750074.036 14778.000 ## bic2 rmsea rmsea.ci.lower ## 749991.410 0.135 0.134 ## rmsea.ci.upper rmsea.ci.level rmsea.pvalue ## 0.137 0.900 0.000 ## rmsea.close.h0 rmsea.notclose.pvalue rmsea.notclose.h0 ## 0.050 1.000 0.080 ## rmr rmr_nomean srmr ## 0.364 0.364 0.080 ## srmr_bentler srmr_bentler_nomean crmr ## 0.080 0.080 0.087 ## crmr_nomean srmr_mplus srmr_mplus_nomean ## 0.087 0.080 0.080 ## cn_05 cn_01 gfi ## 71.949 79.980 0.825 ## agfi pgfi mfi ## 0.756 0.590 0.551 ## ecvi ## 1.199 Click for explanation The one-factor model definitely seems to fit worse than the three-factor model. A second order CFA model is another way of representing the latent structure underlying a set of items. As you read in Byrne (2005), however, the second order CFA is only appropriate in certain circumstances. 5.3.6 Given the CFA results above, would a second order CFA be appropriate for the Trust data? Why or why not? Click for explanation Yes, a second order CFA model is a theoretically appropriate representation of the Trust items. The first order latent variables in the three-factor model are all significantly correlated. The first order latent variables in the three-factor model seem to tap different aspects of some single underlying construct. 5.3.7 Define the lavaan model syntax for a second-order CFA model of the Trust items. Use the three factors defined in 5.3.2 as the first order factors. Click to show code mod_2nd <- ' institutions =~ trstlgl + trstplc + trstun + trstep + trstprl satisfaction =~ stfhlth + stfedu + stfeco + stfgov + stfdem politicians =~ pltinvt + pltcare + trstplt trust =~ politicians + satisfaction + institutions ' Click for explanation To define the second order factor, we use the same syntactic conventions that we employ to define a first order factor. The only differences is that the “indicators” of the second order factor (i.e., the variables listed on the RHS of the =~ operator) are previously defined first order latent variables. 5.3.8 Estimate the second order CFA model, and summarize the results. Does this model fit better or worse than the three-factor model? Is this model more or less complex than the three-factor model? What information can you use to quantify this difference in complexity? Click to show code fit_2nd <- cfa(mod_2nd, data = ess) summary(fit_2nd, fit.measures = TRUE, standardized = TRUE) ## lavaan 0.6.16 ended normally after 44 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 29 ## ## Used Total ## Number of observations 14778 19690 ## ## Model Test User Model: ## ## Test statistic 10652.207 ## Degrees of freedom 62 ## P-value (Chi-square) 0.000 ## ## Model Test Baseline Model: ## ## Test statistic 81699.096 ## Degrees of freedom 78 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.870 ## Tucker-Lewis Index (TLI) 0.837 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -371404.658 ## Loglikelihood unrestricted model (H1) -366078.555 ## ## Akaike (AIC) 742867.317 ## Bayesian (BIC) 743087.743 ## Sample-size adjusted Bayesian (SABIC) 742995.583 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.108 ## 90 Percent confidence interval - lower 0.106 ## 90 Percent confidence interval - upper 0.109 ## P-value H_0: RMSEA <= 0.050 0.000 ## P-value H_0: RMSEA >= 0.080 1.000 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.059 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## institutions =~ ## trstlgl 1.000 1.613 0.677 ## trstplc 0.770 0.012 61.866 0.000 1.241 0.567 ## trstun 0.929 0.013 69.227 0.000 1.498 0.642 ## trstep 0.908 0.013 70.929 0.000 1.464 0.660 ## trstprl 1.139 0.014 84.084 0.000 1.837 0.809 ## satisfaction =~ ## stfhlth 1.000 1.173 0.521 ## stfedu 1.106 0.022 50.840 0.000 1.297 0.577 ## stfeco 1.415 0.025 57.214 0.000 1.659 0.713 ## stfgov 1.480 0.025 58.764 0.000 1.736 0.756 ## stfdem 1.384 0.024 57.904 0.000 1.623 0.731 ## politicians =~ ## pltinvt 1.000 0.646 0.613 ## pltcare 1.021 0.016 62.862 0.000 0.660 0.628 ## trstplt 3.012 0.039 76.838 0.000 1.946 0.891 ## trust =~ ## politicians 1.000 0.918 0.918 ## satisfaction 1.531 0.033 46.494 0.000 0.774 0.774 ## institutions 2.583 0.045 56.796 0.000 0.950 0.950 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .trstlgl 3.068 0.041 75.262 0.000 3.068 0.541 ## .trstplc 3.248 0.041 80.037 0.000 3.248 0.678 ## .trstun 3.197 0.041 77.141 0.000 3.197 0.588 ## .trstep 2.776 0.036 76.243 0.000 2.776 0.564 ## .trstprl 1.776 0.029 61.361 0.000 1.776 0.345 ## .stfhlth 3.695 0.046 79.989 0.000 3.695 0.729 ## .stfedu 3.368 0.043 77.916 0.000 3.368 0.667 ## .stfeco 2.656 0.038 69.070 0.000 2.656 0.491 ## .stfgov 2.264 0.035 64.201 0.000 2.264 0.429 ## .stfdem 2.289 0.034 67.172 0.000 2.289 0.465 ## .pltinvt 0.694 0.009 78.255 0.000 0.694 0.624 ## .pltcare 0.668 0.009 77.562 0.000 0.668 0.605 ## .trstplt 0.978 0.028 34.461 0.000 0.978 0.205 ## .institutions 0.255 0.022 11.691 0.000 0.098 0.098 ## .satisfaction 0.551 0.020 27.846 0.000 0.400 0.400 ## .politicians 0.065 0.004 17.091 0.000 0.157 0.157 ## trust 0.352 0.010 35.005 0.000 1.000 1.000 ## Compare fit between the first and second order models: fitMeasures(fit_3f) ## npar fmin chisq ## 29.000 0.360 10652.207 ## df pvalue baseline.chisq ## 62.000 0.000 81699.096 ## baseline.df baseline.pvalue cfi ## 78.000 0.000 0.870 ## tli nnfi rfi ## 0.837 0.837 0.836 ## nfi pnfi ifi ## 0.870 0.691 0.870 ## rni logl unrestricted.logl ## 0.870 -371404.658 -366078.555 ## aic bic ntotal ## 742867.317 743087.743 14778.000 ## bic2 rmsea rmsea.ci.lower ## 742995.583 0.108 0.106 ## rmsea.ci.upper rmsea.ci.level rmsea.pvalue ## 0.109 0.900 0.000 ## rmsea.close.h0 rmsea.notclose.pvalue rmsea.notclose.h0 ## 0.050 1.000 0.080 ## rmr rmr_nomean srmr ## 0.255 0.255 0.059 ## srmr_bentler srmr_bentler_nomean crmr ## 0.059 0.059 0.064 ## crmr_nomean srmr_mplus srmr_mplus_nomean ## 0.064 0.059 0.059 ## cn_05 cn_01 gfi ## 113.901 126.971 0.897 ## agfi pgfi mfi ## 0.849 0.611 0.699 ## ecvi ## 0.725 fitMeasures(fit_2nd) ## npar fmin chisq ## 29.000 0.360 10652.207 ## df pvalue baseline.chisq ## 62.000 0.000 81699.096 ## baseline.df baseline.pvalue cfi ## 78.000 0.000 0.870 ## tli nnfi rfi ## 0.837 0.837 0.836 ## nfi pnfi ifi ## 0.870 0.691 0.870 ## rni logl unrestricted.logl ## 0.870 -371404.658 -366078.555 ## aic bic ntotal ## 742867.317 743087.743 14778.000 ## bic2 rmsea rmsea.ci.lower ## 742995.583 0.108 0.106 ## rmsea.ci.upper rmsea.ci.level rmsea.pvalue ## 0.109 0.900 0.000 ## rmsea.close.h0 rmsea.notclose.pvalue rmsea.notclose.h0 ## 0.050 1.000 0.080 ## rmr rmr_nomean srmr ## 0.255 0.255 0.059 ## srmr_bentler srmr_bentler_nomean crmr ## 0.059 0.059 0.064 ## crmr_nomean srmr_mplus srmr_mplus_nomean ## 0.064 0.059 0.059 ## cn_05 cn_01 gfi ## 113.901 126.971 0.897 ## agfi pgfi mfi ## 0.849 0.611 0.699 ## ecvi ## 0.725 Click for explanation We don’t have to do anything special here. We can estimate and summarize the second order CFA exactly as we did the first order CFA. You should quickly notice something strange about the model fit statistics compared above. If you don’t see it, consider the following: fitMeasures(fit_3f) - fitMeasures(fit_2nd) ## npar fmin chisq ## 0 0 0 ## df pvalue baseline.chisq ## 0 0 0 ## baseline.df baseline.pvalue cfi ## 0 0 0 ## tli nnfi rfi ## 0 0 0 ## nfi pnfi ifi ## 0 0 0 ## rni logl unrestricted.logl ## 0 0 0 ## aic bic ntotal ## 0 0 0 ## bic2 rmsea rmsea.ci.lower ## 0 0 0 ## rmsea.ci.upper rmsea.ci.level rmsea.pvalue ## 0 0 0 ## rmsea.close.h0 rmsea.notclose.pvalue rmsea.notclose.h0 ## 0 0 0 ## rmr rmr_nomean srmr ## 0 0 0 ## srmr_bentler srmr_bentler_nomean crmr ## 0 0 0 ## crmr_nomean srmr_mplus srmr_mplus_nomean ## 0 0 0 ## cn_05 cn_01 gfi ## 0 0 0 ## agfi pgfi mfi ## 0 0 0 ## ecvi ## 0 The two models produce identical fit statistics! We also see that the degrees of freedom are identical between the two models. Hence, the two models have equal complexity. This result taps into a critical idea in statistical modeling, namely, model equivalency. It turns out the two models we’re comparing here are equivalent in the sense that they are statistically indistinguishable representations of the data. Since this is a very important idea, I want to spend some time discussing it in person. So, spend some time between now and the Week 6 lecture session thinking about the implications of this model equivalence. Specifically, consider the following questions: What do we mean when we say that these two models are equivalent? How is it possible for these two models to be equivalent when one contains an additional latent variable? Why are the degrees of freedom equal for these two models? Why are the fit statistics equal for these two models? We’ll take some time to discuss these ideas in the Week 6 lecture session. End of At-Home Exercises "],["in-class-exercises-4.html", "5.4 In-Class Exercises", " 5.4 In-Class Exercises This week, we will wrap up our re-analysis of the Kestilä (2006) results. During this practical, you will conduct a CFA of the Attitudes toward Immigration items and compare the results to those obtained from your previous EFA- and PCA-based replications of Kestilä (2006). 5.4.1 Load the ESS data. The relevant data are contained in the ess_round1.rds file. Click to show code ess <- readRDS("ess_round1.rds") We are going to conduct a CFA to evaluate the measurement model implied by the five-factor representation of the Attitudes toward Immigration items that you should have found via the EFA you conducted in the Week 4 In-Class Exercises. Caveat: Technically, the following CFA result have no confirmatory value because we’ll be estimating our CFA models from the same data that we used for our EFA. Practicing the techniques will still be useful, though. 5.4.2 Define the lavaan model syntax for the CFA implied by the five-factor solution from 4.4.9. Enforce a simple structure; do not allow any cross-loadings. Covary the five latent factors. Do not specify any mean structure. Save this model syntax as an object in your environment. Hints: You can algorithmically enforce a simple structure by assigning each item to the factor upon which it loads most strongly. You can download the fitted psych::efa() object for the five-factor solution here. The pattern matrix for the five-factor EFA solution in our Week 4 exercises is equivalent to the solution presented in Table 3 of Kestilä (2006). Click to show code mod_5f <- ' ## Immigration Policy: ip =~ imrcntr + eimrcnt + eimpcnt + imsmetn + impcntr + imdfetn ## Social Threat: st =~ imbgeco + imbleco + imwbcnt + imwbcrm + imtcjob + imueclt ## Refugee Policy: rp =~ gvrfgap + imrsprc + rfgbfml + rfggvfn + rfgawrk + rfgfrpc + shrrfg ## Cultural Threat: ct =~ qfimchr + qfimwht + pplstrd + vrtrlg ## Economic Threat: et =~ imwgdwn + imhecop ' Note: We don’t have to specify the latent covariances in the model syntax, we can tell lavaan to estimate all latent covariances when we fit the model. 5.4.3 Estimate the CFA model you defined above, and summarize the results. Use the lavaan::cfa() function to estimate the model. Use the default settings for the cfa() function. Request the model fit statistics with the summary by supplying the fit.measures = TRUE argument to summary(). Request the standardized parameter estimates with the summary by supplying the standardized = TRUE argument to summary(). Check the results, and answer the following questions: Does the model fit the data well? How are the latent variances and covariances specified when using the default settings? How is the model identified when using the default settings? Click to show code ## Load the lavaan package: library(lavaan) ## Estimate the CFA model: fit_5f <- cfa(mod_5f, data = ess) ## Summarize the fitted model: summary(fit_5f, fit.measures = TRUE, standardized = TRUE) ## lavaan 0.6.16 ended normally after 72 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 60 ## ## Used Total ## Number of observations 14243 19690 ## ## Model Test User Model: ## ## Test statistic 18631.556 ## Degrees of freedom 265 ## P-value (Chi-square) 0.000 ## ## Model Test Baseline Model: ## ## Test statistic 159619.058 ## Degrees of freedom 300 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.885 ## Tucker-Lewis Index (TLI) 0.869 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -520035.133 ## Loglikelihood unrestricted model (H1) -510719.354 ## ## Akaike (AIC) 1040190.265 ## Bayesian (BIC) 1040644.106 ## Sample-size adjusted Bayesian (SABIC) 1040453.432 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.070 ## 90 Percent confidence interval - lower 0.069 ## 90 Percent confidence interval - upper 0.071 ## P-value H_0: RMSEA <= 0.050 0.000 ## P-value H_0: RMSEA >= 0.080 0.000 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.048 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## ip =~ ## imrcntr 1.000 0.617 0.748 ## eimrcnt 0.942 0.011 84.943 0.000 0.582 0.696 ## eimpcnt 1.127 0.010 113.413 0.000 0.695 0.898 ## imsmetn 0.982 0.010 98.753 0.000 0.606 0.796 ## impcntr 1.150 0.010 113.623 0.000 0.710 0.900 ## imdfetn 1.132 0.010 111.802 0.000 0.698 0.887 ## st =~ ## imbgeco 1.000 1.608 0.728 ## imbleco 0.826 0.012 69.222 0.000 1.327 0.619 ## imwbcnt 1.046 0.012 88.056 0.000 1.682 0.792 ## imwbcrm 0.713 0.011 63.102 0.000 1.146 0.564 ## imtcjob 0.751 0.011 66.787 0.000 1.207 0.597 ## imueclt 1.008 0.013 78.043 0.000 1.621 0.698 ## rp =~ ## gvrfgap 1.000 0.659 0.610 ## imrsprc 0.855 0.016 51.881 0.000 0.563 0.535 ## rfgbfml 1.047 0.019 56.174 0.000 0.690 0.593 ## rfggvfn 0.849 0.016 51.714 0.000 0.559 0.533 ## rfgawrk 0.653 0.016 41.044 0.000 0.430 0.405 ## rfgfrpc -0.810 0.016 -51.095 0.000 -0.534 -0.525 ## shrrfg -0.999 0.017 -58.381 0.000 -0.658 -0.625 ## ct =~ ## qfimchr 1.000 1.836 0.629 ## qfimwht 0.941 0.017 54.250 0.000 1.728 0.659 ## pplstrd -0.366 0.007 -51.585 0.000 -0.673 -0.600 ## vrtrlg 0.252 0.006 41.294 0.000 0.462 0.443 ## et =~ ## imwgdwn 1.000 0.723 0.667 ## imhecop 1.151 0.023 49.736 0.000 0.832 0.771 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## ip ~~ ## st -0.605 0.012 -48.693 0.000 -0.610 -0.610 ## rp 0.264 0.006 45.566 0.000 0.648 0.648 ## ct 0.634 0.015 41.007 0.000 0.560 0.560 ## et -0.206 0.006 -35.411 0.000 -0.462 -0.462 ## st ~~ ## rp -0.838 0.017 -48.329 0.000 -0.792 -0.792 ## ct -1.622 0.041 -39.091 0.000 -0.550 -0.550 ## et 0.675 0.017 39.083 0.000 0.580 0.580 ## rp ~~ ## ct 0.626 0.018 34.950 0.000 0.518 0.518 ## et -0.233 0.007 -33.007 0.000 -0.490 -0.490 ## ct ~~ ## et -0.592 0.020 -30.127 0.000 -0.446 -0.446 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .imrcntr 0.299 0.004 77.941 0.000 0.299 0.440 ## .eimrcnt 0.359 0.005 79.638 0.000 0.359 0.515 ## .eimpcnt 0.116 0.002 62.821 0.000 0.116 0.193 ## .imsmetn 0.212 0.003 75.580 0.000 0.212 0.366 ## .impcntr 0.119 0.002 62.454 0.000 0.119 0.191 ## .imdfetn 0.132 0.002 65.344 0.000 0.132 0.213 ## .imbgeco 2.288 0.033 70.261 0.000 2.288 0.470 ## .imbleco 2.837 0.037 76.688 0.000 2.837 0.617 ## .imwbcnt 1.677 0.027 63.198 0.000 1.677 0.372 ## .imwbcrm 2.810 0.036 78.612 0.000 2.810 0.682 ## .imtcjob 2.630 0.034 77.524 0.000 2.630 0.643 ## .imueclt 2.761 0.038 72.515 0.000 2.761 0.512 ## .gvrfgap 0.733 0.010 73.584 0.000 0.733 0.628 ## .imrsprc 0.791 0.010 77.119 0.000 0.791 0.714 ## .rfgbfml 0.877 0.012 74.508 0.000 0.877 0.648 ## .rfggvfn 0.788 0.010 77.203 0.000 0.788 0.716 ## .rfgawrk 0.945 0.012 80.870 0.000 0.945 0.836 ## .rfgfrpc 0.749 0.010 77.501 0.000 0.749 0.724 ## .shrrfg 0.676 0.009 72.682 0.000 0.676 0.609 ## .qfimchr 5.142 0.080 64.113 0.000 5.142 0.604 ## .qfimwht 3.891 0.064 60.623 0.000 3.891 0.566 ## .pplstrd 0.804 0.012 67.054 0.000 0.804 0.640 ## .vrtrlg 0.872 0.011 76.990 0.000 0.872 0.804 ## .imwgdwn 0.652 0.012 53.300 0.000 0.652 0.555 ## .imhecop 0.472 0.014 34.353 0.000 0.472 0.405 ## ip 0.381 0.007 51.578 0.000 1.000 1.000 ## st 2.584 0.054 47.795 0.000 1.000 1.000 ## rp 0.434 0.012 36.748 0.000 1.000 1.000 ## ct 3.371 0.096 35.174 0.000 1.000 1.000 ## et 0.523 0.015 34.944 0.000 1.000 1.000 Click for explanation No, the model does not seem to fit the data well. The SRMR looks good, but one good looking fit statistic is not enough. The TLI and CFI are in the “unacceptable” range. RMSEA is in the “questionable” range. The \\(\\chi^2\\) is highly significant, but we don’t care. The cfa() function is just a wrapper for the lavaan() function with several options set at the defaults you would want for a standard CFA. By default: All latent variances and covariances are freely estimated (due to the argument auto.cov.lv.x = TRUE) The model is identified by fixing the first factor loading of each factor to 1 (due to the argument auto.fix.first = TRUE) To see a full list of the (many) options you can specify to tweak the behavior of lavaan estimation functions run ?lavOptions. Now, we will consider a couple of alternative factor structures for the Attitudes toward Immigration CFA. First, we will go extremely simple by estimating a one-factor model wherein all Attitude items are explained by a single latent variable. 5.4.4 Define the lavaan model syntax for a one-factor model of the Immigration items. Save this syntax as an object in your environment. Click to show code mod_1f <- ' ati =~ imrcntr + eimrcnt + eimpcnt + imsmetn + impcntr + imdfetn + imbgeco + imbleco + imwbcnt + imwbcrm + imtcjob + imueclt + gvrfgap + imrsprc + rfgbfml + rfggvfn + rfgawrk + rfgfrpc + shrrfg + qfimchr + qfimwht + pplstrd + vrtrlg + imwgdwn + imhecop ' 5.4.5 Estimate the one-factor model, and summarize the results. Compare the fit measures for the one-factor and five-factor models Which model better fits the data? Note: Remember, you can use the lavaan::fitMeasures() function to extract only the model fit information from a fitted lavaan object. Click to show code ## Estimate the one factor model: fit_1f <- cfa(mod_1f, data = ess) ## Summarize the results: summary(fit_1f) ## lavaan 0.6.16 ended normally after 47 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 50 ## ## Used Total ## Number of observations 14243 19690 ## ## Model Test User Model: ## ## Test statistic 49510.917 ## Degrees of freedom 275 ## P-value (Chi-square) 0.000 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## ati =~ ## imrcntr 1.000 ## eimrcnt 0.937 0.012 78.324 0.000 ## eimpcnt 1.114 0.011 101.263 0.000 ## imsmetn 0.987 0.011 90.990 0.000 ## impcntr 1.147 0.011 102.371 0.000 ## imdfetn 1.153 0.011 103.148 0.000 ## imbgeco -2.055 0.032 -64.749 0.000 ## imbleco -1.625 0.031 -52.533 0.000 ## imwbcnt -2.173 0.030 -71.324 0.000 ## imwbcrm -1.432 0.029 -48.849 0.000 ## imtcjob -1.532 0.029 -52.519 0.000 ## imueclt -2.198 0.033 -65.876 0.000 ## gvrfgap 0.807 0.016 51.746 0.000 ## imrsprc 0.757 0.015 49.790 0.000 ## rfgbfml 0.861 0.017 51.272 0.000 ## rfggvfn 0.722 0.015 47.671 0.000 ## rfgawrk 0.530 0.015 34.448 0.000 ## rfgfrpc -0.755 0.015 -51.462 0.000 ## shrrfg -0.931 0.015 -61.438 0.000 ## qfimchr 1.597 0.042 37.835 0.000 ## qfimwht 1.769 0.038 46.697 0.000 ## pplstrd -0.873 0.016 -53.994 0.000 ## vrtrlg 0.602 0.015 39.940 0.000 ## imwgdwn -0.682 0.016 -43.576 0.000 ## imhecop -0.773 0.016 -49.611 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .imrcntr 0.327 0.004 79.021 0.000 ## .eimrcnt 0.388 0.005 80.422 0.000 ## .eimpcnt 0.161 0.002 70.832 0.000 ## .imsmetn 0.235 0.003 77.101 0.000 ## .impcntr 0.158 0.002 69.688 0.000 ## .imdfetn 0.150 0.002 68.791 0.000 ## .imbgeco 3.381 0.041 82.203 0.000 ## .imbleco 3.666 0.044 83.130 0.000 ## .imwbcnt 2.839 0.035 81.477 0.000 ## .imwbcrm 3.399 0.041 83.334 0.000 ## .imtcjob 3.260 0.039 83.130 0.000 ## .imueclt 3.683 0.045 82.092 0.000 ## .gvrfgap 0.938 0.011 83.176 0.000 ## .imrsprc 0.906 0.011 83.285 0.000 ## .rfgbfml 1.092 0.013 83.203 0.000 ## .rfggvfn 0.917 0.011 83.394 0.000 ## .rfgawrk 1.031 0.012 83.913 0.000 ## .rfgfrpc 0.832 0.010 83.192 0.000 ## .shrrfg 0.803 0.010 82.499 0.000 ## .qfimchr 7.613 0.091 83.803 0.000 ## .qfimwht 5.772 0.069 83.442 0.000 ## .pplstrd 0.988 0.012 83.040 0.000 ## .vrtrlg 0.958 0.011 83.728 0.000 ## .imwgdwn 1.010 0.012 83.583 0.000 ## .imhecop 0.954 0.011 83.294 0.000 ## ati 0.353 0.007 48.941 0.000 ## Compare fit statistics: fitMeasures(fit_5f, fit.measures = c("npar", # Estimated parameters "chisq", "df", "pvalue", # Model fit vs. saturated "cfi", "tli", # Model fit vs. baseline "rmsea", "srmr"), # Model fit vs. saturated output = "text") ## ## Model Test User Model: ## ## Test statistic 18631.556 ## Degrees of freedom 265 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.885 ## Tucker-Lewis Index (TLI) 0.869 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.070 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.048 fitMeasures(fit_1f, fit.measures = c("npar", # Estimated parameters "chisq", "df", "pvalue", # Model fit vs. saturated "cfi", "tli", # Model fit vs. baseline "rmsea", "srmr"), # Model fit vs. saturated output = "text") ## ## Model Test User Model: ## ## Test statistic 49510.917 ## Degrees of freedom 275 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.691 ## Tucker-Lewis Index (TLI) 0.663 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.112 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.087 Click for explanation The one-factor model definitely seems to fit worse than the five-factor model. 5.4.6 Given the CFA results from the five factor model, would a second-order CFA be appropriate for the Attitudes towards Immigration data? Why or why not? Click for explanation Yes, a second-order CFA model is a theoretically appropriate representation of the Attitudes towards Immigration items. The first order latent variables in the five-factor model are all significantly correlated. The first order latent variables in the five-factor model seem to tap different aspects of some single underlying construct. 5.4.7 Define the lavaan model syntax for a second-order CFA model of the Attitudes towards Immigration items, estimate it, and inspect the results. Use the five factors defined in 5.4.2 as the first order factors. Click to show code mod_2o <- paste(mod_5f, 'ati =~ ip + rp + st + ct + et', sep = '\\n') fit_2o <- cfa(mod_2o, data = ess) summary(fit_2o, fit.measures = TRUE) ## lavaan 0.6.16 ended normally after 94 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 55 ## ## Used Total ## Number of observations 14243 19690 ## ## Model Test User Model: ## ## Test statistic 19121.111 ## Degrees of freedom 270 ## P-value (Chi-square) 0.000 ## ## Model Test Baseline Model: ## ## Test statistic 159619.058 ## Degrees of freedom 300 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.882 ## Tucker-Lewis Index (TLI) 0.869 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -520279.910 ## Loglikelihood unrestricted model (H1) -510719.354 ## ## Akaike (AIC) 1040669.820 ## Bayesian (BIC) 1041085.841 ## Sample-size adjusted Bayesian (SABIC) 1040911.056 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.070 ## 90 Percent confidence interval - lower 0.069 ## 90 Percent confidence interval - upper 0.071 ## P-value H_0: RMSEA <= 0.050 0.000 ## P-value H_0: RMSEA >= 0.080 0.000 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.048 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## ip =~ ## imrcntr 1.000 ## eimrcnt 0.943 0.011 85.095 0.000 ## eimpcnt 1.126 0.010 113.523 0.000 ## imsmetn 0.982 0.010 98.910 0.000 ## impcntr 1.149 0.010 113.651 0.000 ## imdfetn 1.130 0.010 111.789 0.000 ## st =~ ## imbgeco 1.000 ## imbleco 0.822 0.012 68.916 0.000 ## imwbcnt 1.047 0.012 88.172 0.000 ## imwbcrm 0.709 0.011 62.846 0.000 ## imtcjob 0.747 0.011 66.424 0.000 ## imueclt 1.013 0.013 78.434 0.000 ## rp =~ ## gvrfgap 1.000 ## imrsprc 0.854 0.017 51.127 0.000 ## rfgbfml 1.048 0.019 55.377 0.000 ## rfggvfn 0.853 0.017 51.170 0.000 ## rfgawrk 0.657 0.016 40.785 0.000 ## rfgfrpc -0.828 0.016 -51.249 0.000 ## shrrfg -1.020 0.017 -58.369 0.000 ## ct =~ ## qfimchr 1.000 ## qfimwht 0.939 0.018 51.902 0.000 ## pplstrd -0.389 0.008 -51.072 0.000 ## vrtrlg 0.271 0.006 41.908 0.000 ## et =~ ## imwgdwn 1.000 ## imhecop 1.158 0.024 48.877 0.000 ## ati =~ ## ip 1.000 ## rp 1.264 0.024 53.732 0.000 ## st -3.123 0.051 -61.058 0.000 ## ct 2.638 0.058 45.467 0.000 ## et -1.000 0.024 -42.490 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .imrcntr 0.299 0.004 77.900 0.000 ## .eimrcnt 0.359 0.005 79.597 0.000 ## .eimpcnt 0.116 0.002 62.698 0.000 ## .imsmetn 0.211 0.003 75.502 0.000 ## .impcntr 0.119 0.002 62.476 0.000 ## .imdfetn 0.133 0.002 65.406 0.000 ## .imbgeco 2.285 0.033 70.158 0.000 ## .imbleco 2.852 0.037 76.762 0.000 ## .imwbcnt 1.668 0.027 62.920 0.000 ## .imwbcrm 2.821 0.036 78.653 0.000 ## .imtcjob 2.646 0.034 77.607 0.000 ## .imueclt 2.734 0.038 72.213 0.000 ## .gvrfgap 0.740 0.010 73.738 0.000 ## .imrsprc 0.797 0.010 77.211 0.000 ## .rfgbfml 0.885 0.012 74.621 0.000 ## .rfggvfn 0.791 0.010 77.189 0.000 ## .rfgawrk 0.946 0.012 80.833 0.000 ## .rfgfrpc 0.741 0.010 77.149 0.000 ## .shrrfg 0.665 0.009 72.020 0.000 ## .qfimchr 5.347 0.081 65.623 0.000 ## .qfimwht 4.084 0.065 62.673 0.000 ## .pplstrd 0.778 0.012 64.838 0.000 ## .vrtrlg 0.854 0.011 75.931 0.000 ## .imwgdwn 0.655 0.012 52.977 0.000 ## .imhecop 0.468 0.014 33.353 0.000 ## .ip 0.177 0.004 44.418 0.000 ## .st 0.596 0.023 26.030 0.000 ## .rp 0.101 0.005 21.784 0.000 ## .ct 1.745 0.060 29.185 0.000 ## .et 0.316 0.010 31.813 0.000 ## ati 0.204 0.005 37.371 0.000 5.4.8 Compare the model fit of the first- and second-order five-factor models using the fitMeasures() function. Which model offers the better fit? Which model is more complex? Click to show code fitMeasures(fit_5f, fit.measures = c("npar", # Estimated parameters "chisq", "df", "pvalue", # Model fit vs. saturated "cfi", "tli", # Model fit vs. baseline "rmsea", "srmr"), # Model fit vs. saturated output = "text") ## ## Model Test User Model: ## ## Test statistic 18631.556 ## Degrees of freedom 265 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.885 ## Tucker-Lewis Index (TLI) 0.869 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.070 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.048 fitMeasures(fit_2o, fit.measures = c("npar", # Estimated parameters "chisq", "df", "pvalue", # Model fit vs. saturated "cfi", "tli", # Model fit vs. baseline "rmsea", "srmr"), # Model fit vs. saturated output = "text") ## ## Model Test User Model: ## ## Test statistic 19121.111 ## Degrees of freedom 270 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.882 ## Tucker-Lewis Index (TLI) 0.869 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.070 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.048 Click for explanation The CFI and TLI are both slightly better in the original five factor model, but the RMSEA and SRMR of both models don’t differ out to three decimal places. As usual, both models have a significant \\(\\chi^2\\), but that doesn’t tell us much. Qualitative comparisons of model fit are fine, but we’d like to have an actual statistical test for these fit differences. As it happens, we have just such a test: a nested model \\(\\Delta \\chi^2\\) test (AKA, chi-squared difference test, change in chi-squared test, likelihood ratio test). In the Week 7 lecture, we’ll cover nested models and tests thereof, but it will be useful to start thinking about these concepts now. Two models are said to be nested if you can define one model by placing constraints on the other model. By way of example, consider the following two CFA models. The second model is nested within the first model, because we can define the second model by fixing the latent covariance to zero in the first model. Notice that the data contain \\(6(6 + 1) / 2 = 21\\) unique pieces of information. The first model estimates 13 parameters, and the second model estimates 12 parameters. Hence the first model has 8 degrees of freedom, and the second model has 9 degrees of freedom. In general, the following must hold whenever Model B is nested within Model A. Model B will have fewer estimated parameters than Model A. Model B will have more degrees of freedom than Model A. Model A will be more complex than model B. Model A will fit the data better than model B. Saturated Model All models are nested within the saturated model, because the saturated model estimates all possible relations among the variables. Regardless of what model we may be considering, we can always convert that model to a saturated model by estimating all possible associations. Hence, all models are nested within the saturated model. Baseline Model Similarly, the baseline model (AKA, independence model) is nested within all other models. In the baseline model, we only estimate the variances of the observed items; all associations are constrained to zero. We can always convert our model to the baseline model by fixing all associations to zero. Hence, the baseline model is nested within all other models. When two models are nested, we can use a \\(\\Delta \\chi^2\\) test to check if the nested model fits significantly worse than its parent model. Whenever we place constraints on the model, the fit will deteriorate, but we want to know if the constraints we imposed to define the nested model have produced too much loss of fit. We can use the anova() function to easily conduct \\(\\Delta \\chi^2\\) tests comparing models that we’ve estimated with cfa() or sem(). 5.4.9 Use the anova() function to compare the five-factor model from 5.4.2 and one-factor model from 5.4.4. Explain what Df, Chisq, Chisq diff, Df diff, and Pr(>Chisq) mean. Which model is more complex? Which model fits better? What is the conclusion of the test? Click to show code anova(fit_1f, fit_5f) Click for explanation The Df column contains the degrees of freedom of each model. Higher df \\(\\Rightarrow\\) Less complex model The Chisq column shows the \\(\\chi^2\\) statistics (AKA, likelihood ratio statistics) for each model. \\(\\chi^2\\) = The ratio of the likelihoods for the estimated model and the saturated model). Larger \\(\\chi^2\\) \\(\\Rightarrow\\) Worse fit Chisq diff is the difference between the two \\(\\chi^2\\) values (i.e., \\(\\Delta \\chi^2\\)). How much better the more complex model fits the data Larger \\(\\Delta \\chi^2\\) values indicate greater losses of fit induced by the constraints needed to define the nested model. Df diff is the difference in the degrees of freedom between the models. Since both models must be estimated from the same pool of variables, this difference also represents the number of parameters that were constrained to define the nested model. Pr(>Chisq) is a p-value for the \\(\\Delta \\chi^2\\) test. \\(H_0: \\Delta \\chi^2 = 0\\) \\(H_1: \\Delta \\chi^2 > 0\\) The five-factor model is more complex than the one-factor model, but the extra complexity is justified The five-factor model fits significantly better than the one-factor model. 5.4.10 Use the anova() function to compare the first- and second-order five-factor models from 5.4.2 and 5.4.7. Which model is more complex? What is the conclusion of the test? Click to show code anova(fit_5f, fit_2o) Click for explanation The first-order model is more complex than the second-order model (df = 265 vs. df = 270), and the extra complexity is necessary. The first-order model fits significantly better than the second-order model. 5.4.11 Based on the results above, would you say that you have successfully confirmed the five-factor structure implied by the EFA? Click for explanation Nope, not so much. The first-order five-factor model may fit the data best out of the three models considered here, but it still fits terribly. None of these models is an adequate representation of the Attitudes toward Immigration items. This result is particularly embarrassing when you consider that we’ve stacked the deck in our favor by using the same data to conduct the EFA and the CFA. When we fail to support the hypothesized measurement model, the confirmatory phase of our analysis is over. At this point, we’ve essentially rejected our hypothesized measurement structure, and that’s the conclusion of our analysis. We don’t have to throw up our hands in despair, however. We can still contribute something useful by modifying the theoretical measurement model through an exploratory, data-driven, post-hoc analysis. We’ll give that a shot below. 5.4.12 Modify the five-factor CFA from 5.4.2 by freeing the following parameters. The residual covariance between imrcntr and eimrcnt These questions both ask about allowing immigration from wealthy countries. It makes sense that answers on these two items share some additional, unique variance above-and-beyond what they contribute to the common factors. The residual covariance between qfimchr and qfimwht These questions are both about imposing qualifications on immigration (specifically Christian religion and “white” race). Click to show code fit_5f_cov <- paste(mod_5f, 'imrcntr ~~ eimrcnt', 'qfimchr ~~ qfimwht', sep = '\\n') %>% cfa(data = ess) summary(fit_5f_cov, fit.measures = TRUE) ## lavaan 0.6.16 ended normally after 77 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 62 ## ## Used Total ## Number of observations 14243 19690 ## ## Model Test User Model: ## ## Test statistic 9740.512 ## Degrees of freedom 263 ## P-value (Chi-square) 0.000 ## ## Model Test Baseline Model: ## ## Test statistic 159619.058 ## Degrees of freedom 300 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.941 ## Tucker-Lewis Index (TLI) 0.932 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -515589.611 ## Loglikelihood unrestricted model (H1) -510719.354 ## ## Akaike (AIC) 1031303.221 ## Bayesian (BIC) 1031772.190 ## Sample-size adjusted Bayesian (SABIC) 1031575.160 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.050 ## 90 Percent confidence interval - lower 0.049 ## 90 Percent confidence interval - upper 0.051 ## P-value H_0: RMSEA <= 0.050 0.280 ## P-value H_0: RMSEA >= 0.080 0.000 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.036 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## ip =~ ## imrcntr 1.000 ## eimrcnt 0.928 0.007 126.255 0.000 ## eimpcnt 1.184 0.011 106.508 0.000 ## imsmetn 1.012 0.011 92.436 0.000 ## impcntr 1.213 0.011 107.078 0.000 ## imdfetn 1.181 0.011 104.566 0.000 ## st =~ ## imbgeco 1.000 ## imbleco 0.826 0.012 69.006 0.000 ## imwbcnt 1.050 0.012 88.051 0.000 ## imwbcrm 0.715 0.011 63.128 0.000 ## imtcjob 0.751 0.011 66.542 0.000 ## imueclt 1.015 0.013 78.256 0.000 ## rp =~ ## gvrfgap 1.000 ## imrsprc 0.858 0.017 51.965 0.000 ## rfgbfml 1.046 0.019 56.104 0.000 ## rfggvfn 0.848 0.016 51.644 0.000 ## rfgawrk 0.652 0.016 40.998 0.000 ## rfgfrpc -0.813 0.016 -51.233 0.000 ## shrrfg -1.002 0.017 -58.499 0.000 ## ct =~ ## qfimchr 1.000 ## qfimwht 0.979 0.020 48.332 0.000 ## pplstrd -0.586 0.014 -40.685 0.000 ## vrtrlg 0.397 0.011 36.273 0.000 ## et =~ ## imwgdwn 1.000 ## imhecop 1.157 0.023 49.549 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## .imrcntr ~~ ## .eimrcnt 0.230 0.004 59.907 0.000 ## .qfimchr ~~ ## .qfimwht 2.558 0.064 40.233 0.000 ## ip ~~ ## st -0.580 0.012 -48.041 0.000 ## rp 0.255 0.006 45.185 0.000 ## ct 0.467 0.014 34.425 0.000 ## et -0.197 0.006 -35.077 0.000 ## st ~~ ## rp -0.835 0.017 -48.285 0.000 ## ct -1.394 0.040 -35.128 0.000 ## et 0.670 0.017 38.935 0.000 ## rp ~~ ## ct 0.538 0.017 32.407 0.000 ## et -0.232 0.007 -32.949 0.000 ## ct ~~ ## et -0.469 0.017 -27.959 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .imrcntr 0.330 0.004 78.903 0.000 ## .eimrcnt 0.396 0.005 80.392 0.000 ## .eimpcnt 0.109 0.002 60.401 0.000 ## .imsmetn 0.220 0.003 75.979 0.000 ## .impcntr 0.107 0.002 58.874 0.000 ## .imdfetn 0.131 0.002 64.630 0.000 ## .imbgeco 2.301 0.033 70.568 0.000 ## .imbleco 2.845 0.037 76.832 0.000 ## .imwbcnt 1.669 0.026 63.272 0.000 ## .imwbcrm 2.808 0.036 78.659 0.000 ## .imtcjob 2.639 0.034 77.663 0.000 ## .imueclt 2.741 0.038 72.463 0.000 ## .gvrfgap 0.734 0.010 73.743 0.000 ## .imrsprc 0.790 0.010 77.164 0.000 ## .rfgbfml 0.880 0.012 74.676 0.000 ## .rfggvfn 0.790 0.010 77.322 0.000 ## .rfgawrk 0.946 0.012 80.924 0.000 ## .rfgfrpc 0.747 0.010 77.519 0.000 ## .shrrfg 0.674 0.009 72.713 0.000 ## .qfimchr 6.815 0.090 75.362 0.000 ## .qfimwht 5.250 0.072 73.378 0.000 ## .pplstrd 0.674 0.013 52.766 0.000 ## .vrtrlg 0.818 0.011 73.191 0.000 ## .imwgdwn 0.655 0.012 53.496 0.000 ## .imhecop 0.468 0.014 33.845 0.000 ## ip 0.350 0.007 48.646 0.000 ## st 2.571 0.054 47.662 0.000 ## rp 0.433 0.012 36.718 0.000 ## ct 1.698 0.073 23.296 0.000 ## et 0.520 0.015 34.814 0.000 5.4.13 Evaluate the model modifications. Did the model fit significantly improve? Is the fit of the modified model acceptable? Click to show code anova(fit_5f_cov, fit_5f) fitMeasures(fit_5f_cov) ## npar fmin chisq ## 62.000 0.342 9740.512 ## df pvalue baseline.chisq ## 263.000 0.000 159619.058 ## baseline.df baseline.pvalue cfi ## 300.000 0.000 0.941 ## tli nnfi rfi ## 0.932 0.932 0.930 ## nfi pnfi ifi ## 0.939 0.823 0.941 ## rni logl unrestricted.logl ## 0.941 -515589.611 -510719.354 ## aic bic ntotal ## 1031303.221 1031772.190 14243.000 ## bic2 rmsea rmsea.ci.lower ## 1031575.160 0.050 0.049 ## rmsea.ci.upper rmsea.ci.level rmsea.pvalue ## 0.051 0.900 0.280 ## rmsea.close.h0 rmsea.notclose.pvalue rmsea.notclose.h0 ## 0.050 0.000 0.080 ## rmr rmr_nomean srmr ## 0.103 0.103 0.036 ## srmr_bentler srmr_bentler_nomean crmr ## 0.036 0.036 0.037 ## crmr_nomean srmr_mplus srmr_mplus_nomean ## 0.037 0.036 0.036 ## cn_05 cn_01 gfi ## 442.344 467.858 0.944 ## agfi pgfi mfi ## 0.931 0.764 0.717 ## ecvi ## 0.693 Click for explanation Yes, the model fit improved significantly. In this case, the original five-factor model is nested within the modified model. So, our \\(\\Delta \\chi^2\\) test is evaluating the improvement in fit contributed by freeing the two residual covariances. The \\(\\Delta \\chi^2\\) test is significant, so we can conclude that including the two new parameter estimates has significantly improved the model fit. I.e., Estimating these two residual covariances is “worth it” in the sense of balancing model fit and model complexity. Also, the fit of the modified model is now acceptable. Caveat If we had found this result when testing our original model, we would be well-situated to proceed with our analysis. In this case, however, we are no longer justified in generalizing these estimates to the population. We only arrived at this well-fitting model by modifying our original theoretical model to better fit the data using estimates derived from those same data to guide our model modifications. We’ve conducted this post-hoc analysis to help inform future research, and this result is useful as a starting point for future studies. Now, anyone analyzing these scales in the future could incorporate these residual covariances into their initial theoretical model. Basically, we conduct these types of post-hoc analyses to help future researchers learn from our mistakes. End of In-Class Exercises "],["full-sem.html", "6 Full SEM", " 6 Full SEM This week, we will focus on integrating all of the disparate methods we’ve covered so far into full-fledged structural equation models. Homework before the lecture Watch the Lecture Recording for this week. Complete the Reading for this week, and answer the associated reading questions. Homework before the practical Complete the At-Home Exercises. Practical content During the practical you will work on the In-Class Exercises. "],["lecture-5.html", "6.1 Lecture", " 6.1 Lecture This week, we will begin with our final theme and discuss structural equation modeling (SEM). This powerful technique joins the strengths of CFA and path analysis to produce a highly flexible and theoretically appealing modeling tool. Essentially, SEM allows us to build structural path models using the latent variables defined by a CFA. 6.1.1 Recording 6.1.2 Slides You can download the lectures slides here "],["reading-5.html", "6.2 Reading", " 6.2 Reading Reference Weston, R. & Gore, P. A. (2006). A brief guide to structural equation modeling. The Counseling Psychologist 34, 719–752. Notes: This article is quite general and provides an overview of things we have discussed so far in this course. This article also also adds an important new idea: combining factor analysis with path modeling to produce a full Structural Equation Model (SEM). Skip the part on GFI (p. 741). The GFI has been shown to be too dependent on sample size and is not recommended any longer. Skip the part on missing data. There is nothing wrong with this section, but missing data analysis is a broad and difficult topic that we cannot adequately cover in this course. If you would like to learn more about missing data and how to treat them, you can take two courses offered by our department: Conducting a Survey Missing Data Theory and Causal Effects Questions The authors state three similarities and two big differences between SEM and other multivariate statistical techniques (e.g., ANCOVA, regression). What are these similarities and differences? Do you agree with the relative strengths and weaknesses of SEM vs. other methods that the authors present? The authors miss at least one additional advantage of SEM over other multivariate methods. What is this missing advantage? Explain what the terms “measurement model” and “structural model” mean in the SEM context. What are the 6 steps of doing an SEM-based analysis given by the authors? The authors claim that testing an SEM using cross-validation is a good idea. When is cross-validation helpful in SEM? Hint: You may have to do some independent (internet, literature) research to learn how cross-validation can be implemented in SEM. "],["at-home-exercises-5.html", "6.3 At-Home Exercises", " 6.3 At-Home Exercises This week, we’ll take another look at the Kestilä (2006) results. During this practical, you will conduct an SEM to replicate the regression analysis of the Finnish data that you conducted in the Week 4 In-Class Exercises. 6.3.1 Load the Finnish subsample of ESS data. The relevant data are contained in the ess_finland.rds file. These are the processed Finnish subsample data from the Week 4 exercises. Note: Unless otherwise noted, all the following analyses use these data. Click to show code ess_fin <- readRDS("ess_finland.rds") We need to do a little data processing before we can fit the regression model. At the moment, lavaan will not automatically convert a factor variable into dummy codes. So, we need to create explicit dummy codes for the two factors we’ll use as predictors in our regression analysis: sex and political orientation. 6.3.2 Convert the sex and political interest factors into dummy codes. Click to show code library(dplyr) ## Create a dummy codes by broadcasting a logical test on the factor levels: ess_fin <- mutate(ess_fin, female = ifelse(sex == "Female", 1, 0), hi_pol_interest = ifelse(polintr_bin == "High Interest", 1, 0) ) ## Check the results: with(ess_fin, table(dummy = female, factor = sex)) ## factor ## dummy Male Female ## 0 960 0 ## 1 0 1040 with(ess_fin, table(dummy = hi_pol_interest, factor = polintr_bin)) ## factor ## dummy Low Interest High Interest ## 0 1070 0 ## 1 0 929 Click for explanation In R, we have several ways of converting a factor into an appropriate set of dummy codes. We could use the dplyr::recode() function as we did last week. We can use the model.matrix() function to define a design matrix based on the inherent contrast attribute of the factor. Missing data will cause problems here. We can us as.numeric() to revert the factor to its underlying numeric representation {Male = 1, Female = 2} and use arithmetic to convert {1, 2} \\(\\rightarrow\\) {0, 1}. When our factor only has two levels, though, the ifelse() function is the simplest way. We are now ready to estimate our latent regression model. Specifically, we want to combine the three OLS regression models that you ran in 4.4.16 into a single SEM that we will estimate in lavaan. The following path diagram shows the intended theoretical model. Although the variances are not included in this path diagram, all variables in the model (including the observed predictor variables) are random. 6.3.3 Define the lavaan model syntax for the SEM shown above. Use the definition of the institutions, satsifaction, and politicians factors from 5.3.2 to define the DVs. Covary the three latent factors. Covary the five predictors. Click to show code mod_sem <- ' ## Define the latent DVs: institutions =~ trstlgl + trstplc + trstun + trstep + trstprl satisfaction =~ stfhlth + stfedu + stfeco + stfgov + stfdem politicians =~ pltinvt + pltcare + trstplt ## Specify the structural relations: institutions + satisfaction + politicians ~ female + age + eduyrs + hi_pol_interest + lrscale ' Click for explanation We simply need to add a line defining the latent regression paths to our old CFA syntax. We don’t need to specify the covariances in the syntax. We can use options in the sem() function to request those estimates. 6.3.4 Estimate the SEM, and summarize the results. Fit the model to the processed Finnish subsample from above. Estimate the model using lavaan::sem(). Request the standardized parameter estimates with the summary. Request the \\(R^2\\) estimates with the summary. Click to show code library(lavaan) ## Fit the SEM: fit_sem <- sem(mod_sem, data = ess_fin, fixed.x = FALSE) ## Summarize the results: summary(fit_sem, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE) ## lavaan 0.6.16 ended normally after 82 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 59 ## ## Used Total ## Number of observations 1740 2000 ## ## Model Test User Model: ## ## Test statistic 1287.421 ## Degrees of freedom 112 ## P-value (Chi-square) 0.000 ## ## Model Test Baseline Model: ## ## Test statistic 10534.649 ## Degrees of freedom 143 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.887 ## Tucker-Lewis Index (TLI) 0.856 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -57914.779 ## Loglikelihood unrestricted model (H1) -57271.068 ## ## Akaike (AIC) 115947.557 ## Bayesian (BIC) 116269.794 ## Sample-size adjusted Bayesian (SABIC) 116082.357 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.078 ## 90 Percent confidence interval - lower 0.074 ## 90 Percent confidence interval - upper 0.082 ## P-value H_0: RMSEA <= 0.050 0.000 ## P-value H_0: RMSEA >= 0.080 0.160 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.045 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## institutions =~ ## trstlgl 1.000 1.418 0.669 ## trstplc 0.609 0.031 19.403 0.000 0.863 0.508 ## trstun 0.887 0.038 23.484 0.000 1.257 0.626 ## trstep 1.134 0.041 27.652 0.000 1.607 0.755 ## trstprl 1.192 0.040 29.444 0.000 1.689 0.815 ## satisfaction =~ ## stfhlth 1.000 0.979 0.497 ## stfedu 0.602 0.043 13.872 0.000 0.589 0.416 ## stfeco 1.266 0.067 18.848 0.000 1.240 0.681 ## stfgov 1.639 0.079 20.638 0.000 1.605 0.846 ## stfdem 1.521 0.075 20.180 0.000 1.489 0.793 ## politicians =~ ## pltinvt 1.000 0.567 0.566 ## pltcare 0.953 0.048 19.653 0.000 0.540 0.590 ## trstplt 3.281 0.133 24.675 0.000 1.860 0.915 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## institutions ~ ## female 0.019 0.073 0.259 0.796 0.013 0.007 ## age -0.008 0.002 -3.740 0.000 -0.006 -0.105 ## eduyrs 0.034 0.010 3.233 0.001 0.024 0.091 ## hi_pol_interst 0.358 0.076 4.730 0.000 0.253 0.126 ## lrscale 0.104 0.018 5.634 0.000 0.073 0.147 ## satisfaction ~ ## female -0.147 0.050 -2.910 0.004 -0.150 -0.075 ## age -0.007 0.002 -4.598 0.000 -0.007 -0.129 ## eduyrs 0.005 0.007 0.775 0.439 0.006 0.022 ## hi_pol_interst 0.164 0.052 3.162 0.002 0.167 0.084 ## lrscale 0.099 0.013 7.501 0.000 0.101 0.202 ## politicians ~ ## female 0.010 0.029 0.349 0.727 0.018 0.009 ## age -0.004 0.001 -4.490 0.000 -0.007 -0.124 ## eduyrs 0.007 0.004 1.697 0.090 0.012 0.047 ## hi_pol_interst 0.258 0.031 8.364 0.000 0.455 0.227 ## lrscale 0.039 0.007 5.370 0.000 0.068 0.138 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .institutions ~~ ## .satisfaction 1.030 0.069 14.933 0.000 0.796 0.796 ## .politicians 0.675 0.041 16.628 0.000 0.908 0.908 ## .satisfaction ~~ ## .politicians 0.365 0.027 13.544 0.000 0.713 0.713 ## female ~~ ## age 0.071 0.212 0.335 0.738 0.071 0.008 ## eduyrs 0.179 0.046 3.869 0.000 0.179 0.093 ## hi_pol_interst -0.017 0.006 -2.767 0.006 -0.017 -0.066 ## lrscale -0.032 0.024 -1.316 0.188 -0.032 -0.032 ## age ~~ ## eduyrs -22.750 1.722 -13.212 0.000 -22.750 -0.334 ## hi_pol_interst 1.377 0.215 6.413 0.000 1.377 0.156 ## lrscale 1.774 0.853 2.079 0.038 1.774 0.050 ## eduyrs ~~ ## hi_pol_interst 0.270 0.047 5.787 0.000 0.270 0.140 ## lrscale 0.735 0.186 3.946 0.000 0.735 0.095 ## hi_pol_interest ~~ ## lrscale 0.016 0.024 0.672 0.501 0.016 0.016 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .trstlgl 2.477 0.093 26.743 0.000 2.477 0.552 ## .trstplc 2.140 0.076 28.334 0.000 2.140 0.742 ## .trstun 2.453 0.090 27.322 0.000 2.453 0.608 ## .trstep 1.950 0.078 24.906 0.000 1.950 0.430 ## .trstprl 1.443 0.064 22.437 0.000 1.443 0.336 ## .stfhlth 2.922 0.104 28.103 0.000 2.922 0.753 ## .stfedu 1.663 0.058 28.613 0.000 1.663 0.827 ## .stfeco 1.775 0.069 25.755 0.000 1.775 0.536 ## .stfgov 1.020 0.056 18.371 0.000 1.020 0.284 ## .stfdem 1.307 0.060 21.953 0.000 1.307 0.371 ## .pltinvt 0.682 0.024 27.818 0.000 0.682 0.680 ## .pltcare 0.547 0.020 27.582 0.000 0.547 0.652 ## .trstplt 0.672 0.069 9.676 0.000 0.672 0.163 ## .institutions 1.881 0.125 15.077 0.000 0.936 0.936 ## .satisfaction 0.892 0.086 10.386 0.000 0.930 0.930 ## .politicians 0.294 0.024 12.224 0.000 0.914 0.914 ## female 0.250 0.008 29.496 0.000 0.250 1.000 ## age 313.238 10.620 29.496 0.000 313.238 1.000 ## eduyrs 14.818 0.502 29.496 0.000 14.818 1.000 ## hi_pol_interst 0.250 0.008 29.496 0.000 0.250 1.000 ## lrscale 4.034 0.137 29.496 0.000 4.034 1.000 ## ## R-Square: ## Estimate ## trstlgl 0.448 ## trstplc 0.258 ## trstun 0.392 ## trstep 0.570 ## trstprl 0.664 ## stfhlth 0.247 ## stfedu 0.173 ## stfeco 0.464 ## stfgov 0.716 ## stfdem 0.629 ## pltinvt 0.320 ## pltcare 0.348 ## trstplt 0.837 ## institutions 0.064 ## satisfaction 0.070 ## politicians 0.086 Click for explanation The fixed.x = FALSE argument tells lavaan to model the predictors as random variables. By default, lavaan will covary any random predictor variables. So, we don’t need to make any other changes to the usual procedure. 6.3.5 Finally, we will rerun the latent regression model from above as a path model with the factor scores from 4.4.10 acting as the DVs. Rerun the above SEM as a path model wherein the EFA-derived Trust in Institutions, Satisfaction with Political Systems, and Trust in Politicians factor scores act as the DVs. Request the standardized parameter estimates with the summary. Request the \\(R^2\\) estimates with the summary. Click to show code ## Define the model syntax for the path analysis: mod_pa <- ' trust_inst + satisfy + trust_pol ~ female + age + eduyrs + hi_pol_interest + lrscale' ## Estimate the path model: fit_pa <- sem(mod_pa, data = ess_fin, fixed.x = FALSE) ## Summarize the results: summary(fit_pa, standardized = TRUE, rsquare = TRUE) ## lavaan 0.6.16 ended normally after 44 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 36 ## ## Used Total ## Number of observations 1740 2000 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## trust_inst ~ ## female 0.004 0.045 0.091 0.928 0.004 0.002 ## age -0.003 0.001 -2.229 0.026 -0.003 -0.057 ## eduyrs 0.023 0.006 3.642 0.000 0.023 0.094 ## hi_pol_interst 0.167 0.046 3.599 0.000 0.167 0.088 ## lrscale 0.059 0.011 5.258 0.000 0.059 0.125 ## satisfy ~ ## female -0.125 0.040 -3.115 0.002 -0.125 -0.073 ## age -0.005 0.001 -4.102 0.000 -0.005 -0.105 ## eduyrs -0.003 0.006 -0.534 0.594 -0.003 -0.014 ## hi_pol_interst 0.073 0.041 1.782 0.075 0.073 0.043 ## lrscale 0.085 0.010 8.510 0.000 0.085 0.200 ## trust_pol ~ ## female 0.016 0.046 0.338 0.735 0.016 0.008 ## age -0.009 0.001 -6.480 0.000 -0.009 -0.161 ## eduyrs 0.018 0.007 2.839 0.005 0.018 0.071 ## hi_pol_interst 0.464 0.047 9.801 0.000 0.464 0.232 ## lrscale 0.055 0.011 4.801 0.000 0.055 0.110 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .trust_inst ~~ ## .satisfy 0.437 0.021 20.609 0.000 0.437 0.568 ## .trust_pol 0.498 0.024 20.480 0.000 0.498 0.564 ## .satisfy ~~ ## .trust_pol 0.367 0.021 17.664 0.000 0.367 0.467 ## female ~~ ## age 0.071 0.212 0.335 0.738 0.071 0.008 ## eduyrs 0.179 0.046 3.869 0.000 0.179 0.093 ## hi_pol_interst -0.017 0.006 -2.767 0.006 -0.017 -0.066 ## lrscale -0.032 0.024 -1.316 0.188 -0.032 -0.032 ## age ~~ ## eduyrs -22.750 1.722 -13.212 0.000 -22.750 -0.334 ## hi_pol_interst 1.377 0.215 6.413 0.000 1.377 0.156 ## lrscale 1.774 0.853 2.079 0.038 1.774 0.050 ## eduyrs ~~ ## hi_pol_interst 0.270 0.047 5.787 0.000 0.270 0.140 ## lrscale 0.735 0.186 3.946 0.000 0.735 0.095 ## hi_pol_interest ~~ ## lrscale 0.016 0.024 0.672 0.501 0.016 0.016 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .trust_inst 0.866 0.029 29.496 0.000 0.866 0.958 ## .satisfy 0.684 0.023 29.496 0.000 0.684 0.945 ## .trust_pol 0.902 0.031 29.496 0.000 0.902 0.902 ## female 0.250 0.008 29.496 0.000 0.250 1.000 ## age 313.238 10.620 29.496 0.000 313.238 1.000 ## eduyrs 14.818 0.502 29.496 0.000 14.818 1.000 ## hi_pol_interst 0.250 0.008 29.496 0.000 0.250 1.000 ## lrscale 4.034 0.137 29.496 0.000 4.034 1.000 ## ## R-Square: ## Estimate ## trust_inst 0.042 ## satisfy 0.055 ## trust_pol 0.098 Click to show explanation We don’t so anything particularly special here. We simply rerun our latent regression as a path analysis with the EFA-derived factor scores as the DVs. 6.3.6 Compare the results from the path analysis to the SEM-based results. Does it matter whether we use a latent variable or a factor score to define the DV? Hint: When comparing parameter estimates, use the fully standardized estimates (i.e., the values in the column labeled Std.all). Click to show code Note: The “supportFunction.R” script that we source below isn’t a necessary part of the solution. This script defines a bunch of convenience functions. One of these functions, partSummary(), allows us to print selected pieces of the model summary. ## Source a script of convenience function definitions: source("supportFunctions.R") ## View the regression estimates from the SEM: partSummary(fit_sem, 8, standardized = TRUE) ## Regressions: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## institutions ~ ## female 0.019 0.073 0.259 0.796 0.013 0.007 ## age -0.008 0.002 -3.740 0.000 -0.006 -0.105 ## eduyrs 0.034 0.010 3.233 0.001 0.024 0.091 ## hi_pol_interst 0.358 0.076 4.730 0.000 0.253 0.126 ## lrscale 0.104 0.018 5.634 0.000 0.073 0.147 ## satisfaction ~ ## female -0.147 0.050 -2.910 0.004 -0.150 -0.075 ## age -0.007 0.002 -4.598 0.000 -0.007 -0.129 ## eduyrs 0.005 0.007 0.775 0.439 0.006 0.022 ## hi_pol_interst 0.164 0.052 3.162 0.002 0.167 0.084 ## lrscale 0.099 0.013 7.501 0.000 0.101 0.202 ## politicians ~ ## female 0.010 0.029 0.349 0.727 0.018 0.009 ## age -0.004 0.001 -4.490 0.000 -0.007 -0.124 ## eduyrs 0.007 0.004 1.697 0.090 0.012 0.047 ## hi_pol_interst 0.258 0.031 8.364 0.000 0.455 0.227 ## lrscale 0.039 0.007 5.370 0.000 0.068 0.138 ## View the regression estimates from the path analysis: partSummary(fit_pa, 7, standardized = TRUE) ## Regressions: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## trust_inst ~ ## female 0.004 0.045 0.091 0.928 0.004 0.002 ## age -0.003 0.001 -2.229 0.026 -0.003 -0.057 ## eduyrs 0.023 0.006 3.642 0.000 0.023 0.094 ## hi_pol_interst 0.167 0.046 3.599 0.000 0.167 0.088 ## lrscale 0.059 0.011 5.258 0.000 0.059 0.125 ## satisfy ~ ## female -0.125 0.040 -3.115 0.002 -0.125 -0.073 ## age -0.005 0.001 -4.102 0.000 -0.005 -0.105 ## eduyrs -0.003 0.006 -0.534 0.594 -0.003 -0.014 ## hi_pol_interst 0.073 0.041 1.782 0.075 0.073 0.043 ## lrscale 0.085 0.010 8.510 0.000 0.085 0.200 ## trust_pol ~ ## female 0.016 0.046 0.338 0.735 0.016 0.008 ## age -0.009 0.001 -6.480 0.000 -0.009 -0.161 ## eduyrs 0.018 0.007 2.839 0.005 0.018 0.071 ## hi_pol_interst 0.464 0.047 9.801 0.000 0.464 0.232 ## lrscale 0.055 0.011 4.801 0.000 0.055 0.110 ## View the R-squared estimates from the SEM: partSummary(fit_sem, 11, rsquare = TRUE) ## R-Square: ## Estimate ## trstlgl 0.448 ## trstplc 0.258 ## trstun 0.392 ## trstep 0.570 ## trstprl 0.664 ## stfhlth 0.247 ## stfedu 0.173 ## stfeco 0.464 ## stfgov 0.716 ## stfdem 0.629 ## pltinvt 0.320 ## pltcare 0.348 ## trstplt 0.837 ## institutions 0.064 ## satisfaction 0.070 ## politicians 0.086 ## View the R-squared estimates from the SEM: partSummary(fit_pa, 10, rsquare = TRUE) ## R-Square: ## Estimate ## trust_inst 0.042 ## satisfy 0.055 ## trust_pol 0.098 Click for explanation It certainly looks like the way we define the DV has a meaningful impact. The patterns of significance differ between the two sets of regression slopes, and the \\(R^2\\) values are larger for the Institutions and Satisfaction factors in the SEM, and the \\(R^2\\) for the Politicians factor is higher in the path analysis. End of At-Home Exercises "],["in-class-exercises-5.html", "6.4 In-Class Exercises", " 6.4 In-Class Exercises In these exercises, you will use full structural equation modeling (SEM) to evaluate the Theory of Reasoned Action (TORA), which is a popular psychological theory of social behavior developed by Ajzen and Fishbein. The theory states that actual behavior is predicted by behavioral intention, which is in turn predicted by the attitude toward the behavior and subjective norms about the behavior. Later, a third determinant was added, perceived behavioral control. The extent to which people feel that they have control over their behavior also influences their behavior. The data we will use for this practical are available in the toradata.csv file. These data were synthesized according to the results of Reinecke (1998)’s investigation of condom use by young people between 16 and 24 years old. The data contain the following variables: respnr: Numeric participant ID behavior: The dependent variable condom use Measured on a 5-point frequency scale (How often do you…) intent: A single item assessing behavioral intention Measured on a similar 5-point scale (In general, do you intend to…). attit_1:attit_3: Three indicators of attitudes about condom use Measured on a 5-point rating scale (e.g., using a condom is awkward) norm_1:norm_3: Three indicators of social norms about condom use Measured on a 5-point rating scale (e.g., I think most of my friends would use…) control_1:control_3: Three indicators of perceived behavioral control Measured on a 5-point rating scale (e.g., I know well how to use a condom) sex: Binary factor indicating biological sex 6.4.1 Load the data contained in the toradata.csv file. Click to show code condom <- read.csv("toradata.csv", stringsAsFactors = TRUE) 6.4.2 The data contain multiple indicators of attitudes, norms, and control. Run a CFA for these three latent variables. Correlate the latent factors. Do the data support the measurement model for these latent factors? Are the three latent factors significantly correlated? Is it reasonable to proceed with our evaluation of the TORA theory? Click to show code library(lavaan) mod_cfa <- ' attitudes =~ attit_1 + attit_2 + attit_3 norms =~ norm_1 + norm_2 + norm_3 control =~ control_1 + control_2 + control_3 ' fit <- cfa(mod_cfa, data = condom) summary(fit, fit.measures = TRUE) ## lavaan 0.6.16 ended normally after 29 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 21 ## ## Number of observations 250 ## ## Model Test User Model: ## ## Test statistic 35.611 ## Degrees of freedom 24 ## P-value (Chi-square) 0.060 ## ## Model Test Baseline Model: ## ## Test statistic 910.621 ## Degrees of freedom 36 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.987 ## Tucker-Lewis Index (TLI) 0.980 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -2998.290 ## Loglikelihood unrestricted model (H1) -2980.484 ## ## Akaike (AIC) 6038.580 ## Bayesian (BIC) 6112.530 ## Sample-size adjusted Bayesian (SABIC) 6045.959 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.044 ## 90 Percent confidence interval - lower 0.000 ## 90 Percent confidence interval - upper 0.073 ## P-value H_0: RMSEA <= 0.050 0.599 ## P-value H_0: RMSEA >= 0.080 0.017 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.037 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## attitudes =~ ## attit_1 1.000 ## attit_2 1.036 0.068 15.308 0.000 ## attit_3 -1.002 0.067 -14.856 0.000 ## norms =~ ## norm_1 1.000 ## norm_2 1.031 0.098 10.574 0.000 ## norm_3 0.932 0.093 10.013 0.000 ## control =~ ## control_1 1.000 ## control_2 0.862 0.129 6.699 0.000 ## control_3 0.968 0.133 7.290 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## attitudes ~~ ## norms 0.340 0.069 4.957 0.000 ## control 0.475 0.073 6.468 0.000 ## norms ~~ ## control 0.338 0.064 5.254 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 0.418 0.052 8.047 0.000 ## .attit_2 0.310 0.047 6.633 0.000 ## .attit_3 0.369 0.049 7.577 0.000 ## .norm_1 0.504 0.071 7.130 0.000 ## .norm_2 0.469 0.071 6.591 0.000 ## .norm_3 0.635 0.075 8.465 0.000 ## .control_1 0.614 0.078 7.905 0.000 ## .control_2 0.865 0.091 9.520 0.000 ## .control_3 0.762 0.087 8.758 0.000 ## attitudes 0.885 0.116 7.620 0.000 ## norms 0.743 0.116 6.423 0.000 ## control 0.497 0.099 5.002 0.000 Click for explanation Yes, the model fits the data well, and the measurement parameters (e.g., factor loadings, residual variances) look reasonable. So, the data seem to support this measurement structure. Yes, all three latent variables are significantly, positively correlated. Yes. The measurement structure is supported, so we can use the latent variables to represent the respective constructs in our subsequent SEM. The TORA doesn’t actually say anything about the associations between these three factors, but it makes sense that they would be positively associated. So, we should find this result comforting. 6.4.3 Estimate the basic TORA model as an SEM. Predict intention from attitudes and norms. Predict condom use from intention. Use the latent versions of attitudes and norms. Covary the attitudes and norms factors. Does the model fit well? Do the estimates align with the TORA? How much variance in intention and condom use are explained by the model? Click to show code mod <- ' ## Define the latent variables: attitudes =~ attit_1 + attit_2 + attit_3 norms =~ norm_1 + norm_2 + norm_3 ## Define the structural model: intent ~ attitudes + norms behavior ~ intent ' fit <- sem(mod, data = condom) summary(fit, fit.measures = TRUE, rsquare = TRUE) ## lavaan 0.6.16 ended normally after 24 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 18 ## ## Number of observations 250 ## ## Model Test User Model: ## ## Test statistic 27.890 ## Degrees of freedom 18 ## P-value (Chi-square) 0.064 ## ## Model Test Baseline Model: ## ## Test statistic 1089.407 ## Degrees of freedom 28 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.991 ## Tucker-Lewis Index (TLI) 0.986 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -2533.616 ## Loglikelihood unrestricted model (H1) -2519.671 ## ## Akaike (AIC) 5103.232 ## Bayesian (BIC) 5166.618 ## Sample-size adjusted Bayesian (SABIC) 5109.557 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.047 ## 90 Percent confidence interval - lower 0.000 ## 90 Percent confidence interval - upper 0.079 ## P-value H_0: RMSEA <= 0.050 0.523 ## P-value H_0: RMSEA >= 0.080 0.046 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.036 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## attitudes =~ ## attit_1 1.000 ## attit_2 1.039 0.068 15.365 0.000 ## attit_3 -1.002 0.067 -14.850 0.000 ## norms =~ ## norm_1 1.000 ## norm_2 0.983 0.087 11.333 0.000 ## norm_3 0.935 0.087 10.778 0.000 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## intent ~ ## attitudes 0.439 0.063 6.990 0.000 ## norms 0.693 0.077 8.977 0.000 ## behavior ~ ## intent 0.746 0.045 16.443 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## attitudes ~~ ## norms 0.347 0.069 5.027 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 0.420 0.052 8.103 0.000 ## .attit_2 0.306 0.046 6.604 0.000 ## .attit_3 0.372 0.049 7.651 0.000 ## .norm_1 0.483 0.064 7.581 0.000 ## .norm_2 0.521 0.065 7.954 0.000 ## .norm_3 0.610 0.070 8.713 0.000 ## .intent 0.423 0.048 8.769 0.000 ## .behavior 0.603 0.054 11.180 0.000 ## attitudes 0.884 0.116 7.614 0.000 ## norms 0.765 0.113 6.767 0.000 ## ## R-Square: ## Estimate ## attit_1 0.678 ## attit_2 0.757 ## attit_3 0.705 ## norm_1 0.613 ## norm_2 0.587 ## norm_3 0.523 ## intent 0.639 ## behavior 0.520 Click for explanation Yes, the model still fits the data very well. Yes, the estimates all align with the TORA. Specifically, attitudes and norms both significantly predict intention, and intention significantly predicts condom use. The model explains 63.93% of the variance in intention and 51.96% of the variance in condom use. 6.4.4 Update your model to represent the extended TORA model that includes perceived behavioral control. Regress condom use onto perceived behavioral control. Use the latent variable representation of control. Covary all three exogenous latent factors. Does the model fit well? Do the estimates align with the updated TORA? How much variance in intention and condom use are explained by the model? Click to show code mod_tora <- ' attitudes =~ attit_1 + attit_2 + attit_3 norms =~ norm_1 + norm_2 + norm_3 control =~ control_1 + control_2 + control_3 intent ~ attitudes + norms behavior ~ intent + control ' fit_tora <- sem(mod_tora, data = condom) summary(fit_tora, fit.measures = TRUE, rsquare = TRUE) ## lavaan 0.6.16 ended normally after 31 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 27 ## ## Number of observations 250 ## ## Model Test User Model: ## ## Test statistic 48.757 ## Degrees of freedom 39 ## P-value (Chi-square) 0.136 ## ## Model Test Baseline Model: ## ## Test statistic 1333.695 ## Degrees of freedom 55 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.992 ## Tucker-Lewis Index (TLI) 0.989 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -3551.160 ## Loglikelihood unrestricted model (H1) -3526.782 ## ## Akaike (AIC) 7156.320 ## Bayesian (BIC) 7251.400 ## Sample-size adjusted Bayesian (SABIC) 7165.807 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.032 ## 90 Percent confidence interval - lower 0.000 ## 90 Percent confidence interval - upper 0.057 ## P-value H_0: RMSEA <= 0.050 0.870 ## P-value H_0: RMSEA >= 0.080 0.000 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.033 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## attitudes =~ ## attit_1 1.000 ## attit_2 1.033 0.068 15.221 0.000 ## attit_3 -1.025 0.068 -15.097 0.000 ## norms =~ ## norm_1 1.000 ## norm_2 0.984 0.087 11.256 0.000 ## norm_3 0.955 0.088 10.881 0.000 ## control =~ ## control_1 1.000 ## control_2 0.859 0.127 6.789 0.000 ## control_3 0.997 0.131 7.609 0.000 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## intent ~ ## attitudes 0.447 0.063 7.100 0.000 ## norms 0.706 0.078 9.078 0.000 ## behavior ~ ## intent 0.563 0.063 8.923 0.000 ## control 0.454 0.119 3.805 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## attitudes ~~ ## norms 0.342 0.068 5.011 0.000 ## control 0.474 0.072 6.548 0.000 ## norms ~~ ## control 0.352 0.064 5.521 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 0.432 0.052 8.381 0.000 ## .attit_2 0.330 0.046 7.220 0.000 ## .attit_3 0.344 0.046 7.439 0.000 ## .norm_1 0.496 0.063 7.820 0.000 ## .norm_2 0.533 0.065 8.152 0.000 ## .norm_3 0.595 0.069 8.643 0.000 ## .control_1 0.625 0.075 8.372 0.000 ## .control_2 0.876 0.090 9.757 0.000 ## .control_3 0.746 0.084 8.874 0.000 ## .intent 0.409 0.047 8.769 0.000 ## .behavior 0.542 0.052 10.423 0.000 ## attitudes 0.872 0.115 7.566 0.000 ## norms 0.751 0.112 6.709 0.000 ## control 0.485 0.096 5.059 0.000 ## ## R-Square: ## Estimate ## attit_1 0.668 ## attit_2 0.738 ## attit_3 0.727 ## norm_1 0.602 ## norm_2 0.577 ## norm_3 0.535 ## control_1 0.437 ## control_2 0.290 ## control_3 0.392 ## intent 0.651 ## behavior 0.566 Click for explanation Yes, the model still fits the data very well. Yes, the estimates all align with the updated TORA. Specifically, attitudes and norms both significantly predict intention, while intention and control both significantly predict condom use. The model explains 65.11% of the variance in intention and 56.62% of the variance in condom use. The TORA model explicitly forbids direct paths from attitudes and norms to behaviors; these effects should be fully mediated by the behavioral intention. The theory does not specify how perceived behavioral control should affect behaviors. There may be a direct effect of control on behavior, or the effect may be (partially) mediated by intention. 6.4.5 Evaluate the hypothesized indirect effects of attitudes and norms. Include attitudes, norms, and control in your model as in 6.4.4. Does intention significantly mediate the effects of attitudes and norms on behavior? Don’t forget to follow all the steps we covered for testing mediation. Are both of the above effects completely mediated? Do these results comport with the TORA? Why or why not? Click for explanation mod <- ' attitudes =~ attit_1 + attit_2 + attit_3 norms =~ norm_1 + norm_2 + norm_3 control =~ control_1 + control_2 + control_3 intent ~ a1 * attitudes + a2 * norms behavior ~ b * intent + control + attitudes + norms ie_att := a1 * b ie_norm := a2 * b ' set.seed(235711) fit <- sem(mod, data = condom, se = "bootstrap", bootstrap = 1000) summary(fit, ci = TRUE) ## lavaan 0.6.16 ended normally after 36 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 29 ## ## Number of observations 250 ## ## Model Test User Model: ## ## Test statistic 48.629 ## Degrees of freedom 37 ## P-value (Chi-square) 0.096 ## ## Parameter Estimates: ## ## Standard errors Bootstrap ## Number of requested bootstrap draws 1000 ## Number of successful bootstrap draws 1000 ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## attitudes =~ ## attit_1 1.000 1.000 1.000 ## attit_2 1.033 0.060 17.261 0.000 0.925 1.165 ## attit_3 -1.025 0.064 -15.894 0.000 -1.163 -0.902 ## norms =~ ## norm_1 1.000 1.000 1.000 ## norm_2 0.984 0.071 13.794 0.000 0.843 1.127 ## norm_3 0.955 0.093 10.324 0.000 0.792 1.157 ## control =~ ## control_1 1.000 1.000 1.000 ## control_2 0.860 0.113 7.624 0.000 0.653 1.098 ## control_3 0.996 0.147 6.790 0.000 0.748 1.320 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## intent ~ ## attitudes (a1) 0.447 0.067 6.674 0.000 0.324 0.585 ## norms (a2) 0.706 0.078 9.094 0.000 0.569 0.878 ## behavior ~ ## intent (b) 0.545 0.075 7.282 0.000 0.389 0.686 ## control 0.428 0.232 1.847 0.065 0.046 0.934 ## attitudes 0.010 0.122 0.084 0.933 -0.249 0.226 ## norms 0.041 0.118 0.345 0.730 -0.194 0.266 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## attitudes ~~ ## norms 0.342 0.070 4.883 0.000 0.208 0.480 ## control 0.475 0.069 6.850 0.000 0.344 0.612 ## norms ~~ ## control 0.350 0.067 5.218 0.000 0.221 0.484 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## .attit_1 0.432 0.050 8.720 0.000 0.331 0.526 ## .attit_2 0.330 0.045 7.382 0.000 0.238 0.415 ## .attit_3 0.343 0.049 6.992 0.000 0.244 0.444 ## .norm_1 0.496 0.060 8.305 0.000 0.376 0.614 ## .norm_2 0.533 0.077 6.951 0.000 0.390 0.687 ## .norm_3 0.594 0.069 8.597 0.000 0.443 0.719 ## .control_1 0.624 0.076 8.216 0.000 0.477 0.763 ## .control_2 0.875 0.092 9.495 0.000 0.686 1.052 ## .control_3 0.745 0.079 9.398 0.000 0.574 0.889 ## .intent 0.409 0.050 8.169 0.000 0.309 0.507 ## .behavior 0.544 0.058 9.379 0.000 0.415 0.639 ## attitudes 0.872 0.104 8.387 0.000 0.675 1.077 ## norms 0.751 0.099 7.557 0.000 0.556 0.941 ## control 0.486 0.096 5.042 0.000 0.303 0.684 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## ie_att 0.244 0.050 4.860 0.000 0.150 0.352 ## ie_norm 0.385 0.066 5.835 0.000 0.268 0.527 Yes, both indirect effects are significant according to the 95% bootstrapped CIs. Yes, both effects are completely moderated by behavioral intention. We can infer as much because the direct effects of attitudes and norms on condom use are both nonsignificant. Yes, these results comport with the TORA. Both effects are fully mediated, as the theory stipulates. In addition to evaluating the significance of the indirect and direct effects, we can also take a model-comparison perspective. We can use model comparisons to test if removing the direct effects of attitudes and norms on condom use significantly decreases model fit. In other words, are those paths needed to accurately represent the data, or are they “dead weight”. 6.4.6 Use a \\(\\Delta \\chi^2\\) test to evaluate the necessity of including the direct effects of attitudes and norms on condom use in the model. What is your conclusion? Click for explanation We only need to compare the fit of the model with the direct effects included to the fit of the model without the direct effects. We’ve already estimated both models, so we can simply submit the fitted lavaan objects to the anova() function. anova(fit, fit_tora) The \\(\\Delta \\chi^2\\) test is not significant. So, we have not lost a significant amount of fit by fixing the direct effects to zero. In other words, the complete mediation model explains the data just as well as the partial mediation model. So, we should probably prefer the more parsimonious model. 6.4.7 Use some statistical means of evaluating the most plausible way to include perceived behavioral control into the model. Choose between the following three options: control predicts behavior via a direct, un-mediated effect. control predicts behavior via an indirect effect that is completely mediated by intention. control predicts behavior via both an indirect effect through intention and a residual direct effect. Hint: There is more than one way to approach this problem. Approach 1: Testing Effects Click to show code One way to tackle this problem is to test the indirect, direct, and total effects. ## Allow for partial mediation: mod1 <- ' attitudes =~ attit_1 + attit_2 + attit_3 norms =~ norm_1 + norm_2 + norm_3 control =~ control_1 + control_2 + control_3 intent ~ attitudes + norms + a * control behavior ~ b * intent + c * control ie := a * b total := ie + c ' set.seed(235711) fit1 <- sem(mod1, data = condom, se = "bootstrap", bootstrap = 1000) summary(fit1, ci = TRUE) ## lavaan 0.6.16 ended normally after 33 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 28 ## ## Number of observations 250 ## ## Model Test User Model: ## ## Test statistic 47.389 ## Degrees of freedom 38 ## P-value (Chi-square) 0.141 ## ## Parameter Estimates: ## ## Standard errors Bootstrap ## Number of requested bootstrap draws 1000 ## Number of successful bootstrap draws 1000 ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## attitudes =~ ## attit_1 1.000 1.000 1.000 ## attit_2 1.034 0.060 17.222 0.000 0.925 1.167 ## attit_3 -1.021 0.064 -15.877 0.000 -1.158 -0.898 ## norms =~ ## norm_1 1.000 1.000 1.000 ## norm_2 0.985 0.071 13.803 0.000 0.848 1.133 ## norm_3 0.948 0.093 10.204 0.000 0.786 1.155 ## control =~ ## control_1 1.000 1.000 1.000 ## control_2 0.861 0.113 7.635 0.000 0.653 1.100 ## control_3 0.996 0.142 7.020 0.000 0.760 1.318 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## intent ~ ## attitudes 0.357 0.115 3.113 0.002 0.146 0.603 ## norms 0.646 0.095 6.794 0.000 0.473 0.859 ## control (a) 0.199 0.199 1.002 0.317 -0.188 0.633 ## behavior ~ ## intent (b) 0.551 0.074 7.487 0.000 0.391 0.683 ## control (c) 0.469 0.142 3.298 0.001 0.231 0.791 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## attitudes ~~ ## norms 0.344 0.070 4.905 0.000 0.210 0.481 ## control 0.471 0.069 6.838 0.000 0.342 0.608 ## norms ~~ ## control 0.345 0.066 5.240 0.000 0.215 0.481 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## .attit_1 0.429 0.050 8.628 0.000 0.329 0.524 ## .attit_2 0.325 0.045 7.230 0.000 0.233 0.408 ## .attit_3 0.347 0.049 7.011 0.000 0.248 0.455 ## .norm_1 0.490 0.060 8.172 0.000 0.373 0.612 ## .norm_2 0.525 0.076 6.869 0.000 0.385 0.684 ## .norm_3 0.599 0.070 8.529 0.000 0.447 0.729 ## .control_1 0.626 0.074 8.429 0.000 0.479 0.761 ## .control_2 0.875 0.092 9.522 0.000 0.689 1.049 ## .control_3 0.748 0.078 9.532 0.000 0.579 0.893 ## .intent 0.412 0.050 8.283 0.000 0.307 0.504 ## .behavior 0.541 0.055 9.873 0.000 0.423 0.639 ## attitudes 0.875 0.104 8.385 0.000 0.676 1.081 ## norms 0.757 0.099 7.616 0.000 0.560 0.949 ## control 0.484 0.095 5.092 0.000 0.306 0.683 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## ie 0.110 0.105 1.048 0.295 -0.105 0.309 ## total 0.578 0.186 3.108 0.002 0.235 0.971 Click for explanation From the above results, we can see that the direct and total effects are both significant, but the indirect effect is not. Hence, it probably makes the most sense to include control via a direct (non-mediated) effect on behavior. Approach 2.1: Nested Model Comparison Click to show code We can also approach this problem from a model-comparison perspective. We can fit models that encode each pattern of constraints and check which one best represents the data. ## Force complete mediation: mod2 <- ' attitudes =~ attit_1 + attit_2 + attit_3 norms =~ norm_1 + norm_2 + norm_3 control =~ control_1 + control_2 + control_3 intent ~ attitudes + norms + control behavior ~ intent ' ## Force no mediation: mod3 <- ' attitudes =~ attit_1 + attit_2 + attit_3 norms =~ norm_1 + norm_2 + norm_3 control =~ control_1 + control_2 + control_3 intent ~ attitudes + norms behavior ~ intent + control ' ## Estimate the two restricted models: fit2 <- sem(mod2, data = condom) fit3 <- sem(mod3, data = condom) ## Check the results: summary(fit2) ## lavaan 0.6.16 ended normally after 33 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 27 ## ## Number of observations 250 ## ## Model Test User Model: ## ## Test statistic 62.797 ## Degrees of freedom 39 ## P-value (Chi-square) 0.009 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## attitudes =~ ## attit_1 1.000 ## attit_2 1.033 0.068 15.295 0.000 ## attit_3 -1.018 0.068 -15.087 0.000 ## norms =~ ## norm_1 1.000 ## norm_2 0.985 0.087 11.305 0.000 ## norm_3 0.947 0.087 10.845 0.000 ## control =~ ## control_1 1.000 ## control_2 0.864 0.126 6.855 0.000 ## control_3 0.958 0.129 7.417 0.000 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## intent ~ ## attitudes 0.352 0.096 3.669 0.000 ## norms 0.644 0.088 7.347 0.000 ## control 0.207 0.163 1.268 0.205 ## behavior ~ ## intent 0.746 0.045 16.443 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## attitudes ~~ ## norms 0.345 0.069 5.023 0.000 ## control 0.476 0.073 6.513 0.000 ## norms ~~ ## control 0.346 0.065 5.361 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 0.427 0.051 8.295 0.000 ## .attit_2 0.325 0.046 7.101 0.000 ## .attit_3 0.349 0.047 7.477 0.000 ## .norm_1 0.490 0.064 7.702 0.000 ## .norm_2 0.524 0.065 8.025 0.000 ## .norm_3 0.600 0.069 8.652 0.000 ## .control_1 0.610 0.076 8.015 0.000 ## .control_2 0.861 0.090 9.580 0.000 ## .control_3 0.769 0.086 8.938 0.000 ## .intent 0.412 0.046 8.890 0.000 ## .behavior 0.603 0.054 11.180 0.000 ## attitudes 0.877 0.115 7.596 0.000 ## norms 0.757 0.112 6.733 0.000 ## control 0.500 0.098 5.076 0.000 summary(fit3) ## lavaan 0.6.16 ended normally after 31 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 27 ## ## Number of observations 250 ## ## Model Test User Model: ## ## Test statistic 48.757 ## Degrees of freedom 39 ## P-value (Chi-square) 0.136 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## attitudes =~ ## attit_1 1.000 ## attit_2 1.033 0.068 15.221 0.000 ## attit_3 -1.025 0.068 -15.097 0.000 ## norms =~ ## norm_1 1.000 ## norm_2 0.984 0.087 11.256 0.000 ## norm_3 0.955 0.088 10.881 0.000 ## control =~ ## control_1 1.000 ## control_2 0.859 0.127 6.789 0.000 ## control_3 0.997 0.131 7.609 0.000 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## intent ~ ## attitudes 0.447 0.063 7.100 0.000 ## norms 0.706 0.078 9.078 0.000 ## behavior ~ ## intent 0.563 0.063 8.923 0.000 ## control 0.454 0.119 3.805 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## attitudes ~~ ## norms 0.342 0.068 5.011 0.000 ## control 0.474 0.072 6.548 0.000 ## norms ~~ ## control 0.352 0.064 5.521 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 0.432 0.052 8.381 0.000 ## .attit_2 0.330 0.046 7.220 0.000 ## .attit_3 0.344 0.046 7.439 0.000 ## .norm_1 0.496 0.063 7.820 0.000 ## .norm_2 0.533 0.065 8.152 0.000 ## .norm_3 0.595 0.069 8.643 0.000 ## .control_1 0.625 0.075 8.372 0.000 ## .control_2 0.876 0.090 9.757 0.000 ## .control_3 0.746 0.084 8.874 0.000 ## .intent 0.409 0.047 8.769 0.000 ## .behavior 0.542 0.052 10.423 0.000 ## attitudes 0.872 0.115 7.566 0.000 ## norms 0.751 0.112 6.709 0.000 ## control 0.485 0.096 5.059 0.000 ## Do either of the restricted models fit worse than the partial mediation model? anova(fit1, fit2) anova(fit1, fit3) Click for explanation The above \\(\\Delta \\chi^2\\) tests tell us that the full mediation model fits significantly worse than the partial mediation model. Hence, forcing full mediation by fixing the direct effect to zero is an unreasonable restraint. The total effect model, on the other hand, does not fit significantly worse than the partial mediation model. So, we can conclude that removing the indirect effect and modeling the influence of control on behavior as an un-mediated direct association represents the data just as well as a model that allows for both indirect and direct effects. Hence, we should prefer the more parsimonious total effects model. Approach 2.2: Non-Nested Model Comparison Click to show code We can also use information criteria to compare our models. The two most popular information criteria are the Akaike’s Information Criterion (AIC) and the Bayesian Information Criterion (BIC). ## Which model is the most parsimonious representation of the data? AIC(fit1, fit2, fit3) BIC(fit1, fit2, fit3) Click for explanation While the effect tests and the nested model comparisons both lead us to prefer the non-mediated model, we cannot directly say that the complete mediation model fits significantly worse than the non-mediated model. We have not directly compared those two models, and we cannot do so with the \\(\\Delta \\chi^2\\). We cannot do such a test because these two models are not nested: we must both add and remove a path to get from one model specification to the other. Also, both models have the same degrees of freedom, so we cannot define a sampling distribution against which we would compare the \\(\\Delta \\chi^2\\), anyway. We can use information criteria to get around this problem, though. Information criteria can be used to compare both nested and non-nested models. These criteria are designed to rank models by balancing their fit to the data and their complexity. When comparing models based on information criteria, a lower value indicates a better model in the sense of a better balance of fit and parsimony. The above results show that both the AIC and the BIC agree that the no-mediation model is the best. Conclusion Click for explanation So, in the end, regardless of how we approach the question, all of our results suggest modeling perceived behavioral control as a direct, non-mediated predictor of condom use. End of In-Class Exercises "],["multiple-group-models.html", "7 Multiple Group Models", " 7 Multiple Group Models This week, you will cover multiple group modeling and measurement invariance testing in the SEM/CFA context. Homework before the lecture Watch the Lecture Recording for this week. Homework before the practical Complete the At-Home Exercises. Practical content During the practical you will work on the In-Class Exercises. "],["lecture-6.html", "7.1 Lecture", " 7.1 Lecture In this lecture, we will explore how you can incorporate grouping factors into your CFA and SEM analyses. We’ll cover three general topics: The multiple group modeling framework Measurement invariance testing Using multiple group models to test for moderation 7.1.1 Recordings Multiple Group Models Measurement Invariance Measurement Invariance Examples Moderation by Group 7.1.2 Slides You can download the lecture slides here "],["reading-6.html", "7.2 Reading", " 7.2 Reading There is no official reading this week. Please contemplate the following image instead. \\[\\\\[12pt]\\] "],["at-home-exercises-6.html", "7.3 At-Home Exercises", " 7.3 At-Home Exercises 7.3.1 Multiple-Group Path Analysis To fix ideas, we’ll start these practical exercises by re-running part of the moderation analysis from the Week 3 At-Home Exercises as a multiple group model. 7.3.1.1 Load the Sesam2.sav data. NOTE: Unless otherwise specified, all analyses in Section 7.3.1 use these data. Click to show code library(haven) # Read the data into an object called 'sesam2': sesam2 <- read_sav("Sesam2.sav") VIEWCAT is a nominal grouping variable, but it is represented as a numeric variable in the sesam2 data. The levels represent the following frequencies of Sesame Street viewership of the children in the data: VIEWCAT = 1: Rarely/Never VIEWCAT = 2: 2–3 times a week VIEWCAT = 3: 4–5 times a week VIEWCAT = 4: > 5 times a week We will use VIEWCAT as the grouping variable in our path model. To do so, we don’t really need to convert VIEWCAT into a factor, but, if we do, lavaan will give our groups meaningful labels in the output. That added clarity can be pretty helpful. 7.3.1.2 Convert VIEWCAT into a factor. Make sure that VIEWCAT = 1 is the reference group. Assign the factor labels denoted above. Click to show code library(dplyr) ## Store the old version for checking: tmp <- sesam2$VIEWCAT ## Convert 'VIEWCAT' to a factor: sesam2 <- mutate(sesam2, VIEWCAT = factor(VIEWCAT, labels = c("Rarely/never", "2-3 times per week", "4-5 times per week", "> 5 times per week") ) ) ## Check the conversion: table(old = tmp, new = sesam2$VIEWCAT, useNA = "always") ## new ## old Rarely/never 2-3 times per week 4-5 times per week > 5 times per week ## 1 25 0 0 0 ## 2 0 44 0 0 ## 3 0 0 57 0 ## 4 0 0 0 53 ## <NA> 0 0 0 0 ## new ## old <NA> ## 1 0 ## 2 0 ## 3 0 ## 4 0 ## <NA> 0 7.3.1.3 Create a conditional slopes plot to visualize the effect of AGE on POSTNUMB within each of the VIEWCAT groups. Based on this visualization, do you think it is reasonable to expect that VIEWCAT moderates the effect of AGE on POSTNUMB? Click to show code library(ggplot2) ggplot(sesam2, aes(AGE, POSTNUMB, color = VIEWCAT)) + geom_point() + geom_smooth(method = "lm", se = FALSE) Click for explanation The regression lines representing the conditional focal effects are not parallel, so there appears to be some level of moderation. That being said, the differences are pretty small, so the moderation may not be significant (i.e., the non-parallel regression lines may simply be reflecting sampling variability). We will use path analysis to test if VIEWCAT moderates the effect of AGE on POSTNUMB. This analysis will entail three steps: Estimate the unrestricted multiple-group model wherein we regress POSTNUMB onto AGE and specify VIEWCAT as the grouping factor. Estimate the restricted model wherein we constrain the AGE \\(\\rightarrow\\) POSTNUMB effect to be equal in all VIEWCAT groups. Conduct a \\(\\Delta \\chi^2\\) test to compare the fit of the two models. 7.3.1.4 Estimate the unrestricted path model described above. Include the intercept term in your model. Judging from the focal effects estimate in each group, do you think moderation is plausible? Click to show code library(lavaan) ## Estimate the additive model a view the results: out_full <- sem('POSTNUMB ~ 1 + AGE', data = sesam2, group = "VIEWCAT") summary(out_full) ## lavaan 0.6.16 ended normally after 1 iteration ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 12 ## ## Number of observations per group: ## Rarely/never 25 ## 4-5 times per week 57 ## > 5 times per week 53 ## 2-3 times per week 44 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## Test statistic for each group: ## Rarely/never 0.000 ## 4-5 times per week 0.000 ## > 5 times per week 0.000 ## 2-3 times per week 0.000 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## ## Group 1 [Rarely/never]: ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## POSTNUMB ~ ## AGE 0.747 0.239 3.118 0.002 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB -18.721 12.142 -1.542 0.123 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 73.285 20.728 3.536 0.000 ## ## ## Group 2 [4-5 times per week]: ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## POSTNUMB ~ ## AGE 0.554 0.234 2.369 0.018 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 4.861 12.178 0.399 0.690 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 135.923 25.461 5.339 0.000 ## ## ## Group 3 [> 5 times per week]: ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## POSTNUMB ~ ## AGE 0.405 0.214 1.894 0.058 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 15.676 11.249 1.394 0.163 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 115.942 22.523 5.148 0.000 ## ## ## Group 4 [2-3 times per week]: ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## POSTNUMB ~ ## AGE 0.729 0.255 2.855 0.004 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB -8.747 13.003 -0.673 0.501 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 112.019 23.882 4.690 0.000 Click for explanation There are some notable differences in the AGE \\(\\rightarrow\\) POSTNUMB focal effect between VIEWCAT groups. It looks like VIEWCAT could moderate the focal effect. 7.3.1.5 Estimate the restricted model described above. Equate the focal effect across all VIEWCAT groups. Click to show code ## Estimate the restricted model and view the results: out_res <- sem('POSTNUMB ~ 1 + c("b1", "b1", "b1", "b1") * AGE', data = sesam2, group = "VIEWCAT") summary(out_res) ## lavaan 0.6.16 ended normally after 38 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 12 ## Number of equality constraints 3 ## ## Number of observations per group: ## Rarely/never 25 ## 4-5 times per week 57 ## > 5 times per week 53 ## 2-3 times per week 44 ## ## Model Test User Model: ## ## Test statistic 1.486 ## Degrees of freedom 3 ## P-value (Chi-square) 0.685 ## Test statistic for each group: ## Rarely/never 0.413 ## 4-5 times per week 0.027 ## > 5 times per week 0.760 ## 2-3 times per week 0.287 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## ## Group 1 [Rarely/never]: ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## POSTNUMB ~ ## AGE (b1) 0.592 0.118 5.032 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB -10.966 6.154 -1.782 0.075 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 74.505 21.073 3.536 0.000 ## ## ## Group 2 [4-5 times per week]: ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## POSTNUMB ~ ## AGE (b1) 0.592 0.118 5.032 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 2.869 6.275 0.457 0.647 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 135.988 25.473 5.339 0.000 ## ## ## Group 3 [> 5 times per week]: ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## POSTNUMB ~ ## AGE (b1) 0.592 0.118 5.032 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 5.923 6.313 0.938 0.348 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 117.616 22.848 5.148 0.000 ## ## ## Group 4 [2-3 times per week]: ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## POSTNUMB ~ ## AGE (b1) 0.592 0.118 5.032 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB -1.826 6.157 -0.297 0.767 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 112.751 24.039 4.690 0.000 7.3.1.6 Test for moderation by comparing the full and restricted models from 7.3.1.4 and 7.3.1.5, respectively: Does VIEWCAT significantly moderate the effect of AGE on POSTNUMB? Click to show code ## Test for moderation: anova(out_full, out_res) Click for explanation No, VIEWCAT does not significantly moderate the effect of AGE on POSTNUMB (\\(\\Delta \\chi^2[3] = 1.486\\), \\(p = 0.685\\)). 7.3.2 Multiple-Group CFA In the next part of these exercises, we will estimate a multiple-group CFA to evaluate the measurement structure of a scale assessing Prolonged Grief Disorder. The relevant data are contained in the PGDdata2.txt file. This dataset consists of a grouping variable, Kin2 (with two levels: “partner” and “else”) and 5 items taken from the Inventory of Complicated Grief: Yearning Part of self died Difficulty accepting the loss Avoiding reminders of deceased Bitterness about the loss You can find more information about this scale in Boelen et al. (2010). 7.3.2.1 Load the PGDdata2.txt data. Use the read.table() function to load the data. Convert the missing values to NA via the na.strings argument. Retain the column labels via the header argument. Specify the field delimiter as the tab character (i.e., \"\\t\"). Exclude any cases with missing values on Kin2. NOTE: Unless otherwise specified, all analyses in Section 7.3.2 use these data. Click to show code ## Load the data: pgd <- read.table("PGDdata2.txt", na.strings = "-999", header = TRUE, sep = "\\t") %>% filter(!is.na(Kin2)) ## Check the results: head(pgd) summary(pdg) str(pgd) ## Kin2 b1pss1 b2pss2 b3pss3 ## Min. :0.0000 Min. :0.000 Min. :0.0000 Min. :0.0000 ## 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000 ## Median :1.0000 Median :1.000 Median :0.0000 Median :1.0000 ## Mean :0.6661 Mean :1.236 Mean :0.4622 Mean :0.9771 ## 3rd Qu.:1.0000 3rd Qu.:2.000 3rd Qu.:1.0000 3rd Qu.:1.0000 ## Max. :1.0000 Max. :5.000 Max. :3.0000 Max. :5.0000 ## NA's :1 ## b4pss4 b5pss5 ## Min. :0.000 Min. :0.0000 ## 1st Qu.:0.000 1st Qu.:0.0000 ## Median :1.000 Median :0.0000 ## Mean :1.009 Mean :0.6761 ## 3rd Qu.:2.000 3rd Qu.:1.0000 ## Max. :3.000 Max. :3.0000 ## NA's :1 ## 'data.frame': 569 obs. of 6 variables: ## $ Kin2 : int 0 0 1 1 0 1 1 1 1 1 ... ## $ b1pss1: int 1 1 1 1 1 2 1 3 1 1 ... ## $ b2pss2: int 1 0 1 0 1 2 1 2 0 0 ... ## $ b3pss3: int 1 0 1 1 2 2 1 2 1 1 ... ## $ b4pss4: int 1 1 1 1 0 2 2 3 0 1 ... ## $ b5pss5: int 1 0 0 0 0 1 2 3 0 0 ... 7.3.2.2 Run a single-group CFA wherein the five scale variables described above indicate a single latent factor. Do not include any grouping variable. Use the default settings in the cfa() function. Click to show code ## Define the model syntax: cfaMod <- 'grief =~ b1pss1 + b2pss2 + b3pss3 + b4pss4 + b5pss5' ## Estimate the model: out0 <- cfa(cfaMod, data = pgd) 7.3.2.3 Summarize the evaluate the fitted CFA Does the model fit well? Are the items homogeneously associated with the latent factor? Which item is most weakly associated with the latent factor? Click to show code ## Summarize the fitted model: summary(out0, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE) ## lavaan 0.6.16 ended normally after 19 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 10 ## ## Used Total ## Number of observations 567 569 ## ## Model Test User Model: ## ## Test statistic 8.110 ## Degrees of freedom 5 ## P-value (Chi-square) 0.150 ## ## Model Test Baseline Model: ## ## Test statistic 775.364 ## Degrees of freedom 10 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.996 ## Tucker-Lewis Index (TLI) 0.992 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -3219.918 ## Loglikelihood unrestricted model (H1) -3215.863 ## ## Akaike (AIC) 6459.836 ## Bayesian (BIC) 6503.240 ## Sample-size adjusted Bayesian (SABIC) 6471.495 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.033 ## 90 Percent confidence interval - lower 0.000 ## 90 Percent confidence interval - upper 0.073 ## P-value H_0: RMSEA <= 0.050 0.710 ## P-value H_0: RMSEA >= 0.080 0.023 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.018 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## grief =~ ## b1pss1 1.000 0.752 0.759 ## b2pss2 0.454 0.043 10.570 0.000 0.341 0.495 ## b3pss3 0.831 0.058 14.445 0.000 0.625 0.691 ## b4pss4 0.770 0.055 14.010 0.000 0.579 0.667 ## b5pss5 0.817 0.057 14.410 0.000 0.614 0.689 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .b1pss1 0.416 0.037 11.300 0.000 0.416 0.424 ## .b2pss2 0.358 0.023 15.549 0.000 0.358 0.755 ## .b3pss3 0.427 0.033 13.117 0.000 0.427 0.522 ## .b4pss4 0.419 0.031 13.599 0.000 0.419 0.555 ## .b5pss5 0.417 0.032 13.160 0.000 0.417 0.525 ## grief 0.565 0.059 9.514 0.000 1.000 1.000 ## ## R-Square: ## Estimate ## b1pss1 0.576 ## b2pss2 0.245 ## b3pss3 0.478 ## b4pss4 0.445 ## b5pss5 0.475 Click for explanation The model fits the data quite well (\\(\\chi^2[5] = 8.11\\), \\(p = 0.15\\), \\(\\textit{RMSEA} = 0.033\\), \\(\\textit{CFI} = 0.996\\), \\(\\textit{SRMR} = 0.018\\)). All of the indicators appear to be more-or-less equally good indicators of the latent factor except for b2pss2 which has a standardized factor loading of \\(\\lambda = 0.495\\) and \\(R^2 = 0.245\\). 7.3.2.4 Rerun the CFA from 7.3.2.2 as a multiple-group model. Use the Kin2 variable as the grouping factor. Do not place any equality constraints across groups. Click to show code out1 <- cfa(cfaMod, data = pgd, group = "Kin2") 7.3.2.5 Summarize the fitted multiple-group CFA from 7.3.2.4. Does the two-group model fit the data well? Do you notice any salient differences between the two sets of within-group estimates? Click to show code summary(out1, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE) ## lavaan 0.6.16 ended normally after 27 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 30 ## ## Number of observations per group: Used Total ## 0 188 190 ## 1 379 379 ## ## Model Test User Model: ## ## Test statistic 11.317 ## Degrees of freedom 10 ## P-value (Chi-square) 0.333 ## Test statistic for each group: ## 0 8.976 ## 1 2.340 ## ## Model Test Baseline Model: ## ## Test statistic 781.358 ## Degrees of freedom 20 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.998 ## Tucker-Lewis Index (TLI) 0.997 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -3206.363 ## Loglikelihood unrestricted model (H1) -3200.705 ## ## Akaike (AIC) 6472.727 ## Bayesian (BIC) 6602.937 ## Sample-size adjusted Bayesian (SABIC) 6507.701 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.022 ## 90 Percent confidence interval - lower 0.000 ## 90 Percent confidence interval - upper 0.070 ## P-value H_0: RMSEA <= 0.050 0.789 ## P-value H_0: RMSEA >= 0.080 0.018 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.017 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## ## Group 1 [0]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## grief =~ ## b1pss1 1.000 0.702 0.712 ## b2pss2 0.372 0.076 4.922 0.000 0.261 0.410 ## b3pss3 0.938 0.118 7.986 0.000 0.659 0.709 ## b4pss4 0.909 0.116 7.848 0.000 0.638 0.691 ## b5pss5 0.951 0.122 7.774 0.000 0.667 0.683 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .b1pss1 1.346 0.072 18.727 0.000 1.346 1.366 ## .b2pss2 0.441 0.046 9.499 0.000 0.441 0.693 ## .b3pss3 1.059 0.068 15.618 0.000 1.059 1.139 ## .b4pss4 1.122 0.067 16.671 0.000 1.122 1.216 ## .b5pss5 0.745 0.071 10.442 0.000 0.745 0.762 ## grief 0.000 0.000 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .b1pss1 0.478 0.067 7.118 0.000 0.478 0.493 ## .b2pss2 0.338 0.037 9.205 0.000 0.338 0.832 ## .b3pss3 0.430 0.060 7.170 0.000 0.430 0.498 ## .b4pss4 0.445 0.060 7.408 0.000 0.445 0.522 ## .b5pss5 0.511 0.068 7.519 0.000 0.511 0.534 ## grief 0.493 0.098 5.007 0.000 1.000 1.000 ## ## R-Square: ## Estimate ## b1pss1 0.507 ## b2pss2 0.168 ## b3pss3 0.502 ## b4pss4 0.478 ## b5pss5 0.466 ## ## ## Group 2 [1]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## grief =~ ## b1pss1 1.000 0.769 0.778 ## b2pss2 0.502 0.052 9.597 0.000 0.386 0.542 ## b3pss3 0.785 0.066 11.945 0.000 0.604 0.680 ## b4pss4 0.708 0.062 11.497 0.000 0.544 0.652 ## b5pss5 0.762 0.062 12.185 0.000 0.586 0.696 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .b1pss1 1.182 0.051 23.277 0.000 1.182 1.196 ## .b2pss2 0.475 0.037 12.973 0.000 0.475 0.666 ## .b3pss3 0.934 0.046 20.460 0.000 0.934 1.051 ## .b4pss4 0.955 0.043 22.270 0.000 0.955 1.144 ## .b5pss5 0.644 0.043 14.879 0.000 0.644 0.764 ## grief 0.000 0.000 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .b1pss1 0.385 0.043 8.862 0.000 0.385 0.394 ## .b2pss2 0.359 0.029 12.468 0.000 0.359 0.706 ## .b3pss3 0.425 0.039 11.025 0.000 0.425 0.538 ## .b4pss4 0.401 0.035 11.420 0.000 0.401 0.575 ## .b5pss5 0.366 0.034 10.767 0.000 0.366 0.516 ## grief 0.592 0.073 8.081 0.000 1.000 1.000 ## ## R-Square: ## Estimate ## b1pss1 0.606 ## b2pss2 0.294 ## b3pss3 0.462 ## b4pss4 0.425 ## b5pss5 0.484 Click for explanation The two-group model also fits the data very well (\\(\\chi^2[10] = 11.32\\), \\(p = 0.333\\), \\(\\textit{RMSEA} = 0.022\\), \\(\\textit{CFI} = 0.998\\), \\(\\textit{SRMR} = 0.017\\)). No, there are no striking differences between the two sets of estimates. Although there is certainly some variability between groups, the two sets of estimates don’t look systematically different. 7.3.2.6 Based on the above results, what can you conclude about configural, weak, and strong measurement invariance across the Kin2 groups? Click for explanation Configural invariance holds. The unrestricted multiple-group CFA fits the data adequately (very well, actually), and the measurement model parameters are reasonable in both groups. We cannot yet draw any conclusions about weak or strong invariance. We need to do the appropriate model comparison tests first. End of At-Home Exercises 7 "],["in-class-exercises-6.html", "7.4 In-Class Exercises", " 7.4 In-Class Exercises 7.4.1 Measurement Invariance We’ll now pick up where we left off with the At-Home Exercises by testing measurement invariance in the two-group CFA of prolonged grief disorder. 7.4.1.1 Load the PGDdata2.txt data as you did for the At-Home Exercises. NOTE: Unless otherwise specified, all analyses in Section 7.4.1 use these data. Click to show code ## Load the data: pgd <- read.table("PGDdata2.txt", na.strings = "-999", header = TRUE, sep = "\\t") %>% filter(!is.na(Kin2)) ## Check the results: head(pgd) summary(pdg) str(pgd) ## Kin2 b1pss1 b2pss2 b3pss3 ## Min. :0.0000 Min. :0.000 Min. :0.0000 Min. :0.0000 ## 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000 ## Median :1.0000 Median :1.000 Median :0.0000 Median :1.0000 ## Mean :0.6661 Mean :1.236 Mean :0.4622 Mean :0.9771 ## 3rd Qu.:1.0000 3rd Qu.:2.000 3rd Qu.:1.0000 3rd Qu.:1.0000 ## Max. :1.0000 Max. :5.000 Max. :3.0000 Max. :5.0000 ## NA's :1 ## b4pss4 b5pss5 ## Min. :0.000 Min. :0.0000 ## 1st Qu.:0.000 1st Qu.:0.0000 ## Median :1.000 Median :0.0000 ## Mean :1.009 Mean :0.6761 ## 3rd Qu.:2.000 3rd Qu.:1.0000 ## Max. :3.000 Max. :3.0000 ## NA's :1 ## 'data.frame': 569 obs. of 6 variables: ## $ Kin2 : int 0 0 1 1 0 1 1 1 1 1 ... ## $ b1pss1: int 1 1 1 1 1 2 1 3 1 1 ... ## $ b2pss2: int 1 0 1 0 1 2 1 2 0 0 ... ## $ b3pss3: int 1 0 1 1 2 2 1 2 1 1 ... ## $ b4pss4: int 1 1 1 1 0 2 2 3 0 1 ... ## $ b5pss5: int 1 0 0 0 0 1 2 3 0 0 ... 7.4.1.2 Test configural, weak, and strong invariance using the multiple-group CFA from 7.3.2.4. What are your conclusions? Click to show code library(lavaan) library(semTools) # provides the compareFit() function ## Define the syntax for the CFA model: cfaMod <- 'grief =~ b1pss1 + b2pss2 + b3pss3 + b4pss4 + b5pss5' ## Estimate the configural model: configOut <- cfa(cfaMod, data = pgd, group = "Kin2") ## Estimate the weak invariance model: weakOut <- cfa(cfaMod, data = pgd, group = "Kin2", group.equal = "loadings") ## Estimate the strong invariance model: strongOut <- cfa(cfaMod, data = pgd, group = "Kin2", group.equal = c("loadings", "intercepts") ) ## Test invariance through model comparison tests: compareFit(configOut, weakOut, strongOut) %>% summary() ## ################### Nested Model Comparison ######################### ## ## Chi-Squared Difference Test ## ## Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq) ## configOut 10 6472.7 6602.9 11.317 ## weakOut 14 6472.7 6585.5 19.275 7.9585 0.059083 4 0.09311 . ## strongOut 18 6469.4 6564.9 23.968 4.6931 0.024722 4 0.32026 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## ####################### Model Fit Indices ########################### ## chisq df pvalue rmsea cfi tli srmr aic bic ## configOut 11.317† 10 .333 .022† 0.998† 0.997† .017† 6472.727 6602.937 ## weakOut 19.275 14 .155 .036 .993 .990 .038 6472.685 6585.534 ## strongOut 23.968 18 .156 .034 .992 .991 .042 6469.378† 6564.866† ## ## ################## Differences in Fit Indices ####################### ## df rmsea cfi tli srmr aic bic ## weakOut - configOut 4 0.015 -0.005 -0.006 0.021 -0.041 -17.403 ## strongOut - weakOut 4 -0.002 -0.001 0.001 0.004 -3.307 -20.668 Click for explanation Configural invariance holds. The unrestricted, two-group model fits the data very well (\\(\\chi^2[10] = 11.32\\), \\(p = 0.333\\), \\(\\textit{RMSEA} = 0.022\\), \\(\\textit{CFI} = 0.998\\), \\(\\textit{SRMR} = 0.017\\)). Weak invariance holds. The model comparison test shows a non-significant loss of fit between the configural and weak models (\\(\\Delta \\chi^2[4] = 7.959\\), \\(p = 0.093\\)). Strong invariance holds. The model comparison test shows a non-significant loss of fit between the weak and strong models (\\(\\Delta \\chi^2[4] = 4.693\\), \\(p = 0.32\\)). End of In-Class Exercises 7 "],["wrap-up.html", "8 Wrap-Up", " 8 Wrap-Up There will be no new lecture or practical content this week. This is an open week that we’ll use to tie up any loose ends and wrap up the course content. "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]]
+[["index.html", "Theory Construction and Statistical Modeling Course Information", " Theory Construction and Statistical Modeling Kyle M. Lang Last updated: 2023-10-16 Course Information In order to test a theory, we must express the theory as a statistical model and then test this model on quantitative (numeric) data. In this course we will use datasets from different disciplines within the social sciences (educational sciences, psychology, and sociology) to explain and illustrate theories and practices that are used in all social science disciplines to statistically model social science theories. This course uses existing tutorial datasets to practice the process of translating verbal theories into testable statistical models. If you are interested in the methods of acquiring high quality data to test your own theory, we recommend following the course Conducting a Survey which is taught from November to January. Most information about the course is available in this GitBook. Course-related communication will be through https://uu.blackboard.com (Log in with your student ID and password). "],["acknowledgement.html", "Acknowledgement", " Acknowledgement This course was originally developed by dr. Caspar van Lissa. I (dr. Kyle M. Lang) have modified Caspar’s original materials and take full responsibility for any errors or inaccuracies introduced through these modifications. Credit for any particularly effective piece of pedagogy should probably go to Caspar. You can view the original version of this course here on Caspar’s GitHub page. "],["instructors.html", "Instructors", " Instructors Coordinator: dr. Kyle M. Lang Lectures: dr. Kyle M. Lang Practicals: Rianne Kraakman Daniëlle Remmerswaal Danielle McCool "],["course-overview.html", "Course overview", " Course overview This course comprises three parts: Path analysis: You will learn how to estimate complex path models of observed variables (e.g., linked linear regressions) as structural equation models. Factor analysis: You will learn different ways of defining and estimating latent (unobserved) constructs. Full structural equation modeling: You will combine the first two topics to estimate path models describing the associations among latent constructs. Each of these three themes will be evaluated with a separate assignment. The first two assignments will be graded on a pass/fail basis. Your course grade will be based on your third assignment grade. "],["schedule.html", "Schedule", " Schedule Course Week Calendar Week Lecture/Practical Topic Workgroup Activity Assignment Deadline 0 36 Pre-course preparation 1 37 Introduction to R 2 38 Statistical modeling, Path analysis 3 39 Mediation, Moderation 4 40 Exploratory factor analysis (EFA) A1 Peer-Review A1: 2023-10-04 @ 23:59 5 41 Confirmatory factor analysis (CFA) 6 42 Structural equation modeling (SEM) A2 Peer-Review A2: 2023-10-18 @ 23:59 7 43 Multiple group models 8 44 Wrap-up A3 Peer-Review 9 45 Exam week: No class meetings A3: 2023-11-10 @ 23:59 NOTE: The schedule (including topics covered and assignment deadlines) is subject to change at the instructors’ discretion. "],["learning-goals.html", "Learning goals", " Learning goals In this course you will learn how to translate a social scientific theory into a statistical model, how to analyze your data with these models, and how to interpret and report your results following APA standards. After completing the course, you will be able to: Translate a verbal theory into a conceptual model, and translate a conceptual model into a statistical model. Independently analyze data using the free, open-source statistical software R. Apply a latent variable model to a real-life problem wherein the observed variables are only indirect indicators of an unobserved construct. Use a path model to represent the hypothesized causal relations among several variables, including relationships such as mediation and moderation. Explain to a fellow student how structural equation modeling combines latent variable models with path models and the benefits of doing so. Reflect critically on the decisions involved in defining and estimating structural equation models. "],["resources.html", "Resources", " Resources Literature You do not need a separate book for this course! Most of the information is contained within this GitBook and the course readings (which you will be able to access via links in this GitBook). All literature is freely available online, as long as you are logging in from within the UU-domain (i.e., from the UU campus or through an appropriate VPN). All readings are linked in this GitBook via either direct download links or DOIs. If you run into any trouble accessing a given article, searching for the title using Google Scholar or the University Library will probably due the trick. Software You will do all of your statistical analyses with the statistical programming language/environment R and the add-on package lavaan. If you want to expand your learning, you can follow this optional lavaan tutorial. "],["reading-questions.html", "Reading questions", " Reading questions Along with every article, we will provide reading questions. You will not be graded on the reading questions, but it is important to prepare the reading questions before every lecture. The reading questions serve several important purposes: Provide relevant background knowledge for the lecture Help you recognize and understand the key terms and concepts Make you aware of important publications that shaped the field Help you extract the relevant insights from the literature "],["weekly-preparation.html", "Weekly preparation", " Weekly preparation Before every class meeting (both lectures and practicals) you need to do the assigned homework (delineated in the GitBook chapter for that week). This course follows a flipped classroom procedure, so you must complete the weekly homework to meaningfully participate in, and benefit from, the class meetings. Background knowledge We assume you have basic knowledge about multivariate statistics before entering this course. You do not need any prior experience working with R. If you wish to refresh your knowledge, we recommend the chapters on ANOVA, multiple regression, and exploratory factor analysis from Field’s Discovering Statistics using R. If you cannot access the Field book, many other introductory statistics textbooks cover these topics equally well. So, use whatever you have lying around from past statistics courses. You could also try one of the following open-access options: Applied Statistics with R Introduction to Modern Statistics Introduction to Statistical Learning "],["grading.html", "Grading", " Grading Your grade for the course is based on a “portfolio” composed of the three take-home assignments: Path modeling Deadline: Wednesday 2023-10-04 at 23:59 Group assignment Pass/Fail Confirmatory factor analysis Deadline: Wednesday 2023-10-18 at 23:59 Group assignment Pass/Fail Full structural equation modeling Deadline: Friday 2023-11-10 at 23:59 Individual assignment Comprises your entire numeric course grade The specifics of the assignments will be explicated in the Assignments chapter of this GitBook "],["attendance.html", "Attendance", " Attendance Attendance is not mandatory, but we strongly encourage you to attend all lectures and practicals. In our experience, students who actively participate tend to pass the course, whereas those who do not participate tend to drop out or fail. The lectures and practicals build on each other, so, in the unfortunate event that you have to miss a class meeting, please make sure you have caught up with the material before the next session. "],["assignments.html", "Assignments", " Assignments This chapter contains the details and binding information about the three assignments that comprise the portfolio upon which your course grade is based. Below, you can find a brief idea of what each assignment will cover. For each assignment, you will use R to analyze some real-world data, and you will write up your results in a concise report (not a full research paper). Guidelines for these analyses/reports are delineated in the following three sections. You will submit your reports via Blackboard. You will complete the first two assignments in your Assignment Groups. You will complete the third assignment individually. The first two assignments are graded as pass/fail. You must pass both of these assignments to pass the course. The third assignment constitutes your course grade. "],["assignment-1-path-analysis.html", "Assignment 1: Path Analysis", " Assignment 1: Path Analysis For the first assignment, you will work in groups to apply a path model that describes how several variables could be causally related. The components of the first assignment are described below. Choose a suitable dataset, and describe the data. You can use any of the 8 datasets linked below. State the research question; define and explicate the theoretical path model. This model must include, at least, three variables. Use a path diagram to show your theoretical model. Translate your theoretical path model into lavaan syntax, and estimate the model. Include the code used to define and estimate your model as an appendix. Explain your rationale for important modeling decisions. Discuss the conceptual fit between your theory and your model. Evaluate the model assumptions. Discuss other important decisions that could have influence your results. Report the results in APA style. Provide relevant output in a suitable format. Include measures of explained variance for the dependent variables. Discuss the results. Use your results to answer the research question. Consider the strengths and limitations of your analysis. Evaluation See the Grading section below for more information on how Assignment 1 will be evaluated. You can access an evaluation matrix for Assignment 1 here. This matrix gives an indication of what level of work constitutes insufficient, sufficient, and excellent responses to the six components described above. Submission Assignment 1 is due at 23:59 on Wednesday 4 October 2023. Submit your report via the Assignment 1 portal on Blackboard. "],["assignment-2-confirmatory-factor-analysis.html", "Assignment 2: Confirmatory Factor Analysis", " Assignment 2: Confirmatory Factor Analysis In the second assignment, you will work in groups to run a CFA wherein the observed variables are indirect indicators of the unobserved constructs you want to analyze. The components of the second assignment are described below. Choose a suitable dataset, and describe the data. Ideally, you will work with the same data that you analyzed in Assignment 1. If you want to switch, you can use any of the 8 datasets linked below. State the research question; define and explicate the theoretical CFA model. This model must include, at least, two latent constructs. Use a path diagram to represent your model. Translate your theoretical model into lavaan syntax, and estimate the model. Include the code used to define and estimate your model as an appendix. Explain your rationale for important modeling decisions. Discuss the conceptual fit between your theory and your model. Evaluate the model assumptions. Discuss other important decisions that could have influence your results. Report the results in APA style. Provide relevant output in a suitable format. Include measures of model fit. Discuss the results. Use your results to answer the research question. Consider the strengths and limitations of your analysis. Evaluation See the Grading section below for more information on how Assignment 2 will be evaluated. You can access an evaluation matrix for Assignment 2 here. This matrix gives an indication of what level of work constitutes insufficient, sufficient, and excellent responses to the six components described above. Submission Assignment 2 is due at 23:59 on Wednesday 18 October 2023. Submit your report via the Assignment 2 portal on Blackboard. "],["a3_components.html", "Assignment 3: Full Structural Equation Model", " Assignment 3: Full Structural Equation Model In the third assignment, you will work individually to apply a full SEM that describes how several (latent) variables could be causally related. The components of the third assignment are described below. Choose a suitable dataset, and describe the data. Ideally, you will work with the same data that you analyzed in Assignments 1 & 2. If you want to switch, you can use any of the 8 datasets linked below. State the research question; define and explicate the theoretical SEM. The structural component of this model must include, at least, three variables. The model must include, at least, two latent variables. Use a path diagram to represent your model. Translate your theoretical SEM into lavaan syntax, and estimate the model. Include the code used to define and estimate your model as an appendix. Explain your rationale for important modeling decisions. Discuss the conceptual fit between your theory and your model. Evaluate the model assumptions. Discuss other important decisions that could have influence your results. Report the results in APA style. Provide relevant output in a suitable format. Include measures of model fit. Include measures of explained variance for the dependent variables. Discuss the results. Use your results to answer the research question. Consider the strengths and limitations of your analysis. Evaluation See the Grading section below for more information on how the component scores represented in the rubric are combined into an overall assignment grade. You can access an evaluation matrix for Assignment 3 here. This matrix gives an indication of what level of work constitutes insufficient, sufficient, and excellent responses to the six components described above. Submission Assignment 3 is due at 23:59 on Friday 10 November 2023. Submit your report via the Assignment 3 portal on Blackboard. "],["elaboration-tips.html", "Elaboration & Tips", " Elaboration & Tips Theoretical Model & Research Question You need to provide some justification for your model and research question, but only enough to demonstrate that you’ve actually conceptualized and estimated a theoretically plausible statistical model (as opposed to randomly combining variables until lavaan returns a pretty picture). You have several ways to show that your model is plausible. Use common-sense arguments. Reference (a small number of) published papers. Replicate an existing model/research question. Don’t provide a rigorous literature-supported theoretical motivation. You don’t have the time to conduct a thorough literature review, and we don’t have the time to read such reviews when grading. Literature review is not one of the learning goals for this course, so you cannot get “bonus points” for an extensive literature review. You are free to test any plausible model that meets the size requirements. You can derive your own model/research question or you can replicate a published analysis. Model Specifications We will not cover methods for modeling categorical outcome variables. So, use only continuous variables as outcomes. DVs in path models and the structural parts of SEMs Observed indicators of latent factors in CFA/SEM NOTE: You may treat ordinal items as continuous, for the purposes of these assignments. We will not cover methods for latent variable interactions. Don’t specify a theoretical model that requires an interaction involving a latent construct. There is one exception to the above prohibition. If the moderator is an observed grouping variable, you can estimate the model as a multiple-group model. We’ll cover these methods in Week 7. Assumptions You need to show that you’re thinking about the assumptions and their impact on your results, but you don’t need to run thorough model diagnostics. Indeed, the task of checking assumptions isn’t nearly as straight forward in path analysis, CFA, and SEM as it is in linear regression modeling. You won’t be able to directly apply the methods you have learned for regression diagnostics, for example. Since all of our models are estimated with normal-theory maximum likelihood, the fundamental assumption of all the models we’ll consider in this course boils down to the following. All random variables in my model are i.i.d. multivariate normally distributed. So, you can get by with basic data screening and checking the observed random variables in your model (i.e., all variables other than fixed predictors) for normality. Since checking for multivariate normality is a bit tricky, we’ll only ask you to evaluate univariate normality. You should do these evaluations via graphical means. To summarize, we’re looking for the following. Data Consider whether the measurement level of your data matches the assumptions of your model. Check your variables for univariate outliers. If you find any outliers, either treat them in some way or explain why you are retaining them for the analysis. Check for missing data. For the purposes of the assignment, you can use complete case analysis to work around the missing data. If you’re up for more of a challenge, feel free to try multiple imputation or full information maximum likelihood. Model Evaluate the univariate normality of any random, observed variables in your model. E.g., DVs in path models, observed IVs modeled as random variables, indicators of latent factors If you fit a multiple-group model for Assignment 3, do this evaluation within groups. Use graphical tools to evaluate the normality assumption. Normal QQ-Plots Histograms Results What do we mean by reporting your results “in a suitable format”? Basically, put some effort into making your results readable, and don’t include a bunch of superfluous information. Part of demonstrating that you understand the analysis is showing that you know which pieces of output convey the important information. Tabulate your results; don’t directly copy the R output. Don’t include everything lavaan gives you. Include only the output needed to understand your results and support your conclusions. "],["data_options.html", "Data", " Data Below, you can find links to a few suitable datasets that you can use for the assignments. You must use one of the following datasets. You may not choose your own data from the wild. Coping with Covid Dataset Codebook Pre-Registration Feminist Perspectives Scale Dataset Article Hypersensitive Narcissism Scale & Dirty Dozen Dataset HSNS Article DD Article Kentucky Inventory of Mindfulness Skills Dataset Article Depression Anxiety Stress Scale Dataset DASS Information Nomophobia Dataset Recylced Water Acceptance Dataset Article "],["procedures.html", "Procedures", " Procedures Formatting You must submit your assignment reports in PDF format. Each report should include a title page. The title page should include the following information: The name of the assignment. The names of all assignment authors (i.e., all group members for Assignments 1 & 2, your name for Assignment 3). The Assignment Group number (only for Assignments 1 & 2). You must include the code used to define and run your model(s) as an appendix. Try to format the text in this appendix clearly. Use a monospace font. Length You may use as many words as necessary to adequately explain yourself; though, concision and parsimony are encouraged. Note that the assignments are not intended to be full-blown papers! The focus should be on the definition of your model, how this model relates to theory (introduction), and what you have learned from your estimated model (discussion). For each of the assignments, you should be able to get the job done in fewer than 10 pages of text (excluding title page, figures, appendices, and references). Submission You will submit your reports through Blackboard. Each assignment has a corresponding item in the “Assignments” section of the BB page through which you will submit your reports. For Assignments 1 & 2, you may only submit one report per group. Designate one group member to submit the report. The grade for this submission will apply to all group members. If something goes wrong with the submission, or you notice a mistake (before the deadline) that you want to correct, you may upload a new version of your report. We will grade the final submitted version. The submissions will be screened with Ouriginal. "],["grading-1.html", "Grading", " Grading Group Assignments Assignments 1 & 2 are simply graded as pass/fail. To pass, your submission must: Do a reasonable job of addressing the relevant components listed above Be submitted before the deadline Otherwise, you will fail the assignment. Individual Assignment Assignment 3 will be fully graded on the usual 10-point scale. Points will be allocated according to the extent to which your submission addresses the six components listed above. The evaluation matrix gives an indication of how these points will be apportioned. Further details over the grading procedures for Assignment 3 (e.g., exactly how your 10-point grade will be defined) will be provided at a later date. Assuming your group passes the first two assignments, your final course grade will simply be your Assignment 3 grade. Resits You must get a “pass” for Assignments 1 & 2 and score at least 5.5 on Assignment 3 to pass the course. If you fail any of the assignments, you will have the opportunity to resit the failed assignment(s). If you resit Assignment 3, your revised graded cannot be higher than 6. Further details on the resit procedure will be provided at a later date. Example Assignment You can find an example of a good submission (for an older version of Assignment 2) here. This example is not perfect (no paper ever is), and several points could be improved. That being said, this submission exemplifies what we’re looking for in your project reports. So, following the spirit of this example would earn you a high grade. "],["rules.html", "Rules", " Rules Resources For all three assignments, you may use any reference materials you like, including: All course materials The course GitBook Additional books and papers The internet Collaboration You will complete the first two assignments in groups. Although you will work in groups, your group may not work together with other groups. You will complete the final assignment individually. For this assignment, you may not work with anyone else. For all three assignments, you are obligated to submit original work (i.e., work conducted for this course by you or your group). Submitting an assignment that violates this condition constitutes fraud. Such cases of fraud will be addressed according to the University’s standard policy. Academic integrity Hopefully, you also feel a moral obligation to obey the rules. For this course, we have implemented an examination that allows you to showcase what you have learned in a more realistic way than a written exam would allow. This assessment format spares you the stress of long exams (the two exams for this course used to be 4 hours each) and the attendant studying/cramming. The assignments will also help you assess your ability to independently analyse data, which is important to know for your future courses and/or career. However, this format also assumes that you complete the assignments in good faith. So, I simply ask that you hold up your end of the bargain, and submit your original work to show us what you’ve learned. Strict stuff By submitting your assignments (both group and individual), you confirm the following: You have completed the assignment yourself (or with your group) You are submitting work that you have written yourself (or with your group) You are using your own UU credentials to submit the assignment You have not had outside help that violates the conditions delineated above while completing the assignment All assignments will be submitted via Ouriginal in Blackboard and, thereby, checked for plagiarism. If fraud or plagiarism is detected or suspected, we will inform the Board of Examiners in the usual manner. In the event of demonstrable fraud, the sanctions delineated in Article 5.15 of the Education and Examination Regulations (EER) will apply. "],["software-setup.html", "Software Setup", " Software Setup This chapter will help you prepare for the course by showing how to install R and RStudio on your computer. If you’re already using R, there may be nothing new for you here. That being said, you should look over this chapter to ensure that your current setup will be compatible with the course requirements. If you have never used R before, this chapter is essential! The information is this chapter will be crucial for getting your computer ready for the course. "],["typographic-conventions.html", "Typographic Conventions", " Typographic Conventions Throughout this GitBook, we (try to) use a consistent set of typographic conventions: Functions are typeset in a code font, and the name of the function is always followed by parentheses E.g., sum(), mean() Other R objects (e.g., data objects, function arguments) are in also typeset in a code font but without parentheses E.g., seTE, method.tau Sometimes, we’ll use the package name followed by two colons (::, the so-called *scope-resolution operator), like lavaan::sem(). This command is valid R code and will run if you copy it into your R console. The lavaan:: part of the command tells R that we want to use the sem() from the lavaan package. "],["installing-software.html", "Installing software", " Installing software Before we start the course, we have to install three things: R: A free program for statistical programming RStudio: An integrated development environment (IDE) which makes it easier to work with R. Several packages: Separate pieces of ‘add-on’ software for R with functions to do specific analyses. Packages also include documentation describing how to use their functions and sample data. Installing R The latest version of R is available here. Click the appropriate link for your operating system and follow the instructions for installing the latest stable release. Depending on which OS you select, you may be given an option to install different components (e.g., base, contrib, Rtools). For this course, you will only need the base package. Installing RStudio Download the Free Desktop version of RStudio from the download page of the RStudio website. Installing packages To participate in this course, you will need a few essential R packages. Here’s an overview of the packages and why we need them: Package Description lavaan A sophisticated and user-friendly package for structural equation modeling dplyr A powerful suite of data-processing tools ggplot2 A flexible and user-friendly package for making graphs tidySEM Plotting and tabulating the output of SEM-models semTools Comparing models, establishing measurement invariance across groups psych Descriptive statistics and EFA rockchalk Probing interactions foreign Loading data from SPSS ‘.sav’ files readxl Loading data from Excel ‘.xslx’ files To install these packages, we use the install.packages() function in R. Open RStudio Inside RStudio, find the window named Console on left side of the screen. Copy the following code into the console and hit Enter/Return to run the command. install.packages(c("lavaan", "dplyr", "ggplot2", "tidySEM", "semTools", "psych", "rockchalk", "foreign", "readxl"), dependencies = TRUE) "],["course-data.html", "Course Data", " Course Data All of the data files you will need for the course are available in this SurfDrive directory. Follow the link to download a ZIP archive containing the data you will need to complete the practical exercises and assignments. Extract these data files to a convenient location on your computer. "],["note-on-data-updates.html", "Note on Data Updates", " Note on Data Updates During the course, we may need to update some of these datasets and/or add some new datasets to the SurfDrive directory. If so, you will need to download the updated data. We will let you know if and when any datasets are modified. In such situations, you are responsible for updating your data. Working with outdated data will probably produce incorrect results. Your answer won’t match the solutions we expect. Your answer will be marked as incorrect, even if the code used to produce the answer is correct. Points lost on an assignment due to using outdated datasets will not be returned. "],["introduction-to-r.html", "1 Introduction to R", " 1 Introduction to R This week is all about getting up-and-running with R and RStudio. Homework before the lecture Complete the preparatory material: Read over the Course Information chapter Work through the Software Setup chapter Watch the Lecture Recording for this week. Homework before the practical Complete the At-Home Exercises. Practical content During the practical you will work on the In-Class Exercises. "],["lecture.html", "1.1 Lecture", " 1.1 Lecture This week, you will learn the basics of R and RStudio. Rather than re-inventing the proverbial wheel, we’re linked to existing resources developed by R-Ladies Sydney. 1.1.1 Recordings Tour of RStudio \\[\\\\[6pt]\\] R Packages \\[\\\\[6pt]\\] Data I/0 1.1.2 Slides You can access the accompanying resources on the R-Ladies Sydney website here. "],["reading.html", "1.2 Reading", " 1.2 Reading There is no official reading this week. If you’d like to deepen your dive into R, feel free to check out Hadley Wickham’s excellent book R for Data Science. Otherwise, you may want to get a jump-start on the At-Home Exercises for this week. \\[\\\\[12pt]\\] "],["at-home-exercises.html", "1.3 At-Home Exercises", " 1.3 At-Home Exercises This week is all about gaining familiarity with R and RStudio. We’ll be using the primers available on Posit Cloud to work through some basic elements of data visualization and statistical programming in R. Although you should already have R working, this week’s at-home and in-class exercises don’t require that you have R installed on your system. If following along within this GitBook doesn’t work for you, you can also find the tutorials online on the Posit Primers page. 1.3.1 Visualizations with R 1.3.2 Programming with R End of At-Home Exercises "],["in-class-exercises.html", "1.4 In-Class Exercises", " 1.4 In-Class Exercises In the practical this week, we’ll go a little further into what it’s possible with R. Don’t worry if you cannot remember everything in these primers—they’re only meant to familiarize you with what is possible and to get you some experience interacting with R and RStudio. The following primers come from Posit Cloud and were created with the learnr package. 1.4.1 Viewing Data This first primer introduces a special data format called a tibble, as well as some functions for viewing your data. 1.4.2 Dissecting Data In the next primer, we’ll explore tools to subset and rearrange you data: select(), filter(), and arrange(). 1.4.3 Grouping and Manipulating Data Advanced If you made it through the previous two sections with ease and want to challenge yourself, go ahead with this next section. If you’re running short on time, you can skip ahead to Exploratory Data Analysis. \\[\\\\[3pt]\\] 1.4.4 Exploratory Data Analysis 1.4.5 Visualizing Data Visualizing data is a great way to start understanding a data set. In this section, we’ll highlight a few examples of how you can use the ggplot2 libarary to visualize your data. Primers on many other visualizations are available on Posit Cloud. Bar Charts for Categorical Variables Scatterplots for Continuous Variables 1.4.6 Tidying Data This primer will provide an overview of what’s meant by “tidy data”. You only need to complete the Tidy Data section—the sections on Gathering and Spreading columns are useful, but we won’t ask you to apply those techniques in this course. Recap Hopefully, you now feel more comfortable using some of R’s basic functionality and packages to work with data. Here’s a brief description of the functions covered above: install.packages() for installing packages Remember to put the package names in quotes library() for loading packages View() for viewing your dataset select() for picking only certain columns filter() for picking only certain rows arrange() for changing the rows order %>% aka “the pipe” for chaining commands together In RStudio, you can hit ctrl+shift+m as a handy key combination ? for help files Logical tests and Boolean operators == equal to != not equal to < less than <= less than or equal to > greater than >= greater than or equal to is.na() is the value NA (not available) !is.na is the value not NA & and (true only if the left and right are both true) | or (true if either the left or right are true) ! not (invert true/false) %in% in (is left in the larger set of right values) any() any (true if any in the set are true) all() all (true if all in the set are true) xor() xor (true if one and only one of the set are true) ggplot2 ggplot() create the basic object from which to building a plot aes() contains the aesthetic mappings (like x and y) geom_bar() bar plots for distributions of categorical variables geom_point() scatterplots for plotting two continuous variables geom_label_repel() for plotting text facet_wrap() for creating sets of conditional plots End of In-Class Exercises "],["statistical-modeling-path-analysis.html", "2 Statistical Modeling & Path Analysis", " 2 Statistical Modeling & Path Analysis This week, we will cover statistical modeling and path analysis. Homework before the lecture Watch the Lecture Recording for this week. Complete the Reading for this week, and answer the associated reading questions. Homework before the practical Complete the At-Home Exercises. Practical content During the practical you will work on the In-Class Exercises. "],["lecture-1.html", "2.1 Lecture", " 2.1 Lecture In this lecture, we will begin by discussing the paradigm and contextualizing statistical modeling relative to other ways that we can conduct statistical analyses. We will conclude with an introduction to . 2.1.1 Recordings Statistical Reasoning Statistical Modeling Path Analysis 2.1.2 Slides You can download the lectures slides here "],["reading-1.html", "2.2 Reading", " 2.2 Reading Reference Smaldino, P. E. (2017). Models are stupid, and we need more of them. In R.R. Vallacher, S.J. Read, & A. Nowakt (Eds.), Computational Social Psychology (pp. 311–331). New York: Routledge. SKIP PAGES 322 - 327 Questions What are the differences between a “verbal model” and a “formal model”? As explained in the paragraph “A Brief Note on Statistical Models”, formal models are not the same as statistical models. Still, we can learn a lot from Smaldino’s approach. Write down three insights from this paper that you would like to apply to your statistical modeling during this course. Suggested Reading (Optional) The following paper is not required, but it’s definitely worth a read. Breiman provides a very interesting perspective on different ways to approach a modeling-based analysis. Breiman, L. (2001). Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statistical Science, 16(3) 199–231. https://doi.org/10.1214/ss/1009213726 "],["at-home-exercises-1.html", "2.3 At-Home Exercises", " 2.3 At-Home Exercises Load the LifeSat.sav data. library(dplyr) library(haven) LifeSat <- read_spss("LifeSat.sav") 2.3.1 Make a table of descriptive statistics for the variables: LifSat, educ, ChildSup, SpouSup, and age. What is the average age in the sample? What is the range (youngest and oldest child)? Hint: Use the tidySEM::descriptives() function.` Click for explanation The package tidySEM contains the descriptives() function for computing descriptive statistics. The describe() function in the psych package is a good alternative. library(tidySEM) descriptives(LifeSat[ , c("LifSat", "educ", "ChildSup", "SpouSup", "age")]) 2.3.2 Run a simple linear regression with LifSat as the dependent variable and educ as the independent variable. Hints: The lm() function (short for linear model) does linear regression. The summary() function provides relevant summary statistics for the model. It can be helpful to store the results of your analysis in an object. Click for explanation results <- lm(LifSat ~ educ, data = LifeSat) summary(results) ## ## Call: ## lm(formula = LifSat ~ educ, data = LifeSat) ## ## Residuals: ## Min 1Q Median 3Q Max ## -43.781 -11.866 2.018 12.418 43.018 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 35.184 7.874 4.469 2.15e-05 *** ## educ 3.466 1.173 2.956 0.00392 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 17.64 on 96 degrees of freedom ## Multiple R-squared: 0.08344, Adjusted R-squared: 0.0739 ## F-statistic: 8.74 on 1 and 96 DF, p-value: 0.003918 2.3.3 Repeat the analysis from 2.3.2 with age as the independent variable. Click for explanation results <- lm(LifSat ~ age, data = LifeSat) summary(results) ## ## Call: ## lm(formula = LifSat ~ age, data = LifeSat) ## ## Residuals: ## Min 1Q Median 3Q Max ## -35.321 -14.184 3.192 13.593 40.626 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 200.2302 52.1385 3.840 0.00022 *** ## age -2.0265 0.7417 -2.732 0.00749 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 17.75 on 96 degrees of freedom ## Multiple R-squared: 0.07215, Adjusted R-squared: 0.06249 ## F-statistic: 7.465 on 1 and 96 DF, p-value: 0.007487 2.3.4 Repeat the analysis from 2.3.2 and 2.3.3 with ChildSup as the independent variable. Click for explanation results <- lm(LifSat ~ ChildSup, data = LifeSat) summary(results) ## ## Call: ## lm(formula = LifSat ~ ChildSup, data = LifeSat) ## ## Residuals: ## Min 1Q Median 3Q Max ## -37.32 -12.14 0.66 12.41 44.68 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.559 8.342 4.502 1.89e-05 *** ## ChildSup 2.960 1.188 2.492 0.0144 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 17.86 on 96 degrees of freedom ## Multiple R-squared: 0.06076, Adjusted R-squared: 0.05098 ## F-statistic: 6.211 on 1 and 96 DF, p-value: 0.01441 2.3.5 Run a multiple linear regression with LifSat as the dependent variable and educ, age, and ChildSup as the independent variables. Hint: You can use the + sign to add multiple variables to the RHS of your model formula. Click for explanation results <- lm(LifSat ~ educ + age + ChildSup, data = LifeSat) summary(results) ## ## Call: ## lm(formula = LifSat ~ educ + age + ChildSup, data = LifeSat) ## ## Residuals: ## Min 1Q Median 3Q Max ## -32.98 -12.56 2.68 11.03 41.91 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 134.9801 53.2798 2.533 0.0130 * ## educ 2.8171 1.1436 2.463 0.0156 * ## age -1.5952 0.7188 -2.219 0.0289 * ## ChildSup 2.4092 1.1361 2.121 0.0366 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 16.92 on 94 degrees of freedom ## Multiple R-squared: 0.1741, Adjusted R-squared: 0.1477 ## F-statistic: 6.603 on 3 and 94 DF, p-value: 0.0004254 2.3.6 Compare the results from 2.3.5 with those from 2.3.2, 2.3.3, and 2.3.4. What do you notice when you compare the estimated slopes for each of the three predictors in the multiple regression model with the corresponding estimates from the simple regression models? "],["in-class-exercises-1.html", "2.4 In-Class Exercises", " 2.4 In-Class Exercises During this practical, you will work through some exercises meant to expand your statistical reasoning skills and improve your understanding of linear models. For this exercise, having some familiarity with regression will be helpful. If you feel like you need to refresh your knowledge in this area, consider the resources listed in the Background knowledge section. Data: You will use the following dataset for these exercises. Sesam.sav 2.4.1 Data Exploration Open the file “Sesam.sav” # Load `dplyr` for data processing: library(dplyr) # Load the `haven` library for reading in SPSS files: library(haven) ## Load the 'Sesam.sav' data ## Use haven::zap_formats() to remove SPSS attributes sesam <- read_sav(file = "Sesam.sav") %>% zap_formats() This file is part of a larger dataset that evaluates the impact of the first year of the Sesame Street television series. Sesame Street is mainly concerned with teaching preschool related skills to children in the 3–5 year age range. The following variables will be used in this exercise: age: measured in months prelet: knowledge of letters before watching Sesame Street (range 0–58) prenumb: knowledge of numbers before watching Sesame Street (range 0–54) prerelat: knowledge of size/amount/position relationships before watching Sesame Street (range 0–17) peabody: vocabulary maturity before watching Sesame Street (range 20–120) postnumb: knowledge of numbers after a year of Sesame Street (range 0–54) Note: Unless stated otherwise, the following questions refer to the sesam data and the above variables. 2.4.1.1 What is the type of each variable? Hint: The output of the str() function should be helpful here. Click to show code ## Examine the data structure: str(sesam) ## tibble [240 × 8] (S3: tbl_df/tbl/data.frame) ## $ id : num [1:240] 1 2 3 4 5 6 7 8 9 10 ... ## $ age : num [1:240] 66 67 56 49 69 54 47 51 69 53 ... ## $ prelet : num [1:240] 23 26 14 11 47 26 12 48 44 38 ... ## $ prenumb : num [1:240] 40 39 9 14 51 33 13 52 42 31 ... ## $ prerelat: num [1:240] 14 16 9 9 17 14 11 15 15 10 ... ## $ peabody : num [1:240] 62 80 32 27 71 32 28 38 49 32 ... ## $ postnumb: num [1:240] 44 39 40 19 54 39 44 51 48 52 ... ## $ gain : num [1:240] 4 0 31 5 3 6 31 -1 6 21 ... ## ..- attr(*, "display_width")= int 10 Click for explanation All variables are numeric. str() uses the abbreviation “num” to indicate a numeric vector. 2.4.1.2 What is the average age in the sample? What is the age range (youngest and oldest child)? Hint: Use tidySEM::descriptives() Click to show code As in the take home exercises, you can use the descriptives() function from the tidySEM package to describe the data: library(tidySEM) descriptives(sesam) Click for explanation We can get the average age from the “mean” column in the table ( 51.5), and the age range from the columns “min” and “max”, (34 and 69 respectively.) 2.4.1.3 What is the average gain in knowledge of numbers? What is the standard deviation of this gain? Hints: You will need to compute the gain and save the change score as a new object. You can then use the base-R functions mean() and sd() to do the calculations. Click to show code Create a new variable that represents the difference between pre- and post-test scores on knowledge of numbers: sesam <- mutate(sesam, ndif = postnumb - prenumb) Compute the mean and SD of the change score: sesam %>% summarise(mean(ndif), sd(ndif)) 2.4.1.4 Create an appropriate visualization of the gain scores you computed in 2.4.1.3. Justify your choice of visualization. Hint: Some applicable visualizations are explained in the Visualizations with R section. Click to show code library(ggplot2) ## Create an empty baseline plot object: p <- ggplot(sesam, aes(x = ndif)) ## Add some appropriate geoms: p + geom_histogram() p + geom_density() p + geom_boxplot() Click for explanation Because the gain score is numeric, we should use something appropriate for showing the distribution of a continuous variable. In this case, we can use either a density plot, or a histogram (remember from the lecture, this is like a density plot, but binned). We can also use a box plot, which can be a concise way to display a lot of information about a variable in a little less space. 2.4.1.5 Create a visualization that provides information about the bivariate relationship between the pre- and post-test number knowledge. Justify your choice of visualization. Describe the relationship based on what you see in your visualization. Hint: Again, the Visualizations with R section may provide some useful insights. Click to show code ## Create a scatterplot of the pre- and post-test number knowledge ggplot(sesam, aes(x = prenumb, y = postnumb)) + geom_point() Click for explanation A scatterplot is a good tool for showing patterns in the way that two continuous variables relate to each other. From it, we can quickly gather information about whether a relationship exists, its direction, its strength, how much variation there is, and whether or not a relationship might be non-linear. Based on this scatterplot, we see a positive relationship between the prior knowledge of numbers and the knowledge of numbers at the end of the study. Children who started with a higher level of numeracy also ended with a higher level of numeracy. There is a considerable amount of variance in the relationship. Not every child increases their numeracy between pre-test and post-test. Children show differing amounts of increase. 2.4.2 Linear Modeling 2.4.2.1 Are there significant, bivariate associations between postnumb and the following variables? age prelet prenumb prerelat peabody Use Pearson correlations to answer this question. You do not need to check the assumptions here (though you would in real life). Hint: The base-R cor.test() function and the corr.test() function from the psych package will both conduct hypothesis tests for a correlation coefficients (the base-R cor() function only computes the coefficients). Click to show code library(psych) ## Test the correlations using psych::corr.test(): sesam %>% select(postnumb, age, prelet, prenumb, prerelat, peabody) %>% corr.test() ## Call:corr.test(x = .) ## Correlation matrix ## postnumb age prelet prenumb prerelat peabody ## postnumb 1.00 0.34 0.50 0.68 0.54 0.52 ## age 0.34 1.00 0.33 0.43 0.44 0.29 ## prelet 0.50 0.33 1.00 0.72 0.47 0.40 ## prenumb 0.68 0.43 0.72 1.00 0.72 0.61 ## prerelat 0.54 0.44 0.47 0.72 1.00 0.56 ## peabody 0.52 0.29 0.40 0.61 0.56 1.00 ## Sample Size ## [1] 240 ## Probability values (Entries above the diagonal are adjusted for multiple tests.) ## postnumb age prelet prenumb prerelat peabody ## postnumb 0 0 0 0 0 0 ## age 0 0 0 0 0 0 ## prelet 0 0 0 0 0 0 ## prenumb 0 0 0 0 0 0 ## prerelat 0 0 0 0 0 0 ## peabody 0 0 0 0 0 0 ## ## To see confidence intervals of the correlations, print with the short=FALSE option ## OR ## library(magrittr) ## Test the correlations using multiple cor.test() calls: sesam %$% cor.test(postnumb, age) ## ## Pearson's product-moment correlation ## ## data: postnumb and age ## t = 5.5972, df = 238, p-value = 5.979e-08 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.2241066 0.4483253 ## sample estimates: ## cor ## 0.3410578 sesam %$% cor.test(postnumb, prelet) ## ## Pearson's product-moment correlation ## ## data: postnumb and prelet ## t = 8.9986, df = 238, p-value < 2.2e-16 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.4029239 0.5926632 ## sample estimates: ## cor ## 0.5038464 sesam %$% cor.test(postnumb, prenumb) ## ## Pearson's product-moment correlation ## ## data: postnumb and prenumb ## t = 14.133, df = 238, p-value < 2.2e-16 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.6002172 0.7389277 ## sample estimates: ## cor ## 0.6755051 sesam %$% cor.test(postnumb, prerelat) ## ## Pearson's product-moment correlation ## ## data: postnumb and prerelat ## t = 9.9857, df = 238, p-value < 2.2e-16 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.4475469 0.6268773 ## sample estimates: ## cor ## 0.5433818 sesam %$% cor.test(postnumb, peabody) ## ## Pearson's product-moment correlation ## ## data: postnumb and peabody ## t = 9.395, df = 238, p-value < 2.2e-16 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.4212427 0.6067923 ## sample estimates: ## cor ## 0.520128 Click for explanation Yes, based on the p-values (remember that 0 here really means very small, making it less than .05), we would say that there are significant correlations between postnumb and all other variables in the data. (In fact, all variables in the data are significantly correlated with one another.) 2.4.2.2 Do age and prenumb explain a significant proportion of the variance in postnumb? What statistic did you use to justify your conclusion? Interpret the model fit. Use the lm() function to fit your model. Click to show code lmOut <- lm(postnumb ~ age + prenumb, data = sesam) summary(lmOut) ## ## Call: ## lm(formula = postnumb ~ age + prenumb, data = sesam) ## ## Residuals: ## Min 1Q Median 3Q Max ## -38.130 -6.456 -0.456 5.435 22.568 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 7.4242 5.1854 1.432 0.154 ## age 0.1225 0.1084 1.131 0.259 ## prenumb 0.7809 0.0637 12.259 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 9.486 on 237 degrees of freedom ## Multiple R-squared: 0.4592, Adjusted R-squared: 0.4547 ## F-statistic: 100.6 on 2 and 237 DF, p-value: < 2.2e-16 Click for explanation Yes, age and prenumb explain a significant amount of variability in postnumb (\\(R^2 = 0.459\\), \\(F[2, 237] = 100.629\\), \\(p < 0.001\\)). We use the F statistic for the overall test of model fit to support this conclusion. The variables age and prenumb together explain 45.9% of the variability in postnumb. 2.4.2.3 Write the null and alternative hypotheses tested for in 2.4.2.2. Click for explanation Since we are testing for explained variance, our hypotheses concern the \\(R^2\\). \\[ \\begin{align*} H_0: R^2 = 0\\\\ H_1: R^2 > 0 \\end{align*} \\] Note that this is a directional hypotheses because the \\(R^2\\) cannot be negative. 2.4.2.4 Define the model syntax to estimate the model from 2.4.2.2 as a path analysis using lavaan. Click to show code mod <- 'postnumb ~ 1 + age + prenumb' 2.4.2.5 Estimate the path analytic model you defined above. Use the lavaan::sem() function to estimate the model. Click to show code library(lavaan) lavOut1 <- sem(mod, data = sesam) 2.4.2.6 Summarize the fitted model you estimated above. Use the summary() function to summarize the model. Click to show code summary(lavOut1) ## lavaan 0.6.16 ended normally after 1 iteration ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 4 ## ## Number of observations 240 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## postnumb ~ ## age 0.123 0.108 1.138 0.255 ## prenumb 0.781 0.063 12.336 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .postnumb 7.424 5.153 1.441 0.150 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .postnumb 88.864 8.112 10.954 0.000 In OLS regression, the predictor variables are usually treated as fixed and do not covary. We can easily relax this assumption in path analysis. 2.4.2.7 Re-estimate the path analytic model you defined in 2.4.2.4. Specify the predictors as random, correlated variables. Hint: You can make the predictors random in, at least, two ways: Modify the model syntax to specify the correlation between age and prenumb. Add fixed.x = FALSE to your sem() call. Click to show code lavOut2 <- sem(mod, data = sesam, fixed.x = FALSE) ## OR ## mod <- ' postnumb ~ 1 + age + prenumb age ~~ prenumb ' lavOut2 <- sem(mod, data = sesam) 2.4.2.8 Summarize the fitted model you estimated above. Compare the results to those from the OLS regression in 2.4.2.2 and the path model in 2.4.2.5. Click to show code summary(lavOut2) ## lavaan 0.6.16 ended normally after 26 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 9 ## ## Number of observations 240 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## postnumb ~ ## age 0.123 0.108 1.138 0.255 ## prenumb 0.781 0.063 12.336 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## age ~~ ## prenumb 28.930 4.701 6.154 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .postnumb 7.424 5.153 1.441 0.150 ## age 51.525 0.405 127.344 0.000 ## prenumb 20.896 0.688 30.359 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .postnumb 88.864 8.112 10.954 0.000 ## age 39.291 3.587 10.954 0.000 ## prenumb 113.702 10.379 10.954 0.000 summary(lavOut1) ## lavaan 0.6.16 ended normally after 1 iteration ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 4 ## ## Number of observations 240 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## postnumb ~ ## age 0.123 0.108 1.138 0.255 ## prenumb 0.781 0.063 12.336 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .postnumb 7.424 5.153 1.441 0.150 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .postnumb 88.864 8.112 10.954 0.000 summary(lmOut) ## ## Call: ## lm(formula = postnumb ~ age + prenumb, data = sesam) ## ## Residuals: ## Min 1Q Median 3Q Max ## -38.130 -6.456 -0.456 5.435 22.568 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 7.4242 5.1854 1.432 0.154 ## age 0.1225 0.1084 1.131 0.259 ## prenumb 0.7809 0.0637 12.259 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 9.486 on 237 degrees of freedom ## Multiple R-squared: 0.4592, Adjusted R-squared: 0.4547 ## F-statistic: 100.6 on 2 and 237 DF, p-value: < 2.2e-16 2.4.2.9 Consider the path model below. How many regression coefficients are estimated in this model? How many variances are estimated? How many covariances are estimated? Click for explanation Six regression coefficients (red) Four (residual) variances (blue) No covariances 2.4.2.10 Consider a multiple regression analysis with three continuous independent variables: scores on tests of language, history, and logic, and one continuous dependent variable: score on a math test. We want to know if scores on the language, history, and logic tests can predict the math test score. Sketch a path model that you could use to answer this question How many regression parameters are there? How many variances could you estimate? How many covariances could you estimate? 2.4.3 Categorical IVs Load the Drivers.sav data. # Read the data into a data frame named 'drivers': drivers <- read_sav("Drivers.sav") %>% as_factor() # This preserves the SPSS labels for nominal variables In this section, we will evaluate the following research question: Does talking on the phone interfere with people's driving skills? These data come from an experiment. The condition variable represents the three experimental conditions: Hand-held phone Hands-free phone Control (no phone) We will use condition as the IV in our models. The DV, RT, represents the participant’s reaction time (in milliseconds) during a driving simulation. 2.4.3.1 Use the package ggplot2 to create a density plot for the variable RT. What concept are we representing with this plot? Hint: Consider the lap times example from the statistical modeling section of Lecture 2. Click to show code ggplot(drivers, aes(x = RT)) + geom_density() Click for explanation This shows the distribution of all the combined reaction times from drivers in all three categories. 2.4.3.2 Modify this density plot by mapping the variable condition from your data to the fill aesthetic in ggplot. What is the difference between this plot and the previous plot? Do you think there is evidence for differences between the groups? How might we test this by fitting a model to our sample? Click to show code Hint: To modify the transparency of the densities, use the aesthetic alpha. ggplot(drivers, aes(x = RT, fill = condition)) + geom_density(alpha = .5) Click for explanation This figure models the conditional distribution of reaction time, where the type of cell phone usage is the grouping factor. Things you can look at to visually assess whether the three groups differ are the amount of overlap of the distributions, how much distance there is between the individual means, and whether the combined distribution is much different than the conditional distributions. If we are willing to assume that these conditional distributions are normally distributed and have equivalent variances, we could use a linear model with dummy-coded predictors. Aside: ANOVA vs. Linear Regression As you may know, the mathematical model underlying ANOVA is just a linear regression model with nominal IVs. So, in terms of the underlying statistical models, there is no difference between ANOVA and regression; the differences lie in the focus of the analysis. ANOVA is really a type of statistical test wherein we are testing hypotheses about the effects of some set of nominal grouping factors on some continuous outcome. When doing an ANOVA, we usually don’t interact directly with the parameter estimates from the underlying model. Regression is a type of statistical model (i.e., a way to represent a univariate distribution with a conditional mean and fixed variance). When we do a regression analysis, we primarily focus on the estimated parameters of the underling linear model. When doing ANOVA in R, we estimate the model exactly as we would for linear regression; we simply summarize the results differently. If you want to summarize your model in terms of the sums of squares table you usually see when running an ANOVA, you can supply your fitted lm object to the anova() function. This is a statistical modeling course, not a statistical testing course, so we will not consider ANOVA any further. 2.4.3.3 Estimate a linear model that will answer the research question stated in the beginning of this section. Use lm() to estimate the model. Summarize the fitted model and use the results to answer the research question. Click to show code library(magrittr) lmOut <- drivers %>% mutate(condition = relevel(condition, ref = "control")) %$% lm(RT ~ condition) summary(lmOut) ## ## Call: ## lm(formula = RT ~ condition) ## ## Residuals: ## Min 1Q Median 3Q Max ## -317.50 -71.25 2.98 89.55 243.45 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 553.75 29.08 19.042 <2e-16 *** ## conditionhand-held 100.75 41.13 2.450 0.0174 * ## conditionhands-free 63.80 41.13 1.551 0.1264 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 130.1 on 57 degrees of freedom ## Multiple R-squared: 0.09729, Adjusted R-squared: 0.06562 ## F-statistic: 3.072 on 2 and 57 DF, p-value: 0.05408 anova(lmOut) Click for explanation The effect of condition on RT is nonsignificant (\\(F[2, 57] = 3.07\\), \\(p = 0.054\\)). Therefore, based on these results, we do not have evidence for an effect of mobile phone usage on driving performance. 2.4.3.4 Use lavaan to estimate the model from 2.4.3.3 as a path model. Hint: lavaan won’t let us use factors for our categorical predictors. So, you will need to create your own dummy codes. Click to show code mod <- 'RT ~ 1 + HH + HF' lavOut <- drivers %>% mutate(HH = ifelse(condition == "hand-held", 1, 0), # Create dummy code for "hand-held" condition HF = ifelse(condition == "hands-free", 1, 0) # Create dummy code for "hands-free" condition ) %>% sem(mod, data = .) # Estimate the model summary(lavOut) ## lavaan 0.6.16 ended normally after 1 iteration ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 4 ## ## Number of observations 60 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## RT ~ ## HH 100.750 40.085 2.513 0.012 ## HF 63.800 40.085 1.592 0.111 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .RT 553.750 28.344 19.537 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .RT 16068.028 2933.607 5.477 0.000 At this point, we haven’t covered the tools you need to conduct the ANOVA-style tests with path models. So, you can’t yet answer the research question with the above model. When we discuss model comparisons, you’ll get the missing tools. End of In-Class Exercises 2 "],["mediation-moderation.html", "3 Mediation & Moderation", " 3 Mediation & Moderation In this lecture, we will discuss two particular types of processes that we can model using path analysis: mediation and moderation. Homework before the lecture Watch the Lecture Recording for this week. Complete the Reading for this week, and answer the associated reading questions. Homework before the practical Complete the At-Home Exercises. Practical content During the practical you will work on the In-Class Exercises. "],["lecture-2.html", "3.1 Lecture", " 3.1 Lecture Researchers often have theories about possible causal processes linking multiple variables. Mediation is a particularly important example of such a process in which in an input variable, X, influences the outcome, Y, through an intermediary variable, M (the mediator). For instance, psychotherapy (X), may affect thoughts (M), which in turn affects mood (Y). We can investigate mediation via a specific sequence of linear regression equations, but path modeling will make our lives much easier. We can use path models to simultaneously estimate multiple related regression equations. So, mediation analysis is an ideal application of path modeling. In this lecture, we consider both approaches and discuss their relative strengths and weaknesses. As with mediation, researchers often posit theories involving moderation. Moderation implies that the effect of X on Y depends on another variable, Z. For instance, the effect of feedback (X) on performance (Y) may depend on age (Z). Older children might process feedback more effectively than younger children. Hence, the feedback is more effective for older children than for younger children, and the effect of feedback on performance is stronger for older children than for younger children. In such a case, we would say that age moderates the effect of feedback on performance. 3.1.1 Recordings Note: In the following recordings, the slide numbers are a bit of a mess, because I made these videos by cutting together recordings that used different slide decks. My apologies to those who are particularly distracted by continuity errors. Mediation Basics Mediation Testing Bootstrapping Moderation Basics Moderation Probing 3.1.2 Slides You can download the lecture slides here "],["reading-2.html", "3.2 Reading", " 3.2 Reading Reference Baron, R. M. & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical Considerations. Journal of Personality and Individual Differences, 51(6), 1173–1182 Questions What is mediation? Give an example of mediation. According to the authors, we must satisfy four criteria to infer mediation. What are these criteria? What is “moderation”, and how is it different from “mediation”? Give an example of moderation. What are the four methods given by Baron and Kenny as suitable ways to to study interaction effects? The authors suggest that one of the most common ways to address unreliability is to use multiple indicators. Thinking back to what you’ve learned about factor analysis, briefly explain why multiple indicators can improve reliability. How can you determine whether a variable is a mediator or moderator? Reference Hayes, A. F. (2009). Beyond Baron and Kenny: Statistical mediation analysis in the new millennium. Communication Monographs, 76(4), 408–420. Questions What is an indirect or mediated effect? What is the difference between the total and direct effect? What is the main problem with the Barron & Kenny “Causal Steps Approach”? What is bootstrapping, and why is it a better way to test mediation than Sobel’s test? Explain how it is possible that “effects that don’t exist can be mediated”. "],["at-home-exercises-2.html", "3.3 At-Home Exercises", " 3.3 At-Home Exercises 3.3.1 Mediation In the first part of this practical, we will analyze the data contained in SelfEsteem.sav. These data comprise 143 observations of the following variables.1 case: Participant ID number ParAtt: Parental Attachment PeerAtt: Peer Attachment Emp: Empathy ProSoc: Prosocial behavior Aggr: Aggression SelfEst: Self-esteem 3.3.1.1 Load the SelfEsteem.sav data. Note: Unless otherwise specified, all analyses in Section 3.3.1 apply to these data. Click to show code library(haven) seData <- read_sav("SelfEsteem.sav") Suppose we are interested in the (indirect) effect of peer attachment on self-esteem, and whether empathy has a mediating effect on this relationship. We might generate the following hypotheses: Better peer relationships promote higher self esteem This effect is mediated by a student’s empathy levels, where better peer relationships increase empathy, and higher levels of empathy lead to higher self-esteem. To evaluate these hypotheses, we will use lavaan to estimate a path model. 3.3.1.2 Draw a path model (on paper) that can be used to test the above hypotheses. Label the input (X), outcome (Y), and mediator/intermediary (M). Label the paths a, b, and c’. Hint: Refer back to the Mediation Basics lecture if you need help here. Click for explanation 3.3.1.3 Specify the lavaan model syntax implied by the path diagram shown above. Save the resulting character string as an object in your environment. Hint: Refer back to the example in which opinions of systematic racism mediate the relationship between political affiliation and support for affirmative action policies from the Mediation Testing lecture this week. Click to show code mod <- ' ## Equation for outcome: SelfEst ~ Emp + PeerAtt ## Equation for the mediator: Emp ~ PeerAtt ' 3.3.1.4 Use the lavaan::sem() function to estimate the model defined in 3.3.1.3. Use the default settings in sem(). Click to show code library(lavaan) out <- sem(mod, data = seData) 3.3.1.5 Explore the summary of the fitted model. Which numbers correspond to the a, b, and c’ paths? Interpret these paths. Do the direction of the effects seem to align with our hypothesis? Click to show code summary(out) ## lavaan 0.6.16 ended normally after 1 iteration ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 5 ## ## Number of observations 143 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## SelfEst ~ ## Emp 0.234 0.091 2.568 0.010 ## PeerAtt 0.174 0.088 1.968 0.049 ## Emp ~ ## PeerAtt 0.349 0.076 4.628 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .SelfEst 0.934 0.110 8.456 0.000 ## .Emp 0.785 0.093 8.456 0.000 Click for explanation The results show estimates of the a path (Emp ~ PeerAtt), the b path (SelfEst ~ Emp), and the c’ path (SelfEst ~ PeerAtt). All three of these effects are positive and significant, including the direct effect of PeerAtt on SelfEst (\\(\\beta = 0.174\\), \\(Z = 1.97\\), \\(p = 0.025\\)), and the parts of the indirect effect made up by the effect of PeerAtt on Emp (\\(\\beta = 0.349\\), \\(Z = 4.63\\), \\(p = 0\\)), and Emp on SelfEst (\\(\\beta = 0.234\\), \\(Z = 2.57\\), \\(p = 0.005\\)). We can see that the direction of the effects seems to support of our hypotheses, but without taking the next steps to investigate the indirect effect, we should be hesitant to say more. Remember that an indirect effect (IE) is the product of multiple regression slopes. Therefore, to estimate an IE, we must define this product in our model syntax. In lavaan, we define the new IE parameter in two steps. Label the relevant regression paths. Use the labels to define a new parameter that represent the desired IE. We can define new parameters in lavaan model syntax via the := operator. The lavaan website contains a tutorial on this procedure: http://lavaan.ugent.be/tutorial/mediation.html 3.3.1.6 Use the procedure described above to modify the model syntax from 3.3.1.3 by adding the definition of the hypothesized IE from PeerAtt to SelfEst. Click to show code mod <- ' ## Equation for outcome: SelfEst ~ b * Emp + PeerAtt ## Equation for mediator: Emp ~ a * PeerAtt ## Indirect effect: ie := a * b ' Click for explanation Notice that I only label the parameters that I will use to define the IE. You are free to label any parameter that you like, but I choose the to label only the minimally sufficient set to avoid cluttering the code/output. 3.3.1.7 Use lavaan::sem() to estimate the model with the IEs defined. Use the default settings for sem(). Is the hypothesized IE significant according to the default tests? Hint: Refer to the Mediation Testing lecture Click to show code out <- sem(mod, data = seData) summary(out) ## lavaan 0.6.16 ended normally after 1 iteration ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 5 ## ## Number of observations 143 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## SelfEst ~ ## Emp (b) 0.234 0.091 2.568 0.010 ## PeerAtt 0.174 0.088 1.968 0.049 ## Emp ~ ## PeerAtt (a) 0.349 0.076 4.628 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .SelfEst 0.934 0.110 8.456 0.000 ## .Emp 0.785 0.093 8.456 0.000 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ## ie 0.082 0.036 2.245 0.025 Click for explanation The IE of Peer Attachment on Self Esteem through Empathy is statistically significant (\\(\\hat{\\textit{IE}} = 0.082\\), \\(Z = 2.25\\), \\(p = 0.012\\)). Note: The p-value above doesn’t match the output because we’re testing a directional hypothesis, but lavaan conducts two-tailed tests for the model parameters. As we learned in the lecture, the above test of the indirect effect is equivalent to Sobel’s Z test (which we don’t really want). An appropriate, robust test of the indirect effect requires bootstrapping, which we will do later this week as part of the in-class exercises. For now, we’ll add another input variable to our model: parental attachment. We will use this model to evaluate the following research questions: Is there a direct effect of parental attachment on self-esteem, after controlling for peer attachment and empathy? Is there a direct effect of peer attachment on self-esteem, after controlling for parental attachment and empathy? Is the effect of parental attachment on self-esteem mediated by empathy, after controlling for peer attachment? Is the effect of peer attachment on self-esteem mediated by empathy, after controlling for parental attachment? 3.3.1.8 Run the path model needed to test the research questions listed above. Specify the lavaan model syntax implied by the research questions. Allow peer attachment and parental attachment to covary. Define two new parameters to represent the hypothesized indirect effects. Estimate the model using lavaan::sem(). Use the default settings in sem(). Investigate the model summary. Click to show code mod <- ' ## Equation for outcome: SelfEst ~ b * Emp + ParAtt + PeerAtt ## Equation for mediator: Emp ~ a1 * ParAtt + a2 * PeerAtt ## Covariance: ParAtt ~~ PeerAtt ie_ParAtt := a1 * b ie_PeerAtt := a2 * b ' out <- sem(mod, data = seData) summary(out) ## lavaan 0.6.16 ended normally after 16 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 10 ## ## Number of observations 143 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## SelfEst ~ ## Emp (b) 0.206 0.088 2.357 0.018 ## ParAtt 0.287 0.078 3.650 0.000 ## PeerAtt 0.024 0.094 0.252 0.801 ## Emp ~ ## ParAtt (a1) 0.078 0.075 1.045 0.296 ## PeerAtt (a2) 0.306 0.086 3.557 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## ParAtt ~~ ## PeerAtt 0.537 0.103 5.215 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .SelfEst 0.854 0.101 8.456 0.000 ## .Emp 0.779 0.092 8.456 0.000 ## ParAtt 1.277 0.151 8.456 0.000 ## PeerAtt 0.963 0.114 8.456 0.000 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ## ie_ParAtt 0.016 0.017 0.956 0.339 ## ie_PeerAtt 0.063 0.032 1.965 0.049 3.3.1.9 What can we say about the two indirect effects? Can we say that empathy mediates both paths? Click to show explanation According to the Sobel-style test, after controlling for parental attachment, the indirect effect of peer attachment on self-esteem was statistically significant (\\(\\hat{IE} = 0.063\\), \\(Z = 1.96\\), \\(p = 0.049\\)), as was the analogous direct effect (\\(\\hat{\\beta} = 0.306\\), \\(Z = 3.56\\), \\(p < 0.001\\)). After controlling for peer attachment, neither the indirect effect (\\(\\hat{IE} = 0.016\\), \\(Z = 0.96\\), \\(p = 0.339\\)) nor the direct effect (\\(\\hat{\\beta} = 0.078\\), \\(Z = 1.05\\), \\(p = 0.296\\)) of parental attachment on self-esteem was significant, though. 3.3.2 Moderation Remember that moderation attempts to describe when one variable influences another. For the home exercise, we’ll go back to the Sesame Street data we worked with for the in-class exercises last week. 3.3.2.1 Load the Sesam2.sav data.2 NOTE: Unless otherwise specified, all analyses in Section 3.3.2 use these data. Click to show code # Read the data into an object called 'sesam2': sesam2 <- read_sav("Sesam2.sav") VIEWCAT is a nominal grouping variable, but it is represented as a numeric variable in the sesam2 data. The levels represent the following frequencies of Sesame Street viewership of the children in the data: VIEWCAT = 1: Rarely/Never VIEWCAT = 2: 2–3 times a week VIEWCAT = 3: 4–5 times a week VIEWCAT = 4: > 5 times a week 3.3.2.2 Convert VIEWCAT into a factor. Make sure that VIEWCAT = 1 is the reference group. Hints: You can identify the reference group with the levels() or contrasts() functions. The reference group is the group labelled with the first level printed by levels(). When you run contrasts(), you will see a pattern matrix that defines a certain dummy coding scheme. The reference group is the group that has zeros in each column of this matrix. If you need to change the reference group, you can use the relevel() function. Click to show code library(forcats) ## Convert 'VIEWCAT' to a factor: sesam2 <- sesam2 %>% mutate(VIEWCAT = factor(VIEWCAT)) ## Optionally specify the labels # sesam2 <- # sesam2 %>% # mutate(VIEWCAT = factor(VIEWCAT, # levels = c(1, 2, 3, 4), # labels = c("Rarely/never", # "2-3 times per week", # "4-5 times per week", # "> 5 times per week"))) ## Check the reference group: levels(sesam2$VIEWCAT) ## [1] "1" "2" "3" "4" contrasts(sesam2$VIEWCAT) ## 2 3 4 ## 1 0 0 0 ## 2 1 0 0 ## 3 0 1 0 ## 4 0 0 1 ## If necessary, relevel # sesam <- # sesam2 %>% # mutate(VIEWCAT = relevel(VIEWCAT, 1)) 3.3.2.3 Use lm() to estimate a multiple regression model wherein VIEWCAT predicts POSTNUMB. Summarize the model. Interpret the estimates. Click to show code lmOut <- lm(POSTNUMB ~ VIEWCAT, data = sesam2) summary(lmOut) ## ## Call: ## lm(formula = POSTNUMB ~ VIEWCAT, data = sesam2) ## ## Residuals: ## Min 1Q Median 3Q Max ## -25.474 -7.942 0.240 8.526 25.240 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 18.760 2.316 8.102 8.95e-14 *** ## VIEWCAT2 9.331 2.900 3.218 0.00154 ** ## VIEWCAT3 14.714 2.777 5.298 3.49e-07 *** ## VIEWCAT4 18.032 2.809 6.419 1.24e-09 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 11.58 on 175 degrees of freedom ## Multiple R-squared: 0.2102, Adjusted R-squared: 0.1967 ## F-statistic: 15.53 on 3 and 175 DF, p-value: 5.337e-09 Click for explanation Viewing category explains a statistically significant proportion of the variance in the post-test score of numbers learned (\\(R^2 = 0.21\\), \\(F(3, 175) = 15.53\\), \\(p < 0.001\\)). Kids who never or rarely watched Sesame Street had an average score of 18.76 on the post-test. Kids with weekly viewing habits of 2–3, 4–5, or 5+ times per week all had significantly higher scores on the post-test than kids who never or rarely watched Sesame Street (2–3: \\(\\hat{\\beta} = 9.33\\), \\(t = 3.22\\), \\(p = 0.002\\); 4–5: \\(\\hat{\\beta} = 14.71\\), \\(t = 5.3\\), \\(p < 0.001\\); 5+: \\(\\hat{\\beta} = 18.03\\), \\(t = 6.42\\), \\(p < 0.001\\)). If we compare the box plot, kernel density plot, and model output below, the relationships between the regression coefficient estimates for the viewing categories and the group means should be evident. 3.3.2.4 Use ggplot() to make a scatterplot with AGE on the x-axis and POSTNUMB on the y-axis. Color the points according to the their VIEWCAT level. Save the plot object to a variable in your environment. Hint: You can map color to the levels of a variable on your dataset by assigning the variable names to the color argument of the aes() function in ggplot(). Click to show code library(ggplot2) ## Add aes(..., color = VIEWCAT) to get different colors for each group: p <- ggplot(sesam2, aes(x = AGE, y = POSTNUMB, color = VIEWCAT)) + geom_point() # Add points for scatterplot ## Print the plot stored as 'p': p We assigned the global color aesthetic to the VIEWCAT variable, so the points are colored based on their group. 3.3.2.5 Add linear regression lines for each group to the above scatterplot. Hints: You can add regression lines with ggplot2::geom_smooth() To get linear regression lines, set the argument method = \"lm\" To omit error envelopes, set the argument se = FALSE Click to show code ## Add OLS best-fit lines: p + geom_smooth(method = "lm", se = FALSE) The global color aesthetic assignment from above carries through to any additional plot elements that we add, including the regression lines. So, we also get a separate regression line for each VIEWCAT group. 3.3.2.6 How would you interpret the pattern of regression lines above? Click for explanation All the lines show a positive slope, so post-test number recognition appears to increase along with increasing age. The lines are not parallel, though. So VIEWCAT may be moderating the effect of AGE on POSTNUMB. Based on the figure we just created, we may want to test for moderation in our regression model. To do so, we need to add an interaction between AGE and VIEWCAT. The VIEWCAT factor is represented by 3 dummy codes in our model, though. So when we interact AGE and VIEWCAT, we will create 3 interaction terms. To test the overall moderating influence of VIEWCAT, we need to conduct a multiparameter hypothesis test of all 3 interaction terms. One way that we can go about implementing such a test is through a hierarchical regression analysis entailing three steps: Estimate the additive model wherein we regress POSTNUMB onto AGE and VIEWCAT without any interaction. Estimate the moderated model by adding the interaction between AGE and VIEWCAT into the additive model. Conduct a \\(\\Delta R^2\\) test to compare the fit of the two models. 3.3.2.7 Conduct the hierarchical regression analysis described above. Does VIEWCAT significantly moderate the effect of AGE on POSTNUMB? Provide statistical justification for your conclusion. Click to show code ## Estimate the additive model a view the results: results_add <- lm(POSTNUMB ~ VIEWCAT + AGE, data = sesam2) summary(results_add) ## ## Call: ## lm(formula = POSTNUMB ~ VIEWCAT + AGE, data = sesam2) ## ## Residuals: ## Min 1Q Median 3Q Max ## -23.680 -8.003 -0.070 8.464 22.635 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -10.1056 6.5091 -1.553 0.12235 ## VIEWCAT2 9.1453 2.7390 3.339 0.00103 ** ## VIEWCAT3 13.8602 2.6294 5.271 3.98e-07 *** ## VIEWCAT4 16.9215 2.6636 6.353 1.79e-09 *** ## AGE 0.5750 0.1221 4.708 5.08e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 10.94 on 174 degrees of freedom ## Multiple R-squared: 0.2995, Adjusted R-squared: 0.2834 ## F-statistic: 18.6 on 4 and 174 DF, p-value: 9.642e-13 ## Estimate the moderated model and view the results: results_mod <- lm(POSTNUMB ~ VIEWCAT * AGE, data = sesam2) summary(results_mod) ## ## Call: ## lm(formula = POSTNUMB ~ VIEWCAT * AGE, data = sesam2) ## ## Residuals: ## Min 1Q Median 3Q Max ## -23.8371 -8.2387 0.6158 8.7988 22.5611 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -18.7211 15.5883 -1.201 0.2314 ## VIEWCAT2 9.9741 20.6227 0.484 0.6293 ## VIEWCAT3 23.5825 19.3591 1.218 0.2248 ## VIEWCAT4 34.3969 19.3600 1.777 0.0774 . ## AGE 0.7466 0.3074 2.429 0.0162 * ## VIEWCAT2:AGE -0.0175 0.4060 -0.043 0.9657 ## VIEWCAT3:AGE -0.1930 0.3782 -0.510 0.6104 ## VIEWCAT4:AGE -0.3416 0.3770 -0.906 0.3663 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 10.99 on 171 degrees of freedom ## Multiple R-squared: 0.3046, Adjusted R-squared: 0.2762 ## F-statistic: 10.7 on 7 and 171 DF, p-value: 3.79e-11 ## Test for moderation: anova(results_add, results_mod) Click for explanation VIEWCAT does not significantly moderate the effect of AGE on POSTNUMB (\\(F[3, 171] = 0.422\\), \\(p = 0.738\\)). 3.3.2.8 Sketch the analytic path diagrams for the additive and moderated models you estimated in 3.3.2.7 (on paper). Click for explanation Additive Model Moderated Model End of At-Home Exercises 3 These data were simulated from the covariance matrix provided in Laible, D. J., Carlo, G., & Roesch, S. C. (2004). Pathways to self-esteem in late adolescence: The role of parent and peer attachment, empathy, and social behaviours. Journal of adolescence, 27(6), 703-716.↩︎ These data are from the very interesting study: Ball, S., & Bogatz, G. A. (1970). A Summary of the Major Findings in” The First Year of Sesame Street: An Evaluation”.↩︎ "],["in-class-exercises-2.html", "3.4 In-Class Exercises", " 3.4 In-Class Exercises 3.4.1 Mediation In this practical, we’ll go back to the data from the at-home exercises, SelfEsteem.sav. Recall that these data comprise 143 observations of the following variables. case: Participant ID number ParAtt: Parental Attachment PeerAtt: Peer Attachment Emp: Empathy ProSoc: Prosocial behavior Aggr: Aggression SelfEst: Self-esteem When we last worked with the data, we built a model with one mediator (Emp), creating indirect effects between our predictors ParAtt and PeerAtt, and our outcome variable SelfEst. Below, you will estimate a more complex, multiple-mediator model. 3.4.1.1 Load the data into the object seData using haven::read_sav() Click to show code library(haven) seData <- read_sav("SelfEsteem.sav") For this analysis, we are interested in the (indirect) effects of parental and peer attachment on self-esteem. Furthermore, we want to evaluate the mediating roles of empathy and social behavior (i.e., prosocial behavior and aggression). Specifically, we have the following hypotheses. Better peer relationships will promote higher self-esteem via a three-step indirect process. Better peer relationships will increase empathy levels. Higher empathy will increase prosocial behavior and decrease aggressive behavior. More prosocial behaviors and less aggressive behavior will both produce higher self-esteem. Better relationships with parents directly increase self-esteem. To evaluate these hypotheses, we will use lavaan to estimate the following multiple mediator model as a path model. 3.4.1.2 Specify the lavaan model syntax implied by the path diagram shown above. Save the resulting character string as an object in your environment. Click to show code mod0 <- ' ## Equation for outcome: SelfEst ~ ProSoc + Aggr + Emp + ParAtt + PeerAtt ## Equations for stage 2 mediators: ProSoc ~ PeerAtt + ParAtt + Emp Aggr ~ PeerAtt + ParAtt + Emp ## Equation for stage 1 mediator: Emp ~ ParAtt + PeerAtt ## Covariances: ProSoc ~~ Aggr ParAtt ~~ PeerAtt ' 3.4.1.3 Use the lavaan::sem() function to estimate the model defined in 3.4.1.2. Use the default settings in sem(). Summarize the fitted model. Click to show code library(lavaan) out <- sem(mod0, data = seData) summary(out) ## lavaan 0.6.16 ended normally after 16 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 21 ## ## Number of observations 143 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## SelfEst ~ ## ProSoc 0.252 0.096 2.634 0.008 ## Aggr 0.185 0.085 2.172 0.030 ## Emp 0.143 0.098 1.460 0.144 ## ParAtt 0.244 0.078 3.133 0.002 ## PeerAtt 0.051 0.091 0.555 0.579 ## ProSoc ~ ## PeerAtt -0.037 0.080 -0.469 0.639 ## ParAtt 0.193 0.067 2.886 0.004 ## Emp 0.477 0.074 6.411 0.000 ## Aggr ~ ## PeerAtt -0.095 0.090 -1.055 0.291 ## ParAtt -0.034 0.075 -0.454 0.650 ## Emp -0.309 0.084 -3.697 0.000 ## Emp ~ ## ParAtt 0.078 0.075 1.045 0.296 ## PeerAtt 0.306 0.086 3.557 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## .ProSoc ~~ ## .Aggr -0.086 0.058 -1.476 0.140 ## ParAtt ~~ ## PeerAtt 0.537 0.103 5.215 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .SelfEst 0.796 0.094 8.456 0.000 ## .ProSoc 0.618 0.073 8.456 0.000 ## .Aggr 0.777 0.092 8.456 0.000 ## .Emp 0.779 0.092 8.456 0.000 ## ParAtt 1.277 0.151 8.456 0.000 ## PeerAtt 0.963 0.114 8.456 0.000 3.4.1.4 Considering the parameter estimates from 3.4.1.3, what can you say about the hypotheses? Click for explanation Notice that all of the hypotheses stated above are explicitly directional. Hence, when evaluating the significance of the structural paths that speak to these hypotheses, we should use one-tailed tests. We cannot ask lavaan to return one-tailed p-values, but we have no need to do so. We can simply divide the two-tailed p-values in half. The significant direct effect of ParAtt on SelfEst (\\(\\beta = 0.244\\), \\(Z = 3.13\\), \\(p = 0.001\\)) and the lack of a significant direct effect of PeerAtt on SelfEst (\\(\\beta = 0.051\\), \\(Z = 0.555\\), \\(p = 0.29\\)) align with our hypotheses. The remaining patterns of individual estimates also seem to conform to the hypotheses (e.g., all of the individual paths comprising the indirect effects of PeerAtt on SelfEst are significant). We cannot make any firm conclusions until we actually estimate and test the indirect effects, though. 3.4.1.5 Modify the model syntax from 3.4.1.2 by adding definitions of the two hypothesized IEs from PeerAtt to SelfEst. Click to show code You can use any labeling scheme that makes sense to you, but I recommend adopting some kind of systematic rule. Here, I will label the individual estimates in terms of the short variable names used in the path diagram above. mod <- ' ## Equation for outcome: SelfEst ~ y_m21 * ProSoc + y_m22 * Aggr + Emp + ParAtt + PeerAtt ## Equations for stage 2 mediators: ProSoc ~ m21_x2 * PeerAtt + ParAtt + m21_m1 * Emp Aggr ~ m22_x2 * PeerAtt + ParAtt + m22_m1 * Emp ## Equation for stage 1 mediator: Emp ~ ParAtt + m1_x2 * PeerAtt ## Covariances: ProSoc ~~ Aggr ParAtt ~~ PeerAtt ## Indirect effects: ie_pro := m1_x2 * m21_m1 * y_m21 ie_agg := m1_x2 * m22_m1 * y_m22 ' 3.4.1.6 Use lavaan::sem() to estimate the model with the IEs defined. Use the default settings for sem(). Are the hypothesized IEs significant according to the default tests? Click to show code out <- sem(mod, data = seData) summary(out) ## lavaan 0.6.16 ended normally after 16 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 21 ## ## Number of observations 143 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## SelfEst ~ ## ProSoc (y_21) 0.252 0.096 2.634 0.008 ## Aggr (y_22) 0.185 0.085 2.172 0.030 ## Emp 0.143 0.098 1.460 0.144 ## ParAtt 0.244 0.078 3.133 0.002 ## PerAtt 0.051 0.091 0.555 0.579 ## ProSoc ~ ## PerAtt (m21_2) -0.037 0.080 -0.469 0.639 ## ParAtt 0.193 0.067 2.886 0.004 ## Emp (m21_1) 0.477 0.074 6.411 0.000 ## Aggr ~ ## PerAtt (m22_2) -0.095 0.090 -1.055 0.291 ## ParAtt -0.034 0.075 -0.454 0.650 ## Emp (m22_1) -0.309 0.084 -3.697 0.000 ## Emp ~ ## ParAtt 0.078 0.075 1.045 0.296 ## PerAtt (m1_2) 0.306 0.086 3.557 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## .ProSoc ~~ ## .Aggr -0.086 0.058 -1.476 0.140 ## ParAtt ~~ ## PeerAtt 0.537 0.103 5.215 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .SelfEst 0.796 0.094 8.456 0.000 ## .ProSoc 0.618 0.073 8.456 0.000 ## .Aggr 0.777 0.092 8.456 0.000 ## .Emp 0.779 0.092 8.456 0.000 ## ParAtt 1.277 0.151 8.456 0.000 ## PeerAtt 0.963 0.114 8.456 0.000 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ## ie_pro 0.037 0.018 2.010 0.044 ## ie_agg -0.017 0.011 -1.657 0.098 Click for explanation The IE of Peer Attachment on Self Esteem through Empathy and Prosocial Behavior is significant (\\(\\hat{\\textit{IE}} = 0.037\\), \\(Z = 2.01\\), \\(p = 0.022\\)), as is the analogous IE through Aggressive Behavior (\\(\\hat{\\textit{IE}} = -0.017\\), \\(Z = -1.66\\), \\(p = 0.049\\)). Though, this latter effect is just barely significant at the \\(\\alpha = 0.05\\) level. The tests we used to evaluate the significance of the IEs in 3.4.1.6 are flawed because they assume normal sampling distributions for the IEs. However the IEs are defined as products of multiple, normally distributed, regression slopes. So the IEs themselves cannot be normally distributed (at least in finite samples), and the results of the normal-theory significance tests may be misleading. To get an accurate test of the IEs, we should use bootstrapping to generate an empirical sampling distribution for each IE. In lavaan, we implement bootstrapping by specifying the se = \"bootstrap\" option in the fitting function (i.e., the cfa() or sem() function) and specifying the number of bootstrap samples via the bootstrap option. Workflow Tip To draw reliable conclusions from bootstrapped results, we need many bootstrap samples (i.e., B > 1000), but we must estimate the full model for each of these samples, so the estimation can take a long time. To avoid too much frustration, you should first estimate the model without bootstrapping to make sure everything is specified correctly. Only after you are certain that your code is correct do you want to run the full bootstrapped version. 3.4.1.7 Re-estimate the model from 3.4.1.6 using 1000 bootstrap samples. Other than the se and bootstrap options, use the defaults. Are the hypothesized IEs significant according to the bootstrap-based test statistics? Click to show code ## Set a seed to get replicable bootstrap samples: set.seed(235711) ## Estimate the model with bootstrapping: out_boot <- sem(mod, data = seData, se = "bootstrap", bootstrap = 1000) ## Summarize the model: summary(out_boot) ## lavaan 0.6.16 ended normally after 16 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 21 ## ## Number of observations 143 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Bootstrap ## Number of requested bootstrap draws 1000 ## Number of successful bootstrap draws 1000 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## SelfEst ~ ## ProSoc (y_21) 0.252 0.100 2.529 0.011 ## Aggr (y_22) 0.185 0.085 2.174 0.030 ## Emp 0.143 0.095 1.507 0.132 ## ParAtt 0.244 0.079 3.089 0.002 ## PerAtt 0.051 0.095 0.530 0.596 ## ProSoc ~ ## PerAtt (m21_2) -0.037 0.082 -0.456 0.648 ## ParAtt 0.193 0.068 2.831 0.005 ## Emp (m21_1) 0.477 0.078 6.092 0.000 ## Aggr ~ ## PerAtt (m22_2) -0.095 0.087 -1.093 0.275 ## ParAtt -0.034 0.076 -0.448 0.654 ## Emp (m22_1) -0.309 0.092 -3.356 0.001 ## Emp ~ ## ParAtt 0.078 0.072 1.092 0.275 ## PerAtt (m1_2) 0.306 0.079 3.896 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## .ProSoc ~~ ## .Aggr -0.086 0.058 -1.493 0.135 ## ParAtt ~~ ## PeerAtt 0.537 0.128 4.195 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .SelfEst 0.796 0.082 9.698 0.000 ## .ProSoc 0.618 0.068 9.114 0.000 ## .Aggr 0.777 0.104 7.476 0.000 ## .Emp 0.779 0.090 8.651 0.000 ## ParAtt 1.277 0.197 6.473 0.000 ## PeerAtt 0.963 0.105 9.203 0.000 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ## ie_pro 0.037 0.019 1.891 0.059 ## ie_agg -0.017 0.011 -1.638 0.101 Click for explanation As with the normal-theory tests, the hypothesized IE of Peer Attachment on Self Esteem was significant (\\(\\hat{\\textit{IE}} = 0.037\\), \\(Z = 1.89\\), \\(p = 0.029\\)), but the IE of Aggressive Behavior has crossed into nonsignificant territory (\\(\\hat{\\textit{IE}} = -0.017\\), \\(Z = -1.64\\), \\(p = 0.051\\)). Note: Bootstrapping is a stochastic method, so each run can provide different results. Since the indirect effect of aggressive behavior is so close to the critical value, you may come to a different conclusions vis-á-vis statistical significance if you run this analysis with a different random number seed or a different number of bootstrap samples. When you use the summary() function to summarize the bootstrapped model from 3.4.1.7, the output will probably look pretty much the same as it did in 3.4.1.6, but it’s not. The standard errors and test statistics in the bootstrapped summary are derived from empirical sampling distributions, whereas these values are based on an assumed normal sampling distribution in 3.4.1.6. The standard method of testing IEs with bootstrapping is to compute confidence intervals (CIs) from the empirical sampling distribution of the IEs. In lavaan, we can compute basic (percentile, 95%) CIs by adding the ci = TRUE option to the summary() function. To evaluate our directional hypotheses at an \\(\\alpha = 0.05\\) level, however, we need to compute 90% CIs. We can get more control over the summary statistics (include the CIs) with the parameterEstimates() function. 3.4.1.8 Check the documentation for lavaan::parameterEstimates(). Click to show code ?parameterEstimates 3.4.1.9 Use the parameterEstimates() function to compute bootstrapped CIs for the hypothesized IEs. Compute percentile CIs. Are the IEs significant according to the bootstrapped CIs? Click to show code parameterEstimates(out_boot, ci = TRUE, level = 0.9) Click for explanation When evaluating a directional hypothesis with a CI, we only consider one of the interval’s boundaries. For a hypothesized positive effect, we check only if the lower boundary is greater than zero. For a hypothesized negative effect, we check if the upper boundary is less than zero. As with the previous tests, the IE of Peer Attachment on Self Esteem through Empathy and Prosocial Behavior is significant (\\(\\hat{\\textit{IE}} = 0.037\\), \\(95\\% ~ CI = [0.009; \\infty]\\)), but the analogous IE through Aggressive Behavior is not quite significant (\\(\\hat{\\textit{IE}} = -0.017\\), \\(95\\% ~ CI = [-\\infty; -0.003]\\)). 3.4.1.10 Based on the analyses you’ve conducted here, what do you conclude vis-à-vis the original hypotheses? Click for explanation When using normal-theory tests, both hypothesized indirect effects between Peer Attachment and Self Esteem were supported in that the IE through Empathy and Prosocial Behavior as well as the IE through Empathy and Aggressive Behavior were both significant. The hypothesized direct effect of Parent Attachment on Self Esteem was also born out via a significant direct effect in the model. When testing the indirect effects with bootstrapping, however, the effect through Aggressive Behavior was nonsignificant. Since bootstrapping gives a more accurate test of the indirect effect, we should probably trust these results more than the normal-theory results. We should not infer a significant indirect effect of Peer Attachment on Self Esteem transmitted through Empathy and Aggressive Behavior. These results may not tell the whole story, though. We have not tested for indirect effects between Parent Attachment and Self Esteem, and we have not evaluated simpler indirect effects between Peer Attachment and Self Esteem (e.g., PeerAtt \\(\\rightarrow\\) Emp \\(\\rightarrow\\) SelfEst). 3.4.2 Moderation We will first analyze a synthetic version of the Outlook on Life Survey data. The original data were collected in the United States in 2012 to measure, among other things, attitudes about racial issues, opinions of the Federal government, and beliefs about the future. We will work with a synthesized subset of the original data. You can access these synthetic data as outlook.rds. This dataset comprises 2288 observations of the following 13 variables. d1:d3: Three observed indicators of a construct measuring disillusionment with the US Federal government. Higher scores indicate more disillusionment s1:s4: Four observed indicators of a construct measuring the perceived achievability of material success. Higher scores indicate greater perceived achievability progress: A single item assessing perceived progress toward achieving the “American Dream” Higher scores indicate greater perceived progress merit: A single item assessing endorsement of the meritocratic ideal that hard work leads to success. Higher scores indicate stronger endorsement of the meritocratic ideal lib2Con: A single item assessing liberal-to-conservative orientation Lower scores are more liberal, higher scores are more conservative party: A four-level factor indicating self-reported political party affiliation disillusion: A scale score representing disillusionment with the US Federal government Created as the mean of d1:d3 success: A scale score representing the perceived achievability of material success Created as the mean of s1:s4 To satisfy the access and licensing conditions under which the original data are distributed, the data contained in outlook.rds were synthesized from the original variables using the methods described by Volker and Vink (2021). You can access the original data here, and you can access the code used to process the data here. 3.4.2.1 Read in the outlook.rds dataset. Hint: An RDS file is an R object that’s been saved to a file. To read in this type of file, we use readRDS() from base R. Click to show code outlook <- readRDS("outlook.rds") 3.4.2.2 Summarize the outlook data to get a sense of their characteristics. Click to show code head(outlook) summary(outlook) ## d1 d2 d3 s1 ## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 ## 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:3.000 1st Qu.:2.000 ## Median :4.000 Median :3.000 Median :4.000 Median :2.000 ## Mean :3.642 Mean :3.218 Mean :3.629 Mean :2.288 ## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:3.000 ## Max. :5.000 Max. :5.000 Max. :5.000 Max. :4.000 ## s2 s3 s4 progress ## Min. :1.000 Min. :1.000 Min. :1.000 Min. : 1.000 ## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:2.000 1st Qu.: 5.000 ## Median :2.000 Median :2.000 Median :2.000 Median : 7.000 ## Mean :1.922 Mean :2.012 Mean :2.469 Mean : 6.432 ## 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.: 8.000 ## Max. :4.000 Max. :4.000 Max. :4.000 Max. :10.000 ## merit lib2Con party disillusion ## Min. :1.000 Min. :1.000 republican : 332 Min. :1.000 ## 1st Qu.:4.000 1st Qu.:3.000 democrat :1264 1st Qu.:3.000 ## Median :5.000 Median :4.000 independent: 576 Median :3.667 ## Mean :4.826 Mean :3.998 other : 116 Mean :3.497 ## 3rd Qu.:6.000 3rd Qu.:5.000 3rd Qu.:4.000 ## Max. :7.000 Max. :7.000 Max. :5.000 ## success ## Min. :1.000 ## 1st Qu.:1.750 ## Median :2.000 ## Mean :2.173 ## 3rd Qu.:2.500 ## Max. :4.000 str(outlook) ## 'data.frame': 2288 obs. of 13 variables: ## $ d1 : num 4 4 4 5 5 4 5 4 4 4 ... ## $ d2 : num 4 2 4 4 3 5 4 2 4 5 ... ## $ d3 : num 4 4 4 5 4 4 4 3 3 4 ... ## $ s1 : num 3 3 4 2 2 2 2 1 3 3 ... ## $ s2 : num 2 2 2 1 1 2 1 1 2 2 ... ## $ s3 : num 3 2 4 1 2 1 1 1 3 2 ... ## $ s4 : num 3 3 3 1 2 3 3 2 2 2 ... ## $ progress : num 8 4 6 1 6 5 7 6 9 7 ... ## $ merit : num 6 5 5 4 3 4 2 5 5 5 ... ## $ lib2Con : num 5 6 4 1 4 4 4 4 4 5 ... ## $ party : Factor w/ 4 levels "republican","democrat",..: 1 3 3 2 2 2 2 2 4 1 ... ## $ disillusion: num 4 3.33 4 4.67 4 ... ## $ success : num 2.75 2.5 3.25 1.25 1.75 2 1.75 1.25 2.5 2.25 ... We will first use OLS regression to estimate a model encoding the following relations: Belief in the achievability of success, success, predicts perceived progress toward the American Dream, progress, as the focal effect. Disillusionment with the US Federal government, disillusion moderates the success \\(\\rightarrow\\) progress effect. Placement on the liberal-to-conservative continuum, lib2Con is partialed out as a covariate. 3.4.2.3 Draw the conceptual path diagram for the model described above. Click for explanation 3.4.2.4 Write out the regression equation necessary to evaluate the moderation hypothesis described above. Click for explanation \\[ Y_{progress} = \\beta_0 + \\beta_1 W_{lib2Con} + \\beta_2 X_{success} + \\beta_3 Z_{disillusion} + \\beta_4 XZ + \\varepsilon \\] 3.4.2.5 Use lm() to estimate the moderated regression model via OLS regression. Click to show code olsFit <- lm(progress ~ lib2Con + success * disillusion, data = outlook) 3.4.2.6 Summarize the fitted model and interpret the results. Is the moderation hypothesis supported? How does disillusionment level affect the focal effect? Click to show code summary(olsFit) ## ## Call: ## lm(formula = progress ~ lib2Con + success * disillusion, data = outlook) ## ## Residuals: ## Min 1Q Median 3Q Max ## -7.4315 -1.2525 0.1307 1.4369 5.6717 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 6.81128 0.62073 10.973 < 2e-16 *** ## lib2Con 0.03052 0.03040 1.004 0.3155 ## success 0.42360 0.25853 1.638 0.1015 ## disillusion -0.78002 0.16864 -4.625 3.95e-06 *** ## success:disillusion 0.17429 0.07273 2.396 0.0166 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.041 on 2283 degrees of freedom ## Multiple R-squared: 0.1385, Adjusted R-squared: 0.137 ## F-statistic: 91.74 on 4 and 2283 DF, p-value: < 2.2e-16 Click for explanation Yes, disillusion significantly moderates the relation between success and progress (\\(\\beta = 0.174\\), \\(t[2283] = 2.396\\), \\(p = 0.017\\)) such that the effect of success on progress increases as levels of disillusion increase, after controlling for lib2Con. The rockchalk package contains some useful routines for probing interactions estimated via lm(). Specifically, the plotslopes() function will estimate and plot simple slopes, and the testSlopes() function tests the simple slopes estimated by plotSlopes(). 3.4.2.7 Probe the interaction. Use the plotSlopes() and testSlopes() functions from the rockchalk package to conduct a simple slopes analysis for the model from 3.4.2.5. Click to show code library(rockchalk) ## Estimate and plot simple slopes: psOut <- plotSlopes(olsFit, plotx = "success", modx = "disillusion", modxVals = "std.dev") ## Test the simple slopes: tsOut <- testSlopes(psOut) ## Values of disillusion OUTSIDE this interval: ## lo hi ## -28.9332857 0.2672244 ## cause the slope of (b1 + b2*disillusion)success to be statistically significant ## View the results: tsOut$hypotests Note: The message printed by testSlopes() gives the boundaries of the Johnson-Neyman Region of Significance (Johnson & Neyman, 1936). Johnson-Neyman analysis is an alternative method of probing interactions that we have not covered in this course. For more information, check out Preacher, et al. (2006). We will now use lavaan to estimate the moderated regression model from above as a path analysis. 3.4.2.8 Define the model syntax for the path analytic version of the model described above. Parameterize the model as in the OLS regression. Use only observed items and scale scores. Click to show code pathMod <- ' progress ~ 1 + lib2Con + success + disillusion + success:disillusion ' 3.4.2.9 Estimate the path model on the outlook data. Click to show code pathFit <- sem(pathMod, data = outlook) 3.4.2.10 Summarize the fitted path model and interpret the results. Do the results match the OLS regression results? What proportion of the variability in progress is explained by this model? Hint: the function lavInspect() can be used to extract information from models Click to show code summary(pathFit) ## lavaan 0.6.16 ended normally after 1 iteration ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 6 ## ## Number of observations 2288 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## progress ~ ## lib2Con 0.031 0.030 1.005 0.315 ## success 0.424 0.258 1.640 0.101 ## disillusion -0.780 0.168 -4.630 0.000 ## success:dsllsn 0.174 0.073 2.399 0.016 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .progress 6.811 0.620 10.985 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .progress 4.157 0.123 33.823 0.000 lavInspect(pathFit, "r2") ## progress ## 0.138 Click for explanation Yes, the estimates and inferential conclusions are all the same as in the OLS regression model. The model explains 13.85% of the variability in progress. The semTools package contains some helpful routines for probing interactions estimated via the lavaan() function (or one of it’s wrappers). Specifically, the probe2WayMC() and plotProbe() functions will estimate/test simple slopes and plot the estimated simple slopes, respectively. 3.4.2.11 Probe the interaction from 3.4.2.9 using semTools utilities. Use probe2WayMC() to estimate and test the simple slopes. Use plotProbe() to visualize the simple slopes. Define the simple slopes with the same conditional values of disillusion that you used in 3.4.2.7. Which simple slopes are significant? Do these results match the results from 3.4.2.7? Click to show code library(semTools) ## Define the conditional values at which to calculate simple slopes: condVals <- summarise(outlook, "m-sd" = mean(disillusion) - sd(disillusion), mean = mean(disillusion), "m+sd" = mean(disillusion) + sd(disillusion) ) %>% unlist() ## Compute simple slopes and intercepts: ssOut <- probe2WayMC(pathFit, nameX = c("success", "disillusion", "success:disillusion"), nameY = "progress", modVar = "disillusion", valProbe = condVals) ## Check the results: ssOut ## $SimpleIntcept ## disillusion est se z pvalue ## m-sd 2.719 4.690 0.231 20.271 0 ## mean 3.497 4.084 0.190 21.508 0 ## m+sd 4.274 3.477 0.230 15.122 0 ## ## $SimpleSlope ## disillusion est se z pvalue ## m-sd 2.719 0.897 0.083 10.792 0 ## mean 3.497 1.033 0.065 15.994 0 ## m+sd 4.274 1.169 0.088 13.223 0 ## Visualize the simple slopes: plotProbe(ssOut, xlim = range(outlook$success), xlab = "Ease of Personal Success", ylab = "Progress toward American Dream", legendArgs = list(legend = names(condVals)) ) Click for explanation Each of the simple slopes is significant. As level of disillusionment increases, the effect of success on progress also increases, and this effect is significant for all levels of disillusion considered here. These results match the simple slopes from the OLS regression analysis. End of In-Class Exercises 3 "],["efa.html", "4 EFA", " 4 EFA This week will be a general introduction to latent variables and scaling procedures. We will discuss several different aspects of exploratory factor analysis (EFA). Most notably: The differences between Principal Component Analyses (PCA) and Factor Analysis Model estimation and factor extraction methods Factor rotations You will have to make decisions regarding each of these aspects when conducting a factor analysis. We will also discuss reliability and factor scores as means of evaluating the properties of a scale. Homework before the lecture Watch the Lecture Recording for this week. Complete the Reading for this week, and answer the associated reading questions. Homework before the practical Complete the At-Home Exercises. Practical content During the practical you will work on the In-Class Exercises. "],["lecture-3.html", "4.1 Lecture", " 4.1 Lecture How do you know if you have measured the putative hypothetical construct that you intend to measure? The methods introduced in this lecture (namely, latent variables, factor analysis, and reliability analysis) can shed empirical light on this issue. In the social and behavioral sciences we’re often forced to measure key concepts indirectly. For example, we have no way of directly quantifying a person’s current level of depression, or their innate motivation, or their risk-aversion, or any of the other myriad psychological features that comprise the human mental state. In truth, we cannot really measure these hypothetical constructs at all, we must estimate latent representations thereof (though, psychometricians still use the language of physical measurement to describe this process). Furthermore, we can rarely estimate an adequate representation with only a single observed variable (e.g., question on a survey, score on a test, reading from a sensor). We generally need several observed variables to reliably represent a single hypothetical construct. For example, we cannot accurately determine someone’s IQ or socio-economic status based on their response to a single question; we need several questions that each tap into slightly different aspects of IQ or SES. Given multiple items measuring the same construct, we can use the methods discussed in this lecture (i.e., factor analysis and reliability analysis) to evaluate the quality of our measurement (i.e., how well we have estimated the underlying hypothetical construct). If we do well enough in this estimation task, we will be able to combine these estimated latent variables with the path analysis methods discussed in previous two weeks to produce the full structural equation models that we will cover at the end of this course. 4.1.1 Recording Notes: This week (and next), we’ll be re-using Caspar van Lissa’s old slides and lecture recording. So, you’ll see Caspar in the following video, and the slides will have a notably different flavor than our usual materials. Don’t be confused by any mention of “model fit” in the lecture. We haven’t covered model fit yet, but we will do so next week. 4.1.2 Slides You can download the lecture slides here. "],["reading-3.html", "4.2 Reading", " 4.2 Reading This week, you will read two papers. Reference 1 Preacher, K. J., & MacCullum, R. C. (2003). Repairing Tom Swift’s electric factor analysis machine, Understanding Statistics 2(1) 13–43. Questions 1 What is a latent variable? Give an example of a latent variable. What is factor analysis, and what can you investigate using this method? In the introduction, Preacher and Maccallum describe a “little jiffy” method of doing factor analysis. Briefly describe this little jiffy—or bad practice—method. Briefly explain the key differences between Principal Component Analyses (PCA) and Exploratory Factor Analyses (EFA). What is the purpose of factor rotation? Reference 2 Kestilä, E. (2006). Is there demand for radical right populism in the Finnish electorate? Scandinavian Political Studies 29(3), 169–191. Questions 2 What is the research question that the author tries to answer? Briefly describe the characteristics of the Radical Right Parties (RRP) in Europe. What are the two main explanations of support for RRP upon which this paper focuses? Does the empirical part of the paper reflect the theoretical framework well? Why or why not? According to the author, is Finland very different from other European countries on the main dependent variables? What is the author’s conclusion (i.e., how does the author answer the research question)? "],["at-home-exercises-3.html", "4.3 At-Home Exercises", " 4.3 At-Home Exercises In these exercises, you will attempt to replicate some of the analyses from the second reading for this week: Kestilä, E. (2006). Is there demand for radical right populism in the Finnish electorate? Scandinavian Political Studies 29(3), 169–191. The data for this practical were collected during the first round of the European Social Survey (ESS). The ESS is a repeated cross-sectional survey administered in 32 European countries. The first wave was collected in 2002, and two new waves have been collected each year since. You can find more info and access the data at https://www.europeansocialsurvey.org. The data we will analyze for this practical are contained in the file named ESSround1-a.sav. This file contains data for all respondents, but only includes those variables that you will need to complete the following exercises. 4.3.1 Load the ESSround1-a.sav dataset into R. Inspect the data after loading to make sure everything went well. Click to show code ## Load the 'haven' package: library(haven) library(tidySEM) ## Read the 'ESSround1-a.sav' data into a data frame called 'ess': ess <- read_spss("ESSround1-a.sav") ## Inspect the result: dim(ess) head(ess) descriptives(ess) ## [1] 42359 50 Click here for a description of the variables. Variable Description name Title of dataset essround ESS round edition Edition proddate Production date cntry Country idno Respondent’s identification number trstlgl Trust in the legal system trstplc Trust in the police trstun Trust in the United Nations trstep Trust in the European Parliament trstprl Trust in country’s parliament stfhlth State of health services in country nowadays stfedu State of education in country nowadays stfeco How satisfied with present state of economy in country stfgov How satisfied with the national government stfdem How satisfied with the way democracy works in country pltinvt Politicians interested in votes rather than peoples opinions pltcare Politicians in general care what people like respondent think trstplt Trust in politicians imsmetn Allow many/few immigrants of same race/ethnic group as majority imdfetn Allow many/few immigrants of different race/ethnic group from majority eimrcnt Allow many/few immigrants from richer countries in Europe eimpcnt Allow many/few immigrants from poorer countries in Europe imrcntr Allow many/few immigrants from richer countries outside Europe impcntr Allow many/few immigrants from poorer countries outside Europe qfimchr Qualification for immigration: christian background qfimwht Qualification for immigration: be white imwgdwn Average wages/salaries generally brought down by immigrants imhecop Immigrants harm economic prospects of the poor more than the rich imtcjob Immigrants take jobs away in country or create new jobs imbleco Taxes and services: immigrants take out more than they put in or less imbgeco Immigration bad or good for country’s economy imueclt Country’s cultural life undermined or enriched by immigrants imwbcnt Immigrants make country worse or better place to live imwbcrm Immigrants make country’s crime problems worse or better imrsprc Richer countries should be responsible for accepting people from poorer countries pplstrd Better for a country if almost everyone share customs and traditions vrtrlg Better for a country if a variety of different religions shrrfg Country has more than its fair share of people applying refugee status rfgawrk People applying refugee status allowed to work while cases considered gvrfgap Government should be generous judging applications for refugee status rfgfrpc Most refugee applicants not in real fear of persecution own countries rfggvfn Financial support to refugee applicants while cases considered rfgbfml Granted refugees should be entitled to bring close family members gndr Gender yrbrn Year of birth edulvl Highest level of education eduyrs Years of full-time education completed polintr How interested in politics lrscale Placement on left right scale One thing you might notice when inspecting the ess data is that most of the variables are stored as labelled vectors. When loading SPSS data, haven will use these labelled vectors to preserve the metadata associated with SPSS scale variables (i.e., variable labels and value labels). While it’s good to have this metadata available, we want to analyze these items as numeric variables and factors, so the value labels are only going to make our lives harder. Thankfully, the labelled package contains many routines for manipulating labelled vectors. We’ll deal with the numeric variables in just a bit, but our first task will be to covert grouping variables to factors. 4.3.2 Convert the cntry, gndr, edulvl, and polintr variables into factors. Use the as_factor() function to do the conversion. Convert edulvl and polintr to ordered factors. Click to see code library(dplyr) ess <- mutate(ess, country = as_factor(cntry), sex = as_factor(gndr), edulvl = as_factor(edulvl, ordered = TRUE), polintr = as_factor(polintr, ordered = TRUE) ) The ess dataset contains much more information than Kestilä (2006) used. Kestilä only analyzed data from the following ten countries: Austria Belgium Denmark Finland France Germany Italy Netherlands Norway Sweden So, our next task is to subset the data to only the relevant population. When we apply logical subsetting, we can select rows from a dataset based on logical conditions. In this case, we want to select only rows from the 10 countries listed above. 4.3.3 Subset the data to include only the 10 countries analyzed by Kestilä (2006). Inspect the subsetted data to check that everything went well. Hints: Use the %in% operator to create a logical vector that indicates which elements of the cntry variable are in the set of target counties. Use the droplevels() levels function to clean up empty factor levels. Click to show code ## Create a character vector naming the target countries: targets <- c("Austria", "Belgium", "Denmark", "Finland", "France", "Germany", "Italy", "Netherlands", "Norway", "Sweden") ## Select only those rows that come from a target country: ess <- filter(ess, country %in% targets) %>% # Subset rows droplevels() # Drop empty factor levels ## Inspect the result: dim(ess) ## [1] 19690 52 table(ess$country) ## ## Austria Belgium Germany Denmark Finland France ## 2257 1899 2919 1506 2000 1503 ## Italy Netherlands Norway Sweden ## 1207 2364 2036 1999 In keeping with common practice, we will treat ordinal Likert-type rating scales with five or more levels as continuous. Since some R routines will treat labelled vectors as discrete variables, we can make things easier for ourselves by converting all the labelled vectors in our data to numeric vectors. We can use the labelled::remove_val_labels() function to strip the value labels and convert all of the labelled vectors to numeric vectors. 4.3.4 Convert the remaining labelled vectors to numeric vectors. Click to see code ## If necessary, install the labelled package: # install.packages("labelled", repos = "https://cloud.r-project.org") ## Load the labelled package: library(labelled) ## Strip the value labels: ess <- remove_val_labels(ess) ## Check the effects: str(ess) ## tibble [19,690 × 52] (S3: tbl_df/tbl/data.frame) ## $ name : chr [1:19690] "ESS1e06_1" "ESS1e06_1" "ESS1e06_1" "ESS1e06_1" ... ## ..- attr(*, "label")= chr "Title of dataset" ## ..- attr(*, "format.spss")= chr "A9" ## ..- attr(*, "display_width")= int 14 ## $ essround: num [1:19690] 1 1 1 1 1 1 1 1 1 1 ... ## ..- attr(*, "label")= chr "ESS round" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 10 ## $ edition : chr [1:19690] "6.1" "6.1" "6.1" "6.1" ... ## ..- attr(*, "label")= chr "Edition" ## ..- attr(*, "format.spss")= chr "A3" ## ..- attr(*, "display_width")= int 9 ## $ proddate: chr [1:19690] "03.10.2008" "03.10.2008" "03.10.2008" "03.10.2008" ... ## ..- attr(*, "label")= chr "Production date" ## ..- attr(*, "format.spss")= chr "A10" ## ..- attr(*, "display_width")= int 12 ## $ cntry : num [1:19690] 1 18 1 1 18 1 2 18 1 18 ... ## ..- attr(*, "label")= chr "Country" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 7 ## $ idno : num [1:19690] 1 1 2 3 3 4 4 4 6 6 ... ## ..- attr(*, "label")= chr "Respondent's identification number" ## ..- attr(*, "format.spss")= chr "F9.0" ## ..- attr(*, "display_width")= int 11 ## $ trstlgl : num [1:19690] 10 6 8 4 8 10 9 7 7 7 ... ## ..- attr(*, "label")= chr "Trust in the legal system" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ trstplc : num [1:19690] 10 8 5 8 8 9 8 9 4 9 ... ## ..- attr(*, "label")= chr "Trust in the police" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ trstun : num [1:19690] 9 8 6 NA 5 8 NA 7 5 7 ... ## ..- attr(*, "label")= chr "Trust in the United Nations" ## ..- attr(*, "format.spss")= chr "F2.0" ## $ trstep : num [1:19690] NA 3 0 7 3 7 0 3 4 6 ... ## ..- attr(*, "label")= chr "Trust in the European Parliament" ## ..- attr(*, "format.spss")= chr "F2.0" ## $ trstprl : num [1:19690] 9 7 0 6 8 8 10 2 6 8 ... ## ..- attr(*, "label")= chr "Trust in country's parliament" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ stfhlth : num [1:19690] 10 4 0 7 6 8 NA 6 3 5 ... ## ..- attr(*, "label")= chr "State of health services in country nowadays" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ stfedu : num [1:19690] 8 7 7 5 8 7 NA 7 6 7 ... ## ..- attr(*, "label")= chr "State of education in country nowadays" ## ..- attr(*, "format.spss")= chr "F2.0" ## $ stfeco : num [1:19690] 7 6 0 7 8 6 NA 9 8 9 ... ## ..- attr(*, "label")= chr "How satisfied with present state of economy in country" ## ..- attr(*, "format.spss")= chr "F2.0" ## $ stfgov : num [1:19690] 7 7 0 7 6 3 NA 5 5 7 ... ## ..- attr(*, "label")= chr "How satisfied with the national government" ## ..- attr(*, "format.spss")= chr "F2.0" ## $ stfdem : num [1:19690] 8 5 5 5 7 7 NA 7 7 9 ... ## ..- attr(*, "label")= chr "How satisfied with the way democracy works in country" ## ..- attr(*, "format.spss")= chr "F2.0" ## $ pltinvt : num [1:19690] 1 3 1 1 4 1 1 3 2 3 ... ## ..- attr(*, "label")= chr "Politicians interested in votes rather than peoples opinions" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ pltcare : num [1:19690] 1 4 1 1 4 3 2 5 2 3 ... ## ..- attr(*, "label")= chr "Politicians in general care what people like respondent think" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ trstplt : num [1:19690] 0 5 0 2 5 4 8 2 4 6 ... ## ..- attr(*, "label")= chr "Trust in politicians" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ imsmetn : num [1:19690] 4 3 2 3 2 1 NA 2 NA 1 ... ## ..- attr(*, "label")= chr "Allow many/few immigrants of same race/ethnic group as majority" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ imdfetn : num [1:19690] 3 3 2 3 2 2 NA 2 NA 1 ... ## ..- attr(*, "label")= chr "Allow many/few immigrants of different race/ethnic group from majority" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ eimrcnt : num [1:19690] 4 2 2 2 3 1 NA 2 NA 1 ... ## ..- attr(*, "label")= chr "Allow many/few immigrants from richer countries in Europe" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ eimpcnt : num [1:19690] 3 2 2 2 2 2 NA 2 NA 1 ... ## ..- attr(*, "label")= chr "Allow many/few immigrants from poorer countries in Europe" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ imrcntr : num [1:19690] 3 3 2 2 2 1 NA 2 NA 2 ... ## ..- attr(*, "label")= chr "Allow many/few immigrants from richer countries outside Europe" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ impcntr : num [1:19690] 3 2 2 3 2 1 NA 2 NA 2 ... ## ..- attr(*, "label")= chr "Allow many/few immigrants from poorer countries outside Europe" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ qfimchr : num [1:19690] 4 2 0 6 2 0 99 0 1 2 ... ## ..- attr(*, "label")= chr "Qualification for immigration: christian background" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ qfimwht : num [1:19690] 1 0 0 0 0 0 99 0 0 1 ... ## ..- attr(*, "label")= chr "Qualification for immigration: be white" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ imwgdwn : num [1:19690] 3 4 2 2 3 3 NA 4 NA 4 ... ## ..- attr(*, "label")= chr "Average wages/salaries generally brought down by immigrants" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ imhecop : num [1:19690] 2 2 1 4 3 2 NA 3 NA 2 ... ## ..- attr(*, "label")= chr "Immigrants harm economic prospects of the poor more than the rich" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ imtcjob : num [1:19690] 7 5 6 5 7 10 NA 8 NA 4 ... ## ..- attr(*, "label")= chr "Immigrants take jobs away in country or create new jobs" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ imbleco : num [1:19690] 9 4 2 NA 3 10 NA 9 NA 6 ... ## ..- attr(*, "label")= chr "Taxes and services: immigrants take out more than they put in or less" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ imbgeco : num [1:19690] 4 3 10 7 5 10 NA 8 NA 5 ... ## ..- attr(*, "label")= chr "Immigration bad or good for country's economy" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ imueclt : num [1:19690] 9 4 10 5 4 10 NA 9 NA 3 ... ## ..- attr(*, "label")= chr "Country's cultural life undermined or enriched by immigrants" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ imwbcnt : num [1:19690] 7 3 5 5 5 10 NA 8 NA 5 ... ## ..- attr(*, "label")= chr "Immigrants make country worse or better place to live" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ imwbcrm : num [1:19690] 3 3 5 2 3 5 NA 5 NA 3 ... ## ..- attr(*, "label")= chr "Immigrants make country's crime problems worse or better" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ imrsprc : num [1:19690] 2 2 1 4 1 2 NA 1 1 3 ... ## ..- attr(*, "label")= chr "Richer countries should be responsible for accepting people from poorer countries" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ pplstrd : num [1:19690] 2 4 2 2 3 4 NA 4 4 2 ... ## ..- attr(*, "label")= chr "Better for a country if almost everyone share customs and traditions" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ vrtrlg : num [1:19690] 3 5 3 2 4 1 NA 4 2 3 ... ## ..- attr(*, "label")= chr "Better for a country if a variety of different religions" ## ..- attr(*, "format.spss")= chr "F1.0" ## $ shrrfg : num [1:19690] 3 2 1 1 3 3 NA 3 4 3 ... ## ..- attr(*, "label")= chr "Country has more than its fair share of people applying refugee status" ## ..- attr(*, "format.spss")= chr "F1.0" ## $ rfgawrk : num [1:19690] 2 2 1 2 2 2 NA 2 1 2 ... ## ..- attr(*, "label")= chr "People applying refugee status allowed to work while cases considered" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ gvrfgap : num [1:19690] 4 3 2 4 2 2 NA 3 2 4 ... ## ..- attr(*, "label")= chr "Government should be generous judging applications for refugee status" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ rfgfrpc : num [1:19690] 4 3 2 4 4 4 NA 4 3 4 ... ## ..- attr(*, "label")= chr "Most refugee applicants not in real fear of persecution own countries" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ rfggvfn : num [1:19690] 2 3 2 4 3 2 NA 2 2 2 ... ## ..- attr(*, "label")= chr "Financial support to refugee applicants while cases considered" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ rfgbfml : num [1:19690] 2 3 1 2 2 1 NA 4 2 3 ... ## ..- attr(*, "label")= chr "Granted refugees should be entitled to bring close family members" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 9 ## $ gndr : num [1:19690] 1 2 1 2 2 1 NA 2 2 1 ... ## ..- attr(*, "label")= chr "Gender" ## ..- attr(*, "format.spss")= chr "F1.0" ## ..- attr(*, "display_width")= int 6 ## $ yrbrn : num [1:19690] 1949 1978 1953 1940 1964 ... ## ..- attr(*, "label")= chr "Year of birth" ## ..- attr(*, "format.spss")= chr "F4.0" ## ..- attr(*, "display_width")= int 7 ## $ edulvl : Ord.factor w/ 7 levels "Not completed primary education"<..: NA 4 NA NA 4 NA NA 7 NA 6 ... ## $ eduyrs : num [1:19690] 11 16 14 9 12 18 NA 17 15 17 ... ## ..- attr(*, "label")= chr "Years of full-time education completed" ## ..- attr(*, "format.spss")= chr "F2.0" ## $ polintr : Ord.factor w/ 4 levels "Very interested"<..: 3 3 1 2 3 2 1 4 3 3 ... ## $ lrscale : num [1:19690] 6 7 6 5 8 5 NA 8 5 7 ... ## ..- attr(*, "label")= chr "Placement on left right scale" ## ..- attr(*, "format.spss")= chr "F2.0" ## ..- attr(*, "display_width")= int 9 ## $ country : Factor w/ 10 levels "Austria","Belgium",..: 1 9 1 1 9 1 2 9 1 9 ... ## $ sex : Factor w/ 2 levels "Male","Female": 1 2 1 2 2 1 NA 2 2 1 ... descriptives(ess) Click for explanation Note that the numeric variables are now simple numeric vectors, but the variable labels have been retained as column attributes (which is probably useful). If we want to completely nuke the labelling information, we can use the labelled::remove_labels() function to do so. In addition to screening with summary statistics, we can also visualize the variables’ distributions. You have already created a few such visualizations for single variables. Now, we will use a few tricks to efficiently plot each of our target variables. The first step in this process will be to convert the interesting part of our data from “wide format” (one column per variable) into “long format” (one column of variable names, one column of data values). The pivot_longer() function from the tidyr package provides a convenient way to execute this conversion. 4.3.5 Use tidyr::pivot_longer() to create a long-formatted data frame from the target variables in ess. The target variables are all columns from trstlgl to rfgbfml. Click to show code ## Load the tidyr package: library(tidyr) ## Convert the target variables into a long-formatted data frame: ess_plot <- pivot_longer(ess, cols = trstlgl:rfgbfml, # Which columns to convert names_to = "variable", # Name for the new grouping variable values_to = "value") # Name for the column of stacked values The next step in the process will be to plot the variables using ggplot(). In the above code, I’ve named the new grouping variable variable and the new stacked data variable value. So, to create one plot for each (original, wide-format) variable, we will use the facet_wrap() function to facet the plots of value on the variable column (i.e., create a separate conditional plot of value for each unique value in variable). 4.3.6 Use ggplot() with an appropriate geom (e.g., geom_histogram(), geom_density(), geom_boxplot()) and facet_wrap() to visualize each of the target variables. Hint: To implement the faceting, simply add facet_wrap(~ variable, scales = \"free_x\") to the end of your ggplot() call (obviously, replacing “variable” with whatever you named the grouping variable in your pivot_longer() call). Click to show code library(ggplot2) ggplot(ess_plot, aes(x = value)) + geom_histogram() + # Create a histogram facet_wrap(~ variable, scales = "free_x") # Facet on 'variable' Click for explanation Notice that the variables are actually discrete (i.e., each variable takes only a few integer values). However, most variables look relatively normal despite being categorical. So, we’ll bend the rules a bit and analyze these variables as continuous. It also looks like there’s something weird going on with qfimchr and qfimwht. More on that below. 4.3.7 Check the descriptives for the target variables again. Do you see any remaining issues? Click to show code select(ess, trstlgl:rfgbfml) %>% descriptives() Click for explanation The variables qfimchr and qfimwht both contain values that fall outside the expected range for our survey responses: 77, 88, and 99. In SPSS, these were labeled as “Refusal” “Don’t know” and “No answer” respectively, and would not have contributed to the analysis. 4.3.8 Correct any remaining issues you found above. Click to show code ess <- ess %>% mutate(across(c(qfimchr, qfimwht), na_if, 77)) %>% mutate(across(c(qfimchr, qfimwht), na_if, 88)) %>% mutate(across(c(qfimchr, qfimwht), na_if, 99)) ## Check the results: select(ess, trstlgl:rfgbfml) %>% descriptives() Click to show explanation Here, we need to tell R that these values should be considered missing, or NA. Otherwise they will contribute the numeric value to the analysis, as though someone had provided an answer of 77 on a 10-point scale. We’ve done quite a bit of data processing, and we’ll continue to use these data for several future practicals, so it would be a good idea to save the processed dataset for later use. When saving data that you plan to analyze in R, you will usually want to use the R Data Set (RDS) format. Datasets saved in RDS format retain all of their attributes and formatting (e.g., factor are still factors, missing values are coded as NA, etc.). So, you don’t have to redo any data processing before future analyses. 4.3.9 Use the saveRDS() function to save the processed dataset. Click to show code ## Save the processed data: saveRDS(ess, "ess_round1.rds") Now, we’re ready to run the analyses and see if we can replicate the Kestilä (2006) results. 4.3.10 Run two principal component analyses (PCA): one for trust in politics, one for attitudes towards immigration. Use the principal() function from the psych package. Use exactly the same specifications as Kestilä (2006) concerning the estimation method, rotation, number of components extracted, etc. Hints: Remember that you can view the help file for psych::principal() by running ?psych::principal or, if the psych package already loaded, simply running ?principal. When you print the output from psych::principal(), you can use the cut option to hide any factor loadings smaller than a given threshold. You could consider hiding any loadings smaller than those reported by Kestilä (2006) to make the output easier to interpret. Click to show code Trust in politics Kestilä extracted three components with VARIMAX rotation. ## Load the psych package: library(psych) ## Run the PCA: pca_trust <- select(ess, trstlgl:trstplt) %>% principal(nfactors = 3, rotate = "varimax") ## Print the results: print(pca_trust, cut = 0.3, digits = 3) ## Principal Components Analysis ## Call: principal(r = ., nfactors = 3, rotate = "varimax") ## Standardized loadings (pattern matrix) based upon correlation matrix ## RC3 RC2 RC1 h2 u2 com ## trstlgl 0.779 0.669 0.331 1.21 ## trstplc 0.761 0.633 0.367 1.18 ## trstun 0.675 0.556 0.444 1.44 ## trstep 0.651 0.332 0.549 0.451 1.57 ## trstprl 0.569 0.489 0.650 0.350 2.49 ## stfhlth 0.745 0.567 0.433 1.04 ## stfedu 0.750 0.603 0.397 1.14 ## stfeco 0.711 0.300 0.616 0.384 1.44 ## stfgov 0.634 0.377 0.587 0.413 1.88 ## stfdem 0.369 0.568 0.325 0.564 0.436 2.38 ## pltinvt 0.817 0.695 0.305 1.08 ## pltcare 0.811 0.695 0.305 1.11 ## trstplt 0.510 0.611 0.716 0.284 2.40 ## ## RC3 RC2 RC1 ## SS loadings 2.942 2.668 2.490 ## Proportion Var 0.226 0.205 0.192 ## Cumulative Var 0.226 0.432 0.623 ## Proportion Explained 0.363 0.329 0.307 ## Cumulative Proportion 0.363 0.693 1.000 ## ## Mean item complexity = 1.6 ## Test of the hypothesis that 3 components are sufficient. ## ## The root mean square of the residuals (RMSR) is 0.07 ## with the empirical chi square 15240.94 with prob < 0 ## ## Fit based upon off diagonal values = 0.967 Attitudes toward immigration Kestilä extracted five components with VARIMAX rotation. pca_att <- select(ess, imsmetn:rfgbfml) %>% principal(nfactors = 5, rotate = "varimax") print(pca_att, cut = 0.3, digits = 3) ## Principal Components Analysis ## Call: principal(r = ., nfactors = 5, rotate = "varimax") ## Standardized loadings (pattern matrix) based upon correlation matrix ## RC2 RC1 RC5 RC3 RC4 h2 u2 com ## imsmetn 0.797 0.725 0.275 1.30 ## imdfetn 0.775 0.794 0.206 1.70 ## eimrcnt 0.827 0.715 0.285 1.09 ## eimpcnt 0.800 0.789 0.211 1.49 ## imrcntr 0.835 0.747 0.253 1.15 ## impcntr 0.777 0.782 0.218 1.63 ## qfimchr 0.813 0.688 0.312 1.08 ## qfimwht 0.752 0.637 0.363 1.26 ## imwgdwn 0.807 0.712 0.288 1.19 ## imhecop 0.747 0.669 0.331 1.42 ## imtcjob 0.569 0.334 0.484 0.516 1.99 ## imbleco 0.703 0.554 0.446 1.25 ## imbgeco 0.698 0.605 0.395 1.52 ## imueclt 0.568 -0.340 0.545 0.455 2.43 ## imwbcnt 0.673 0.633 0.367 1.87 ## imwbcrm 0.655 0.478 0.522 1.23 ## imrsprc 0.614 0.440 0.560 1.34 ## pplstrd 0.324 -0.551 0.468 0.532 2.11 ## vrtrlg -0.345 0.471 0.419 0.581 2.67 ## shrrfg 0.365 -0.352 0.418 0.582 4.16 ## rfgawrk 0.614 0.396 0.604 1.10 ## gvrfgap 0.691 0.559 0.441 1.35 ## rfgfrpc -0.387 0.327 0.673 3.34 ## rfggvfn 0.585 0.417 0.583 1.46 ## rfgbfml 0.596 0.460 0.540 1.61 ## ## RC2 RC1 RC5 RC3 RC4 ## SS loadings 4.374 3.393 2.774 2.199 1.723 ## Proportion Var 0.175 0.136 0.111 0.088 0.069 ## Cumulative Var 0.175 0.311 0.422 0.510 0.579 ## Proportion Explained 0.302 0.235 0.192 0.152 0.119 ## Cumulative Proportion 0.302 0.537 0.729 0.881 1.000 ## ## Mean item complexity = 1.7 ## Test of the hypothesis that 5 components are sufficient. ## ## The root mean square of the residuals (RMSR) is 0.05 ## with the empirical chi square 29496.06 with prob < 0 ## ## Fit based upon off diagonal values = 0.976 Feature engineering (i.e., creating new variables by combining and/or transforming existing variables) is one of the most common applications of PCA. PCA is a dimension reduction technique that distills the most salient information from a set of variables into a (smaller) set of component scores. Hence, PCA can be a good way of creating aggregate items (analogous to weighted scale scores) when the data are not collected with validated scales. Principal component scores are automatically generated when we run the PCA. If we want to use these scores in subsequent analyses (e.g., as predictors in a regression model), we usually add them to our dataset as additional columns. 4.3.11 Add the component scores produced by the analyses you ran above to the ess data frame. Give each component score an informative name, based on your interpretation of the factor loading matrix I.e., What hypothetical construct do you think each component represents given the items that load onto it? Hints: You can use the data.frame() function to join multiple objects into a single data frame. You can use the colnames() function to assign column names to a matrix or data frame. 1. Extract the component scores Click to show code ## Save the component scores in stand-alone matrices: trust_scores <- pca_trust$scores att_scores <- pca_att$scores ## Inspect the result: head(trust_scores) ## RC3 RC2 RC1 ## [1,] NA NA NA ## [2,] 0.09755193 -0.01552183 0.994954 ## [3,] 0.23069626 -1.53162604 -2.022642 ## [4,] NA NA NA ## [5,] -0.21112678 0.84370377 1.200007 ## [6,] 1.86596955 0.31083233 -1.062603 summary(trust_scores) ## RC3 RC2 RC1 ## Min. :-4.035 Min. :-3.706 Min. :-3.139 ## 1st Qu.:-0.527 1st Qu.:-0.652 1st Qu.:-0.649 ## Median : 0.155 Median : 0.094 Median : 0.092 ## Mean : 0.055 Mean : 0.015 Mean : 0.049 ## 3rd Qu.: 0.727 3rd Qu.: 0.742 3rd Qu.: 0.742 ## Max. : 3.302 Max. : 3.452 Max. : 3.539 ## NA's :4912 NA's :4912 NA's :4912 head(att_scores) ## RC2 RC1 RC5 RC3 RC4 ## [1,] 1.9873715 1.3233586 -0.8382499 -0.02172765 -0.0908143 ## [2,] 0.1692841 -1.2178436 -0.5016936 -0.21749066 0.6758844 ## [3,] -0.3630480 0.3260383 -1.5133423 -0.51405480 -2.2071787 ## [4,] NA NA NA NA NA ## [5,] -0.1137484 -0.7891232 -1.4732563 -0.05843873 0.4110692 ## [6,] -0.9195530 2.8231404 -0.3480398 -0.75699796 -1.3230602 summary(att_scores) ## RC2 RC1 RC5 RC3 ## Min. :-3.660 Min. :-3.929 Min. :-3.824 Min. :-2.764 ## 1st Qu.:-0.616 1st Qu.:-0.585 1st Qu.:-0.656 1st Qu.:-0.748 ## Median :-0.085 Median : 0.062 Median :-0.008 Median :-0.121 ## Mean :-0.013 Mean : 0.012 Mean : 0.021 Mean : 0.014 ## 3rd Qu.: 0.680 3rd Qu.: 0.654 3rd Qu.: 0.652 3rd Qu.: 0.698 ## Max. : 3.743 Max. : 4.584 Max. : 4.108 Max. : 4.084 ## NA's :5447 NA's :5447 NA's :5447 NA's :5447 ## RC4 ## Min. :-3.784 ## 1st Qu.:-0.683 ## Median : 0.046 ## Mean : 0.003 ## 3rd Qu.: 0.717 ## Max. : 3.254 ## NA's :5447 Click for explanation The object produced by psych::principal() is simply list, and the component scores are already stored therein. So, to extract the component scores, we simply use the $ operator to extract them. 2. Name the component scores Click to show code ## Check names (note the order): colnames(trust_scores) ## [1] "RC3" "RC2" "RC1" colnames(att_scores) ## [1] "RC2" "RC1" "RC5" "RC3" "RC4" ## Give informative names: colnames(trust_scores) <- c("Trust_Institutions", "Satisfaction", "Trust_Politicians") colnames(att_scores) <- c("Quantity", "Effects", "Refugees", "Diversity", "Economic") 3. Add the component scores to the dataset Click to show code # Add the component scores to the 'ess' data: ess <- data.frame(ess, trust_scores, att_scores) 4.3.12 Were you able to replicate the results of Kestilä (2006)? Click for explanation Yes, more-or-less. Although the exact estimates differ somewhat, the general pattern of factor loadings in Kestilä (2006) matches what we found here. End of At-Home Exercises "],["in-class-exercises-3.html", "4.4 In-Class Exercises", " 4.4 In-Class Exercises In these exercises, we will continue with our re-analysis/replication of the Kestilä (2006) results. Rather than attempting a direct replication, we will now redo the analysis using exploratory factor analysis (EFA). 4.4.1 Load the ess_round1.rds dataset. These are the data that we saved after the data processing in the At-Home Exercises. Click to show code ess <- readRDS("ess_round1.rds") 4.4.2 Kestilä (2006) claimed that running a PCA is a good way to test if the questions in the ESS measure attitudes towards immigration and trust in politics. Based on what you’ve learned from the readings and lectures, do you agree with this position? Click for explanation Hopefully not. PCA is not a method for estimating latent measurement structure; PCA is a dimension reduction technique that tries to summarize a set of data with a smaller set of component scores. If we really want to estimate the factor structure underlying a set of observed variables, we should use EFA. 4.4.3 Suppose you had to construct the trust in politics and attitude towards immigration scales described by Kestilä (2006) based on the theory and background information presented in that article. What type of analysis would you choose? What key factors would influence your decision? Click for explanation We are trying to estimate meaningful latent factors, so EFA would be an appropriate method. The theory presented by Kestilä (2006) did not hypothesize a particular number of factors, so we would need to use appropriate techniques to estimate the best number. In particular, combining information from: Scree plots Parallel analysis Substantive interpretability of the (rotated) factor loadings Since the factors are almost certainly correlated, we should apply an oblique rotation. We will now rerun the two PCAs that you conducted for the At-Home Exercises using EFA. We will estimate the EFA models using the psych::fa() function, but we need to know how many factors to extract. We could simply estimate a range of solutions and compare the results. We can restrict the range of plausible solutions and save some time by first checking/plotting the eigenvalues and running parallel analysis. 4.4.4 Estimate the number of latent factors underlying the Trust items based on the eigenvalues, the scree plot, and parallel analysis. How many factors are suggested by each method? 1. Eigenvalue estimation Click to show code ## Load the psych package: library(psych) ## Run a trivial EFA on the 'trust' items efa_trust0 <- select(ess, trstlgl:trstplt) %>% fa(nfactors = 1, rotate = "none") Click for explanation (EFA) First, we run a trivial EFA using the psych::fa() function to estimate the eigenvalues. We don’t care about the factors yet, so we can extract a single factor. We also don’t care about interpretable solutions, so we don’t need rotation. ## View the estimated eigenvalues: round(efa_trust0$values, digits = 3) ## [1] 4.980 0.716 0.482 0.165 0.069 0.014 -0.066 -0.092 -0.182 -0.207 ## [11] -0.284 -0.296 -0.319 Click for explanation (eigenvalue extraction) We can check the eigenvalues to see what proportion of the observed variance is accounted for by each additional factor we may extract. Since only one eigenvalue is greater than one, the so-called “Kaiser Criterion” would suggest extracting a single factor. The Kaiser Criterion is not a valid way to select the number of factors in EFA. So, we don’t want to rely on this information alone. We can still use the eigenvalues to help us with factor enumeration, though. One way to do so is by plotting the eigenvalues in a scree plot. 2. Scree plot Click to show code Given a vector of estimated eigenvalues, we can create a scree plot using ggplot() and the geom_line() or geom_path() geometry. library(ggplot2) library(magrittr) efa_trust0 %$% data.frame(y = values, x = 1:length(values)) %>% ggplot(aes(x, y)) + geom_line() + xlab("No. of Factors") + ylab("Eigenvalues") We can also use the psych::scree() function to create a scree plot directly from the data. select(ess, trstlgl:trstplt) %>% scree(pc = FALSE) Click for explanation (scree plot) Although the scree plot provides useful information, we need to interpret that information subjectively, and the conclusions are sometimes ambiguous, in this case. In this case, the plot seems to suggest either one or three components, depending on where we consider the “elbow” to lie. As recommended in the lecture, we can also use “parallel analysis” (Horn, 1965) to provide more objective information about the number of factors. We’ll use the psych::fa.parallel() function to implement parallel analysis. Parallel analysis relies on randomly simulated/permuted data, so we should set a seed to make sure our results are reproducible. We can set the fa = \"fa\" option to get only the results for EFA. 3. Parallel Analysis Click to show code ## Set the random number seed: set.seed(235711) ## Run the parallel analysis: pa_trust <- select(ess, trstlgl:trstplt) %>% fa.parallel(fa = "fa") ## Parallel analysis suggests that the number of factors = 6 and the number of components = NA Click for explanation The results of the parallel analysis suggest 6 factors. If you’ve been paying close attention, you may have noticed that we need to compute the eigenvalues from the original data to run parallel analysis. Hence, we don’t actually need to run a separate EFA to estimate the eigenvalues. ## View the eigenvalues estimated during the parallel analysis: pa_trust$fa.values ## [1] 4.97995262 0.71644127 0.48201040 0.16517645 0.06885820 0.01422241 ## [7] -0.06606777 -0.09225113 -0.18231333 -0.20740917 -0.28415857 -0.29573407 ## [13] -0.31877470 ## Compare to the version from the EFA: pa_trust$fa.values - efa_trust0$values ## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 ## Recreate the scree plot from above: pa_trust %$% data.frame(y = fa.values, x = 1:length(fa.values)) %>% ggplot(aes(x, y)) + geom_line() + xlab("No. of Factors") + ylab("Eigenvalues") Of course, we also see the same scree plot printed as part of the parallel analysis. So, there’s really no reason to create a separate scree plot, at all, if we’re doing parallel analysis. 4. Conclusion Click for explanation The different criteria disagree on how many factors we should extract, but we have narrowed the range. Based on the scree plot and parallel analysis, we should consider solutions for 3 to 6 factors. We need to examine the factor loadings to see which solution makes the most substantive sense. 4.4.5 Do the same analysis for the attitudes toward immigration items. Click to show code This time, we’ll start by running the parallel analysis and get the eigenvalues and scree plot from psych::fa.parallel(). ## Set the seed: set.seed(235711) ## Run parallel analysis on the 'attitudes' items: pa_att <- select(ess, imsmetn:rfgbfml) %>% fa.parallel(fa = "fa") ## Parallel analysis suggests that the number of factors = 7 and the number of components = NA ## Check the eigenvalues: round(pa_att$fa.values, digits = 3) ## [1] 7.895 1.449 0.734 0.533 0.313 0.156 0.121 0.019 -0.001 -0.064 ## [11] -0.083 -0.103 -0.119 -0.131 -0.150 -0.175 -0.185 -0.200 -0.212 -0.233 ## [21] -0.239 -0.247 -0.334 -0.422 -0.427 Click for explanation For the attitudes toward immigration analysis, the results are even more ambiguous than they were for the trust items. The Kaiser Criterion suggests 2 factors. The scree plot is hopelessly ambiguous. At least 3 factors? No more than 9 factors? Parallel analysis suggests 7 factors Based on the scree plot and parallel analysis, it seems reasonable to consider solutions for 3 to 7 factors. Again, we need to check the substantive interpretation to choose the most reasonable solution. To evaluate the substantive interpretability of the different solutions, we need to estimate the full EFA models for each candidate number of factors. We then compare the factor loadings across solutions to see which set of loadings define the most reasonable set of latent variables. 4.4.6 For the trust items, estimate the EFA models for each plausible number of components that you identified above. Use the psych::fa() function to estimate the models. You will need to specify a few key options. The data (including only the variables you want to analyze) The number of factors that you want to extract The rotation method The estimation method The method of estimating factor scores Hint: You can save yourself a lot of typing/copy-pasting (and the attendant chances of errors) by using a for() loop to iterate through numbers of factors. Click to show code ## Define an empty list to hold all of our fitted EFA objects: efa_trust <- list() ## Loop through the interesting numbers of factors and estimate an EFA for each: for(i in 3:6) efa_trust[[as.character(i)]] <- ess %>% select(trstlgl:trstplt) %>% fa(nfactors = i, # Number of factors = Loop index rotate = "promax", # Oblique rotation scores = "Bartlett") # Estimate factor scores with WLS 4.4.7 Repeat the above analysis for the attitudes items. Click to show code efa_att <- list() for(i in 3:7) efa_att[[as.character(i)]] <- ess %>% select(imsmetn:rfgbfml) %>% fa(nfactors = i, rotate = "promax", scores = "Bartlett") 4.4.8 Compare the factor loading matrices from the models estimated from the Trust items, and select the best solution. Hints: The factor loadings are stored in the loadings slot of the object returned by psych::fa(). Looping can also be useful here. Click to show code for(x in efa_trust) print(x$loadings) ## ## Loadings: ## MR3 MR2 MR1 ## trstlgl 0.839 -0.115 ## trstplc 0.763 -0.218 ## trstun 0.579 0.161 ## trstep 0.554 0.198 ## trstprl 0.444 0.342 ## stfhlth 0.656 -0.125 ## stfedu 0.695 -0.157 ## stfeco -0.102 0.704 0.146 ## stfgov 0.593 0.226 ## stfdem 0.183 0.476 0.150 ## pltinvt 0.813 ## pltcare 0.808 ## trstplt 0.330 0.526 ## ## MR3 MR2 MR1 ## SS loadings 2.299 2.016 1.970 ## Proportion Var 0.177 0.155 0.152 ## Cumulative Var 0.177 0.332 0.483 ## ## Loadings: ## MR2 MR1 MR4 MR3 ## trstlgl 0.797 ## trstplc 0.725 ## trstun 0.656 0.113 ## trstep 1.003 -0.175 ## trstprl 0.121 0.455 0.200 0.112 ## stfhlth 0.663 -0.106 ## stfedu 0.704 -0.110 0.100 ## stfeco 0.729 ## stfgov 0.631 0.175 -0.149 ## stfdem 0.501 0.107 0.115 ## pltinvt 0.855 ## pltcare -0.103 0.863 ## trstplt 0.479 0.340 ## ## MR2 MR1 MR4 MR3 ## SS loadings 2.161 1.952 1.722 1.239 ## Proportion Var 0.166 0.150 0.132 0.095 ## Cumulative Var 0.166 0.316 0.449 0.544 ## ## Loadings: ## MR1 MR4 MR5 MR3 MR2 ## trstlgl 0.935 ## trstplc 0.810 ## trstun 0.505 0.168 ## trstep -0.138 1.128 -0.108 -0.154 ## trstprl 0.359 0.250 0.140 0.201 -0.104 ## stfhlth 0.557 ## stfedu 0.752 ## stfeco 0.710 -0.118 0.172 ## stfgov 0.973 -0.132 ## stfdem 0.556 0.153 ## pltinvt 0.882 ## pltcare 0.855 ## trstplt 0.288 0.308 0.313 ## ## MR1 MR4 MR5 MR3 MR2 ## SS loadings 2.019 1.716 1.655 1.674 0.936 ## Proportion Var 0.155 0.132 0.127 0.129 0.072 ## Cumulative Var 0.155 0.287 0.415 0.543 0.615 ## ## Loadings: ## MR5 MR1 MR4 MR3 MR2 MR6 ## trstlgl 0.980 ## trstplc 0.655 ## trstun 0.911 ## trstep -0.116 0.739 0.163 ## trstprl 0.197 0.577 0.138 ## stfhlth 0.614 ## stfedu 0.771 ## stfeco 0.689 -0.123 0.144 ## stfgov 0.891 ## stfdem 0.513 0.144 ## pltinvt 0.816 ## pltcare 0.778 ## trstplt 0.706 0.193 ## ## MR5 MR1 MR4 MR3 MR2 MR6 ## SS loadings 1.606 1.417 1.442 1.327 1.014 0.879 ## Proportion Var 0.124 0.109 0.111 0.102 0.078 0.068 ## Cumulative Var 0.124 0.233 0.343 0.446 0.524 0.591 Click for explanation Note: Any factor loadings with magnitude lower than 0.1 are suppressed in above output. The factor loadings matrix indicates how strongly each latent factor (columns) associates with the observed items (rows). We can interpret these factor loadings in the same way that we would interpret regression coefficients (indeed, a factor analytic model can be viewed as a multivariate regression model wherein the latent factors are the predictors and the observed items are the outcomes). A higher factor loading indicates a stronger association between the item and factor linked by that loading. Items with high factor loadings are “good” indicators of the respective factors. Items with only very low loadings do not provide much information about any factor. You may want to exclude such items from your analysis. Note that the size of the factor loadings depends on the number of factors. So, you should only consider excluding an observed item after you have chosen the number of latent factors. When we print the loading matrix, we see additional information printed below the factor loadings. Proportion Var: What proportion of the items’ variance is explained by each of the factors. Cumulative Var: How much variance the factors explain, in total. If you estimated as many factors as items, then the Cumulative Var for the final factor would be 1.00 (i.e., 100%). 4.4.9 Compare the factor loading matrices from the models estimated from the Attitudes items, and select the best solution. Click to show code for(x in efa_att) print(x$loadings) ## ## Loadings: ## MR1 MR2 MR3 ## imsmetn 0.802 ## imdfetn 0.754 0.106 ## eimrcnt 0.843 ## eimpcnt 0.814 ## imrcntr 0.857 ## impcntr 0.769 ## qfimchr 0.235 0.858 ## qfimwht 0.132 0.719 ## imwgdwn 0.293 -0.181 ## imhecop 0.371 -0.162 ## imtcjob 0.619 ## imbleco 0.702 ## imbgeco 0.687 ## imueclt 0.561 -0.207 ## imwbcnt 0.732 ## imwbcrm 0.637 ## imrsprc -0.494 -0.125 ## pplstrd 0.249 -0.413 ## vrtrlg -0.275 0.240 ## shrrfg 0.514 -0.111 ## rfgawrk -0.386 ## gvrfgap -0.601 -0.148 ## rfgfrpc 0.432 ## rfggvfn -0.489 ## rfgbfml -0.545 ## ## MR1 MR2 MR3 ## SS loadings 4.819 3.950 1.683 ## Proportion Var 0.193 0.158 0.067 ## Cumulative Var 0.193 0.351 0.418 ## ## Loadings: ## MR2 MR4 MR1 MR3 ## imsmetn 0.788 ## imdfetn 0.731 0.153 0.110 ## eimrcnt 0.855 -0.143 ## eimpcnt 0.790 0.165 ## imrcntr 0.860 ## impcntr 0.743 0.182 ## qfimchr -0.122 0.853 ## qfimwht 0.723 ## imwgdwn 0.638 0.264 ## imhecop 0.680 0.217 ## imtcjob 0.633 0.136 ## imbleco 0.563 -0.212 0.153 ## imbgeco 0.604 -0.168 ## imueclt 0.392 -0.236 -0.168 ## imwbcnt 0.526 -0.282 ## imwbcrm 0.397 -0.292 ## imrsprc 0.616 ## pplstrd 0.231 -0.378 ## vrtrlg 0.279 0.264 ## shrrfg 0.299 -0.271 ## rfgawrk 0.452 ## gvrfgap 0.123 0.774 ## rfgfrpc 0.193 -0.281 ## rfggvfn 0.467 ## rfgbfml 0.619 ## ## MR2 MR4 MR1 MR3 ## SS loadings 3.828 2.778 2.570 1.602 ## Proportion Var 0.153 0.111 0.103 0.064 ## Cumulative Var 0.153 0.264 0.367 0.431 ## ## Loadings: ## MR2 MR1 MR5 MR3 MR4 ## imsmetn 0.792 ## imdfetn 0.728 0.169 0.113 ## eimrcnt 0.910 -0.150 -0.237 ## eimpcnt 0.779 0.126 0.213 ## imrcntr 0.910 -0.128 -0.187 ## impcntr 0.731 0.131 0.236 ## qfimchr 0.109 -0.156 0.882 ## qfimwht 0.139 0.736 ## imwgdwn 0.740 ## imhecop 0.700 ## imtcjob 0.543 0.124 0.182 ## imbleco 0.682 0.135 ## imbgeco 0.799 ## imueclt 0.572 -0.202 ## imwbcnt 0.712 ## imwbcrm 0.545 -0.124 ## imrsprc 0.620 ## pplstrd 0.207 -0.396 ## vrtrlg -0.198 0.151 0.285 0.116 ## shrrfg 0.208 -0.263 0.139 ## rfgawrk 0.457 ## gvrfgap 0.783 ## rfgfrpc -0.338 0.156 ## rfggvfn 0.477 ## rfgbfml -0.125 0.538 ## ## MR2 MR1 MR5 MR3 MR4 ## SS loadings 3.970 2.790 2.215 1.693 1.166 ## Proportion Var 0.159 0.112 0.089 0.068 0.047 ## Cumulative Var 0.159 0.270 0.359 0.427 0.473 ## ## Loadings: ## MR2 MR1 MR6 MR3 MR5 MR4 ## imsmetn 0.705 0.166 ## imdfetn 0.833 ## eimrcnt 0.249 0.859 ## eimpcnt 0.946 ## imrcntr 0.456 0.517 ## impcntr 0.951 ## qfimchr 0.134 -0.122 0.875 ## qfimwht 0.151 0.725 ## imwgdwn 0.748 ## imhecop 0.678 ## imtcjob 0.566 0.123 0.175 ## imbleco 0.753 0.144 ## imbgeco 0.822 ## imueclt 0.580 -0.201 ## imwbcnt 0.751 ## imwbcrm 0.597 ## imrsprc 0.146 0.527 ## pplstrd 0.204 -0.392 ## vrtrlg -0.204 0.143 0.281 0.115 ## shrrfg 0.198 -0.275 0.141 ## rfgawrk 0.517 ## gvrfgap 0.784 ## rfgfrpc -0.294 0.144 ## rfggvfn 0.512 ## rfgbfml 0.596 ## ## MR2 MR1 MR6 MR3 MR5 MR4 ## SS loadings 3.304 3.013 1.994 1.649 1.065 1.133 ## Proportion Var 0.132 0.121 0.080 0.066 0.043 0.045 ## Cumulative Var 0.132 0.253 0.332 0.398 0.441 0.486 ## ## Loadings: ## MR2 MR1 MR6 MR3 MR5 MR7 MR4 ## imsmetn 0.700 0.162 ## imdfetn 0.821 ## eimrcnt 0.245 0.879 ## eimpcnt 0.935 ## imrcntr 0.452 0.523 ## impcntr 0.938 ## qfimchr 0.751 ## qfimwht 0.720 ## imwgdwn 0.700 ## imhecop 0.172 0.624 ## imtcjob 0.574 -0.120 0.174 ## imbleco 0.679 0.108 ## imbgeco 0.832 -0.145 ## imueclt 0.531 -0.191 ## imwbcnt 0.649 0.138 ## imwbcrm 0.464 0.131 0.290 ## imrsprc 0.146 0.440 -0.100 ## pplstrd -0.274 0.392 ## vrtrlg -0.121 0.190 -0.297 0.115 ## shrrfg -0.124 0.437 0.131 ## rfgawrk 0.538 ## gvrfgap 0.616 -0.237 ## rfgfrpc -0.131 0.437 0.135 ## rfggvfn 0.504 ## rfgbfml 0.526 ## ## MR2 MR1 MR6 MR3 MR5 MR7 MR4 ## SS loadings 3.224 2.467 1.456 1.305 1.105 0.901 0.984 ## Proportion Var 0.129 0.099 0.058 0.052 0.044 0.036 0.039 ## Cumulative Var 0.129 0.228 0.286 0.338 0.382 0.418 0.458 It is very possible that you selected a different numbers of factors than Kestilä (2006). We need to keep these exercises consistent, though. So, the remaining questions will all assume you have extract three factors from the Trust items and five factors from the Attitudes items, to parallel the Kestilä (2006) results. ## Select the three-factor solution for 'trust': efa_trust <- efa_trust[["3"]] ## Select the five-factor solution for 'attitudes': efa_att <- efa_att[["5"]] 4.4.10 Give the factor scores meaningful names, and add the scores to the ess dataset as new columns. Hint: If you’re not sure of what do to, check 4.3.11. Click to show code ## Rename the factor scores: colnames(efa_trust$scores) <- c("trust_inst", "satisfy", "trust_pol") colnames(efa_att$scores) <- c("effects", "allowance", "refugees", "ethnic", "europe") ## Add factor scores to the dataset as new columns: ess <- data.frame(ess, efa_trust$scores, efa_att$scores) Kestilä (2006) used the component scores to descriptively evaluate country-level differences in Attitudes toward Immigration and Political Trust. So, now it’s time to replicate those analyses. 4.4.11 Repeat the Kestilä (2006) between-country comparison using the factor scores you created in 4.4.10 and an appropriate statistical test. Click to show code Here, we’ll only demonstrate a possible approach to analyzing one of the Trust dimensions. We can use a linear model to test whether the countries differ in average levels of Trust in Institutions (as quantified by the relevant factor score). ## Estimate the model: out <- lm(trust_inst ~ country, data = ess) ## View the regression-style summary: summary(out) ## ## Call: ## lm(formula = trust_inst ~ country, data = ess) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.2295 -0.6226 0.1171 0.7194 3.3061 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.09028 0.02445 -3.692 0.000224 *** ## countryBelgium -0.28923 0.03642 -7.942 2.12e-15 *** ## countryGermany -0.05966 0.03211 -1.858 0.063205 . ## countryDenmark 0.75509 0.03882 19.452 < 2e-16 *** ## countryFinland 0.59235 0.03439 17.224 < 2e-16 *** ## countryItaly 0.10991 0.04071 2.700 0.006939 ** ## countryNetherlands -0.05357 0.03379 -1.585 0.112893 ## countryNorway 0.36922 0.03493 10.570 < 2e-16 *** ## countrySweden 0.28560 0.03613 7.904 2.89e-15 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.029 on 14769 degrees of freedom ## (4912 observations deleted due to missingness) ## Multiple R-squared: 0.082, Adjusted R-squared: 0.0815 ## F-statistic: 164.9 on 8 and 14769 DF, p-value: < 2.2e-16 ## View the results as an ANOVA table: anova(out) ## Post-hoc tests out %>% aov() %>% TukeyHSD() ## Tukey multiple comparisons of means ## 95% family-wise confidence level ## ## Fit: aov(formula = .) ## ## $country ## diff lwr upr p adj ## Belgium-Austria -0.289225482 -0.40219224 -0.17625873 0.0000000 ## Germany-Austria -0.059655996 -0.15926604 0.03995405 0.6429963 ## Denmark-Austria 0.755089552 0.63466911 0.87551000 0.0000000 ## Finland-Austria 0.592348290 0.48565882 0.69903776 0.0000000 ## Italy-Austria 0.109910185 -0.01636587 0.23618624 0.1476635 ## Netherlands-Austria -0.053567808 -0.15838407 0.05124846 0.8131104 ## Norway-Austria 0.369224250 0.26085692 0.47759158 0.0000000 ## Sweden-Austria 0.285601197 0.17350905 0.39769334 0.0000000 ## Germany-Belgium 0.229569486 0.12386351 0.33527546 0.0000000 ## Denmark-Belgium 1.044315033 0.91880537 1.16982470 0.0000000 ## Finland-Belgium 0.881573772 0.76917165 0.99397589 0.0000000 ## Italy-Belgium 0.399135667 0.26799745 0.53027389 0.0000000 ## Netherlands-Belgium 0.235657673 0.12503199 0.34628336 0.0000000 ## Norway-Belgium 0.658449732 0.54445381 0.77244566 0.0000000 ## Sweden-Belgium 0.574826679 0.45728417 0.69236918 0.0000000 ## Denmark-Germany 0.814745547 0.70110863 0.92838247 0.0000000 ## Finland-Germany 0.652004286 0.55303505 0.75097352 0.0000000 ## Italy-Germany 0.169566181 0.04974170 0.28939066 0.0003895 ## Netherlands-Germany 0.006088188 -0.09085878 0.10303516 0.9999999 ## Norway-Germany 0.428880246 0.32810453 0.52965596 0.0000000 ## Sweden-Germany 0.345257193 0.24048642 0.45002796 0.0000000 ## Finland-Denmark -0.162741262 -0.28263218 -0.04285034 0.0008579 ## Italy-Denmark -0.645179366 -0.78279052 -0.50756821 0.0000000 ## Netherlands-Denmark -0.808657360 -0.92688442 -0.69043030 0.0000000 ## Norway-Denmark -0.385865301 -0.50725174 -0.26447886 0.0000000 ## Sweden-Denmark -0.469488354 -0.59421139 -0.34476531 0.0000000 ## Italy-Finland -0.482438105 -0.60820928 -0.35666693 0.0000000 ## Netherlands-Finland -0.645916098 -0.75012357 -0.54170862 0.0000000 ## Norway-Finland -0.223124040 -0.33090264 -0.11534544 0.0000000 ## Sweden-Finland -0.306747093 -0.41827017 -0.19522402 0.0000000 ## Netherlands-Italy -0.163477993 -0.28766412 -0.03929186 0.0014719 ## Norway-Italy 0.259314065 0.13211649 0.38651164 0.0000000 ## Sweden-Italy 0.175691012 0.04530545 0.30607657 0.0009794 ## Norway-Netherlands 0.422792059 0.31686740 0.52871671 0.0000000 ## Sweden-Netherlands 0.339169005 0.22943659 0.44890142 0.0000000 ## Sweden-Norway -0.083623053 -0.19675232 0.02950622 0.3462227 Click for explanation According to the omnibus F-test, average levels of Trust in Institutions significantly differ between countries, but this test cannot tell us between which countries the differences lie. Similarly, the t statistics associated with each dummy code in the regression-style summary only tell us if that country differs significantly from the reference country (i.e., Austria), but we cannot see, for example, if there is a significant difference in average trust levels between Belgium and the Netherlands. One way to test for differences between the individual countries would be a post hoc test of all pairwise comparisons. Since we’ll be doing 45 tests, we need to apply a correction for repeated testing. Above, we use the TukeyHSD() function to conduct all pairwise comparisons while applying Tukey’s HSD correction. The TukeyHSD() function only accepts models estimated with the aov() function, so we first pass our fitted lm object through aov(). The second part of the Kestilä (2006) analysis was to evaluate how socio-demographic characteristics affected attitudes towards immigrants and trust in politics among the Finnish electorate. Before we can replicate this part of the analysis, we need to subset the data to only the Finnish cases. 4.4.12 Create a new data frame that contains only the Finnish cases from ess. Hint: You can use logical indexing based on the country variable. Click to show code ess_finland <- filter(ess, country == "Finland") We still have one more step before we can estimate any models. We must prepare our variables for analysis. Our dependent variables will be the factor scores generated above. So, we do not need to apply any further processing. We have not yet used any of the independent variables, though. So, we should inspect those variables to see if they require any processing. In our processed ess data, the relevant variables have the following names: sex yrbrn eduyrs polintr lrscale 4.4.13 Inspect the independent variables listed above. Click to show code library(tidySEM) select(ess_finland, sex, yrbrn, eduyrs, polintr, lrscale) %>% descriptives() Click for explanation It looks like we still need some recoding. 4.4.14 Apply any necessary recoding/transformations. 1. Age Click to show code ess_finland <- mutate(ess_finland, age = 2002 - yrbrn) Click for explanation The data contain the participants’ years of birth instead of their age, but Kestilä analyzed age. Fortunately, we know that the data were collected in 2002, so we can simply subtract each participant’s value of yrbrn from the 2002 to compute their age. 2. Political Interest Click to show code First, we’ll transform polintr. ## Recode the four factor levels into two factor levels: ess_finland <- mutate(ess_finland, polintr_bin = recode_factor(polintr, "Not at all interested" = "Low Interest", "Hardly interested" = "Low Interest", "Quite interested" = "High Interest", "Very interested" = "High Interest") ) ## Check the conversion: with(ess_finland, table(old = polintr, new = polintr_bin, useNA = "always")) ## new ## old Low Interest High Interest <NA> ## Very interested 0 144 0 ## Quite interested 0 785 0 ## Hardly interested 842 0 0 ## Not at all interested 228 0 0 ## <NA> 0 0 1 Click for explanation Kestilä (2006) dichotomized polintr by combining the lowest two and highest two categories. So, we don’t actually want to convert the polint variable into a numeric, Likert-type variable. We want polint to be a binary factor. The recode_factor() function from dplyr() will automatically convert our result into a factor. As with the ess_round1.rds data, we will be coming back to this Finnish subsample data in future practical exercises. So, we should save our work by writing the processed dataset to disk. 4.4.15 Use the saveRDS() function to save the processed Finnish subsample data. Click to see code ## Save the processed Finnish data: saveRDS(ess_finland, "ess_finland.rds") Now, we’re finally ready to replicate the regression analysis from Kestilä (2006). Creating a single aggregate score by summing the individual component scores is a pretty silly thing to do, though. So, we won’t reproduce that aspect of the analysis. 4.4.16 Run a series of multiple linear regression analyses with the factor scores you created in 4.4.10 as the dependent variables and the same predictors used by Kestilä (2006). Do your results agree with those reported by Kestilä (2006)? Click to show code ## Predicting 'Trust in Institutions': out_trust_inst <- lm(trust_inst ~ sex + age + eduyrs + polintr_bin + lrscale, data = ess_finland) summary(out_trust_inst) ## ## Call: ## lm(formula = trust_inst ~ sex + age + eduyrs + polintr_bin + ## lrscale, data = ess_finland) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.9499 -0.5102 0.1337 0.6638 2.5919 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.057518 0.124294 -0.463 0.643595 ## sexFemale 0.004091 0.045170 0.091 0.927849 ## age -0.003071 0.001380 -2.225 0.026219 * ## eduyrs 0.023223 0.006388 3.635 0.000286 *** ## polintr_binHigh Interest 0.166860 0.046448 3.592 0.000337 *** ## lrscale 0.058951 0.011232 5.249 1.72e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.9321 on 1734 degrees of freedom ## (260 observations deleted due to missingness) ## Multiple R-squared: 0.04155, Adjusted R-squared: 0.03879 ## F-statistic: 15.03 on 5 and 1734 DF, p-value: 1.78e-14 ## Predicting 'Trust in Politicians': out_trust_pol <- lm(trust_pol ~ sex + age + eduyrs + polintr_bin + lrscale, data = ess_finland) summary(out_trust_pol) ## ## Call: ## lm(formula = trust_pol ~ sex + age + eduyrs + polintr_bin + lrscale, ## data = ess_finland) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.03673 -0.67306 0.05346 0.69666 2.38771 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.165989 0.126840 -1.309 0.19083 ## sexFemale 0.015572 0.046095 0.338 0.73554 ## age -0.009112 0.001409 -6.469 1.28e-10 *** ## eduyrs 0.018476 0.006519 2.834 0.00465 ** ## polintr_binHigh Interest 0.463763 0.047399 9.784 < 2e-16 *** ## lrscale 0.054932 0.011462 4.793 1.79e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.9512 on 1734 degrees of freedom ## (260 observations deleted due to missingness) ## Multiple R-squared: 0.09806, Adjusted R-squared: 0.09546 ## F-statistic: 37.71 on 5 and 1734 DF, p-value: < 2.2e-16 ## Predicting 'Attitudes toward Refugees': out_refugees <- lm(refugees ~ sex + age + eduyrs + polintr_bin + lrscale, data = ess_finland) summary(out_refugees) ## ## Call: ## lm(formula = refugees ~ sex + age + eduyrs + polintr_bin + lrscale, ## data = ess_finland) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.9118 -0.6860 -0.0594 0.6904 4.1044 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -1.690e-01 1.438e-01 -1.175 0.240080 ## sexFemale -4.828e-01 5.181e-02 -9.318 < 2e-16 *** ## age 2.903e-05 1.604e-03 0.018 0.985561 ## eduyrs -2.537e-02 7.459e-03 -3.401 0.000688 *** ## polintr_binHigh Interest -2.131e-01 5.345e-02 -3.986 6.99e-05 *** ## lrscale 9.359e-02 1.296e-02 7.223 7.65e-13 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.06 on 1699 degrees of freedom ## (295 observations deleted due to missingness) ## Multiple R-squared: 0.09535, Adjusted R-squared: 0.09269 ## F-statistic: 35.81 on 5 and 1699 DF, p-value: < 2.2e-16 That does it for our replication of the Kesilä (2006) analyses, but we still have one more topic to consider in this practical. One of the most common applications of EFA is scale development. Given a pool of items without a known factor structure, we try to estimate the underlying latent factors that define the (sub)scales represented by our items. In such applications, we use the factor loading matrix for our optimal solution to make “bright-line” assignments of items to putative factors according to the simple structure represented by the estimated factor loading matrix. In other words, we disregard small factor loadings and assign observed items to only the single latent factor upon which they load most strongly. We then hypothesize that those items are true indicators of that latent factor. We can use confirmatory factor analysis (which you will learn about next week) to test rigorously this hypothesis, but we can already get started by estimating the internal consistency (a type of reliability) of the hypothesized subscales. 4.4.17 Estimate the internal consistency of the three Trust subscales and five Attitudes subscales implied by your EFA solutions from above. Use Cronbach’s Alpha to quantify internal consistency. Use the alpha() function from the psych package to conduct the analysis. Run your analysis on the full ess dataset, not the Finnish subset. Are the subscales implied by your EFA reliable, in the sense of good internal consistency? Note that \\(\\alpha > 0.7\\) is generally considered acceptable, and \\(\\alpha > 0.8\\) is usually considered good. Click to show code ## Run the reliability analysis on the subscale data: ( out <- select(ess, starts_with("stf")) %>% psych::alpha() ) ## ## Reliability analysis ## Call: psych::alpha(x = .) ## ## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r ## 0.79 0.79 0.77 0.44 3.9 0.0023 5.4 1.7 0.41 ## ## 95% confidence boundaries ## lower alpha upper ## Feldt 0.79 0.79 0.8 ## Duhachek 0.79 0.79 0.8 ## ## Reliability if an item is dropped: ## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r ## stfhlth 0.78 0.78 0.73 0.47 3.5 0.0026 0.0064 0.46 ## stfedu 0.76 0.76 0.72 0.45 3.2 0.0028 0.0109 0.44 ## stfeco 0.74 0.74 0.70 0.41 2.8 0.0031 0.0069 0.39 ## stfgov 0.74 0.74 0.69 0.42 2.9 0.0030 0.0035 0.41 ## stfdem 0.75 0.75 0.71 0.43 3.0 0.0029 0.0074 0.40 ## ## Item statistics ## n raw.r std.r r.cor r.drop mean sd ## stfhlth 19481 0.69 0.69 0.56 0.50 5.8 2.3 ## stfedu 18844 0.73 0.73 0.62 0.55 5.9 2.3 ## stfeco 19211 0.78 0.78 0.70 0.63 5.0 2.4 ## stfgov 19106 0.77 0.76 0.69 0.61 4.5 2.3 ## stfdem 19106 0.75 0.75 0.67 0.59 5.7 2.3 Click for explanation Here, we estimate the reliability of the Satisfaction subscale from the Trust analysis. According to our EFA, the Satisfaction subscale should be indicated by the following five variables: stfeco stfgov stfdem stfedu stfhlth We select these variables using the tidy-select function starts_with() to extract all variables beginning with the three characters “stf”. To estimate the internal consistency of this subscale, we simply provide a data frame containing only the subscale data to the alpha() function. The raw_alpha value is the estimate of Cronbach’s Alpha. In this case \\(\\alpha = 0.794\\), so the subscale is pretty reliable. The table labeled “Reliability if an item is dropped” shows what Cronbach’s Alpha would be if each item were excluded from the scale. If this value is notably higher than the raw_alpha value, it could indicate a bad item. Note that reliability is only one aspect of scale quality, though. So, you shouldn’t throw out items just because they perform poorly in reliability analysis. End of In-Class Exercises "],["cfa.html", "5 CFA", " 5 CFA This week, we will introduce confirmatory factor analysis (CFA) and discuss how it differs from EFA. Furthermore, we will revisit the idea of model fit and introduce into the R-package lavaan. Homework before the lecture Watch the Lecture Recording for this week. Complete the Reading for this week, and answer the associated reading questions. Homework before the practical Complete the At-Home Exercises. Practical content During the practical you will work on the In-Class Exercises. "],["lecture-4.html", "5.1 Lecture", " 5.1 Lecture Often, we work with scales that have a validated or hypothesized factor structure. In the former case, the scale structure has been validated through previous psychometric studies. In the latter case, we may have conducted an EFA to estimate the factor structure on prior data, or theory/intuition may suggest a plausible structure. Regardless of how we come to expect a given factor structure, such situations represent confirmatory modeling problems, because we are attempting to empirically confirm an a priori expectation. Hence, exploratory methods like EFA are not appropriate, and we should employ confirmatory modeling techniques. This week we consider one such technique: confirmatory factor analysis (CFA). As the name suggests, CFA is related to the EFA methods we discussed last week in that both methods are flavors of factor analysis. However, the two methods address fundamentally different research questions. Rather than attempting to estimate an unknown factor structure (as in EFA), we now want to compare a hypothesized measurement model (i.e., factor structure) to observed data in order to evaluate the model’s plausibility. 5.1.1 Recording Note: When Caspar discusses the complexity of the second-order CFA model, it’s easy to misunderstand his statements. We need to be careful not to over-generalize. In general, a second-order CFA is not more complex than a first-order CFA. Actually, in most practical applications, the opposite is true. A second-order CFA is more complex than a first-order CFA, when the factors in the first-order CFA are uncorrelated. This is the situation Caspar references in the recording when claiming that the second-order model is more complex. We hardly ever want to fit such first-order CFA, though. The default CFA fully saturates the latent covariance structure. If the factors in the first-order CFA are fully correlated (according to standard practice), and we include a single second-order factor, the following statements hold. If the first-order CFA has more than three factors, the first-order model is more complex than the second-order model. If the first-order model has three or fewer factors, the first- and second-order models are equivalent (due to scaling constraints we need to impose to identify the second-order model). The second-order model cannot be more complex than the first-order model (assuming both models are correctly identified and no extra constraints are imposed). The above statements may not hold in more complex situations (e.g., more than one second-order factor, partially saturated first-order correlation structure, etc.). You can always identify the more complex model by calculating the degrees of freedom for both models. The model with fewer degrees of freedom is more complex. 5.1.2 Slides You can download the lecture slides here "],["reading-4.html", "5.2 Reading", " 5.2 Reading Reference Byrne, B. (2005). Factor analytic models: Viewing the structure of an assessment instrument from three perspectives, Journal of Personality Assessment, 85(1), 17–32. Questions What are the main differences between exploratory factor analysis (EFA) and confirmatory factor analysis (CFA)? In which circumstances should a researcher use EFA, and in which should they use CFA? What are the five main limitations of EFA that CFA overcomes? In which circumstances can a second order CFA model be useful? Consider the following four techniques: PCA, EFA, CFA, second order CFA. For each of the following three research situations, which of the above techniques would you use and why? A researcher has developed a new questionnaire that should measure personality and wants to know how many factors underlie the items in their new measure. A researcher is modeling data collected with a seven-item scale that has been used since the 1960s to measure authoritarianism. A researcher has recorded highest completed level of education, years of education, and highest level of education attempted for all respondents in a survey. The researcher wants to include some operationalization of the concept of ‘education’ in their model but is unsure of which observed variable to use. "],["at-home-exercises-4.html", "5.3 At-Home Exercises", " 5.3 At-Home Exercises This week, we will wrap up our re-analysis of the Kestilä (2006) results. During this practical, you will conduct a CFA of the Trust in Politics items and compare the results to those obtained from your previous EFA- and PCA-based replications of Kestilä (2006). 5.3.1 Load the ESS data. The relevant data are contained in the ess_round1.rds file. This file is in R Data Set (RDS) format. The dataset is already stored as a data frame with the processing and cleaning that you should have done for previous practicals completed. Click to show code ess <- readRDS("ess_round1.rds") Although you may have settled on any number of EFA solutions during the Week 4 In-Class Exercises, we are going to base the following CFA on a three-factor model of Trust in Politics similar to the original PCA results from Kestilä (2006). Note: Unless otherwise specified, all following questions refer to the Trust in Politics items. We will not consider the Attitudes toward Immigration items in these exercises. 5.3.2 Define the lavaan model syntax for the CFA implied by the three-factor EFA solution you found in the Week 4 In-Class Exercises. Covary the three latent factors. Do not specify any mean structure. Save this model syntax as an object in your environment. Click to show code mod_3f <- ' institutions =~ trstlgl + trstplc + trstun + trstep + trstprl satisfaction =~ stfhlth + stfedu + stfeco + stfgov + stfdem politicians =~ pltinvt + pltcare + trstplt ' Click for explanation We don’t have to specify the latent covariances in the model syntax, we can tell lavaan to estimate all latent covariances when we fit the model. 5.3.3 Estimate the CFA model you defined above, and summarize the results. Use the lavaan::cfa() function to estimate the model. Use the default settings for the cfa() function. Request the model fit statistics with the summary by supplying the fit.measures = TRUE argument to summary(). Request the standardized parameter estimates with the summary by supplying the standardized = TRUE argument to summary(). Check the results, and answer the following questions: Does the model fit the data well? How are the latent variances and covariances specified when using the default settings? How is the model identified when using the default settings? Click the code ## Load the lavaan package: library(lavaan) ## Estimate the CFA model: fit_3f <- cfa(mod_3f, data = ess) ## Summarize the fitted model: summary(fit_3f, fit.measures = TRUE, standardized = TRUE) ## lavaan 0.6.16 ended normally after 46 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 29 ## ## Used Total ## Number of observations 14778 19690 ## ## Model Test User Model: ## ## Test statistic 10652.207 ## Degrees of freedom 62 ## P-value (Chi-square) 0.000 ## ## Model Test Baseline Model: ## ## Test statistic 81699.096 ## Degrees of freedom 78 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.870 ## Tucker-Lewis Index (TLI) 0.837 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -371404.658 ## Loglikelihood unrestricted model (H1) -366078.555 ## ## Akaike (AIC) 742867.317 ## Bayesian (BIC) 743087.743 ## Sample-size adjusted Bayesian (SABIC) 742995.583 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.108 ## 90 Percent confidence interval - lower 0.106 ## 90 Percent confidence interval - upper 0.109 ## P-value H_0: RMSEA <= 0.050 0.000 ## P-value H_0: RMSEA >= 0.080 1.000 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.059 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## institutions =~ ## trstlgl 1.000 1.613 0.677 ## trstplc 0.770 0.012 61.866 0.000 1.241 0.567 ## trstun 0.929 0.013 69.227 0.000 1.498 0.642 ## trstep 0.908 0.013 70.929 0.000 1.464 0.660 ## trstprl 1.139 0.014 84.084 0.000 1.837 0.809 ## satisfaction =~ ## stfhlth 1.000 1.173 0.521 ## stfedu 1.106 0.022 50.840 0.000 1.297 0.577 ## stfeco 1.415 0.025 57.214 0.000 1.659 0.713 ## stfgov 1.480 0.025 58.764 0.000 1.736 0.756 ## stfdem 1.384 0.024 57.904 0.000 1.623 0.731 ## politicians =~ ## pltinvt 1.000 0.646 0.613 ## pltcare 1.021 0.016 62.862 0.000 0.660 0.628 ## trstplt 3.012 0.039 76.838 0.000 1.946 0.891 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## institutions ~~ ## satisfaction 1.391 0.032 43.206 0.000 0.736 0.736 ## politicians 0.909 0.018 49.934 0.000 0.872 0.872 ## satisfaction ~~ ## politicians 0.539 0.013 41.053 0.000 0.711 0.711 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .trstlgl 3.068 0.041 75.262 0.000 3.068 0.541 ## .trstplc 3.248 0.041 80.037 0.000 3.248 0.678 ## .trstun 3.197 0.041 77.141 0.000 3.197 0.588 ## .trstep 2.776 0.036 76.243 0.000 2.776 0.564 ## .trstprl 1.776 0.029 61.361 0.000 1.776 0.345 ## .stfhlth 3.695 0.046 79.989 0.000 3.695 0.729 ## .stfedu 3.368 0.043 77.916 0.000 3.368 0.667 ## .stfeco 2.656 0.038 69.070 0.000 2.656 0.491 ## .stfgov 2.264 0.035 64.201 0.000 2.264 0.429 ## .stfdem 2.289 0.034 67.172 0.000 2.289 0.465 ## .pltinvt 0.694 0.009 78.255 0.000 0.694 0.624 ## .pltcare 0.668 0.009 77.562 0.000 0.668 0.605 ## .trstplt 0.978 0.028 34.461 0.000 0.978 0.205 ## institutions 2.601 0.059 44.198 0.000 1.000 1.000 ## satisfaction 1.375 0.044 31.407 0.000 1.000 1.000 ## politicians 0.417 0.011 38.843 0.000 1.000 1.000 Click for explanation No, the model does not seem to fit the data well. The SRMR looks good, but one good looking fit statistic is not enough. The RMSEA, TLI, and CFI are all in the “unacceptable” range. The \\(\\chi^2\\) is highly significant, but we don’t care. The cfa() function is just a wrapper for the lavaan() function with several options set at the defaults you would want for a standard CFA. By default: All latent variances and covariances are freely estimated (due to the argument auto.cov.lv.x = TRUE) The model is identified by fixing the first factor loading of each factor to 1 (due to the argument auto.fix.first = TRUE) To see a full list of the (many) options you can specify to tweak the behavior of lavaan estimation functions run ?lavOptions. Now, we will consider a couple of alternative factor structures for the Trust in Politics CFA. First, we will go extremely simple by estimating a one-factor model wherein all Trust items are explained by a single latent variable. 5.3.4 Define the lavaan model syntax for a one-factor model of the Trust items. Save this syntax as an object in your environment. Click to show code mod_1f <- ' political_trust =~ trstlgl + trstplc + trstun + trstep + trstprl + stfhlth + stfedu + stfeco + stfgov + stfdem + pltinvt + pltcare + trstplt ' 5.3.5 Estimate the one-factor model, and summarize the results. Does this model appear to fit better or worse than the three-factor model? Note: You can use the lavaan::fitMeasures() function to extract only the model fit information from a fitted lavaan object. Click to show code ## Estimate the one factor model: fit_1f <- cfa(mod_1f, data = ess) ## Summarize the results: summary(fit_1f, fit.measures = TRUE) ## lavaan 0.6.16 ended normally after 33 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 26 ## ## Used Total ## Number of observations 14778 19690 ## ## Model Test User Model: ## ## Test statistic 17667.304 ## Degrees of freedom 65 ## P-value (Chi-square) 0.000 ## ## Model Test Baseline Model: ## ## Test statistic 81699.096 ## Degrees of freedom 78 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.784 ## Tucker-Lewis Index (TLI) 0.741 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -374912.206 ## Loglikelihood unrestricted model (H1) -366078.555 ## ## Akaike (AIC) 749876.413 ## Bayesian (BIC) 750074.036 ## Sample-size adjusted Bayesian (SABIC) 749991.410 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.135 ## 90 Percent confidence interval - lower 0.134 ## 90 Percent confidence interval - upper 0.137 ## P-value H_0: RMSEA <= 0.050 0.000 ## P-value H_0: RMSEA >= 0.080 1.000 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.080 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## political_trust =~ ## trstlgl 1.000 ## trstplc 0.774 0.013 57.949 0.000 ## trstun 0.930 0.014 64.200 0.000 ## trstep 0.909 0.014 65.679 0.000 ## trstprl 1.182 0.015 79.401 0.000 ## stfhlth 0.615 0.013 45.947 0.000 ## stfedu 0.695 0.014 51.424 0.000 ## stfeco 0.895 0.014 62.316 0.000 ## stfgov 0.985 0.014 68.200 0.000 ## stfdem 0.998 0.014 70.899 0.000 ## pltinvt 0.382 0.006 59.215 0.000 ## pltcare 0.396 0.006 61.195 0.000 ## trstplt 1.183 0.014 81.716 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .trstlgl 3.370 0.042 79.787 0.000 ## .trstplc 3.410 0.041 82.311 0.000 ## .trstun 3.451 0.043 80.749 0.000 ## .trstep 3.019 0.038 80.272 0.000 ## .trstprl 1.938 0.027 70.878 0.000 ## .stfhlth 4.201 0.050 84.093 0.000 ## .stfedu 3.941 0.047 83.419 0.000 ## .stfeco 3.565 0.044 81.289 0.000 ## .stfgov 3.044 0.038 79.326 0.000 ## .stfdem 2.631 0.034 78.072 0.000 ## .pltinvt 0.775 0.009 82.043 0.000 ## .pltcare 0.743 0.009 81.579 0.000 ## .trstplt 1.548 0.023 67.052 0.000 ## political_trst 2.299 0.055 41.569 0.000 ## Compare fit statistics: fitMeasures(fit_3f) ## npar fmin chisq ## 29.000 0.360 10652.207 ## df pvalue baseline.chisq ## 62.000 0.000 81699.096 ## baseline.df baseline.pvalue cfi ## 78.000 0.000 0.870 ## tli nnfi rfi ## 0.837 0.837 0.836 ## nfi pnfi ifi ## 0.870 0.691 0.870 ## rni logl unrestricted.logl ## 0.870 -371404.658 -366078.555 ## aic bic ntotal ## 742867.317 743087.743 14778.000 ## bic2 rmsea rmsea.ci.lower ## 742995.583 0.108 0.106 ## rmsea.ci.upper rmsea.ci.level rmsea.pvalue ## 0.109 0.900 0.000 ## rmsea.close.h0 rmsea.notclose.pvalue rmsea.notclose.h0 ## 0.050 1.000 0.080 ## rmr rmr_nomean srmr ## 0.255 0.255 0.059 ## srmr_bentler srmr_bentler_nomean crmr ## 0.059 0.059 0.064 ## crmr_nomean srmr_mplus srmr_mplus_nomean ## 0.064 0.059 0.059 ## cn_05 cn_01 gfi ## 113.901 126.971 0.897 ## agfi pgfi mfi ## 0.849 0.611 0.699 ## ecvi ## 0.725 fitMeasures(fit_1f) ## npar fmin chisq ## 26.000 0.598 17667.304 ## df pvalue baseline.chisq ## 65.000 0.000 81699.096 ## baseline.df baseline.pvalue cfi ## 78.000 0.000 0.784 ## tli nnfi rfi ## 0.741 0.741 0.741 ## nfi pnfi ifi ## 0.784 0.653 0.784 ## rni logl unrestricted.logl ## 0.784 -374912.206 -366078.555 ## aic bic ntotal ## 749876.413 750074.036 14778.000 ## bic2 rmsea rmsea.ci.lower ## 749991.410 0.135 0.134 ## rmsea.ci.upper rmsea.ci.level rmsea.pvalue ## 0.137 0.900 0.000 ## rmsea.close.h0 rmsea.notclose.pvalue rmsea.notclose.h0 ## 0.050 1.000 0.080 ## rmr rmr_nomean srmr ## 0.364 0.364 0.080 ## srmr_bentler srmr_bentler_nomean crmr ## 0.080 0.080 0.087 ## crmr_nomean srmr_mplus srmr_mplus_nomean ## 0.087 0.080 0.080 ## cn_05 cn_01 gfi ## 71.949 79.980 0.825 ## agfi pgfi mfi ## 0.756 0.590 0.551 ## ecvi ## 1.199 Click for explanation The one-factor model definitely seems to fit worse than the three-factor model. A second order CFA model is another way of representing the latent structure underlying a set of items. As you read in Byrne (2005), however, the second order CFA is only appropriate in certain circumstances. 5.3.6 Given the CFA results above, would a second order CFA be appropriate for the Trust data? Why or why not? Click for explanation Yes, a second order CFA model is a theoretically appropriate representation of the Trust items. The first order latent variables in the three-factor model are all significantly correlated. The first order latent variables in the three-factor model seem to tap different aspects of some single underlying construct. 5.3.7 Define the lavaan model syntax for a second-order CFA model of the Trust items. Use the three factors defined in 5.3.2 as the first order factors. Click to show code mod_2nd <- ' institutions =~ trstlgl + trstplc + trstun + trstep + trstprl satisfaction =~ stfhlth + stfedu + stfeco + stfgov + stfdem politicians =~ pltinvt + pltcare + trstplt trust =~ politicians + satisfaction + institutions ' Click for explanation To define the second order factor, we use the same syntactic conventions that we employ to define a first order factor. The only differences is that the “indicators” of the second order factor (i.e., the variables listed on the RHS of the =~ operator) are previously defined first order latent variables. 5.3.8 Estimate the second order CFA model, and summarize the results. Does this model fit better or worse than the three-factor model? Is this model more or less complex than the three-factor model? What information can you use to quantify this difference in complexity? Click to show code fit_2nd <- cfa(mod_2nd, data = ess) summary(fit_2nd, fit.measures = TRUE, standardized = TRUE) ## lavaan 0.6.16 ended normally after 44 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 29 ## ## Used Total ## Number of observations 14778 19690 ## ## Model Test User Model: ## ## Test statistic 10652.207 ## Degrees of freedom 62 ## P-value (Chi-square) 0.000 ## ## Model Test Baseline Model: ## ## Test statistic 81699.096 ## Degrees of freedom 78 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.870 ## Tucker-Lewis Index (TLI) 0.837 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -371404.658 ## Loglikelihood unrestricted model (H1) -366078.555 ## ## Akaike (AIC) 742867.317 ## Bayesian (BIC) 743087.743 ## Sample-size adjusted Bayesian (SABIC) 742995.583 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.108 ## 90 Percent confidence interval - lower 0.106 ## 90 Percent confidence interval - upper 0.109 ## P-value H_0: RMSEA <= 0.050 0.000 ## P-value H_0: RMSEA >= 0.080 1.000 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.059 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## institutions =~ ## trstlgl 1.000 1.613 0.677 ## trstplc 0.770 0.012 61.866 0.000 1.241 0.567 ## trstun 0.929 0.013 69.227 0.000 1.498 0.642 ## trstep 0.908 0.013 70.929 0.000 1.464 0.660 ## trstprl 1.139 0.014 84.084 0.000 1.837 0.809 ## satisfaction =~ ## stfhlth 1.000 1.173 0.521 ## stfedu 1.106 0.022 50.840 0.000 1.297 0.577 ## stfeco 1.415 0.025 57.214 0.000 1.659 0.713 ## stfgov 1.480 0.025 58.764 0.000 1.736 0.756 ## stfdem 1.384 0.024 57.904 0.000 1.623 0.731 ## politicians =~ ## pltinvt 1.000 0.646 0.613 ## pltcare 1.021 0.016 62.862 0.000 0.660 0.628 ## trstplt 3.012 0.039 76.838 0.000 1.946 0.891 ## trust =~ ## politicians 1.000 0.918 0.918 ## satisfaction 1.531 0.033 46.494 0.000 0.774 0.774 ## institutions 2.583 0.045 56.796 0.000 0.950 0.950 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .trstlgl 3.068 0.041 75.262 0.000 3.068 0.541 ## .trstplc 3.248 0.041 80.037 0.000 3.248 0.678 ## .trstun 3.197 0.041 77.141 0.000 3.197 0.588 ## .trstep 2.776 0.036 76.243 0.000 2.776 0.564 ## .trstprl 1.776 0.029 61.361 0.000 1.776 0.345 ## .stfhlth 3.695 0.046 79.989 0.000 3.695 0.729 ## .stfedu 3.368 0.043 77.916 0.000 3.368 0.667 ## .stfeco 2.656 0.038 69.070 0.000 2.656 0.491 ## .stfgov 2.264 0.035 64.201 0.000 2.264 0.429 ## .stfdem 2.289 0.034 67.172 0.000 2.289 0.465 ## .pltinvt 0.694 0.009 78.255 0.000 0.694 0.624 ## .pltcare 0.668 0.009 77.562 0.000 0.668 0.605 ## .trstplt 0.978 0.028 34.461 0.000 0.978 0.205 ## .institutions 0.255 0.022 11.691 0.000 0.098 0.098 ## .satisfaction 0.551 0.020 27.846 0.000 0.400 0.400 ## .politicians 0.065 0.004 17.091 0.000 0.157 0.157 ## trust 0.352 0.010 35.005 0.000 1.000 1.000 ## Compare fit between the first and second order models: fitMeasures(fit_3f) ## npar fmin chisq ## 29.000 0.360 10652.207 ## df pvalue baseline.chisq ## 62.000 0.000 81699.096 ## baseline.df baseline.pvalue cfi ## 78.000 0.000 0.870 ## tli nnfi rfi ## 0.837 0.837 0.836 ## nfi pnfi ifi ## 0.870 0.691 0.870 ## rni logl unrestricted.logl ## 0.870 -371404.658 -366078.555 ## aic bic ntotal ## 742867.317 743087.743 14778.000 ## bic2 rmsea rmsea.ci.lower ## 742995.583 0.108 0.106 ## rmsea.ci.upper rmsea.ci.level rmsea.pvalue ## 0.109 0.900 0.000 ## rmsea.close.h0 rmsea.notclose.pvalue rmsea.notclose.h0 ## 0.050 1.000 0.080 ## rmr rmr_nomean srmr ## 0.255 0.255 0.059 ## srmr_bentler srmr_bentler_nomean crmr ## 0.059 0.059 0.064 ## crmr_nomean srmr_mplus srmr_mplus_nomean ## 0.064 0.059 0.059 ## cn_05 cn_01 gfi ## 113.901 126.971 0.897 ## agfi pgfi mfi ## 0.849 0.611 0.699 ## ecvi ## 0.725 fitMeasures(fit_2nd) ## npar fmin chisq ## 29.000 0.360 10652.207 ## df pvalue baseline.chisq ## 62.000 0.000 81699.096 ## baseline.df baseline.pvalue cfi ## 78.000 0.000 0.870 ## tli nnfi rfi ## 0.837 0.837 0.836 ## nfi pnfi ifi ## 0.870 0.691 0.870 ## rni logl unrestricted.logl ## 0.870 -371404.658 -366078.555 ## aic bic ntotal ## 742867.317 743087.743 14778.000 ## bic2 rmsea rmsea.ci.lower ## 742995.583 0.108 0.106 ## rmsea.ci.upper rmsea.ci.level rmsea.pvalue ## 0.109 0.900 0.000 ## rmsea.close.h0 rmsea.notclose.pvalue rmsea.notclose.h0 ## 0.050 1.000 0.080 ## rmr rmr_nomean srmr ## 0.255 0.255 0.059 ## srmr_bentler srmr_bentler_nomean crmr ## 0.059 0.059 0.064 ## crmr_nomean srmr_mplus srmr_mplus_nomean ## 0.064 0.059 0.059 ## cn_05 cn_01 gfi ## 113.901 126.971 0.897 ## agfi pgfi mfi ## 0.849 0.611 0.699 ## ecvi ## 0.725 Click for explanation We don’t have to do anything special here. We can estimate and summarize the second order CFA exactly as we did the first order CFA. You should quickly notice something strange about the model fit statistics compared above. If you don’t see it, consider the following: fitMeasures(fit_3f) - fitMeasures(fit_2nd) ## npar fmin chisq ## 0 0 0 ## df pvalue baseline.chisq ## 0 0 0 ## baseline.df baseline.pvalue cfi ## 0 0 0 ## tli nnfi rfi ## 0 0 0 ## nfi pnfi ifi ## 0 0 0 ## rni logl unrestricted.logl ## 0 0 0 ## aic bic ntotal ## 0 0 0 ## bic2 rmsea rmsea.ci.lower ## 0 0 0 ## rmsea.ci.upper rmsea.ci.level rmsea.pvalue ## 0 0 0 ## rmsea.close.h0 rmsea.notclose.pvalue rmsea.notclose.h0 ## 0 0 0 ## rmr rmr_nomean srmr ## 0 0 0 ## srmr_bentler srmr_bentler_nomean crmr ## 0 0 0 ## crmr_nomean srmr_mplus srmr_mplus_nomean ## 0 0 0 ## cn_05 cn_01 gfi ## 0 0 0 ## agfi pgfi mfi ## 0 0 0 ## ecvi ## 0 The two models produce identical fit statistics! We also see that the degrees of freedom are identical between the two models. Hence, the two models have equal complexity. This result taps into a critical idea in statistical modeling, namely, model equivalency. It turns out the two models we’re comparing here are equivalent in the sense that they are statistically indistinguishable representations of the data. Since this is a very important idea, I want to spend some time discussing it in person. So, spend some time between now and the Week 6 lecture session thinking about the implications of this model equivalence. Specifically, consider the following questions: What do we mean when we say that these two models are equivalent? How is it possible for these two models to be equivalent when one contains an additional latent variable? Why are the degrees of freedom equal for these two models? Why are the fit statistics equal for these two models? We’ll take some time to discuss these ideas in the Week 6 lecture session. End of At-Home Exercises "],["in-class-exercises-4.html", "5.4 In-Class Exercises", " 5.4 In-Class Exercises This week, we will wrap up our re-analysis of the Kestilä (2006) results. During this practical, you will conduct a CFA of the Attitudes toward Immigration items and compare the results to those obtained from your previous EFA- and PCA-based replications of Kestilä (2006). 5.4.1 Load the ESS data. The relevant data are contained in the ess_round1.rds file. Click to show code ess <- readRDS("ess_round1.rds") We are going to conduct a CFA to evaluate the measurement model implied by the five-factor representation of the Attitudes toward Immigration items that you should have found via the EFA you conducted in the Week 4 In-Class Exercises. Caveat: Technically, the following CFA result have no confirmatory value because we’ll be estimating our CFA models from the same data that we used for our EFA. Practicing the techniques will still be useful, though. 5.4.2 Define the lavaan model syntax for the CFA implied by the five-factor solution from 4.4.9. Enforce a simple structure; do not allow any cross-loadings. Covary the five latent factors. Do not specify any mean structure. Save this model syntax as an object in your environment. Hints: You can algorithmically enforce a simple structure by assigning each item to the factor upon which it loads most strongly. You can download the fitted psych::efa() object for the five-factor solution here. The pattern matrix for the five-factor EFA solution in our Week 4 exercises is equivalent to the solution presented in Table 3 of Kestilä (2006). Click to show code mod_5f <- ' ## Immigration Policy: ip =~ imrcntr + eimrcnt + eimpcnt + imsmetn + impcntr + imdfetn ## Social Threat: st =~ imbgeco + imbleco + imwbcnt + imwbcrm + imtcjob + imueclt ## Refugee Policy: rp =~ gvrfgap + imrsprc + rfgbfml + rfggvfn + rfgawrk + rfgfrpc + shrrfg ## Cultural Threat: ct =~ qfimchr + qfimwht + pplstrd + vrtrlg ## Economic Threat: et =~ imwgdwn + imhecop ' Note: We don’t have to specify the latent covariances in the model syntax, we can tell lavaan to estimate all latent covariances when we fit the model. 5.4.3 Estimate the CFA model you defined above, and summarize the results. Use the lavaan::cfa() function to estimate the model. Use the default settings for the cfa() function. Request the model fit statistics with the summary by supplying the fit.measures = TRUE argument to summary(). Request the standardized parameter estimates with the summary by supplying the standardized = TRUE argument to summary(). Check the results, and answer the following questions: Does the model fit the data well? How are the latent variances and covariances specified when using the default settings? How is the model identified when using the default settings? Click to show code ## Load the lavaan package: library(lavaan) ## Estimate the CFA model: fit_5f <- cfa(mod_5f, data = ess) ## Summarize the fitted model: summary(fit_5f, fit.measures = TRUE, standardized = TRUE) ## lavaan 0.6.16 ended normally after 72 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 60 ## ## Used Total ## Number of observations 14243 19690 ## ## Model Test User Model: ## ## Test statistic 18631.556 ## Degrees of freedom 265 ## P-value (Chi-square) 0.000 ## ## Model Test Baseline Model: ## ## Test statistic 159619.058 ## Degrees of freedom 300 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.885 ## Tucker-Lewis Index (TLI) 0.869 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -520035.133 ## Loglikelihood unrestricted model (H1) -510719.354 ## ## Akaike (AIC) 1040190.265 ## Bayesian (BIC) 1040644.106 ## Sample-size adjusted Bayesian (SABIC) 1040453.432 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.070 ## 90 Percent confidence interval - lower 0.069 ## 90 Percent confidence interval - upper 0.071 ## P-value H_0: RMSEA <= 0.050 0.000 ## P-value H_0: RMSEA >= 0.080 0.000 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.048 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## ip =~ ## imrcntr 1.000 0.617 0.748 ## eimrcnt 0.942 0.011 84.943 0.000 0.582 0.696 ## eimpcnt 1.127 0.010 113.413 0.000 0.695 0.898 ## imsmetn 0.982 0.010 98.753 0.000 0.606 0.796 ## impcntr 1.150 0.010 113.623 0.000 0.710 0.900 ## imdfetn 1.132 0.010 111.802 0.000 0.698 0.887 ## st =~ ## imbgeco 1.000 1.608 0.728 ## imbleco 0.826 0.012 69.222 0.000 1.327 0.619 ## imwbcnt 1.046 0.012 88.056 0.000 1.682 0.792 ## imwbcrm 0.713 0.011 63.102 0.000 1.146 0.564 ## imtcjob 0.751 0.011 66.787 0.000 1.207 0.597 ## imueclt 1.008 0.013 78.043 0.000 1.621 0.698 ## rp =~ ## gvrfgap 1.000 0.659 0.610 ## imrsprc 0.855 0.016 51.881 0.000 0.563 0.535 ## rfgbfml 1.047 0.019 56.174 0.000 0.690 0.593 ## rfggvfn 0.849 0.016 51.714 0.000 0.559 0.533 ## rfgawrk 0.653 0.016 41.044 0.000 0.430 0.405 ## rfgfrpc -0.810 0.016 -51.095 0.000 -0.534 -0.525 ## shrrfg -0.999 0.017 -58.381 0.000 -0.658 -0.625 ## ct =~ ## qfimchr 1.000 1.836 0.629 ## qfimwht 0.941 0.017 54.250 0.000 1.728 0.659 ## pplstrd -0.366 0.007 -51.585 0.000 -0.673 -0.600 ## vrtrlg 0.252 0.006 41.294 0.000 0.462 0.443 ## et =~ ## imwgdwn 1.000 0.723 0.667 ## imhecop 1.151 0.023 49.736 0.000 0.832 0.771 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## ip ~~ ## st -0.605 0.012 -48.693 0.000 -0.610 -0.610 ## rp 0.264 0.006 45.566 0.000 0.648 0.648 ## ct 0.634 0.015 41.007 0.000 0.560 0.560 ## et -0.206 0.006 -35.411 0.000 -0.462 -0.462 ## st ~~ ## rp -0.838 0.017 -48.329 0.000 -0.792 -0.792 ## ct -1.622 0.041 -39.091 0.000 -0.550 -0.550 ## et 0.675 0.017 39.083 0.000 0.580 0.580 ## rp ~~ ## ct 0.626 0.018 34.950 0.000 0.518 0.518 ## et -0.233 0.007 -33.007 0.000 -0.490 -0.490 ## ct ~~ ## et -0.592 0.020 -30.127 0.000 -0.446 -0.446 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .imrcntr 0.299 0.004 77.941 0.000 0.299 0.440 ## .eimrcnt 0.359 0.005 79.638 0.000 0.359 0.515 ## .eimpcnt 0.116 0.002 62.821 0.000 0.116 0.193 ## .imsmetn 0.212 0.003 75.580 0.000 0.212 0.366 ## .impcntr 0.119 0.002 62.454 0.000 0.119 0.191 ## .imdfetn 0.132 0.002 65.344 0.000 0.132 0.213 ## .imbgeco 2.288 0.033 70.261 0.000 2.288 0.470 ## .imbleco 2.837 0.037 76.688 0.000 2.837 0.617 ## .imwbcnt 1.677 0.027 63.198 0.000 1.677 0.372 ## .imwbcrm 2.810 0.036 78.612 0.000 2.810 0.682 ## .imtcjob 2.630 0.034 77.524 0.000 2.630 0.643 ## .imueclt 2.761 0.038 72.515 0.000 2.761 0.512 ## .gvrfgap 0.733 0.010 73.584 0.000 0.733 0.628 ## .imrsprc 0.791 0.010 77.119 0.000 0.791 0.714 ## .rfgbfml 0.877 0.012 74.508 0.000 0.877 0.648 ## .rfggvfn 0.788 0.010 77.203 0.000 0.788 0.716 ## .rfgawrk 0.945 0.012 80.870 0.000 0.945 0.836 ## .rfgfrpc 0.749 0.010 77.501 0.000 0.749 0.724 ## .shrrfg 0.676 0.009 72.682 0.000 0.676 0.609 ## .qfimchr 5.142 0.080 64.113 0.000 5.142 0.604 ## .qfimwht 3.891 0.064 60.623 0.000 3.891 0.566 ## .pplstrd 0.804 0.012 67.054 0.000 0.804 0.640 ## .vrtrlg 0.872 0.011 76.990 0.000 0.872 0.804 ## .imwgdwn 0.652 0.012 53.300 0.000 0.652 0.555 ## .imhecop 0.472 0.014 34.353 0.000 0.472 0.405 ## ip 0.381 0.007 51.578 0.000 1.000 1.000 ## st 2.584 0.054 47.795 0.000 1.000 1.000 ## rp 0.434 0.012 36.748 0.000 1.000 1.000 ## ct 3.371 0.096 35.174 0.000 1.000 1.000 ## et 0.523 0.015 34.944 0.000 1.000 1.000 Click for explanation No, the model does not seem to fit the data well. The SRMR looks good, but one good looking fit statistic is not enough. The TLI and CFI are in the “unacceptable” range. RMSEA is in the “questionable” range. The \\(\\chi^2\\) is highly significant, but we don’t care. The cfa() function is just a wrapper for the lavaan() function with several options set at the defaults you would want for a standard CFA. By default: All latent variances and covariances are freely estimated (due to the argument auto.cov.lv.x = TRUE) The model is identified by fixing the first factor loading of each factor to 1 (due to the argument auto.fix.first = TRUE) To see a full list of the (many) options you can specify to tweak the behavior of lavaan estimation functions run ?lavOptions. Now, we will consider a couple of alternative factor structures for the Attitudes toward Immigration CFA. First, we will go extremely simple by estimating a one-factor model wherein all Attitude items are explained by a single latent variable. 5.4.4 Define the lavaan model syntax for a one-factor model of the Immigration items. Save this syntax as an object in your environment. Click to show code mod_1f <- ' ati =~ imrcntr + eimrcnt + eimpcnt + imsmetn + impcntr + imdfetn + imbgeco + imbleco + imwbcnt + imwbcrm + imtcjob + imueclt + gvrfgap + imrsprc + rfgbfml + rfggvfn + rfgawrk + rfgfrpc + shrrfg + qfimchr + qfimwht + pplstrd + vrtrlg + imwgdwn + imhecop ' 5.4.5 Estimate the one-factor model, and summarize the results. Compare the fit measures for the one-factor and five-factor models Which model better fits the data? Note: Remember, you can use the lavaan::fitMeasures() function to extract only the model fit information from a fitted lavaan object. Click to show code ## Estimate the one factor model: fit_1f <- cfa(mod_1f, data = ess) ## Summarize the results: summary(fit_1f) ## lavaan 0.6.16 ended normally after 47 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 50 ## ## Used Total ## Number of observations 14243 19690 ## ## Model Test User Model: ## ## Test statistic 49510.917 ## Degrees of freedom 275 ## P-value (Chi-square) 0.000 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## ati =~ ## imrcntr 1.000 ## eimrcnt 0.937 0.012 78.324 0.000 ## eimpcnt 1.114 0.011 101.263 0.000 ## imsmetn 0.987 0.011 90.990 0.000 ## impcntr 1.147 0.011 102.371 0.000 ## imdfetn 1.153 0.011 103.148 0.000 ## imbgeco -2.055 0.032 -64.749 0.000 ## imbleco -1.625 0.031 -52.533 0.000 ## imwbcnt -2.173 0.030 -71.324 0.000 ## imwbcrm -1.432 0.029 -48.849 0.000 ## imtcjob -1.532 0.029 -52.519 0.000 ## imueclt -2.198 0.033 -65.876 0.000 ## gvrfgap 0.807 0.016 51.746 0.000 ## imrsprc 0.757 0.015 49.790 0.000 ## rfgbfml 0.861 0.017 51.272 0.000 ## rfggvfn 0.722 0.015 47.671 0.000 ## rfgawrk 0.530 0.015 34.448 0.000 ## rfgfrpc -0.755 0.015 -51.462 0.000 ## shrrfg -0.931 0.015 -61.438 0.000 ## qfimchr 1.597 0.042 37.835 0.000 ## qfimwht 1.769 0.038 46.697 0.000 ## pplstrd -0.873 0.016 -53.994 0.000 ## vrtrlg 0.602 0.015 39.940 0.000 ## imwgdwn -0.682 0.016 -43.576 0.000 ## imhecop -0.773 0.016 -49.611 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .imrcntr 0.327 0.004 79.021 0.000 ## .eimrcnt 0.388 0.005 80.422 0.000 ## .eimpcnt 0.161 0.002 70.832 0.000 ## .imsmetn 0.235 0.003 77.101 0.000 ## .impcntr 0.158 0.002 69.688 0.000 ## .imdfetn 0.150 0.002 68.791 0.000 ## .imbgeco 3.381 0.041 82.203 0.000 ## .imbleco 3.666 0.044 83.130 0.000 ## .imwbcnt 2.839 0.035 81.477 0.000 ## .imwbcrm 3.399 0.041 83.334 0.000 ## .imtcjob 3.260 0.039 83.130 0.000 ## .imueclt 3.683 0.045 82.092 0.000 ## .gvrfgap 0.938 0.011 83.176 0.000 ## .imrsprc 0.906 0.011 83.285 0.000 ## .rfgbfml 1.092 0.013 83.203 0.000 ## .rfggvfn 0.917 0.011 83.394 0.000 ## .rfgawrk 1.031 0.012 83.913 0.000 ## .rfgfrpc 0.832 0.010 83.192 0.000 ## .shrrfg 0.803 0.010 82.499 0.000 ## .qfimchr 7.613 0.091 83.803 0.000 ## .qfimwht 5.772 0.069 83.442 0.000 ## .pplstrd 0.988 0.012 83.040 0.000 ## .vrtrlg 0.958 0.011 83.728 0.000 ## .imwgdwn 1.010 0.012 83.583 0.000 ## .imhecop 0.954 0.011 83.294 0.000 ## ati 0.353 0.007 48.941 0.000 ## Compare fit statistics: fitMeasures(fit_5f, fit.measures = c("npar", # Estimated parameters "chisq", "df", "pvalue", # Model fit vs. saturated "cfi", "tli", # Model fit vs. baseline "rmsea", "srmr"), # Model fit vs. saturated output = "text") ## ## Model Test User Model: ## ## Test statistic 18631.556 ## Degrees of freedom 265 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.885 ## Tucker-Lewis Index (TLI) 0.869 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.070 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.048 fitMeasures(fit_1f, fit.measures = c("npar", # Estimated parameters "chisq", "df", "pvalue", # Model fit vs. saturated "cfi", "tli", # Model fit vs. baseline "rmsea", "srmr"), # Model fit vs. saturated output = "text") ## ## Model Test User Model: ## ## Test statistic 49510.917 ## Degrees of freedom 275 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.691 ## Tucker-Lewis Index (TLI) 0.663 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.112 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.087 Click for explanation The one-factor model definitely seems to fit worse than the five-factor model. 5.4.6 Given the CFA results from the five factor model, would a second-order CFA be appropriate for the Attitudes towards Immigration data? Why or why not? Click for explanation Yes, a second-order CFA model is a theoretically appropriate representation of the Attitudes towards Immigration items. The first order latent variables in the five-factor model are all significantly correlated. The first order latent variables in the five-factor model seem to tap different aspects of some single underlying construct. 5.4.7 Define the lavaan model syntax for a second-order CFA model of the Attitudes towards Immigration items, estimate it, and inspect the results. Use the five factors defined in 5.4.2 as the first order factors. Click to show code mod_2o <- paste(mod_5f, 'ati =~ ip + rp + st + ct + et', sep = '\\n') fit_2o <- cfa(mod_2o, data = ess) summary(fit_2o, fit.measures = TRUE) ## lavaan 0.6.16 ended normally after 94 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 55 ## ## Used Total ## Number of observations 14243 19690 ## ## Model Test User Model: ## ## Test statistic 19121.111 ## Degrees of freedom 270 ## P-value (Chi-square) 0.000 ## ## Model Test Baseline Model: ## ## Test statistic 159619.058 ## Degrees of freedom 300 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.882 ## Tucker-Lewis Index (TLI) 0.869 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -520279.910 ## Loglikelihood unrestricted model (H1) -510719.354 ## ## Akaike (AIC) 1040669.820 ## Bayesian (BIC) 1041085.841 ## Sample-size adjusted Bayesian (SABIC) 1040911.056 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.070 ## 90 Percent confidence interval - lower 0.069 ## 90 Percent confidence interval - upper 0.071 ## P-value H_0: RMSEA <= 0.050 0.000 ## P-value H_0: RMSEA >= 0.080 0.000 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.048 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## ip =~ ## imrcntr 1.000 ## eimrcnt 0.943 0.011 85.095 0.000 ## eimpcnt 1.126 0.010 113.523 0.000 ## imsmetn 0.982 0.010 98.910 0.000 ## impcntr 1.149 0.010 113.651 0.000 ## imdfetn 1.130 0.010 111.789 0.000 ## st =~ ## imbgeco 1.000 ## imbleco 0.822 0.012 68.916 0.000 ## imwbcnt 1.047 0.012 88.172 0.000 ## imwbcrm 0.709 0.011 62.846 0.000 ## imtcjob 0.747 0.011 66.424 0.000 ## imueclt 1.013 0.013 78.434 0.000 ## rp =~ ## gvrfgap 1.000 ## imrsprc 0.854 0.017 51.127 0.000 ## rfgbfml 1.048 0.019 55.377 0.000 ## rfggvfn 0.853 0.017 51.170 0.000 ## rfgawrk 0.657 0.016 40.785 0.000 ## rfgfrpc -0.828 0.016 -51.249 0.000 ## shrrfg -1.020 0.017 -58.369 0.000 ## ct =~ ## qfimchr 1.000 ## qfimwht 0.939 0.018 51.902 0.000 ## pplstrd -0.389 0.008 -51.072 0.000 ## vrtrlg 0.271 0.006 41.908 0.000 ## et =~ ## imwgdwn 1.000 ## imhecop 1.158 0.024 48.877 0.000 ## ati =~ ## ip 1.000 ## rp 1.264 0.024 53.732 0.000 ## st -3.123 0.051 -61.058 0.000 ## ct 2.638 0.058 45.467 0.000 ## et -1.000 0.024 -42.490 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .imrcntr 0.299 0.004 77.900 0.000 ## .eimrcnt 0.359 0.005 79.597 0.000 ## .eimpcnt 0.116 0.002 62.698 0.000 ## .imsmetn 0.211 0.003 75.502 0.000 ## .impcntr 0.119 0.002 62.476 0.000 ## .imdfetn 0.133 0.002 65.406 0.000 ## .imbgeco 2.285 0.033 70.158 0.000 ## .imbleco 2.852 0.037 76.762 0.000 ## .imwbcnt 1.668 0.027 62.920 0.000 ## .imwbcrm 2.821 0.036 78.653 0.000 ## .imtcjob 2.646 0.034 77.607 0.000 ## .imueclt 2.734 0.038 72.213 0.000 ## .gvrfgap 0.740 0.010 73.738 0.000 ## .imrsprc 0.797 0.010 77.211 0.000 ## .rfgbfml 0.885 0.012 74.621 0.000 ## .rfggvfn 0.791 0.010 77.189 0.000 ## .rfgawrk 0.946 0.012 80.833 0.000 ## .rfgfrpc 0.741 0.010 77.149 0.000 ## .shrrfg 0.665 0.009 72.020 0.000 ## .qfimchr 5.347 0.081 65.623 0.000 ## .qfimwht 4.084 0.065 62.673 0.000 ## .pplstrd 0.778 0.012 64.838 0.000 ## .vrtrlg 0.854 0.011 75.931 0.000 ## .imwgdwn 0.655 0.012 52.977 0.000 ## .imhecop 0.468 0.014 33.353 0.000 ## .ip 0.177 0.004 44.418 0.000 ## .st 0.596 0.023 26.030 0.000 ## .rp 0.101 0.005 21.784 0.000 ## .ct 1.745 0.060 29.185 0.000 ## .et 0.316 0.010 31.813 0.000 ## ati 0.204 0.005 37.371 0.000 5.4.8 Compare the model fit of the first- and second-order five-factor models using the fitMeasures() function. Which model offers the better fit? Which model is more complex? Click to show code fitMeasures(fit_5f, fit.measures = c("npar", # Estimated parameters "chisq", "df", "pvalue", # Model fit vs. saturated "cfi", "tli", # Model fit vs. baseline "rmsea", "srmr"), # Model fit vs. saturated output = "text") ## ## Model Test User Model: ## ## Test statistic 18631.556 ## Degrees of freedom 265 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.885 ## Tucker-Lewis Index (TLI) 0.869 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.070 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.048 fitMeasures(fit_2o, fit.measures = c("npar", # Estimated parameters "chisq", "df", "pvalue", # Model fit vs. saturated "cfi", "tli", # Model fit vs. baseline "rmsea", "srmr"), # Model fit vs. saturated output = "text") ## ## Model Test User Model: ## ## Test statistic 19121.111 ## Degrees of freedom 270 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.882 ## Tucker-Lewis Index (TLI) 0.869 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.070 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.048 Click for explanation The CFI and TLI are both slightly better in the original five factor model, but the RMSEA and SRMR of both models don’t differ out to three decimal places. As usual, both models have a significant \\(\\chi^2\\), but that doesn’t tell us much. Qualitative comparisons of model fit are fine, but we’d like to have an actual statistical test for these fit differences. As it happens, we have just such a test: a nested model \\(\\Delta \\chi^2\\) test (AKA, chi-squared difference test, change in chi-squared test, likelihood ratio test). In the Week 7 lecture, we’ll cover nested models and tests thereof, but it will be useful to start thinking about these concepts now. Two models are said to be nested if you can define one model by placing constraints on the other model. By way of example, consider the following two CFA models. The second model is nested within the first model, because we can define the second model by fixing the latent covariance to zero in the first model. Notice that the data contain \\(6(6 + 1) / 2 = 21\\) unique pieces of information. The first model estimates 13 parameters, and the second model estimates 12 parameters. Hence the first model has 8 degrees of freedom, and the second model has 9 degrees of freedom. In general, the following must hold whenever Model B is nested within Model A. Model B will have fewer estimated parameters than Model A. Model B will have more degrees of freedom than Model A. Model A will be more complex than model B. Model A will fit the data better than model B. Saturated Model All models are nested within the saturated model, because the saturated model estimates all possible relations among the variables. Regardless of what model we may be considering, we can always convert that model to a saturated model by estimating all possible associations. Hence, all models are nested within the saturated model. Baseline Model Similarly, the baseline model (AKA, independence model) is nested within all other models. In the baseline model, we only estimate the variances of the observed items; all associations are constrained to zero. We can always convert our model to the baseline model by fixing all associations to zero. Hence, the baseline model is nested within all other models. When two models are nested, we can use a \\(\\Delta \\chi^2\\) test to check if the nested model fits significantly worse than its parent model. Whenever we place constraints on the model, the fit will deteriorate, but we want to know if the constraints we imposed to define the nested model have produced too much loss of fit. We can use the anova() function to easily conduct \\(\\Delta \\chi^2\\) tests comparing models that we’ve estimated with cfa() or sem(). 5.4.9 Use the anova() function to compare the five-factor model from 5.4.2 and one-factor model from 5.4.4. Explain what Df, Chisq, Chisq diff, Df diff, and Pr(>Chisq) mean. Which model is more complex? Which model fits better? What is the conclusion of the test? Click to show code anova(fit_1f, fit_5f) Click for explanation The Df column contains the degrees of freedom of each model. Higher df \\(\\Rightarrow\\) Less complex model The Chisq column shows the \\(\\chi^2\\) statistics (AKA, likelihood ratio statistics) for each model. \\(\\chi^2\\) = The ratio of the likelihoods for the estimated model and the saturated model). Larger \\(\\chi^2\\) \\(\\Rightarrow\\) Worse fit Chisq diff is the difference between the two \\(\\chi^2\\) values (i.e., \\(\\Delta \\chi^2\\)). How much better the more complex model fits the data Larger \\(\\Delta \\chi^2\\) values indicate greater losses of fit induced by the constraints needed to define the nested model. Df diff is the difference in the degrees of freedom between the models. Since both models must be estimated from the same pool of variables, this difference also represents the number of parameters that were constrained to define the nested model. Pr(>Chisq) is a p-value for the \\(\\Delta \\chi^2\\) test. \\(H_0: \\Delta \\chi^2 = 0\\) \\(H_1: \\Delta \\chi^2 > 0\\) The five-factor model is more complex than the one-factor model, but the extra complexity is justified The five-factor model fits significantly better than the one-factor model. 5.4.10 Use the anova() function to compare the first- and second-order five-factor models from 5.4.2 and 5.4.7. Which model is more complex? What is the conclusion of the test? Click to show code anova(fit_5f, fit_2o) Click for explanation The first-order model is more complex than the second-order model (df = 265 vs. df = 270), and the extra complexity is necessary. The first-order model fits significantly better than the second-order model. 5.4.11 Based on the results above, would you say that you have successfully confirmed the five-factor structure implied by the EFA? Click for explanation Nope, not so much. The first-order five-factor model may fit the data best out of the three models considered here, but it still fits terribly. None of these models is an adequate representation of the Attitudes toward Immigration items. This result is particularly embarrassing when you consider that we’ve stacked the deck in our favor by using the same data to conduct the EFA and the CFA. When we fail to support the hypothesized measurement model, the confirmatory phase of our analysis is over. At this point, we’ve essentially rejected our hypothesized measurement structure, and that’s the conclusion of our analysis. We don’t have to throw up our hands in despair, however. We can still contribute something useful by modifying the theoretical measurement model through an exploratory, data-driven, post-hoc analysis. We’ll give that a shot below. 5.4.12 Modify the five-factor CFA from 5.4.2 by freeing the following parameters. The residual covariance between imrcntr and eimrcnt These questions both ask about allowing immigration from wealthy countries. It makes sense that answers on these two items share some additional, unique variance above-and-beyond what they contribute to the common factors. The residual covariance between qfimchr and qfimwht These questions are both about imposing qualifications on immigration (specifically Christian religion and “white” race). Click to show code fit_5f_cov <- paste(mod_5f, 'imrcntr ~~ eimrcnt', 'qfimchr ~~ qfimwht', sep = '\\n') %>% cfa(data = ess) summary(fit_5f_cov, fit.measures = TRUE) ## lavaan 0.6.16 ended normally after 77 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 62 ## ## Used Total ## Number of observations 14243 19690 ## ## Model Test User Model: ## ## Test statistic 9740.512 ## Degrees of freedom 263 ## P-value (Chi-square) 0.000 ## ## Model Test Baseline Model: ## ## Test statistic 159619.058 ## Degrees of freedom 300 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.941 ## Tucker-Lewis Index (TLI) 0.932 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -515589.611 ## Loglikelihood unrestricted model (H1) -510719.354 ## ## Akaike (AIC) 1031303.221 ## Bayesian (BIC) 1031772.190 ## Sample-size adjusted Bayesian (SABIC) 1031575.160 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.050 ## 90 Percent confidence interval - lower 0.049 ## 90 Percent confidence interval - upper 0.051 ## P-value H_0: RMSEA <= 0.050 0.280 ## P-value H_0: RMSEA >= 0.080 0.000 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.036 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## ip =~ ## imrcntr 1.000 ## eimrcnt 0.928 0.007 126.255 0.000 ## eimpcnt 1.184 0.011 106.508 0.000 ## imsmetn 1.012 0.011 92.436 0.000 ## impcntr 1.213 0.011 107.078 0.000 ## imdfetn 1.181 0.011 104.566 0.000 ## st =~ ## imbgeco 1.000 ## imbleco 0.826 0.012 69.006 0.000 ## imwbcnt 1.050 0.012 88.051 0.000 ## imwbcrm 0.715 0.011 63.128 0.000 ## imtcjob 0.751 0.011 66.542 0.000 ## imueclt 1.015 0.013 78.256 0.000 ## rp =~ ## gvrfgap 1.000 ## imrsprc 0.858 0.017 51.965 0.000 ## rfgbfml 1.046 0.019 56.104 0.000 ## rfggvfn 0.848 0.016 51.644 0.000 ## rfgawrk 0.652 0.016 40.998 0.000 ## rfgfrpc -0.813 0.016 -51.233 0.000 ## shrrfg -1.002 0.017 -58.499 0.000 ## ct =~ ## qfimchr 1.000 ## qfimwht 0.979 0.020 48.332 0.000 ## pplstrd -0.586 0.014 -40.685 0.000 ## vrtrlg 0.397 0.011 36.273 0.000 ## et =~ ## imwgdwn 1.000 ## imhecop 1.157 0.023 49.549 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## .imrcntr ~~ ## .eimrcnt 0.230 0.004 59.907 0.000 ## .qfimchr ~~ ## .qfimwht 2.558 0.064 40.233 0.000 ## ip ~~ ## st -0.580 0.012 -48.041 0.000 ## rp 0.255 0.006 45.185 0.000 ## ct 0.467 0.014 34.425 0.000 ## et -0.197 0.006 -35.077 0.000 ## st ~~ ## rp -0.835 0.017 -48.285 0.000 ## ct -1.394 0.040 -35.128 0.000 ## et 0.670 0.017 38.935 0.000 ## rp ~~ ## ct 0.538 0.017 32.407 0.000 ## et -0.232 0.007 -32.949 0.000 ## ct ~~ ## et -0.469 0.017 -27.959 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .imrcntr 0.330 0.004 78.903 0.000 ## .eimrcnt 0.396 0.005 80.392 0.000 ## .eimpcnt 0.109 0.002 60.401 0.000 ## .imsmetn 0.220 0.003 75.979 0.000 ## .impcntr 0.107 0.002 58.874 0.000 ## .imdfetn 0.131 0.002 64.630 0.000 ## .imbgeco 2.301 0.033 70.568 0.000 ## .imbleco 2.845 0.037 76.832 0.000 ## .imwbcnt 1.669 0.026 63.272 0.000 ## .imwbcrm 2.808 0.036 78.659 0.000 ## .imtcjob 2.639 0.034 77.663 0.000 ## .imueclt 2.741 0.038 72.463 0.000 ## .gvrfgap 0.734 0.010 73.743 0.000 ## .imrsprc 0.790 0.010 77.164 0.000 ## .rfgbfml 0.880 0.012 74.676 0.000 ## .rfggvfn 0.790 0.010 77.322 0.000 ## .rfgawrk 0.946 0.012 80.924 0.000 ## .rfgfrpc 0.747 0.010 77.519 0.000 ## .shrrfg 0.674 0.009 72.713 0.000 ## .qfimchr 6.815 0.090 75.362 0.000 ## .qfimwht 5.250 0.072 73.378 0.000 ## .pplstrd 0.674 0.013 52.766 0.000 ## .vrtrlg 0.818 0.011 73.191 0.000 ## .imwgdwn 0.655 0.012 53.496 0.000 ## .imhecop 0.468 0.014 33.845 0.000 ## ip 0.350 0.007 48.646 0.000 ## st 2.571 0.054 47.662 0.000 ## rp 0.433 0.012 36.718 0.000 ## ct 1.698 0.073 23.296 0.000 ## et 0.520 0.015 34.814 0.000 5.4.13 Evaluate the model modifications. Did the model fit significantly improve? Is the fit of the modified model acceptable? Click to show code anova(fit_5f_cov, fit_5f) fitMeasures(fit_5f_cov) ## npar fmin chisq ## 62.000 0.342 9740.512 ## df pvalue baseline.chisq ## 263.000 0.000 159619.058 ## baseline.df baseline.pvalue cfi ## 300.000 0.000 0.941 ## tli nnfi rfi ## 0.932 0.932 0.930 ## nfi pnfi ifi ## 0.939 0.823 0.941 ## rni logl unrestricted.logl ## 0.941 -515589.611 -510719.354 ## aic bic ntotal ## 1031303.221 1031772.190 14243.000 ## bic2 rmsea rmsea.ci.lower ## 1031575.160 0.050 0.049 ## rmsea.ci.upper rmsea.ci.level rmsea.pvalue ## 0.051 0.900 0.280 ## rmsea.close.h0 rmsea.notclose.pvalue rmsea.notclose.h0 ## 0.050 0.000 0.080 ## rmr rmr_nomean srmr ## 0.103 0.103 0.036 ## srmr_bentler srmr_bentler_nomean crmr ## 0.036 0.036 0.037 ## crmr_nomean srmr_mplus srmr_mplus_nomean ## 0.037 0.036 0.036 ## cn_05 cn_01 gfi ## 442.344 467.858 0.944 ## agfi pgfi mfi ## 0.931 0.764 0.717 ## ecvi ## 0.693 Click for explanation Yes, the model fit improved significantly. In this case, the original five-factor model is nested within the modified model. So, our \\(\\Delta \\chi^2\\) test is evaluating the improvement in fit contributed by freeing the two residual covariances. The \\(\\Delta \\chi^2\\) test is significant, so we can conclude that including the two new parameter estimates has significantly improved the model fit. I.e., Estimating these two residual covariances is “worth it” in the sense of balancing model fit and model complexity. Also, the fit of the modified model is now acceptable. Caveat If we had found this result when testing our original model, we would be well-situated to proceed with our analysis. In this case, however, we are no longer justified in generalizing these estimates to the population. We only arrived at this well-fitting model by modifying our original theoretical model to better fit the data using estimates derived from those same data to guide our model modifications. We’ve conducted this post-hoc analysis to help inform future research, and this result is useful as a starting point for future studies. Now, anyone analyzing these scales in the future could incorporate these residual covariances into their initial theoretical model. Basically, we conduct these types of post-hoc analyses to help future researchers learn from our mistakes. End of In-Class Exercises "],["full-sem.html", "6 Full SEM", " 6 Full SEM This week, we will focus on integrating all of the disparate methods we’ve covered so far into full-fledged structural equation models. Homework before the lecture Watch the Lecture Recording for this week. Complete the Reading for this week, and answer the associated reading questions. Homework before the practical Complete the At-Home Exercises. Practical content During the practical you will work on the In-Class Exercises. "],["lecture-5.html", "6.1 Lecture", " 6.1 Lecture This week, we will begin with our final theme and discuss structural equation modeling (SEM). This powerful technique joins the strengths of CFA and path analysis to produce a highly flexible and theoretically appealing modeling tool. Essentially, SEM allows us to build structural path models using the latent variables defined by a CFA. 6.1.1 Recording 6.1.2 Slides You can download the lectures slides here "],["reading-5.html", "6.2 Reading", " 6.2 Reading Reference Weston, R. & Gore, P. A. (2006). A brief guide to structural equation modeling. The Counseling Psychologist 34, 719–752. Notes: This article is quite general and provides an overview of things we have discussed so far in this course. This article also also adds an important new idea: combining factor analysis with path modeling to produce a full Structural Equation Model (SEM). Skip the part on GFI (p. 741). The GFI has been shown to be too dependent on sample size and is not recommended any longer. Skip the part on missing data. There is nothing wrong with this section, but missing data analysis is a broad and difficult topic that we cannot adequately cover in this course. If you would like to learn more about missing data and how to treat them, you can take two courses offered by our department: Conducting a Survey Missing Data Theory and Causal Effects Questions The authors state three similarities and two big differences between SEM and other multivariate statistical techniques (e.g., ANCOVA, regression). What are these similarities and differences? Do you agree with the relative strengths and weaknesses of SEM vs. other methods that the authors present? The authors miss at least one additional advantage of SEM over other multivariate methods. What is this missing advantage? Explain what the terms “measurement model” and “structural model” mean in the SEM context. What are the 6 steps of doing an SEM-based analysis given by the authors? The authors claim that testing an SEM using cross-validation is a good idea. When is cross-validation helpful in SEM? Hint: You may have to do some independent (internet, literature) research to learn how cross-validation can be implemented in SEM. "],["at-home-exercises-5.html", "6.3 At-Home Exercises", " 6.3 At-Home Exercises This week, we’ll take another look at the Kestilä (2006) results. During this practical, you will conduct an SEM to replicate the regression analysis of the Finnish data that you conducted in the Week 4 In-Class Exercises. 6.3.1 Load the Finnish subsample of ESS data. The relevant data are contained in the ess_finland.rds file. These are the processed Finnish subsample data from the Week 4 exercises. Note: Unless otherwise noted, all the following analyses use these data. Click to show code ess_fin <- readRDS("ess_finland.rds") We need to do a little data processing before we can fit the regression model. At the moment, lavaan will not automatically convert a factor variable into dummy codes. So, we need to create explicit dummy codes for the two factors we’ll use as predictors in our regression analysis: sex and political orientation. 6.3.2 Convert the sex and political interest factors into dummy codes. Click to show code library(dplyr) ## Create a dummy codes by broadcasting a logical test on the factor levels: ess_fin <- mutate(ess_fin, female = ifelse(sex == "Female", 1, 0), hi_pol_interest = ifelse(polintr_bin == "High Interest", 1, 0) ) ## Check the results: with(ess_fin, table(dummy = female, factor = sex)) ## factor ## dummy Male Female ## 0 960 0 ## 1 0 1040 with(ess_fin, table(dummy = hi_pol_interest, factor = polintr_bin)) ## factor ## dummy Low Interest High Interest ## 0 1070 0 ## 1 0 929 Click for explanation In R, we have several ways of converting a factor into an appropriate set of dummy codes. We could use the dplyr::recode() function as we did last week. We can use the model.matrix() function to define a design matrix based on the inherent contrast attribute of the factor. Missing data will cause problems here. We can us as.numeric() to revert the factor to its underlying numeric representation {Male = 1, Female = 2} and use arithmetic to convert {1, 2} \\(\\rightarrow\\) {0, 1}. When our factor only has two levels, though, the ifelse() function is the simplest way. We are now ready to estimate our latent regression model. Specifically, we want to combine the three OLS regression models that you ran in 4.4.16 into a single SEM that we will estimate in lavaan. The following path diagram shows the intended theoretical model. Although the variances are not included in this path diagram, all variables in the model (including the observed predictor variables) are random. 6.3.3 Define the lavaan model syntax for the SEM shown above. Use the definition of the institutions, satsifaction, and politicians factors from 5.3.2 to define the DVs. Covary the three latent factors. Covary the five predictors. Click to show code mod_sem <- ' ## Define the latent DVs: institutions =~ trstlgl + trstplc + trstun + trstep + trstprl satisfaction =~ stfhlth + stfedu + stfeco + stfgov + stfdem politicians =~ pltinvt + pltcare + trstplt ## Specify the structural relations: institutions + satisfaction + politicians ~ female + age + eduyrs + hi_pol_interest + lrscale ' Click for explanation We simply need to add a line defining the latent regression paths to our old CFA syntax. We don’t need to specify the covariances in the syntax. We can use options in the sem() function to request those estimates. 6.3.4 Estimate the SEM, and summarize the results. Fit the model to the processed Finnish subsample from above. Estimate the model using lavaan::sem(). Request the standardized parameter estimates with the summary. Request the \\(R^2\\) estimates with the summary. Click to show code library(lavaan) ## Fit the SEM: fit_sem <- sem(mod_sem, data = ess_fin, fixed.x = FALSE) ## Summarize the results: summary(fit_sem, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE) ## lavaan 0.6.16 ended normally after 82 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 59 ## ## Used Total ## Number of observations 1740 2000 ## ## Model Test User Model: ## ## Test statistic 1287.421 ## Degrees of freedom 112 ## P-value (Chi-square) 0.000 ## ## Model Test Baseline Model: ## ## Test statistic 10534.649 ## Degrees of freedom 143 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.887 ## Tucker-Lewis Index (TLI) 0.856 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -57914.779 ## Loglikelihood unrestricted model (H1) -57271.068 ## ## Akaike (AIC) 115947.557 ## Bayesian (BIC) 116269.794 ## Sample-size adjusted Bayesian (SABIC) 116082.357 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.078 ## 90 Percent confidence interval - lower 0.074 ## 90 Percent confidence interval - upper 0.082 ## P-value H_0: RMSEA <= 0.050 0.000 ## P-value H_0: RMSEA >= 0.080 0.160 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.045 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## institutions =~ ## trstlgl 1.000 1.418 0.669 ## trstplc 0.609 0.031 19.403 0.000 0.863 0.508 ## trstun 0.887 0.038 23.484 0.000 1.257 0.626 ## trstep 1.134 0.041 27.652 0.000 1.607 0.755 ## trstprl 1.192 0.040 29.444 0.000 1.689 0.815 ## satisfaction =~ ## stfhlth 1.000 0.979 0.497 ## stfedu 0.602 0.043 13.872 0.000 0.589 0.416 ## stfeco 1.266 0.067 18.848 0.000 1.240 0.681 ## stfgov 1.639 0.079 20.638 0.000 1.605 0.846 ## stfdem 1.521 0.075 20.180 0.000 1.489 0.793 ## politicians =~ ## pltinvt 1.000 0.567 0.566 ## pltcare 0.953 0.048 19.653 0.000 0.540 0.590 ## trstplt 3.281 0.133 24.675 0.000 1.860 0.915 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## institutions ~ ## female 0.019 0.073 0.259 0.796 0.013 0.007 ## age -0.008 0.002 -3.740 0.000 -0.006 -0.105 ## eduyrs 0.034 0.010 3.233 0.001 0.024 0.091 ## hi_pol_interst 0.358 0.076 4.730 0.000 0.253 0.126 ## lrscale 0.104 0.018 5.634 0.000 0.073 0.147 ## satisfaction ~ ## female -0.147 0.050 -2.910 0.004 -0.150 -0.075 ## age -0.007 0.002 -4.598 0.000 -0.007 -0.129 ## eduyrs 0.005 0.007 0.775 0.439 0.006 0.022 ## hi_pol_interst 0.164 0.052 3.162 0.002 0.167 0.084 ## lrscale 0.099 0.013 7.501 0.000 0.101 0.202 ## politicians ~ ## female 0.010 0.029 0.349 0.727 0.018 0.009 ## age -0.004 0.001 -4.490 0.000 -0.007 -0.124 ## eduyrs 0.007 0.004 1.697 0.090 0.012 0.047 ## hi_pol_interst 0.258 0.031 8.364 0.000 0.455 0.227 ## lrscale 0.039 0.007 5.370 0.000 0.068 0.138 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .institutions ~~ ## .satisfaction 1.030 0.069 14.933 0.000 0.796 0.796 ## .politicians 0.675 0.041 16.628 0.000 0.908 0.908 ## .satisfaction ~~ ## .politicians 0.365 0.027 13.544 0.000 0.713 0.713 ## female ~~ ## age 0.071 0.212 0.335 0.738 0.071 0.008 ## eduyrs 0.179 0.046 3.869 0.000 0.179 0.093 ## hi_pol_interst -0.017 0.006 -2.767 0.006 -0.017 -0.066 ## lrscale -0.032 0.024 -1.316 0.188 -0.032 -0.032 ## age ~~ ## eduyrs -22.750 1.722 -13.212 0.000 -22.750 -0.334 ## hi_pol_interst 1.377 0.215 6.413 0.000 1.377 0.156 ## lrscale 1.774 0.853 2.079 0.038 1.774 0.050 ## eduyrs ~~ ## hi_pol_interst 0.270 0.047 5.787 0.000 0.270 0.140 ## lrscale 0.735 0.186 3.946 0.000 0.735 0.095 ## hi_pol_interest ~~ ## lrscale 0.016 0.024 0.672 0.501 0.016 0.016 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .trstlgl 2.477 0.093 26.743 0.000 2.477 0.552 ## .trstplc 2.140 0.076 28.334 0.000 2.140 0.742 ## .trstun 2.453 0.090 27.322 0.000 2.453 0.608 ## .trstep 1.950 0.078 24.906 0.000 1.950 0.430 ## .trstprl 1.443 0.064 22.437 0.000 1.443 0.336 ## .stfhlth 2.922 0.104 28.103 0.000 2.922 0.753 ## .stfedu 1.663 0.058 28.613 0.000 1.663 0.827 ## .stfeco 1.775 0.069 25.755 0.000 1.775 0.536 ## .stfgov 1.020 0.056 18.371 0.000 1.020 0.284 ## .stfdem 1.307 0.060 21.953 0.000 1.307 0.371 ## .pltinvt 0.682 0.024 27.818 0.000 0.682 0.680 ## .pltcare 0.547 0.020 27.582 0.000 0.547 0.652 ## .trstplt 0.672 0.069 9.676 0.000 0.672 0.163 ## .institutions 1.881 0.125 15.077 0.000 0.936 0.936 ## .satisfaction 0.892 0.086 10.386 0.000 0.930 0.930 ## .politicians 0.294 0.024 12.224 0.000 0.914 0.914 ## female 0.250 0.008 29.496 0.000 0.250 1.000 ## age 313.238 10.620 29.496 0.000 313.238 1.000 ## eduyrs 14.818 0.502 29.496 0.000 14.818 1.000 ## hi_pol_interst 0.250 0.008 29.496 0.000 0.250 1.000 ## lrscale 4.034 0.137 29.496 0.000 4.034 1.000 ## ## R-Square: ## Estimate ## trstlgl 0.448 ## trstplc 0.258 ## trstun 0.392 ## trstep 0.570 ## trstprl 0.664 ## stfhlth 0.247 ## stfedu 0.173 ## stfeco 0.464 ## stfgov 0.716 ## stfdem 0.629 ## pltinvt 0.320 ## pltcare 0.348 ## trstplt 0.837 ## institutions 0.064 ## satisfaction 0.070 ## politicians 0.086 Click for explanation The fixed.x = FALSE argument tells lavaan to model the predictors as random variables. By default, lavaan will covary any random predictor variables. So, we don’t need to make any other changes to the usual procedure. 6.3.5 Finally, we will rerun the latent regression model from above as a path model with the factor scores from 4.4.10 acting as the DVs. Rerun the above SEM as a path model wherein the EFA-derived Trust in Institutions, Satisfaction with Political Systems, and Trust in Politicians factor scores act as the DVs. Request the standardized parameter estimates with the summary. Request the \\(R^2\\) estimates with the summary. Click to show code ## Define the model syntax for the path analysis: mod_pa <- ' trust_inst + satisfy + trust_pol ~ female + age + eduyrs + hi_pol_interest + lrscale' ## Estimate the path model: fit_pa <- sem(mod_pa, data = ess_fin, fixed.x = FALSE) ## Summarize the results: summary(fit_pa, standardized = TRUE, rsquare = TRUE) ## lavaan 0.6.16 ended normally after 44 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 36 ## ## Used Total ## Number of observations 1740 2000 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## trust_inst ~ ## female 0.004 0.045 0.091 0.928 0.004 0.002 ## age -0.003 0.001 -2.229 0.026 -0.003 -0.057 ## eduyrs 0.023 0.006 3.642 0.000 0.023 0.094 ## hi_pol_interst 0.167 0.046 3.599 0.000 0.167 0.088 ## lrscale 0.059 0.011 5.258 0.000 0.059 0.125 ## satisfy ~ ## female -0.125 0.040 -3.115 0.002 -0.125 -0.073 ## age -0.005 0.001 -4.102 0.000 -0.005 -0.105 ## eduyrs -0.003 0.006 -0.534 0.594 -0.003 -0.014 ## hi_pol_interst 0.073 0.041 1.782 0.075 0.073 0.043 ## lrscale 0.085 0.010 8.510 0.000 0.085 0.200 ## trust_pol ~ ## female 0.016 0.046 0.338 0.735 0.016 0.008 ## age -0.009 0.001 -6.480 0.000 -0.009 -0.161 ## eduyrs 0.018 0.007 2.839 0.005 0.018 0.071 ## hi_pol_interst 0.464 0.047 9.801 0.000 0.464 0.232 ## lrscale 0.055 0.011 4.801 0.000 0.055 0.110 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .trust_inst ~~ ## .satisfy 0.437 0.021 20.609 0.000 0.437 0.568 ## .trust_pol 0.498 0.024 20.480 0.000 0.498 0.564 ## .satisfy ~~ ## .trust_pol 0.367 0.021 17.664 0.000 0.367 0.467 ## female ~~ ## age 0.071 0.212 0.335 0.738 0.071 0.008 ## eduyrs 0.179 0.046 3.869 0.000 0.179 0.093 ## hi_pol_interst -0.017 0.006 -2.767 0.006 -0.017 -0.066 ## lrscale -0.032 0.024 -1.316 0.188 -0.032 -0.032 ## age ~~ ## eduyrs -22.750 1.722 -13.212 0.000 -22.750 -0.334 ## hi_pol_interst 1.377 0.215 6.413 0.000 1.377 0.156 ## lrscale 1.774 0.853 2.079 0.038 1.774 0.050 ## eduyrs ~~ ## hi_pol_interst 0.270 0.047 5.787 0.000 0.270 0.140 ## lrscale 0.735 0.186 3.946 0.000 0.735 0.095 ## hi_pol_interest ~~ ## lrscale 0.016 0.024 0.672 0.501 0.016 0.016 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .trust_inst 0.866 0.029 29.496 0.000 0.866 0.958 ## .satisfy 0.684 0.023 29.496 0.000 0.684 0.945 ## .trust_pol 0.902 0.031 29.496 0.000 0.902 0.902 ## female 0.250 0.008 29.496 0.000 0.250 1.000 ## age 313.238 10.620 29.496 0.000 313.238 1.000 ## eduyrs 14.818 0.502 29.496 0.000 14.818 1.000 ## hi_pol_interst 0.250 0.008 29.496 0.000 0.250 1.000 ## lrscale 4.034 0.137 29.496 0.000 4.034 1.000 ## ## R-Square: ## Estimate ## trust_inst 0.042 ## satisfy 0.055 ## trust_pol 0.098 Click to show explanation We don’t so anything particularly special here. We simply rerun our latent regression as a path analysis with the EFA-derived factor scores as the DVs. 6.3.6 Compare the results from the path analysis to the SEM-based results. Does it matter whether we use a latent variable or a factor score to define the DV? Hint: When comparing parameter estimates, use the fully standardized estimates (i.e., the values in the column labeled Std.all). Click to show code Note: The “supportFunction.R” script that we source below isn’t a necessary part of the solution. This script defines a bunch of convenience functions. One of these functions, partSummary(), allows us to print selected pieces of the model summary. ## Source a script of convenience function definitions: source("supportFunctions.R") ## View the regression estimates from the SEM: partSummary(fit_sem, 8, standardized = TRUE) ## Regressions: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## institutions ~ ## female 0.019 0.073 0.259 0.796 0.013 0.007 ## age -0.008 0.002 -3.740 0.000 -0.006 -0.105 ## eduyrs 0.034 0.010 3.233 0.001 0.024 0.091 ## hi_pol_interst 0.358 0.076 4.730 0.000 0.253 0.126 ## lrscale 0.104 0.018 5.634 0.000 0.073 0.147 ## satisfaction ~ ## female -0.147 0.050 -2.910 0.004 -0.150 -0.075 ## age -0.007 0.002 -4.598 0.000 -0.007 -0.129 ## eduyrs 0.005 0.007 0.775 0.439 0.006 0.022 ## hi_pol_interst 0.164 0.052 3.162 0.002 0.167 0.084 ## lrscale 0.099 0.013 7.501 0.000 0.101 0.202 ## politicians ~ ## female 0.010 0.029 0.349 0.727 0.018 0.009 ## age -0.004 0.001 -4.490 0.000 -0.007 -0.124 ## eduyrs 0.007 0.004 1.697 0.090 0.012 0.047 ## hi_pol_interst 0.258 0.031 8.364 0.000 0.455 0.227 ## lrscale 0.039 0.007 5.370 0.000 0.068 0.138 ## View the regression estimates from the path analysis: partSummary(fit_pa, 7, standardized = TRUE) ## Regressions: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## trust_inst ~ ## female 0.004 0.045 0.091 0.928 0.004 0.002 ## age -0.003 0.001 -2.229 0.026 -0.003 -0.057 ## eduyrs 0.023 0.006 3.642 0.000 0.023 0.094 ## hi_pol_interst 0.167 0.046 3.599 0.000 0.167 0.088 ## lrscale 0.059 0.011 5.258 0.000 0.059 0.125 ## satisfy ~ ## female -0.125 0.040 -3.115 0.002 -0.125 -0.073 ## age -0.005 0.001 -4.102 0.000 -0.005 -0.105 ## eduyrs -0.003 0.006 -0.534 0.594 -0.003 -0.014 ## hi_pol_interst 0.073 0.041 1.782 0.075 0.073 0.043 ## lrscale 0.085 0.010 8.510 0.000 0.085 0.200 ## trust_pol ~ ## female 0.016 0.046 0.338 0.735 0.016 0.008 ## age -0.009 0.001 -6.480 0.000 -0.009 -0.161 ## eduyrs 0.018 0.007 2.839 0.005 0.018 0.071 ## hi_pol_interst 0.464 0.047 9.801 0.000 0.464 0.232 ## lrscale 0.055 0.011 4.801 0.000 0.055 0.110 ## View the R-squared estimates from the SEM: partSummary(fit_sem, 11, rsquare = TRUE) ## R-Square: ## Estimate ## trstlgl 0.448 ## trstplc 0.258 ## trstun 0.392 ## trstep 0.570 ## trstprl 0.664 ## stfhlth 0.247 ## stfedu 0.173 ## stfeco 0.464 ## stfgov 0.716 ## stfdem 0.629 ## pltinvt 0.320 ## pltcare 0.348 ## trstplt 0.837 ## institutions 0.064 ## satisfaction 0.070 ## politicians 0.086 ## View the R-squared estimates from the SEM: partSummary(fit_pa, 10, rsquare = TRUE) ## R-Square: ## Estimate ## trust_inst 0.042 ## satisfy 0.055 ## trust_pol 0.098 Click for explanation It certainly looks like the way we define the DV has a meaningful impact. The patterns of significance differ between the two sets of regression slopes, and the \\(R^2\\) values are larger for the Institutions and Satisfaction factors in the SEM, and the \\(R^2\\) for the Politicians factor is higher in the path analysis. End of At-Home Exercises "],["in-class-exercises-5.html", "6.4 In-Class Exercises", " 6.4 In-Class Exercises In these exercises, you will use full structural equation modeling (SEM) to evaluate the Theory of Reasoned Action (TORA), which is a popular psychological theory of social behavior developed by Ajzen and Fishbein. The theory states that actual behavior is predicted by behavioral intention, which is in turn predicted by the attitude toward the behavior and subjective norms about the behavior. Later, a third determinant was added, perceived behavioral control. The extent to which people feel that they have control over their behavior also influences their behavior. The data we will use for this practical are available in the toradata.csv file. These data were synthesized according to the results of Reinecke (1998)’s investigation of condom use by young people between 16 and 24 years old. The data contain the following variables: respnr: Numeric participant ID behavior: The dependent variable condom use Measured on a 5-point frequency scale (How often do you…) intent: A single item assessing behavioral intention Measured on a similar 5-point scale (In general, do you intend to…). attit_1:attit_3: Three indicators of attitudes about condom use Measured on a 5-point rating scale (e.g., using a condom is awkward) norm_1:norm_3: Three indicators of social norms about condom use Measured on a 5-point rating scale (e.g., I think most of my friends would use…) control_1:control_3: Three indicators of perceived behavioral control Measured on a 5-point rating scale (e.g., I know well how to use a condom) sex: Binary factor indicating biological sex 6.4.1 Load the data contained in the toradata.csv file. Click to show code condom <- read.csv("toradata.csv", stringsAsFactors = TRUE) 6.4.2 The data contain multiple indicators of attitudes, norms, and control. Run a CFA for these three latent variables. Correlate the latent factors. Do the data support the measurement model for these latent factors? Are the three latent factors significantly correlated? Is it reasonable to proceed with our evaluation of the TORA theory? Click to show code library(lavaan) mod_cfa <- ' attitudes =~ attit_1 + attit_2 + attit_3 norms =~ norm_1 + norm_2 + norm_3 control =~ control_1 + control_2 + control_3 ' fit <- cfa(mod_cfa, data = condom) summary(fit, fit.measures = TRUE) ## lavaan 0.6.16 ended normally after 29 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 21 ## ## Number of observations 250 ## ## Model Test User Model: ## ## Test statistic 35.611 ## Degrees of freedom 24 ## P-value (Chi-square) 0.060 ## ## Model Test Baseline Model: ## ## Test statistic 910.621 ## Degrees of freedom 36 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.987 ## Tucker-Lewis Index (TLI) 0.980 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -2998.290 ## Loglikelihood unrestricted model (H1) -2980.484 ## ## Akaike (AIC) 6038.580 ## Bayesian (BIC) 6112.530 ## Sample-size adjusted Bayesian (SABIC) 6045.959 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.044 ## 90 Percent confidence interval - lower 0.000 ## 90 Percent confidence interval - upper 0.073 ## P-value H_0: RMSEA <= 0.050 0.599 ## P-value H_0: RMSEA >= 0.080 0.017 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.037 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## attitudes =~ ## attit_1 1.000 ## attit_2 1.036 0.068 15.308 0.000 ## attit_3 -1.002 0.067 -14.856 0.000 ## norms =~ ## norm_1 1.000 ## norm_2 1.031 0.098 10.574 0.000 ## norm_3 0.932 0.093 10.013 0.000 ## control =~ ## control_1 1.000 ## control_2 0.862 0.129 6.699 0.000 ## control_3 0.968 0.133 7.290 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## attitudes ~~ ## norms 0.340 0.069 4.957 0.000 ## control 0.475 0.073 6.468 0.000 ## norms ~~ ## control 0.338 0.064 5.254 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 0.418 0.052 8.047 0.000 ## .attit_2 0.310 0.047 6.633 0.000 ## .attit_3 0.369 0.049 7.577 0.000 ## .norm_1 0.504 0.071 7.130 0.000 ## .norm_2 0.469 0.071 6.591 0.000 ## .norm_3 0.635 0.075 8.465 0.000 ## .control_1 0.614 0.078 7.905 0.000 ## .control_2 0.865 0.091 9.520 0.000 ## .control_3 0.762 0.087 8.758 0.000 ## attitudes 0.885 0.116 7.620 0.000 ## norms 0.743 0.116 6.423 0.000 ## control 0.497 0.099 5.002 0.000 Click for explanation Yes, the model fits the data well, and the measurement parameters (e.g., factor loadings, residual variances) look reasonable. So, the data seem to support this measurement structure. Yes, all three latent variables are significantly, positively correlated. Yes. The measurement structure is supported, so we can use the latent variables to represent the respective constructs in our subsequent SEM. The TORA doesn’t actually say anything about the associations between these three factors, but it makes sense that they would be positively associated. So, we should find this result comforting. 6.4.3 Estimate the basic TORA model as an SEM. Predict intention from attitudes and norms. Predict condom use from intention. Use the latent versions of attitudes and norms. Covary the attitudes and norms factors. Does the model fit well? Do the estimates align with the TORA? How much variance in intention and condom use are explained by the model? Click to show code mod <- ' ## Define the latent variables: attitudes =~ attit_1 + attit_2 + attit_3 norms =~ norm_1 + norm_2 + norm_3 ## Define the structural model: intent ~ attitudes + norms behavior ~ intent ' fit <- sem(mod, data = condom) summary(fit, fit.measures = TRUE, rsquare = TRUE) ## lavaan 0.6.16 ended normally after 24 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 18 ## ## Number of observations 250 ## ## Model Test User Model: ## ## Test statistic 27.890 ## Degrees of freedom 18 ## P-value (Chi-square) 0.064 ## ## Model Test Baseline Model: ## ## Test statistic 1089.407 ## Degrees of freedom 28 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.991 ## Tucker-Lewis Index (TLI) 0.986 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -2533.616 ## Loglikelihood unrestricted model (H1) -2519.671 ## ## Akaike (AIC) 5103.232 ## Bayesian (BIC) 5166.618 ## Sample-size adjusted Bayesian (SABIC) 5109.557 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.047 ## 90 Percent confidence interval - lower 0.000 ## 90 Percent confidence interval - upper 0.079 ## P-value H_0: RMSEA <= 0.050 0.523 ## P-value H_0: RMSEA >= 0.080 0.046 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.036 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## attitudes =~ ## attit_1 1.000 ## attit_2 1.039 0.068 15.365 0.000 ## attit_3 -1.002 0.067 -14.850 0.000 ## norms =~ ## norm_1 1.000 ## norm_2 0.983 0.087 11.333 0.000 ## norm_3 0.935 0.087 10.778 0.000 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## intent ~ ## attitudes 0.439 0.063 6.990 0.000 ## norms 0.693 0.077 8.977 0.000 ## behavior ~ ## intent 0.746 0.045 16.443 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## attitudes ~~ ## norms 0.347 0.069 5.027 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 0.420 0.052 8.103 0.000 ## .attit_2 0.306 0.046 6.604 0.000 ## .attit_3 0.372 0.049 7.651 0.000 ## .norm_1 0.483 0.064 7.581 0.000 ## .norm_2 0.521 0.065 7.954 0.000 ## .norm_3 0.610 0.070 8.713 0.000 ## .intent 0.423 0.048 8.769 0.000 ## .behavior 0.603 0.054 11.180 0.000 ## attitudes 0.884 0.116 7.614 0.000 ## norms 0.765 0.113 6.767 0.000 ## ## R-Square: ## Estimate ## attit_1 0.678 ## attit_2 0.757 ## attit_3 0.705 ## norm_1 0.613 ## norm_2 0.587 ## norm_3 0.523 ## intent 0.639 ## behavior 0.520 Click for explanation Yes, the model still fits the data very well. Yes, the estimates all align with the TORA. Specifically, attitudes and norms both significantly predict intention, and intention significantly predicts condom use. The model explains 63.93% of the variance in intention and 51.96% of the variance in condom use. 6.4.4 Update your model to represent the extended TORA model that includes perceived behavioral control. Regress condom use onto perceived behavioral control. Use the latent variable representation of control. Covary all three exogenous latent factors. Does the model fit well? Do the estimates align with the updated TORA? How much variance in intention and condom use are explained by the model? Click to show code mod_tora <- ' attitudes =~ attit_1 + attit_2 + attit_3 norms =~ norm_1 + norm_2 + norm_3 control =~ control_1 + control_2 + control_3 intent ~ attitudes + norms behavior ~ intent + control ' fit_tora <- sem(mod_tora, data = condom) summary(fit_tora, fit.measures = TRUE, rsquare = TRUE) ## lavaan 0.6.16 ended normally after 31 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 27 ## ## Number of observations 250 ## ## Model Test User Model: ## ## Test statistic 48.757 ## Degrees of freedom 39 ## P-value (Chi-square) 0.136 ## ## Model Test Baseline Model: ## ## Test statistic 1333.695 ## Degrees of freedom 55 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.992 ## Tucker-Lewis Index (TLI) 0.989 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -3551.160 ## Loglikelihood unrestricted model (H1) -3526.782 ## ## Akaike (AIC) 7156.320 ## Bayesian (BIC) 7251.400 ## Sample-size adjusted Bayesian (SABIC) 7165.807 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.032 ## 90 Percent confidence interval - lower 0.000 ## 90 Percent confidence interval - upper 0.057 ## P-value H_0: RMSEA <= 0.050 0.870 ## P-value H_0: RMSEA >= 0.080 0.000 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.033 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## attitudes =~ ## attit_1 1.000 ## attit_2 1.033 0.068 15.221 0.000 ## attit_3 -1.025 0.068 -15.097 0.000 ## norms =~ ## norm_1 1.000 ## norm_2 0.984 0.087 11.256 0.000 ## norm_3 0.955 0.088 10.881 0.000 ## control =~ ## control_1 1.000 ## control_2 0.859 0.127 6.789 0.000 ## control_3 0.997 0.131 7.609 0.000 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## intent ~ ## attitudes 0.447 0.063 7.100 0.000 ## norms 0.706 0.078 9.078 0.000 ## behavior ~ ## intent 0.563 0.063 8.923 0.000 ## control 0.454 0.119 3.805 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## attitudes ~~ ## norms 0.342 0.068 5.011 0.000 ## control 0.474 0.072 6.548 0.000 ## norms ~~ ## control 0.352 0.064 5.521 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 0.432 0.052 8.381 0.000 ## .attit_2 0.330 0.046 7.220 0.000 ## .attit_3 0.344 0.046 7.439 0.000 ## .norm_1 0.496 0.063 7.820 0.000 ## .norm_2 0.533 0.065 8.152 0.000 ## .norm_3 0.595 0.069 8.643 0.000 ## .control_1 0.625 0.075 8.372 0.000 ## .control_2 0.876 0.090 9.757 0.000 ## .control_3 0.746 0.084 8.874 0.000 ## .intent 0.409 0.047 8.769 0.000 ## .behavior 0.542 0.052 10.423 0.000 ## attitudes 0.872 0.115 7.566 0.000 ## norms 0.751 0.112 6.709 0.000 ## control 0.485 0.096 5.059 0.000 ## ## R-Square: ## Estimate ## attit_1 0.668 ## attit_2 0.738 ## attit_3 0.727 ## norm_1 0.602 ## norm_2 0.577 ## norm_3 0.535 ## control_1 0.437 ## control_2 0.290 ## control_3 0.392 ## intent 0.651 ## behavior 0.566 Click for explanation Yes, the model still fits the data very well. Yes, the estimates all align with the updated TORA. Specifically, attitudes and norms both significantly predict intention, while intention and control both significantly predict condom use. The model explains 65.11% of the variance in intention and 56.62% of the variance in condom use. The TORA model explicitly forbids direct paths from attitudes and norms to behaviors; these effects should be fully mediated by the behavioral intention. The theory does not specify how perceived behavioral control should affect behaviors. There may be a direct effect of control on behavior, or the effect may be (partially) mediated by intention. 6.4.5 Evaluate the hypothesized indirect effects of attitudes and norms. Include attitudes, norms, and control in your model as in 6.4.4. Does intention significantly mediate the effects of attitudes and norms on behavior? Don’t forget to follow all the steps we covered for testing mediation. Are both of the above effects completely mediated? Do these results comport with the TORA? Why or why not? Click for explanation mod <- ' attitudes =~ attit_1 + attit_2 + attit_3 norms =~ norm_1 + norm_2 + norm_3 control =~ control_1 + control_2 + control_3 intent ~ a1 * attitudes + a2 * norms behavior ~ b * intent + control + attitudes + norms ie_att := a1 * b ie_norm := a2 * b ' set.seed(235711) fit <- sem(mod, data = condom, se = "bootstrap", bootstrap = 1000) summary(fit, ci = TRUE) ## lavaan 0.6.16 ended normally after 36 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 29 ## ## Number of observations 250 ## ## Model Test User Model: ## ## Test statistic 48.629 ## Degrees of freedom 37 ## P-value (Chi-square) 0.096 ## ## Parameter Estimates: ## ## Standard errors Bootstrap ## Number of requested bootstrap draws 1000 ## Number of successful bootstrap draws 1000 ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## attitudes =~ ## attit_1 1.000 1.000 1.000 ## attit_2 1.033 0.060 17.261 0.000 0.925 1.165 ## attit_3 -1.025 0.064 -15.894 0.000 -1.163 -0.902 ## norms =~ ## norm_1 1.000 1.000 1.000 ## norm_2 0.984 0.071 13.794 0.000 0.843 1.127 ## norm_3 0.955 0.093 10.324 0.000 0.792 1.157 ## control =~ ## control_1 1.000 1.000 1.000 ## control_2 0.860 0.113 7.624 0.000 0.653 1.098 ## control_3 0.996 0.147 6.790 0.000 0.748 1.320 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## intent ~ ## attitudes (a1) 0.447 0.067 6.674 0.000 0.324 0.585 ## norms (a2) 0.706 0.078 9.094 0.000 0.569 0.878 ## behavior ~ ## intent (b) 0.545 0.075 7.282 0.000 0.389 0.686 ## control 0.428 0.232 1.847 0.065 0.046 0.934 ## attitudes 0.010 0.122 0.084 0.933 -0.249 0.226 ## norms 0.041 0.118 0.345 0.730 -0.194 0.266 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## attitudes ~~ ## norms 0.342 0.070 4.883 0.000 0.208 0.480 ## control 0.475 0.069 6.850 0.000 0.344 0.612 ## norms ~~ ## control 0.350 0.067 5.218 0.000 0.221 0.484 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## .attit_1 0.432 0.050 8.720 0.000 0.331 0.526 ## .attit_2 0.330 0.045 7.382 0.000 0.238 0.415 ## .attit_3 0.343 0.049 6.992 0.000 0.244 0.444 ## .norm_1 0.496 0.060 8.305 0.000 0.376 0.614 ## .norm_2 0.533 0.077 6.951 0.000 0.390 0.687 ## .norm_3 0.594 0.069 8.597 0.000 0.443 0.719 ## .control_1 0.624 0.076 8.216 0.000 0.477 0.763 ## .control_2 0.875 0.092 9.495 0.000 0.686 1.052 ## .control_3 0.745 0.079 9.398 0.000 0.574 0.889 ## .intent 0.409 0.050 8.169 0.000 0.309 0.507 ## .behavior 0.544 0.058 9.379 0.000 0.415 0.639 ## attitudes 0.872 0.104 8.387 0.000 0.675 1.077 ## norms 0.751 0.099 7.557 0.000 0.556 0.941 ## control 0.486 0.096 5.042 0.000 0.303 0.684 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## ie_att 0.244 0.050 4.860 0.000 0.150 0.352 ## ie_norm 0.385 0.066 5.835 0.000 0.268 0.527 Yes, both indirect effects are significant according to the 95% bootstrapped CIs. Yes, both effects are completely moderated by behavioral intention. We can infer as much because the direct effects of attitudes and norms on condom use are both nonsignificant. Yes, these results comport with the TORA. Both effects are fully mediated, as the theory stipulates. In addition to evaluating the significance of the indirect and direct effects, we can also take a model-comparison perspective. We can use model comparisons to test if removing the direct effects of attitudes and norms on condom use significantly decreases model fit. In other words, are those paths needed to accurately represent the data, or are they “dead weight”. 6.4.6 Use a \\(\\Delta \\chi^2\\) test to evaluate the necessity of including the direct effects of attitudes and norms on condom use in the model. What is your conclusion? Click for explanation We only need to compare the fit of the model with the direct effects included to the fit of the model without the direct effects. We’ve already estimated both models, so we can simply submit the fitted lavaan objects to the anova() function. anova(fit, fit_tora) The \\(\\Delta \\chi^2\\) test is not significant. So, we have not lost a significant amount of fit by fixing the direct effects to zero. In other words, the complete mediation model explains the data just as well as the partial mediation model. So, we should probably prefer the more parsimonious model. 6.4.7 Use some statistical means of evaluating the most plausible way to include perceived behavioral control into the model. Choose between the following three options: control predicts behavior via a direct, un-mediated effect. control predicts behavior via an indirect effect that is completely mediated by intention. control predicts behavior via both an indirect effect through intention and a residual direct effect. Hint: There is more than one way to approach this problem. Approach 1: Testing Effects Click to show code One way to tackle this problem is to test the indirect, direct, and total effects. ## Allow for partial mediation: mod1 <- ' attitudes =~ attit_1 + attit_2 + attit_3 norms =~ norm_1 + norm_2 + norm_3 control =~ control_1 + control_2 + control_3 intent ~ attitudes + norms + a * control behavior ~ b * intent + c * control ie := a * b total := ie + c ' set.seed(235711) fit1 <- sem(mod1, data = condom, se = "bootstrap", bootstrap = 1000) summary(fit1, ci = TRUE) ## lavaan 0.6.16 ended normally after 33 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 28 ## ## Number of observations 250 ## ## Model Test User Model: ## ## Test statistic 47.389 ## Degrees of freedom 38 ## P-value (Chi-square) 0.141 ## ## Parameter Estimates: ## ## Standard errors Bootstrap ## Number of requested bootstrap draws 1000 ## Number of successful bootstrap draws 1000 ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## attitudes =~ ## attit_1 1.000 1.000 1.000 ## attit_2 1.034 0.060 17.222 0.000 0.925 1.167 ## attit_3 -1.021 0.064 -15.877 0.000 -1.158 -0.898 ## norms =~ ## norm_1 1.000 1.000 1.000 ## norm_2 0.985 0.071 13.803 0.000 0.848 1.133 ## norm_3 0.948 0.093 10.204 0.000 0.786 1.155 ## control =~ ## control_1 1.000 1.000 1.000 ## control_2 0.861 0.113 7.635 0.000 0.653 1.100 ## control_3 0.996 0.142 7.020 0.000 0.760 1.318 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## intent ~ ## attitudes 0.357 0.115 3.113 0.002 0.146 0.603 ## norms 0.646 0.095 6.794 0.000 0.473 0.859 ## control (a) 0.199 0.199 1.002 0.317 -0.188 0.633 ## behavior ~ ## intent (b) 0.551 0.074 7.487 0.000 0.391 0.683 ## control (c) 0.469 0.142 3.298 0.001 0.231 0.791 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## attitudes ~~ ## norms 0.344 0.070 4.905 0.000 0.210 0.481 ## control 0.471 0.069 6.838 0.000 0.342 0.608 ## norms ~~ ## control 0.345 0.066 5.240 0.000 0.215 0.481 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## .attit_1 0.429 0.050 8.628 0.000 0.329 0.524 ## .attit_2 0.325 0.045 7.230 0.000 0.233 0.408 ## .attit_3 0.347 0.049 7.011 0.000 0.248 0.455 ## .norm_1 0.490 0.060 8.172 0.000 0.373 0.612 ## .norm_2 0.525 0.076 6.869 0.000 0.385 0.684 ## .norm_3 0.599 0.070 8.529 0.000 0.447 0.729 ## .control_1 0.626 0.074 8.429 0.000 0.479 0.761 ## .control_2 0.875 0.092 9.522 0.000 0.689 1.049 ## .control_3 0.748 0.078 9.532 0.000 0.579 0.893 ## .intent 0.412 0.050 8.283 0.000 0.307 0.504 ## .behavior 0.541 0.055 9.873 0.000 0.423 0.639 ## attitudes 0.875 0.104 8.385 0.000 0.676 1.081 ## norms 0.757 0.099 7.616 0.000 0.560 0.949 ## control 0.484 0.095 5.092 0.000 0.306 0.683 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## ie 0.110 0.105 1.048 0.295 -0.105 0.309 ## total 0.578 0.186 3.108 0.002 0.235 0.971 Click for explanation From the above results, we can see that the direct and total effects are both significant, but the indirect effect is not. Hence, it probably makes the most sense to include control via a direct (non-mediated) effect on behavior. Approach 2.1: Nested Model Comparison Click to show code We can also approach this problem from a model-comparison perspective. We can fit models that encode each pattern of constraints and check which one best represents the data. ## Force complete mediation: mod2 <- ' attitudes =~ attit_1 + attit_2 + attit_3 norms =~ norm_1 + norm_2 + norm_3 control =~ control_1 + control_2 + control_3 intent ~ attitudes + norms + control behavior ~ intent ' ## Force no mediation: mod3 <- ' attitudes =~ attit_1 + attit_2 + attit_3 norms =~ norm_1 + norm_2 + norm_3 control =~ control_1 + control_2 + control_3 intent ~ attitudes + norms behavior ~ intent + control ' ## Estimate the two restricted models: fit2 <- sem(mod2, data = condom) fit3 <- sem(mod3, data = condom) ## Check the results: summary(fit2) ## lavaan 0.6.16 ended normally after 33 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 27 ## ## Number of observations 250 ## ## Model Test User Model: ## ## Test statistic 62.797 ## Degrees of freedom 39 ## P-value (Chi-square) 0.009 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## attitudes =~ ## attit_1 1.000 ## attit_2 1.033 0.068 15.295 0.000 ## attit_3 -1.018 0.068 -15.087 0.000 ## norms =~ ## norm_1 1.000 ## norm_2 0.985 0.087 11.305 0.000 ## norm_3 0.947 0.087 10.845 0.000 ## control =~ ## control_1 1.000 ## control_2 0.864 0.126 6.855 0.000 ## control_3 0.958 0.129 7.417 0.000 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## intent ~ ## attitudes 0.352 0.096 3.669 0.000 ## norms 0.644 0.088 7.347 0.000 ## control 0.207 0.163 1.268 0.205 ## behavior ~ ## intent 0.746 0.045 16.443 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## attitudes ~~ ## norms 0.345 0.069 5.023 0.000 ## control 0.476 0.073 6.513 0.000 ## norms ~~ ## control 0.346 0.065 5.361 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 0.427 0.051 8.295 0.000 ## .attit_2 0.325 0.046 7.101 0.000 ## .attit_3 0.349 0.047 7.477 0.000 ## .norm_1 0.490 0.064 7.702 0.000 ## .norm_2 0.524 0.065 8.025 0.000 ## .norm_3 0.600 0.069 8.652 0.000 ## .control_1 0.610 0.076 8.015 0.000 ## .control_2 0.861 0.090 9.580 0.000 ## .control_3 0.769 0.086 8.938 0.000 ## .intent 0.412 0.046 8.890 0.000 ## .behavior 0.603 0.054 11.180 0.000 ## attitudes 0.877 0.115 7.596 0.000 ## norms 0.757 0.112 6.733 0.000 ## control 0.500 0.098 5.076 0.000 summary(fit3) ## lavaan 0.6.16 ended normally after 31 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 27 ## ## Number of observations 250 ## ## Model Test User Model: ## ## Test statistic 48.757 ## Degrees of freedom 39 ## P-value (Chi-square) 0.136 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## attitudes =~ ## attit_1 1.000 ## attit_2 1.033 0.068 15.221 0.000 ## attit_3 -1.025 0.068 -15.097 0.000 ## norms =~ ## norm_1 1.000 ## norm_2 0.984 0.087 11.256 0.000 ## norm_3 0.955 0.088 10.881 0.000 ## control =~ ## control_1 1.000 ## control_2 0.859 0.127 6.789 0.000 ## control_3 0.997 0.131 7.609 0.000 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## intent ~ ## attitudes 0.447 0.063 7.100 0.000 ## norms 0.706 0.078 9.078 0.000 ## behavior ~ ## intent 0.563 0.063 8.923 0.000 ## control 0.454 0.119 3.805 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## attitudes ~~ ## norms 0.342 0.068 5.011 0.000 ## control 0.474 0.072 6.548 0.000 ## norms ~~ ## control 0.352 0.064 5.521 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 0.432 0.052 8.381 0.000 ## .attit_2 0.330 0.046 7.220 0.000 ## .attit_3 0.344 0.046 7.439 0.000 ## .norm_1 0.496 0.063 7.820 0.000 ## .norm_2 0.533 0.065 8.152 0.000 ## .norm_3 0.595 0.069 8.643 0.000 ## .control_1 0.625 0.075 8.372 0.000 ## .control_2 0.876 0.090 9.757 0.000 ## .control_3 0.746 0.084 8.874 0.000 ## .intent 0.409 0.047 8.769 0.000 ## .behavior 0.542 0.052 10.423 0.000 ## attitudes 0.872 0.115 7.566 0.000 ## norms 0.751 0.112 6.709 0.000 ## control 0.485 0.096 5.059 0.000 ## Do either of the restricted models fit worse than the partial mediation model? anova(fit1, fit2) anova(fit1, fit3) Click for explanation The above \\(\\Delta \\chi^2\\) tests tell us that the full mediation model fits significantly worse than the partial mediation model. Hence, forcing full mediation by fixing the direct effect to zero is an unreasonable restraint. The total effect model, on the other hand, does not fit significantly worse than the partial mediation model. So, we can conclude that removing the indirect effect and modeling the influence of control on behavior as an un-mediated direct association represents the data just as well as a model that allows for both indirect and direct effects. Hence, we should prefer the more parsimonious total effects model. Approach 2.2: Non-Nested Model Comparison Click to show code We can also use information criteria to compare our models. The two most popular information criteria are the Akaike’s Information Criterion (AIC) and the Bayesian Information Criterion (BIC). ## Which model is the most parsimonious representation of the data? AIC(fit1, fit2, fit3) BIC(fit1, fit2, fit3) Click for explanation While the effect tests and the nested model comparisons both lead us to prefer the non-mediated model, we cannot directly say that the complete mediation model fits significantly worse than the non-mediated model. We have not directly compared those two models, and we cannot do so with the \\(\\Delta \\chi^2\\). We cannot do such a test because these two models are not nested: we must both add and remove a path to get from one model specification to the other. Also, both models have the same degrees of freedom, so we cannot define a sampling distribution against which we would compare the \\(\\Delta \\chi^2\\), anyway. We can use information criteria to get around this problem, though. Information criteria can be used to compare both nested and non-nested models. These criteria are designed to rank models by balancing their fit to the data and their complexity. When comparing models based on information criteria, a lower value indicates a better model in the sense of a better balance of fit and parsimony. The above results show that both the AIC and the BIC agree that the no-mediation model is the best. Conclusion Click for explanation So, in the end, regardless of how we approach the question, all of our results suggest modeling perceived behavioral control as a direct, non-mediated predictor of condom use. End of In-Class Exercises "],["multiple-group-models.html", "7 Multiple Group Models", " 7 Multiple Group Models This week, you will cover multiple group modeling and measurement invariance testing in the SEM/CFA context. Homework before the lecture Watch the Lecture Recording for this week. Homework before the practical Complete the At-Home Exercises. Practical content During the practical you will work on the In-Class Exercises. "],["lecture-6.html", "7.1 Lecture", " 7.1 Lecture In this lecture, we will explore how you can incorporate grouping factors into your CFA and SEM analyses. We’ll cover three general topics: The multiple group modeling framework Measurement invariance testing Using multiple group models to test for moderation 7.1.1 Recordings Multiple Group Models Measurement Invariance Measurement Invariance Examples Moderation by Group 7.1.2 Slides You can download the lecture slides here "],["reading-6.html", "7.2 Reading", " 7.2 Reading There is no official reading this week. Please contemplate the following image instead. \\[\\\\[12pt]\\] "],["at-home-exercises-6.html", "7.3 At-Home Exercises", " 7.3 At-Home Exercises 7.3.1 Multiple-Group Path Analysis To fix ideas, we’ll start these practical exercises by re-running part of the moderation analysis from the Week 3 At-Home Exercises as a multiple group model. 7.3.1.1 Load the Sesam2.sav data. NOTE: Unless otherwise specified, all analyses in Section 7.3.1 use these data. Click to show code library(haven) # Read the data into an object called 'sesam2': sesam2 <- read_sav("Sesam2.sav") VIEWCAT is a nominal grouping variable, but it is represented as a numeric variable in the sesam2 data. The levels represent the following frequencies of Sesame Street viewership of the children in the data: VIEWCAT = 1: Rarely/Never VIEWCAT = 2: 2–3 times a week VIEWCAT = 3: 4–5 times a week VIEWCAT = 4: > 5 times a week We will use VIEWCAT as the grouping variable in our path model. To do so, we don’t really need to convert VIEWCAT into a factor, but, if we do, lavaan will give our groups meaningful labels in the output. That added clarity can be pretty helpful. 7.3.1.2 Convert VIEWCAT into a factor. Make sure that VIEWCAT = 1 is the reference group. Assign the factor labels denoted above. Click to show code library(dplyr) ## Store the old version for checking: tmp <- sesam2$VIEWCAT ## Convert 'VIEWCAT' to a factor: sesam2 <- mutate(sesam2, VIEWCAT = factor(VIEWCAT, labels = c("Rarely/never", "2-3 times per week", "4-5 times per week", "> 5 times per week") ) ) ## Check the conversion: table(old = tmp, new = sesam2$VIEWCAT, useNA = "always") ## new ## old Rarely/never 2-3 times per week 4-5 times per week > 5 times per week ## 1 25 0 0 0 ## 2 0 44 0 0 ## 3 0 0 57 0 ## 4 0 0 0 53 ## <NA> 0 0 0 0 ## new ## old <NA> ## 1 0 ## 2 0 ## 3 0 ## 4 0 ## <NA> 0 7.3.1.3 Create a conditional slopes plot to visualize the effect of AGE on POSTNUMB within each of the VIEWCAT groups. Based on this visualization, do you think it is reasonable to expect that VIEWCAT moderates the effect of AGE on POSTNUMB? Click to show code library(ggplot2) ggplot(sesam2, aes(AGE, POSTNUMB, color = VIEWCAT)) + geom_point() + geom_smooth(method = "lm", se = FALSE) Click for explanation The regression lines representing the conditional focal effects are not parallel, so there appears to be some level of moderation. That being said, the differences are pretty small, so the moderation may not be significant (i.e., the non-parallel regression lines may simply be reflecting sampling variability). We will use path analysis to test if VIEWCAT moderates the effect of AGE on POSTNUMB. This analysis will entail three steps: Estimate the unrestricted multiple-group model wherein we regress POSTNUMB onto AGE and specify VIEWCAT as the grouping factor. Estimate the restricted model wherein we constrain the AGE \\(\\rightarrow\\) POSTNUMB effect to be equal in all VIEWCAT groups. Conduct a \\(\\Delta \\chi^2\\) test to compare the fit of the two models. 7.3.1.4 Estimate the unrestricted path model described above. Include the intercept term in your model. Judging from the focal effects estimate in each group, do you think moderation is plausible? Click to show code library(lavaan) ## Estimate the additive model a view the results: out_full <- sem('POSTNUMB ~ 1 + AGE', data = sesam2, group = "VIEWCAT") summary(out_full) ## lavaan 0.6.16 ended normally after 1 iteration ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 12 ## ## Number of observations per group: ## Rarely/never 25 ## 4-5 times per week 57 ## > 5 times per week 53 ## 2-3 times per week 44 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## Test statistic for each group: ## Rarely/never 0.000 ## 4-5 times per week 0.000 ## > 5 times per week 0.000 ## 2-3 times per week 0.000 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## ## Group 1 [Rarely/never]: ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## POSTNUMB ~ ## AGE 0.747 0.239 3.118 0.002 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB -18.721 12.142 -1.542 0.123 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 73.285 20.728 3.536 0.000 ## ## ## Group 2 [4-5 times per week]: ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## POSTNUMB ~ ## AGE 0.554 0.234 2.369 0.018 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 4.861 12.178 0.399 0.690 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 135.923 25.461 5.339 0.000 ## ## ## Group 3 [> 5 times per week]: ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## POSTNUMB ~ ## AGE 0.405 0.214 1.894 0.058 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 15.676 11.249 1.394 0.163 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 115.942 22.523 5.148 0.000 ## ## ## Group 4 [2-3 times per week]: ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## POSTNUMB ~ ## AGE 0.729 0.255 2.855 0.004 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB -8.747 13.003 -0.673 0.501 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 112.019 23.882 4.690 0.000 Click for explanation There are some notable differences in the AGE \\(\\rightarrow\\) POSTNUMB focal effect between VIEWCAT groups. It looks like VIEWCAT could moderate the focal effect. 7.3.1.5 Estimate the restricted model described above. Equate the focal effect across all VIEWCAT groups. Click to show code ## Estimate the restricted model and view the results: out_res <- sem('POSTNUMB ~ 1 + c("b1", "b1", "b1", "b1") * AGE', data = sesam2, group = "VIEWCAT") summary(out_res) ## lavaan 0.6.16 ended normally after 38 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 12 ## Number of equality constraints 3 ## ## Number of observations per group: ## Rarely/never 25 ## 4-5 times per week 57 ## > 5 times per week 53 ## 2-3 times per week 44 ## ## Model Test User Model: ## ## Test statistic 1.486 ## Degrees of freedom 3 ## P-value (Chi-square) 0.685 ## Test statistic for each group: ## Rarely/never 0.413 ## 4-5 times per week 0.027 ## > 5 times per week 0.760 ## 2-3 times per week 0.287 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## ## Group 1 [Rarely/never]: ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## POSTNUMB ~ ## AGE (b1) 0.592 0.118 5.032 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB -10.966 6.154 -1.782 0.075 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 74.505 21.073 3.536 0.000 ## ## ## Group 2 [4-5 times per week]: ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## POSTNUMB ~ ## AGE (b1) 0.592 0.118 5.032 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 2.869 6.275 0.457 0.647 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 135.988 25.473 5.339 0.000 ## ## ## Group 3 [> 5 times per week]: ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## POSTNUMB ~ ## AGE (b1) 0.592 0.118 5.032 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 5.923 6.313 0.938 0.348 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 117.616 22.848 5.148 0.000 ## ## ## Group 4 [2-3 times per week]: ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## POSTNUMB ~ ## AGE (b1) 0.592 0.118 5.032 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB -1.826 6.157 -0.297 0.767 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .POSTNUMB 112.751 24.039 4.690 0.000 7.3.1.6 Test for moderation by comparing the full and restricted models from 7.3.1.4 and 7.3.1.5, respectively: Does VIEWCAT significantly moderate the effect of AGE on POSTNUMB? Click to show code ## Test for moderation: anova(out_full, out_res) Click for explanation No, VIEWCAT does not significantly moderate the effect of AGE on POSTNUMB (\\(\\Delta \\chi^2[3] = 1.486\\), \\(p = 0.685\\)). 7.3.2 Multiple-Group CFA In the next part of these exercises, we will estimate a multiple-group CFA to evaluate the measurement structure of a scale assessing Prolonged Grief Disorder. The relevant data are contained in the PGDdata2.txt file. This dataset consists of a grouping variable, Kin2 (with two levels: “partner” and “else”) and 5 items taken from the Inventory of Complicated Grief: Yearning Part of self died Difficulty accepting the loss Avoiding reminders of deceased Bitterness about the loss You can find more information about this scale in Boelen et al. (2010). 7.3.2.1 Load the PGDdata2.txt data. Use the read.table() function to load the data. Convert the missing values to NA via the na.strings argument. Retain the column labels via the header argument. Specify the field delimiter as the tab character (i.e., \"\\t\"). Exclude any cases with missing values on Kin2. NOTE: Unless otherwise specified, all analyses in Section 7.3.2 use these data. Click to show code ## Load the data: pgd <- read.table("PGDdata2.txt", na.strings = "-999", header = TRUE, sep = "\\t") %>% filter(!is.na(Kin2)) ## Check the results: head(pgd) summary(pdg) str(pgd) ## Kin2 b1pss1 b2pss2 b3pss3 ## Min. :0.0000 Min. :0.000 Min. :0.0000 Min. :0.0000 ## 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000 ## Median :1.0000 Median :1.000 Median :0.0000 Median :1.0000 ## Mean :0.6661 Mean :1.236 Mean :0.4622 Mean :0.9771 ## 3rd Qu.:1.0000 3rd Qu.:2.000 3rd Qu.:1.0000 3rd Qu.:1.0000 ## Max. :1.0000 Max. :5.000 Max. :3.0000 Max. :5.0000 ## NA's :1 ## b4pss4 b5pss5 ## Min. :0.000 Min. :0.0000 ## 1st Qu.:0.000 1st Qu.:0.0000 ## Median :1.000 Median :0.0000 ## Mean :1.009 Mean :0.6761 ## 3rd Qu.:2.000 3rd Qu.:1.0000 ## Max. :3.000 Max. :3.0000 ## NA's :1 ## 'data.frame': 569 obs. of 6 variables: ## $ Kin2 : int 0 0 1 1 0 1 1 1 1 1 ... ## $ b1pss1: int 1 1 1 1 1 2 1 3 1 1 ... ## $ b2pss2: int 1 0 1 0 1 2 1 2 0 0 ... ## $ b3pss3: int 1 0 1 1 2 2 1 2 1 1 ... ## $ b4pss4: int 1 1 1 1 0 2 2 3 0 1 ... ## $ b5pss5: int 1 0 0 0 0 1 2 3 0 0 ... 7.3.2.2 Run a single-group CFA wherein the five scale variables described above indicate a single latent factor. Do not include any grouping variable. Use the default settings in the cfa() function. Click to show code ## Define the model syntax: cfaMod <- 'grief =~ b1pss1 + b2pss2 + b3pss3 + b4pss4 + b5pss5' ## Estimate the model: out0 <- cfa(cfaMod, data = pgd) 7.3.2.3 Summarize the evaluate the fitted CFA Does the model fit well? Are the items homogeneously associated with the latent factor? Which item is most weakly associated with the latent factor? Click to show code ## Summarize the fitted model: summary(out0, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE) ## lavaan 0.6.16 ended normally after 19 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 10 ## ## Used Total ## Number of observations 567 569 ## ## Model Test User Model: ## ## Test statistic 8.110 ## Degrees of freedom 5 ## P-value (Chi-square) 0.150 ## ## Model Test Baseline Model: ## ## Test statistic 775.364 ## Degrees of freedom 10 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.996 ## Tucker-Lewis Index (TLI) 0.992 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -3219.918 ## Loglikelihood unrestricted model (H1) -3215.863 ## ## Akaike (AIC) 6459.836 ## Bayesian (BIC) 6503.240 ## Sample-size adjusted Bayesian (SABIC) 6471.495 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.033 ## 90 Percent confidence interval - lower 0.000 ## 90 Percent confidence interval - upper 0.073 ## P-value H_0: RMSEA <= 0.050 0.710 ## P-value H_0: RMSEA >= 0.080 0.023 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.018 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## grief =~ ## b1pss1 1.000 0.752 0.759 ## b2pss2 0.454 0.043 10.570 0.000 0.341 0.495 ## b3pss3 0.831 0.058 14.445 0.000 0.625 0.691 ## b4pss4 0.770 0.055 14.010 0.000 0.579 0.667 ## b5pss5 0.817 0.057 14.410 0.000 0.614 0.689 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .b1pss1 0.416 0.037 11.300 0.000 0.416 0.424 ## .b2pss2 0.358 0.023 15.549 0.000 0.358 0.755 ## .b3pss3 0.427 0.033 13.117 0.000 0.427 0.522 ## .b4pss4 0.419 0.031 13.599 0.000 0.419 0.555 ## .b5pss5 0.417 0.032 13.160 0.000 0.417 0.525 ## grief 0.565 0.059 9.514 0.000 1.000 1.000 ## ## R-Square: ## Estimate ## b1pss1 0.576 ## b2pss2 0.245 ## b3pss3 0.478 ## b4pss4 0.445 ## b5pss5 0.475 Click for explanation The model fits the data quite well (\\(\\chi^2[5] = 8.11\\), \\(p = 0.15\\), \\(\\textit{RMSEA} = 0.033\\), \\(\\textit{CFI} = 0.996\\), \\(\\textit{SRMR} = 0.018\\)). All of the indicators appear to be more-or-less equally good indicators of the latent factor except for b2pss2 which has a standardized factor loading of \\(\\lambda = 0.495\\) and \\(R^2 = 0.245\\). 7.3.2.4 Rerun the CFA from 7.3.2.2 as a multiple-group model. Use the Kin2 variable as the grouping factor. Do not place any equality constraints across groups. Click to show code out1 <- cfa(cfaMod, data = pgd, group = "Kin2") 7.3.2.5 Summarize the fitted multiple-group CFA from 7.3.2.4. Does the two-group model fit the data well? Do you notice any salient differences between the two sets of within-group estimates? Click to show code summary(out1, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE) ## lavaan 0.6.16 ended normally after 27 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 30 ## ## Number of observations per group: Used Total ## 0 188 190 ## 1 379 379 ## ## Model Test User Model: ## ## Test statistic 11.317 ## Degrees of freedom 10 ## P-value (Chi-square) 0.333 ## Test statistic for each group: ## 0 8.976 ## 1 2.340 ## ## Model Test Baseline Model: ## ## Test statistic 781.358 ## Degrees of freedom 20 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.998 ## Tucker-Lewis Index (TLI) 0.997 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -3206.363 ## Loglikelihood unrestricted model (H1) -3200.705 ## ## Akaike (AIC) 6472.727 ## Bayesian (BIC) 6602.937 ## Sample-size adjusted Bayesian (SABIC) 6507.701 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.022 ## 90 Percent confidence interval - lower 0.000 ## 90 Percent confidence interval - upper 0.070 ## P-value H_0: RMSEA <= 0.050 0.789 ## P-value H_0: RMSEA >= 0.080 0.018 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.017 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## ## Group 1 [0]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## grief =~ ## b1pss1 1.000 0.702 0.712 ## b2pss2 0.372 0.076 4.922 0.000 0.261 0.410 ## b3pss3 0.938 0.118 7.986 0.000 0.659 0.709 ## b4pss4 0.909 0.116 7.848 0.000 0.638 0.691 ## b5pss5 0.951 0.122 7.774 0.000 0.667 0.683 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .b1pss1 1.346 0.072 18.727 0.000 1.346 1.366 ## .b2pss2 0.441 0.046 9.499 0.000 0.441 0.693 ## .b3pss3 1.059 0.068 15.618 0.000 1.059 1.139 ## .b4pss4 1.122 0.067 16.671 0.000 1.122 1.216 ## .b5pss5 0.745 0.071 10.442 0.000 0.745 0.762 ## grief 0.000 0.000 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .b1pss1 0.478 0.067 7.118 0.000 0.478 0.493 ## .b2pss2 0.338 0.037 9.205 0.000 0.338 0.832 ## .b3pss3 0.430 0.060 7.170 0.000 0.430 0.498 ## .b4pss4 0.445 0.060 7.408 0.000 0.445 0.522 ## .b5pss5 0.511 0.068 7.519 0.000 0.511 0.534 ## grief 0.493 0.098 5.007 0.000 1.000 1.000 ## ## R-Square: ## Estimate ## b1pss1 0.507 ## b2pss2 0.168 ## b3pss3 0.502 ## b4pss4 0.478 ## b5pss5 0.466 ## ## ## Group 2 [1]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## grief =~ ## b1pss1 1.000 0.769 0.778 ## b2pss2 0.502 0.052 9.597 0.000 0.386 0.542 ## b3pss3 0.785 0.066 11.945 0.000 0.604 0.680 ## b4pss4 0.708 0.062 11.497 0.000 0.544 0.652 ## b5pss5 0.762 0.062 12.185 0.000 0.586 0.696 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .b1pss1 1.182 0.051 23.277 0.000 1.182 1.196 ## .b2pss2 0.475 0.037 12.973 0.000 0.475 0.666 ## .b3pss3 0.934 0.046 20.460 0.000 0.934 1.051 ## .b4pss4 0.955 0.043 22.270 0.000 0.955 1.144 ## .b5pss5 0.644 0.043 14.879 0.000 0.644 0.764 ## grief 0.000 0.000 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .b1pss1 0.385 0.043 8.862 0.000 0.385 0.394 ## .b2pss2 0.359 0.029 12.468 0.000 0.359 0.706 ## .b3pss3 0.425 0.039 11.025 0.000 0.425 0.538 ## .b4pss4 0.401 0.035 11.420 0.000 0.401 0.575 ## .b5pss5 0.366 0.034 10.767 0.000 0.366 0.516 ## grief 0.592 0.073 8.081 0.000 1.000 1.000 ## ## R-Square: ## Estimate ## b1pss1 0.606 ## b2pss2 0.294 ## b3pss3 0.462 ## b4pss4 0.425 ## b5pss5 0.484 Click for explanation The two-group model also fits the data very well (\\(\\chi^2[10] = 11.32\\), \\(p = 0.333\\), \\(\\textit{RMSEA} = 0.022\\), \\(\\textit{CFI} = 0.998\\), \\(\\textit{SRMR} = 0.017\\)). No, there are no striking differences between the two sets of estimates. Although there is certainly some variability between groups, the two sets of estimates don’t look systematically different. 7.3.2.6 Based on the above results, what can you conclude about configural, weak, and strong measurement invariance across the Kin2 groups? Click for explanation Configural invariance holds. The unrestricted multiple-group CFA fits the data adequately (very well, actually), and the measurement model parameters are reasonable in both groups. We cannot yet draw any conclusions about weak or strong invariance. We need to do the appropriate model comparison tests first. End of At-Home Exercises 7 "],["in-class-exercises-6.html", "7.4 In-Class Exercises", " 7.4 In-Class Exercises 7.4.1 Measurement Invariance We’ll now pick up where we left off with the At-Home Exercises by testing measurement invariance in the two-group CFA of prolonged grief disorder. As you saw in the lecture, measurement invariance testing allows us to empirically test for differences in the measurement model between the groups. If we can establish measurement invariance, we can draw the following (equivalent) conclusions: Our latent constructs are defined equivalently in all groups. The participants in every group are interpreting our items in the same way. Participants in different groups who have the same values for the observed indicators will also have the same score on the latent variable. Between-group differences in latent parameters are due to true differences in the underlying latent constructs and not caused by differences in measurement. Anytime we make between-group comparisons (e.g., ANOVA, t-tests, moderation by group, etc.) we assume invariant measurement. That is, we assume that the scores we’re comparing have the same meaning in each group. When doing multiple group SEM, however, we’re apprised of the incredibly powerful capability of actually testing this—very important, and often violated—assumption. The process of testing measurement invariance can get quite complex, but the basic procedure boils down to using model comparison tests to evaluate the plausibility of increasingly strong between-group constraints. For most problems, these constraints amount to the following three levels: Configural: The same pattern of free and fixed effects in all groups Weak (aka Metric): Configural + Equal factor loadings in all groups Strong (aka Scalar): Weak + Equal item intercepts in all groups You can read more about measurement invariance here and here, and you can find a brief discussion of how to conduct measurement invariance tests in lavaan here. 7.4.1.1 Load the PGDdata2.txt data as you did for the At-Home Exercises. NOTE: Unless otherwise specified, all analyses in Section 7.4.1 use these data. Click to show code library(dplyr) ## Load the data: pgd <- read.table("PGDdata2.txt", na.strings = "-999", header = TRUE, sep = "\\t") %>% filter(!is.na(Kin2)) ## Check the results: head(pgd) summary(pdg) str(pgd) ## Kin2 b1pss1 b2pss2 b3pss3 ## Min. :0.0000 Min. :0.000 Min. :0.0000 Min. :0.0000 ## 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000 ## Median :1.0000 Median :1.000 Median :0.0000 Median :1.0000 ## Mean :0.6661 Mean :1.236 Mean :0.4622 Mean :0.9771 ## 3rd Qu.:1.0000 3rd Qu.:2.000 3rd Qu.:1.0000 3rd Qu.:1.0000 ## Max. :1.0000 Max. :5.000 Max. :3.0000 Max. :5.0000 ## NA's :1 ## b4pss4 b5pss5 ## Min. :0.000 Min. :0.0000 ## 1st Qu.:0.000 1st Qu.:0.0000 ## Median :1.000 Median :0.0000 ## Mean :1.009 Mean :0.6761 ## 3rd Qu.:2.000 3rd Qu.:1.0000 ## Max. :3.000 Max. :3.0000 ## NA's :1 ## 'data.frame': 569 obs. of 6 variables: ## $ Kin2 : int 0 0 1 1 0 1 1 1 1 1 ... ## $ b1pss1: int 1 1 1 1 1 2 1 3 1 1 ... ## $ b2pss2: int 1 0 1 0 1 2 1 2 0 0 ... ## $ b3pss3: int 1 0 1 1 2 2 1 2 1 1 ... ## $ b4pss4: int 1 1 1 1 0 2 2 3 0 1 ... ## $ b5pss5: int 1 0 0 0 0 1 2 3 0 0 ... 7.4.1.2 Test configural, weak, and strong invariance using the multiple-group CFA from 7.3.2.4. What are your conclusions? Click to show code library(lavaan) library(semTools) # provides the compareFit() function ## Define the syntax for the CFA model: cfaMod <- 'grief =~ b1pss1 + b2pss2 + b3pss3 + b4pss4 + b5pss5' ## Estimate the configural model: configOut <- cfa(cfaMod, data = pgd, group = "Kin2") ## Estimate the weak invariance model: weakOut <- cfa(cfaMod, data = pgd, group = "Kin2", group.equal = "loadings") ## Estimate the strong invariance model: strongOut <- cfa(cfaMod, data = pgd, group = "Kin2", group.equal = c("loadings", "intercepts") ) summary(configOut) ## lavaan 0.6.16 ended normally after 27 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 30 ## ## Number of observations per group: Used Total ## 0 188 190 ## 1 379 379 ## ## Model Test User Model: ## ## Test statistic 11.317 ## Degrees of freedom 10 ## P-value (Chi-square) 0.333 ## Test statistic for each group: ## 0 8.976 ## 1 2.340 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## ## Group 1 [0]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## grief =~ ## b1pss1 1.000 ## b2pss2 0.372 0.076 4.922 0.000 ## b3pss3 0.938 0.118 7.986 0.000 ## b4pss4 0.909 0.116 7.848 0.000 ## b5pss5 0.951 0.122 7.774 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .b1pss1 1.346 0.072 18.727 0.000 ## .b2pss2 0.441 0.046 9.499 0.000 ## .b3pss3 1.059 0.068 15.618 0.000 ## .b4pss4 1.122 0.067 16.671 0.000 ## .b5pss5 0.745 0.071 10.442 0.000 ## grief 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .b1pss1 0.478 0.067 7.118 0.000 ## .b2pss2 0.338 0.037 9.205 0.000 ## .b3pss3 0.430 0.060 7.170 0.000 ## .b4pss4 0.445 0.060 7.408 0.000 ## .b5pss5 0.511 0.068 7.519 0.000 ## grief 0.493 0.098 5.007 0.000 ## ## ## Group 2 [1]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## grief =~ ## b1pss1 1.000 ## b2pss2 0.502 0.052 9.597 0.000 ## b3pss3 0.785 0.066 11.945 0.000 ## b4pss4 0.708 0.062 11.497 0.000 ## b5pss5 0.762 0.062 12.185 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .b1pss1 1.182 0.051 23.277 0.000 ## .b2pss2 0.475 0.037 12.973 0.000 ## .b3pss3 0.934 0.046 20.460 0.000 ## .b4pss4 0.955 0.043 22.270 0.000 ## .b5pss5 0.644 0.043 14.879 0.000 ## grief 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .b1pss1 0.385 0.043 8.862 0.000 ## .b2pss2 0.359 0.029 12.468 0.000 ## .b3pss3 0.425 0.039 11.025 0.000 ## .b4pss4 0.401 0.035 11.420 0.000 ## .b5pss5 0.366 0.034 10.767 0.000 ## grief 0.592 0.073 8.081 0.000 summary(weakOut) ## lavaan 0.6.16 ended normally after 22 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 30 ## Number of equality constraints 4 ## ## Number of observations per group: Used Total ## 0 188 190 ## 1 379 379 ## ## Model Test User Model: ## ## Test statistic 19.275 ## Degrees of freedom 14 ## P-value (Chi-square) 0.155 ## Test statistic for each group: ## 0 14.525 ## 1 4.751 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## ## Group 1 [0]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## grief =~ ## b1pss1 1.000 ## b2pss2 (.p2.) 0.457 0.043 10.680 0.000 ## b3pss3 (.p3.) 0.824 0.057 14.374 0.000 ## b4pss4 (.p4.) 0.756 0.054 13.890 0.000 ## b5pss5 (.p5.) 0.805 0.056 14.388 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .b1pss1 1.346 0.073 18.353 0.000 ## .b2pss2 0.441 0.049 9.045 0.000 ## .b3pss3 1.059 0.067 15.828 0.000 ## .b4pss4 1.122 0.066 17.102 0.000 ## .b5pss5 0.745 0.070 10.688 0.000 ## grief 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .b1pss1 0.434 0.064 6.779 0.000 ## .b2pss2 0.327 0.037 8.898 0.000 ## .b3pss3 0.449 0.058 7.786 0.000 ## .b4pss4 0.480 0.058 8.209 0.000 ## .b5pss5 0.539 0.066 8.207 0.000 ## grief 0.577 0.086 6.691 0.000 ## ## ## Group 2 [1]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## grief =~ ## b1pss1 1.000 ## b2pss2 (.p2.) 0.457 0.043 10.680 0.000 ## b3pss3 (.p3.) 0.824 0.057 14.374 0.000 ## b4pss4 (.p4.) 0.756 0.054 13.890 0.000 ## b5pss5 (.p5.) 0.805 0.056 14.388 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .b1pss1 1.182 0.050 23.496 0.000 ## .b2pss2 0.475 0.036 13.281 0.000 ## .b3pss3 0.934 0.046 20.325 0.000 ## .b4pss4 0.955 0.043 21.999 0.000 ## .b5pss5 0.644 0.044 14.731 0.000 ## grief 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .b1pss1 0.399 0.042 9.474 0.000 ## .b2pss2 0.367 0.029 12.831 0.000 ## .b3pss3 0.420 0.038 11.019 0.000 ## .b4pss4 0.394 0.035 11.304 0.000 ## .b5pss5 0.361 0.034 10.686 0.000 ## grief 0.561 0.065 8.564 0.000 summary(strongOut) ## lavaan 0.6.16 ended normally after 26 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 31 ## Number of equality constraints 9 ## ## Number of observations per group: Used Total ## 0 188 190 ## 1 379 379 ## ## Model Test User Model: ## ## Test statistic 23.968 ## Degrees of freedom 18 ## P-value (Chi-square) 0.156 ## Test statistic for each group: ## 0 17.123 ## 1 6.846 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## ## Group 1 [0]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## grief =~ ## b1pss1 1.000 ## b2pss2 (.p2.) 0.449 0.042 10.562 0.000 ## b3pss3 (.p3.) 0.824 0.057 14.467 0.000 ## b4pss4 (.p4.) 0.760 0.054 14.007 0.000 ## b5pss5 (.p5.) 0.803 0.056 14.457 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .b1pss1 (.12.) 1.332 0.066 20.221 0.000 ## .b2pss2 (.13.) 0.505 0.037 13.718 0.000 ## .b3pss3 (.14.) 1.055 0.057 18.601 0.000 ## .b4pss4 (.15.) 1.081 0.053 20.237 0.000 ## .b5pss5 (.16.) 0.756 0.056 13.525 0.000 ## grief 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .b1pss1 0.435 0.064 6.773 0.000 ## .b2pss2 0.332 0.037 8.937 0.000 ## .b3pss3 0.448 0.058 7.776 0.000 ## .b4pss4 0.480 0.059 8.190 0.000 ## .b5pss5 0.539 0.066 8.207 0.000 ## grief 0.579 0.086 6.701 0.000 ## ## ## Group 2 [1]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## grief =~ ## b1pss1 1.000 ## b2pss2 (.p2.) 0.449 0.042 10.562 0.000 ## b3pss3 (.p3.) 0.824 0.057 14.467 0.000 ## b4pss4 (.p4.) 0.760 0.054 14.007 0.000 ## b5pss5 (.p5.) 0.803 0.056 14.457 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .b1pss1 (.12.) 1.332 0.066 20.221 0.000 ## .b2pss2 (.13.) 0.505 0.037 13.718 0.000 ## .b3pss3 (.14.) 1.055 0.057 18.601 0.000 ## .b4pss4 (.15.) 1.081 0.053 20.237 0.000 ## .b5pss5 (.16.) 0.756 0.056 13.525 0.000 ## grief -0.144 0.076 -1.911 0.056 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .b1pss1 0.398 0.042 9.469 0.000 ## .b2pss2 0.370 0.029 12.872 0.000 ## .b3pss3 0.419 0.038 11.014 0.000 ## .b4pss4 0.393 0.035 11.276 0.000 ## .b5pss5 0.361 0.034 10.700 0.000 ## grief 0.561 0.065 8.586 0.000 ## Test invariance through model comparison tests: compareFit(configOut, weakOut, strongOut) %>% summary() ## ################### Nested Model Comparison ######################### ## ## Chi-Squared Difference Test ## ## Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq) ## configOut 10 6472.7 6602.9 11.317 ## weakOut 14 6472.7 6585.5 19.275 7.9585 0.059083 4 0.09311 . ## strongOut 18 6469.4 6564.9 23.968 4.6931 0.024722 4 0.32026 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## ####################### Model Fit Indices ########################### ## chisq df pvalue rmsea cfi tli srmr aic bic ## configOut 11.317† 10 .333 .022† 0.998† 0.997† .017† 6472.727 6602.937 ## weakOut 19.275 14 .155 .036 .993 .990 .038 6472.685 6585.534 ## strongOut 23.968 18 .156 .034 .992 .991 .042 6469.378† 6564.866† ## ## ################## Differences in Fit Indices ####################### ## df rmsea cfi tli srmr aic bic ## weakOut - configOut 4 0.015 -0.005 -0.006 0.021 -0.041 -17.403 ## strongOut - weakOut 4 -0.002 -0.001 0.001 0.004 -3.307 -20.668 Click for explanation Configural invariance holds. The unrestricted, two-group model fits the data very well (\\(\\chi^2[10] = 11.32\\), \\(p = 0.333\\), \\(\\textit{RMSEA} = 0.022\\), \\(\\textit{CFI} = 0.998\\), \\(\\textit{SRMR} = 0.017\\)). Weak invariance holds. The model comparison test shows a non-significant loss of fit between the configural and weak models (\\(\\Delta \\chi^2[4] = 7.96\\), \\(p = 0.093\\)). Strong invariance holds. The model comparison test shows a non-significant loss of fit between the weak and strong models (\\(\\Delta \\chi^2[4] = 4.69\\), \\(p = 0.32\\)). 7.4.2 Testing Between-Group Differences Once we establish strong invariance, we have empirical evidence that latent mean levels are comparable across groups. Hence, we can test for differences in those latent means. In this section, we’ll conduct the equivalent of a t-test using our two-group CFA model. More specifically, we want to know if the latent mean of grief differs significantly between the Kin2 groups. The null and alternative hypotheses for this test are as follows: \\[ H_0: \\alpha_1 = \\alpha_2\\\\ H_1: \\alpha_1 \\neq \\alpha_2 \\] Where, \\(\\alpha_1\\) and \\(\\alpha_2\\) represent the latent means in Group 1 and Group 2, respectively. 7.4.2.1 Use the strongly invariant model from 7.4.1.2 to test the latent mean difference described above. What is your conclusion? Click to show code ## Estimate the model with the latent means equated: resOut <- cfa(cfaMod, data = pgd, group = "Kin2", group.equal = c("loadings", "intercepts", "means") ) ## Check the results: summary(resOut) ## lavaan 0.6.16 ended normally after 24 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 30 ## Number of equality constraints 9 ## ## Number of observations per group: Used Total ## 0 188 190 ## 1 379 379 ## ## Model Test User Model: ## ## Test statistic 27.615 ## Degrees of freedom 19 ## P-value (Chi-square) 0.091 ## Test statistic for each group: ## 0 19.755 ## 1 7.860 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## ## Group 1 [0]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## grief =~ ## b1pss1 1.000 ## b2pss2 (.p2.) 0.451 0.043 10.591 0.000 ## b3pss3 (.p3.) 0.824 0.057 14.450 0.000 ## b4pss4 (.p4.) 0.759 0.054 13.976 0.000 ## b5pss5 (.p5.) 0.804 0.056 14.448 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .b1pss1 (.12.) 1.234 0.042 29.668 0.000 ## .b2pss2 (.13.) 0.461 0.029 15.988 0.000 ## .b3pss3 (.14.) 0.974 0.038 25.664 0.000 ## .b4pss4 (.15.) 1.006 0.036 27.699 0.000 ## .b5pss5 (.16.) 0.676 0.037 18.267 0.000 ## grief 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .b1pss1 0.436 0.064 6.770 0.000 ## .b2pss2 0.331 0.037 8.925 0.000 ## .b3pss3 0.448 0.058 7.767 0.000 ## .b4pss4 0.482 0.059 8.194 0.000 ## .b5pss5 0.538 0.066 8.197 0.000 ## grief 0.588 0.088 6.714 0.000 ## ## ## Group 2 [1]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## grief =~ ## b1pss1 1.000 ## b2pss2 (.p2.) 0.451 0.043 10.591 0.000 ## b3pss3 (.p3.) 0.824 0.057 14.450 0.000 ## b4pss4 (.p4.) 0.759 0.054 13.976 0.000 ## b5pss5 (.p5.) 0.804 0.056 14.448 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .b1pss1 (.12.) 1.234 0.042 29.668 0.000 ## .b2pss2 (.13.) 0.461 0.029 15.988 0.000 ## .b3pss3 (.14.) 0.974 0.038 25.664 0.000 ## .b4pss4 (.15.) 1.006 0.036 27.699 0.000 ## .b5pss5 (.16.) 0.676 0.037 18.267 0.000 ## grief 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .b1pss1 0.399 0.042 9.469 0.000 ## .b2pss2 0.370 0.029 12.863 0.000 ## .b3pss3 0.419 0.038 11.012 0.000 ## .b4pss4 0.394 0.035 11.283 0.000 ## .b5pss5 0.361 0.034 10.691 0.000 ## grief 0.563 0.066 8.584 0.000 ## Test the mean differences via a LRT: anova(strongOut, resOut) Click for explanation We can equate the latent means by specifying the group.equal = \"means\" argument in cfa(). Then, we simply test this constrained model against the strong invariance model to get our test of mean differences. In this case, the means of the grief factor do not significantly differ between Kin2 groups (\\(\\Delta \\chi^2[1] = 3.65\\), \\(p = 0.056\\)). 7.4.3 Multiple-Group SEM for Moderation Now, we’re going to revisit the TORA model from the Week 6 In-Class Exercises, and use a multiple-group model to test the moderating effect of sex. 7.4.3.1 Load the data contained in the toradata.csv file. Click to show code condom <- read.csv("toradata.csv", stringsAsFactors = TRUE) Before we get to any moderation tests, however, we first need to establish measurement invariance. The first step in any multiple-group analysis that includes latent variables is measurment invariance testing. 7.4.3.2 Test for measurement invariance across sex groups in the three latent variables of the TORA model from 6.4.2. Test configural, weak, and strong invariance. Test for invariance in all three latent factors simultaneously. Is full measurement invariance (i.e., up to and including strong invariance) supported? Click to show code tora_cfa <- ' attitudes =~ attit_1 + attit_2 + attit_3 norms =~ norm_1 + norm_2 + norm_3 control =~ control_1 + control_2 + control_3 ' ## Estimate the models: config <- cfa(tora_cfa, data = condom, group = "sex") weak <- cfa(tora_cfa, data = condom, group = "sex", group.equal = "loadings") strong <- cfa(tora_cfa, data = condom, group = "sex", group.equal = c("loadings", "intercepts") ) ## Check that everything went well: summary(config) ## lavaan 0.6.16 ended normally after 54 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 60 ## ## Number of observations per group: ## woman 161 ## man 89 ## ## Model Test User Model: ## ## Test statistic 66.565 ## Degrees of freedom 48 ## P-value (Chi-square) 0.039 ## Test statistic for each group: ## woman 42.623 ## man 23.941 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## ## Group 1 [woman]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## attitudes =~ ## attit_1 1.000 ## attit_2 1.005 0.075 13.427 0.000 ## attit_3 -0.965 0.075 -12.878 0.000 ## norms =~ ## norm_1 1.000 ## norm_2 0.952 0.101 9.470 0.000 ## norm_3 0.879 0.101 8.742 0.000 ## control =~ ## control_1 1.000 ## control_2 0.794 0.144 5.526 0.000 ## control_3 0.989 0.152 6.523 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## attitudes ~~ ## norms 0.450 0.087 5.200 0.000 ## control 0.468 0.089 5.249 0.000 ## norms ~~ ## control 0.387 0.079 4.912 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 2.839 0.090 31.702 0.000 ## .attit_2 2.907 0.084 34.728 0.000 ## .attit_3 3.174 0.084 37.969 0.000 ## .norm_1 2.832 0.080 35.342 0.000 ## .norm_2 2.832 0.079 35.775 0.000 ## .norm_3 2.795 0.081 34.694 0.000 ## .control_1 2.851 0.082 34.755 0.000 ## .control_2 2.857 0.081 35.104 0.000 ## .control_3 2.888 0.081 35.877 0.000 ## attitudes 0.000 ## norms 0.000 ## control 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 0.398 0.059 6.739 0.000 ## .attit_2 0.227 0.045 5.011 0.000 ## .attit_3 0.294 0.048 6.092 0.000 ## .norm_1 0.346 0.065 5.328 0.000 ## .norm_2 0.385 0.065 5.962 0.000 ## .norm_3 0.513 0.072 7.108 0.000 ## .control_1 0.587 0.090 6.531 0.000 ## .control_2 0.754 0.096 7.815 0.000 ## .control_3 0.557 0.086 6.453 0.000 ## attitudes 0.893 0.142 6.273 0.000 ## norms 0.688 0.121 5.706 0.000 ## control 0.497 0.119 4.184 0.000 ## ## ## Group 2 [man]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## attitudes =~ ## attit_1 1.000 ## attit_2 1.167 0.149 7.843 0.000 ## attit_3 -1.060 0.142 -7.443 0.000 ## norms =~ ## norm_1 1.000 ## norm_2 1.070 0.215 4.965 0.000 ## norm_3 0.922 0.189 4.869 0.000 ## control =~ ## control_1 1.000 ## control_2 0.995 0.290 3.435 0.001 ## control_3 0.949 0.285 3.332 0.001 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## attitudes ~~ ## norms 0.086 0.103 0.837 0.403 ## control 0.388 0.113 3.430 0.001 ## norms ~~ ## control 0.200 0.101 1.976 0.048 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 3.270 0.117 28.063 0.000 ## .attit_2 3.180 0.128 24.905 0.000 ## .attit_3 2.787 0.126 22.187 0.000 ## .norm_1 3.236 0.131 24.692 0.000 ## .norm_2 3.337 0.132 25.293 0.000 ## .norm_3 3.303 0.131 25.136 0.000 ## .control_1 3.157 0.111 28.415 0.000 ## .control_2 3.135 0.129 24.249 0.000 ## .control_3 3.213 0.130 24.805 0.000 ## attitudes 0.000 ## norms 0.000 ## control 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 0.440 0.095 4.631 0.000 ## .attit_2 0.405 0.109 3.699 0.000 ## .attit_3 0.541 0.112 4.822 0.000 ## .norm_1 0.740 0.175 4.230 0.000 ## .norm_2 0.647 0.182 3.555 0.000 ## .norm_3 0.868 0.175 4.972 0.000 ## .control_1 0.673 0.146 4.602 0.000 ## .control_2 1.066 0.197 5.417 0.000 ## .control_3 1.110 0.199 5.582 0.000 ## attitudes 0.768 0.182 4.220 0.000 ## norms 0.788 0.242 3.259 0.001 ## control 0.426 0.168 2.537 0.011 summary(weak) ## lavaan 0.6.16 ended normally after 37 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 60 ## Number of equality constraints 6 ## ## Number of observations per group: ## woman 161 ## man 89 ## ## Model Test User Model: ## ## Test statistic 68.557 ## Degrees of freedom 54 ## P-value (Chi-square) 0.088 ## Test statistic for each group: ## woman 43.148 ## man 25.409 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## ## Group 1 [woman]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## attitudes =~ ## attit_1 1.000 ## attit_2 (.p2.) 1.048 0.068 15.413 0.000 ## attit_3 (.p3.) -0.995 0.067 -14.762 0.000 ## norms =~ ## norm_1 1.000 ## norm_2 (.p5.) 0.977 0.091 10.708 0.000 ## norm_3 (.p6.) 0.889 0.089 9.996 0.000 ## control =~ ## cntrl_1 1.000 ## cntrl_2 (.p8.) 0.843 0.130 6.506 0.000 ## cntrl_3 (.p9.) 0.983 0.135 7.306 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## attitudes ~~ ## norms 0.431 0.082 5.256 0.000 ## control 0.450 0.083 5.395 0.000 ## norms ~~ ## control 0.378 0.075 5.031 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 2.839 0.088 32.194 0.000 ## .attit_2 2.907 0.084 34.445 0.000 ## .attit_3 3.174 0.084 37.913 0.000 ## .norm_1 2.832 0.080 35.504 0.000 ## .norm_2 2.832 0.080 35.571 0.000 ## .norm_3 2.795 0.080 34.727 0.000 ## .control_1 2.851 0.082 34.882 0.000 ## .control_2 2.857 0.082 34.746 0.000 ## .control_3 2.888 0.080 36.037 0.000 ## attitudes 0.000 ## norms 0.000 ## control 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 0.408 0.058 6.987 0.000 ## .attit_2 0.221 0.045 4.896 0.000 ## .attit_3 0.293 0.048 6.128 0.000 ## .norm_1 0.353 0.063 5.580 0.000 ## .norm_2 0.380 0.064 5.952 0.000 ## .norm_3 0.512 0.071 7.178 0.000 ## .control_1 0.590 0.088 6.695 0.000 ## .control_2 0.744 0.096 7.731 0.000 ## .control_3 0.565 0.085 6.663 0.000 ## attitudes 0.844 0.129 6.540 0.000 ## norms 0.672 0.113 5.921 0.000 ## control 0.485 0.110 4.417 0.000 ## ## ## Group 2 [man]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## attitudes =~ ## attit_1 1.000 ## attit_2 (.p2.) 1.048 0.068 15.413 0.000 ## attit_3 (.p3.) -0.995 0.067 -14.762 0.000 ## norms =~ ## norm_1 1.000 ## norm_2 (.p5.) 0.977 0.091 10.708 0.000 ## norm_3 (.p6.) 0.889 0.089 9.996 0.000 ## control =~ ## cntrl_1 1.000 ## cntrl_2 (.p8.) 0.843 0.130 6.506 0.000 ## cntrl_3 (.p9.) 0.983 0.135 7.306 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## attitudes ~~ ## norms 0.092 0.114 0.807 0.420 ## control 0.425 0.109 3.912 0.000 ## norms ~~ ## control 0.217 0.103 2.100 0.036 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 3.270 0.120 27.254 0.000 ## .attit_2 3.180 0.125 25.501 0.000 ## .attit_3 2.787 0.125 22.275 0.000 ## .norm_1 3.236 0.132 24.423 0.000 ## .norm_2 3.337 0.130 25.610 0.000 ## .norm_3 3.303 0.132 25.086 0.000 ## .control_1 3.157 0.112 28.208 0.000 ## .control_2 3.135 0.127 24.750 0.000 ## .control_3 3.213 0.131 24.540 0.000 ## attitudes 0.000 ## norms 0.000 ## control 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 0.419 0.093 4.528 0.000 ## .attit_2 0.438 0.099 4.436 0.000 ## .attit_3 0.540 0.107 5.057 0.000 ## .norm_1 0.704 0.158 4.456 0.000 ## .norm_2 0.692 0.153 4.520 0.000 ## .norm_3 0.864 0.164 5.271 0.000 ## .control_1 0.668 0.139 4.797 0.000 ## .control_2 1.110 0.186 5.960 0.000 ## .control_3 1.094 0.193 5.663 0.000 ## attitudes 0.862 0.166 5.200 0.000 ## norms 0.859 0.193 4.443 0.000 ## control 0.447 0.137 3.260 0.001 summary(strong) ## lavaan 0.6.16 ended normally after 60 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 63 ## Number of equality constraints 15 ## ## Number of observations per group: ## woman 161 ## man 89 ## ## Model Test User Model: ## ## Test statistic 72.050 ## Degrees of freedom 60 ## P-value (Chi-square) 0.137 ## Test statistic for each group: ## woman 43.961 ## man 28.089 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## ## Group 1 [woman]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## attitudes =~ ## attit_1 1.000 ## attit_2 (.p2.) 1.028 0.065 15.693 0.000 ## attit_3 (.p3.) -0.990 0.065 -15.114 0.000 ## norms =~ ## norm_1 1.000 ## norm_2 (.p5.) 0.998 0.089 11.182 0.000 ## norm_3 (.p6.) 0.918 0.088 10.467 0.000 ## control =~ ## cntrl_1 1.000 ## cntrl_2 (.p8.) 0.848 0.126 6.736 0.000 ## cntrl_3 (.p9.) 0.987 0.131 7.558 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## attitudes ~~ ## norms 0.428 0.081 5.259 0.000 ## control 0.454 0.084 5.438 0.000 ## norms ~~ ## control 0.372 0.073 5.060 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 (.25.) 2.864 0.085 33.535 0.000 ## .attit_2 (.26.) 2.887 0.083 34.826 0.000 ## .attit_3 (.27.) 3.166 0.082 38.500 0.000 ## .norm_1 (.28.) 2.816 0.078 36.330 0.000 ## .norm_2 (.29.) 2.838 0.078 36.453 0.000 ## .norm_3 (.30.) 2.812 0.078 36.253 0.000 ## .cntrl_1 (.31.) 2.847 0.078 36.562 0.000 ## .cntrl_2 (.32.) 2.859 0.076 37.381 0.000 ## .cntrl_3 (.33.) 2.891 0.077 37.531 0.000 ## attitds 0.000 ## norms 0.000 ## control 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 0.405 0.058 6.931 0.000 ## .attit_2 0.226 0.045 5.051 0.000 ## .attit_3 0.291 0.048 6.084 0.000 ## .norm_1 0.362 0.062 5.795 0.000 ## .norm_2 0.377 0.064 5.931 0.000 ## .norm_3 0.506 0.071 7.109 0.000 ## .control_1 0.592 0.088 6.743 0.000 ## .control_2 0.743 0.096 7.739 0.000 ## .control_3 0.566 0.085 6.684 0.000 ## attitudes 0.861 0.130 6.607 0.000 ## norms 0.650 0.109 5.950 0.000 ## control 0.482 0.108 4.477 0.000 ## ## ## Group 2 [man]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## attitudes =~ ## attit_1 1.000 ## attit_2 (.p2.) 1.028 0.065 15.693 0.000 ## attit_3 (.p3.) -0.990 0.065 -15.114 0.000 ## norms =~ ## norm_1 1.000 ## norm_2 (.p5.) 0.998 0.089 11.182 0.000 ## norm_3 (.p6.) 0.918 0.088 10.467 0.000 ## control =~ ## cntrl_1 1.000 ## cntrl_2 (.p8.) 0.848 0.126 6.736 0.000 ## cntrl_3 (.p9.) 0.987 0.131 7.558 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## attitudes ~~ ## norms 0.093 0.113 0.825 0.409 ## control 0.428 0.109 3.926 0.000 ## norms ~~ ## control 0.213 0.101 2.102 0.036 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 (.25.) 2.864 0.085 33.535 0.000 ## .attit_2 (.26.) 2.887 0.083 34.826 0.000 ## .attit_3 (.27.) 3.166 0.082 38.500 0.000 ## .norm_1 (.28.) 2.816 0.078 36.330 0.000 ## .norm_2 (.29.) 2.838 0.078 36.453 0.000 ## .norm_3 (.30.) 2.812 0.078 36.253 0.000 ## .cntrl_1 (.31.) 2.847 0.078 36.562 0.000 ## .cntrl_2 (.32.) 2.859 0.076 37.381 0.000 ## .cntrl_3 (.33.) 2.891 0.077 37.531 0.000 ## attitds 0.356 0.133 2.680 0.007 ## norms 0.480 0.133 3.602 0.000 ## control 0.318 0.116 2.733 0.006 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .attit_1 0.420 0.094 4.484 0.000 ## .attit_2 0.456 0.100 4.557 0.000 ## .attit_3 0.537 0.107 5.023 0.000 ## .norm_1 0.724 0.157 4.599 0.000 ## .norm_2 0.686 0.153 4.489 0.000 ## .norm_3 0.859 0.165 5.220 0.000 ## .control_1 0.669 0.139 4.821 0.000 ## .control_2 1.109 0.186 5.958 0.000 ## .control_3 1.094 0.193 5.664 0.000 ## attitudes 0.872 0.167 5.214 0.000 ## norms 0.830 0.186 4.455 0.000 ## control 0.445 0.136 3.280 0.001 ## Test measurement invariance: compareFit(config, weak, strong) %>% summary() ## ################### Nested Model Comparison ######################### ## ## Chi-Squared Difference Test ## ## Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq) ## config 48 6021.5 6232.8 66.565 ## weak 54 6011.5 6201.6 68.557 1.9924 0 6 0.9204 ## strong 60 6003.0 6172.0 72.050 3.4934 0 6 0.7448 ## ## ####################### Model Fit Indices ########################### ## chisq df pvalue rmsea cfi tli srmr aic bic ## config 66.565† 48 .039 .056 .979 .968 .048† 6021.476 6232.764 ## weak 68.557 54 .088 .046 .983 .978 .050 6011.469 6201.628 ## strong 72.050 60 .137 .040† .986† .983† .051 6002.962† 6171.992† ## ## ################## Differences in Fit Indices ####################### ## df rmsea cfi tli srmr aic bic ## weak - config 6 -0.009 0.005 0.010 0.003 -10.008 -31.136 ## strong - weak 6 -0.006 0.003 0.006 0.001 -8.507 -29.635 ## Make sure the strongly invariant model still fits well in an absolute sense: fitMeasures(strong) ## npar fmin chisq ## 48.000 0.144 72.050 ## df pvalue baseline.chisq ## 60.000 0.137 948.362 ## baseline.df baseline.pvalue cfi ## 72.000 0.000 0.986 ## tli nnfi rfi ## 0.983 0.983 0.909 ## nfi pnfi ifi ## 0.924 0.770 0.986 ## rni logl unrestricted.logl ## 0.986 -2953.481 -2917.456 ## aic bic ntotal ## 6002.962 6171.992 250.000 ## bic2 rmsea rmsea.ci.lower ## 6019.828 0.040 0.000 ## rmsea.ci.upper rmsea.ci.level rmsea.pvalue ## 0.071 0.900 0.669 ## rmsea.close.h0 rmsea.notclose.pvalue rmsea.notclose.h0 ## 0.050 0.013 0.080 ## rmr rmr_nomean srmr ## 0.062 0.067 0.051 ## srmr_bentler srmr_bentler_nomean crmr ## 0.051 0.055 0.054 ## crmr_nomean srmr_mplus srmr_mplus_nomean ## 0.059 0.052 0.054 ## cn_05 cn_01 gfi ## 275.398 307.658 0.997 ## agfi pgfi mfi ## 0.994 0.554 0.976 ## ecvi ## 0.672 Click for explanation Yes, we have been able to establish full measurement invariance. Configural invariance holds. The unrestricted, multiple-group model fits the data well (\\(\\chi^2[48] = 66.56\\), \\(p = 0.039\\), \\(\\textit{RMSEA} = 0.056\\), \\(\\textit{CFI} = 0.979\\), \\(\\textit{SRMR} = 0.048\\)). Weak invariance holds. The model comparison test shows a non-significant loss of fit between the configural and weak models (\\(\\Delta \\chi^2[6] = 1.99\\), \\(p = 0.92\\)). Strong invariance holds. The model comparison test shows a non-significant loss of fit between the weak and strong models (\\(\\Delta \\chi^2[6] = 3.49\\), \\(p = 0.745\\)). The strongly invariant model still fits the data well (\\(\\chi^2[48] = 66.56\\), \\(p = 0.039\\), \\(\\textit{RMSEA} = 0.056\\), \\(\\textit{CFI} = 0.979\\), \\(\\textit{SRMR} = 0.048\\)). Once we’ve established measurement invariance, we can move on to testing hypotheses about between-group differences secure in the knowledge that our latent factors represent the same hypothetical constructs in all groups. 7.4.3.3 Estimate the full TORA model from 6.4.4 as a multiple-group model. Use sex as the grouping variables. Keep the strong invariance constraints in place. Click to show code ## Add the structural paths to the model: tora_sem <- paste(tora_cfa, 'intent ~ attitudes + norms behavior ~ intent + control', sep = '\\n') ## Estimate the model: toraOut <- sem(tora_sem, data = condom, group = "sex", group.equal = c("loadings", "intercepts") ) ## Check the results: summary(toraOut, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE) ## lavaan 0.6.16 ended normally after 62 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 79 ## Number of equality constraints 17 ## ## Number of observations per group: ## woman 161 ## man 89 ## ## Model Test User Model: ## ## Test statistic 141.903 ## Degrees of freedom 92 ## P-value (Chi-square) 0.001 ## Test statistic for each group: ## woman 83.870 ## man 58.033 ## ## Model Test Baseline Model: ## ## Test statistic 1378.913 ## Degrees of freedom 110 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.961 ## Tucker-Lewis Index (TLI) 0.953 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -3470.878 ## Loglikelihood unrestricted model (H1) -3399.927 ## ## Akaike (AIC) 7065.756 ## Bayesian (BIC) 7284.087 ## Sample-size adjusted Bayesian (SABIC) 7087.541 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.066 ## 90 Percent confidence interval - lower 0.043 ## 90 Percent confidence interval - upper 0.087 ## P-value H_0: RMSEA <= 0.050 0.114 ## P-value H_0: RMSEA >= 0.080 0.137 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.058 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## ## Group 1 [woman]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## attitudes =~ ## attit_1 1.000 0.919 0.816 ## attit_2 (.p2.) 1.023 0.066 15.495 0.000 0.940 0.877 ## attit_3 (.p3.) -1.016 0.066 -15.434 0.000 -0.935 -0.884 ## norms =~ ## norm_1 1.000 0.808 0.798 ## norm_2 (.p5.) 0.956 0.083 11.551 0.000 0.772 0.766 ## norm_3 (.p6.) 0.942 0.084 11.256 0.000 0.761 0.743 ## control =~ ## cntrl_1 1.000 0.671 0.646 ## cntrl_2 (.p8.) 0.846 0.125 6.768 0.000 0.567 0.545 ## cntrl_3 (.p9.) 1.008 0.129 7.814 0.000 0.676 0.665 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## intent ~ ## attitudes 0.436 0.082 5.335 0.000 0.401 0.403 ## norms 0.598 0.100 6.008 0.000 0.483 0.486 ## behavior ~ ## intent 0.347 0.064 5.436 0.000 0.347 0.351 ## control 0.727 0.138 5.274 0.000 0.488 0.496 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## attitudes ~~ ## norms 0.425 0.081 5.262 0.000 0.572 0.572 ## control 0.464 0.082 5.634 0.000 0.752 0.752 ## norms ~~ ## control 0.386 0.073 5.320 0.000 0.712 0.712 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .attit_1 (.31.) 2.863 0.084 34.071 0.000 2.863 2.542 ## .attit_2 (.32.) 2.884 0.082 35.225 0.000 2.884 2.690 ## .attit_3 (.33.) 3.168 0.081 39.065 0.000 3.168 2.995 ## .norm_1 (.34.) 2.802 0.076 36.874 0.000 2.802 2.767 ## .norm_2 (.35.) 2.830 0.075 37.555 0.000 2.830 2.807 ## .norm_3 (.36.) 2.796 0.076 36.725 0.000 2.796 2.730 ## .cntrl_1 (.37.) 2.855 0.077 37.078 0.000 2.855 2.749 ## .cntrl_2 (.38.) 2.866 0.076 37.909 0.000 2.866 2.753 ## .cntrl_3 (.39.) 2.897 0.076 37.969 0.000 2.897 2.849 ## .intent (.40.) 2.712 0.078 34.861 0.000 2.712 2.726 ## .behavir (.41.) 1.630 0.175 9.289 0.000 1.630 1.658 ## attitds 0.000 0.000 0.000 ## norms 0.000 0.000 0.000 ## control 0.000 0.000 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .attit_1 0.423 0.059 7.172 0.000 0.423 0.333 ## .attit_2 0.266 0.045 5.879 0.000 0.266 0.231 ## .attit_3 0.246 0.043 5.671 0.000 0.246 0.219 ## .norm_1 0.372 0.060 6.230 0.000 0.372 0.363 ## .norm_2 0.420 0.062 6.760 0.000 0.420 0.413 ## .norm_3 0.469 0.066 7.063 0.000 0.469 0.448 ## .control_1 0.629 0.085 7.426 0.000 0.629 0.583 ## .control_2 0.762 0.094 8.088 0.000 0.762 0.703 ## .control_3 0.577 0.080 7.238 0.000 0.577 0.558 ## .intent 0.374 0.049 7.615 0.000 0.374 0.378 ## .behavior 0.391 0.052 7.486 0.000 0.391 0.404 ## attitudes 0.845 0.129 6.561 0.000 1.000 1.000 ## norms 0.653 0.108 6.048 0.000 1.000 1.000 ## control 0.450 0.101 4.457 0.000 1.000 1.000 ## ## R-Square: ## Estimate ## attit_1 0.667 ## attit_2 0.769 ## attit_3 0.781 ## norm_1 0.637 ## norm_2 0.587 ## norm_3 0.552 ## control_1 0.417 ## control_2 0.297 ## control_3 0.442 ## intent 0.622 ## behavior 0.596 ## ## ## Group 2 [man]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## attitudes =~ ## attit_1 1.000 0.922 0.815 ## attit_2 (.p2.) 1.023 0.066 15.495 0.000 0.943 0.817 ## attit_3 (.p3.) -1.016 0.066 -15.434 0.000 -0.937 -0.782 ## norms =~ ## norm_1 1.000 0.875 0.723 ## norm_2 (.p5.) 0.956 0.083 11.551 0.000 0.837 0.679 ## norm_3 (.p6.) 0.942 0.084 11.256 0.000 0.825 0.667 ## control =~ ## cntrl_1 1.000 0.663 0.631 ## cntrl_2 (.p8.) 0.846 0.125 6.768 0.000 0.561 0.467 ## cntrl_3 (.p9.) 1.008 0.129 7.814 0.000 0.668 0.540 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## intent ~ ## attitudes 0.535 0.097 5.497 0.000 0.494 0.446 ## norms 0.858 0.111 7.702 0.000 0.751 0.679 ## behavior ~ ## intent 0.613 0.060 10.188 0.000 0.613 0.706 ## control 0.007 0.159 0.045 0.964 0.005 0.005 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## attitudes ~~ ## norms 0.060 0.108 0.552 0.581 0.074 0.074 ## control 0.426 0.107 3.978 0.000 0.697 0.697 ## norms ~~ ## control 0.220 0.096 2.291 0.022 0.380 0.380 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .attit_1 (.31.) 2.863 0.084 34.071 0.000 2.863 2.531 ## .attit_2 (.32.) 2.884 0.082 35.225 0.000 2.884 2.499 ## .attit_3 (.33.) 3.168 0.081 39.065 0.000 3.168 2.645 ## .norm_1 (.34.) 2.802 0.076 36.874 0.000 2.802 2.314 ## .norm_2 (.35.) 2.830 0.075 37.555 0.000 2.830 2.296 ## .norm_3 (.36.) 2.796 0.076 36.725 0.000 2.796 2.261 ## .cntrl_1 (.37.) 2.855 0.077 37.078 0.000 2.855 2.719 ## .cntrl_2 (.38.) 2.866 0.076 37.909 0.000 2.866 2.385 ## .cntrl_3 (.39.) 2.897 0.076 37.969 0.000 2.897 2.344 ## .intent (.40.) 2.712 0.078 34.861 0.000 2.712 2.450 ## .behavir (.41.) 1.630 0.175 9.289 0.000 1.630 1.696 ## attitds 0.369 0.130 2.834 0.005 0.400 0.400 ## norms 0.534 0.126 4.246 0.000 0.610 0.610 ## control 0.309 0.115 2.691 0.007 0.466 0.466 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .attit_1 0.430 0.091 4.710 0.000 0.430 0.336 ## .attit_2 0.443 0.094 4.691 0.000 0.443 0.333 ## .attit_3 0.556 0.108 5.130 0.000 0.556 0.388 ## .norm_1 0.699 0.137 5.088 0.000 0.699 0.477 ## .norm_2 0.819 0.150 5.457 0.000 0.819 0.539 ## .norm_3 0.849 0.153 5.538 0.000 0.849 0.555 ## .control_1 0.663 0.135 4.908 0.000 0.663 0.602 ## .control_2 1.130 0.187 6.025 0.000 1.130 0.782 ## .control_3 1.082 0.191 5.673 0.000 1.082 0.708 ## .intent 0.363 0.089 4.091 0.000 0.363 0.296 ## .behavior 0.460 0.069 6.671 0.000 0.460 0.498 ## attitudes 0.850 0.164 5.190 0.000 1.000 1.000 ## norms 0.766 0.171 4.495 0.000 1.000 1.000 ## control 0.440 0.133 3.301 0.001 1.000 1.000 ## ## R-Square: ## Estimate ## attit_1 0.664 ## attit_2 0.667 ## attit_3 0.612 ## norm_1 0.523 ## norm_2 0.461 ## norm_3 0.445 ## control_1 0.398 ## control_2 0.218 ## control_3 0.292 ## intent 0.704 ## behavior 0.502 7.4.3.4 Conduct an omnibus test to check if sex moderates any of the latent regression paths in the model from 7.4.3.3. Click for explanation ## Estimate a restricted model wherein the latent regressions are all equated ## across groups. toraOut0 <- sem(tora_sem, data = condom, group = "sex", group.equal = c("loadings", "intercepts", "regressions") ) ## Test the constraints: anova(toraOut, toraOut0) Click for explanation We can equate the latent regressions by specifying the group.equal = \"regressions\" argument in sem(). Then, we simply test this constrained model against the unconstrained model from 7.4.3.3 to get our test of moderation. Equating all regression paths across groups produces a significant loss of fit (\\(\\Delta \\chi^2[4] = 52.86\\), \\(p < 0.001\\)). Therefore, sex must moderate at least some of these paths. 7.4.3.5 Conduct a two-parameter test to check if sex moderates the effects of intent and control on behavior. Use the lavTestWald() function to conduct your test. Keep only the weak invariance constraints when estimating the model. Click to show code ## Add the structural paths to the model and assign labels: tora_sem <- paste(tora_cfa, 'intent ~ attitudes + norms behavior ~ c(b1f, b1m) * intent + c(b2f, b2m) * control', sep = '\\n') ## Estimate the model: toraOut <- sem(tora_sem, data = condom, group = "sex", group.equal = "loadings") ## Check the results: summary(toraOut, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE) ## lavaan 0.6.16 ended normally after 60 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 76 ## Number of equality constraints 6 ## ## Number of observations per group: ## woman 161 ## man 89 ## ## Model Test User Model: ## ## Test statistic 119.722 ## Degrees of freedom 84 ## P-value (Chi-square) 0.006 ## Test statistic for each group: ## woman 76.908 ## man 42.814 ## ## Model Test Baseline Model: ## ## Test statistic 1378.913 ## Degrees of freedom 110 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.972 ## Tucker-Lewis Index (TLI) 0.963 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -3459.788 ## Loglikelihood unrestricted model (H1) -3399.927 ## ## Akaike (AIC) 7059.576 ## Bayesian (BIC) 7306.078 ## Sample-size adjusted Bayesian (SABIC) 7084.172 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.058 ## 90 Percent confidence interval - lower 0.032 ## 90 Percent confidence interval - upper 0.081 ## P-value H_0: RMSEA <= 0.050 0.272 ## P-value H_0: RMSEA >= 0.080 0.058 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.047 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## ## Group 1 [woman]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## attitudes =~ ## attit_1 1.000 0.909 0.813 ## attit_2 (.p2.) 1.047 0.069 15.249 0.000 0.951 0.882 ## attit_3 (.p3.) -1.025 0.068 -15.075 0.000 -0.931 -0.882 ## norms =~ ## norm_1 1.000 0.824 0.809 ## norm_2 (.p5.) 0.936 0.083 11.256 0.000 0.771 0.768 ## norm_3 (.p6.) 0.908 0.084 10.810 0.000 0.748 0.734 ## control =~ ## cntrl_1 1.000 0.690 0.666 ## cntrl_2 (.p8.) 0.832 0.126 6.593 0.000 0.574 0.551 ## cntrl_3 (.p9.) 1.006 0.131 7.673 0.000 0.694 0.682 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## intent ~ ## attituds 0.441 0.083 5.294 0.000 0.400 0.403 ## norms 0.580 0.098 5.918 0.000 0.478 0.480 ## behavior ~ ## intent (b1f) 0.531 0.074 7.127 0.000 0.531 0.524 ## control (b2f) 0.490 0.130 3.767 0.000 0.338 0.336 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## attitudes ~~ ## norms 0.428 0.081 5.266 0.000 0.572 0.572 ## control 0.454 0.082 5.517 0.000 0.723 0.723 ## norms ~~ ## control 0.384 0.074 5.169 0.000 0.676 0.676 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .attit_1 2.839 0.088 32.237 0.000 2.839 2.541 ## .attit_2 2.907 0.085 34.191 0.000 2.907 2.695 ## .attit_3 3.174 0.083 38.126 0.000 3.174 3.005 ## .norm_1 2.832 0.080 35.280 0.000 2.832 2.780 ## .norm_2 2.832 0.079 35.765 0.000 2.832 2.819 ## .norm_3 2.795 0.080 34.787 0.000 2.795 2.742 ## .control_1 2.851 0.082 34.875 0.000 2.851 2.749 ## .control_2 2.857 0.082 34.813 0.000 2.857 2.744 ## .control_3 2.888 0.080 35.985 0.000 2.888 2.836 ## .intent 2.677 0.078 34.159 0.000 2.677 2.692 ## .behavior 1.107 0.207 5.338 0.000 1.107 1.098 ## attitudes 0.000 0.000 0.000 ## norms 0.000 0.000 0.000 ## control 0.000 0.000 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .attit_1 0.423 0.059 7.208 0.000 0.423 0.339 ## .attit_2 0.259 0.045 5.713 0.000 0.259 0.223 ## .attit_3 0.248 0.044 5.701 0.000 0.248 0.222 ## .norm_1 0.359 0.060 5.989 0.000 0.359 0.346 ## .norm_2 0.415 0.062 6.711 0.000 0.415 0.411 ## .norm_3 0.479 0.067 7.152 0.000 0.479 0.461 ## .control_1 0.599 0.085 7.027 0.000 0.599 0.557 ## .control_2 0.755 0.095 7.940 0.000 0.755 0.696 ## .control_3 0.555 0.081 6.822 0.000 0.555 0.535 ## .intent 0.382 0.050 7.657 0.000 0.382 0.386 ## .behavior 0.402 0.049 8.152 0.000 0.402 0.396 ## attitudes 0.825 0.127 6.495 0.000 1.000 1.000 ## norms 0.679 0.112 6.066 0.000 1.000 1.000 ## control 0.477 0.106 4.483 0.000 1.000 1.000 ## ## R-Square: ## Estimate ## attit_1 0.661 ## attit_2 0.777 ## attit_3 0.778 ## norm_1 0.654 ## norm_2 0.589 ## norm_3 0.539 ## control_1 0.443 ## control_2 0.304 ## control_3 0.465 ## intent 0.614 ## behavior 0.604 ## ## ## Group 2 [man]: ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## attitudes =~ ## attit_1 1.000 0.921 0.812 ## attit_2 (.p2.) 1.047 0.069 15.249 0.000 0.964 0.831 ## attit_3 (.p3.) -1.025 0.068 -15.075 0.000 -0.944 -0.787 ## norms =~ ## norm_1 1.000 0.928 0.753 ## norm_2 (.p5.) 0.936 0.083 11.256 0.000 0.869 0.698 ## norm_3 (.p6.) 0.908 0.084 10.810 0.000 0.843 0.676 ## control =~ ## cntrl_1 1.000 0.669 0.634 ## cntrl_2 (.p8.) 0.832 0.126 6.593 0.000 0.556 0.464 ## cntrl_3 (.p9.) 1.006 0.131 7.673 0.000 0.673 0.547 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## intent ~ ## attituds 0.501 0.098 5.134 0.000 0.462 0.432 ## norms 0.749 0.112 6.696 0.000 0.695 0.649 ## behavior ~ ## intent (b1m) 0.344 0.086 4.005 0.000 0.344 0.453 ## control (b2m) 0.307 0.168 1.830 0.067 0.205 0.253 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## attitudes ~~ ## norms 0.084 0.113 0.742 0.458 0.098 0.098 ## control 0.424 0.107 3.960 0.000 0.688 0.688 ## norms ~~ ## control 0.240 0.102 2.361 0.018 0.387 0.387 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .attit_1 3.270 0.120 27.179 0.000 3.270 2.881 ## .attit_2 3.180 0.123 25.848 0.000 3.180 2.740 ## .attit_3 2.787 0.127 21.900 0.000 2.787 2.321 ## .norm_1 3.236 0.131 24.790 0.000 3.236 2.628 ## .norm_2 3.337 0.132 25.309 0.000 3.337 2.683 ## .norm_3 3.303 0.132 24.992 0.000 3.303 2.649 ## .control_1 3.157 0.112 28.223 0.000 3.157 2.992 ## .control_2 3.135 0.127 24.653 0.000 3.135 2.613 ## .control_3 3.213 0.130 24.627 0.000 3.213 2.610 ## .intent 3.427 0.113 30.233 0.000 3.427 3.205 ## .behavior 2.607 0.303 8.614 0.000 2.607 3.211 ## attitudes 0.000 0.000 0.000 ## norms 0.000 0.000 0.000 ## control 0.000 0.000 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .attit_1 0.439 0.092 4.795 0.000 0.439 0.341 ## .attit_2 0.417 0.092 4.518 0.000 0.417 0.310 ## .attit_3 0.549 0.107 5.120 0.000 0.549 0.381 ## .norm_1 0.656 0.137 4.801 0.000 0.656 0.432 ## .norm_2 0.792 0.148 5.342 0.000 0.792 0.512 ## .norm_3 0.845 0.153 5.504 0.000 0.845 0.543 ## .control_1 0.666 0.134 4.955 0.000 0.666 0.598 ## .control_2 1.130 0.187 6.053 0.000 1.130 0.785 ## .control_3 1.063 0.187 5.669 0.000 1.063 0.701 ## .intent 0.385 0.086 4.471 0.000 0.385 0.337 ## .behavior 0.399 0.063 6.328 0.000 0.399 0.605 ## attitudes 0.849 0.164 5.174 0.000 1.000 1.000 ## norms 0.861 0.190 4.528 0.000 1.000 1.000 ## control 0.447 0.134 3.334 0.001 1.000 1.000 ## ## R-Square: ## Estimate ## attit_1 0.659 ## attit_2 0.690 ## attit_3 0.619 ## norm_1 0.568 ## norm_2 0.488 ## norm_3 0.457 ## control_1 0.402 ## control_2 0.215 ## control_3 0.299 ## intent 0.663 ## behavior 0.395 ## Test the constraints: lavTestWald(toraOut, "b1f == b1m; b2f == b2m") ## $stat ## [1] 9.85773 ## ## $df ## [1] 2 ## ## $p.value ## [1] 0.00723471 ## ## $se ## [1] "standard" Click for explanation The Wald test suggest significant moderation (\\(\\Delta \\chi^2[2] = 9.86\\), \\(p = 0.007\\)). Equating these two regression slopes across groups produces a significant loss of fit. Therefore, sex must moderate one or both of these paths. End of In-Class Exercises "],["wrap-up.html", "8 Wrap-Up", " 8 Wrap-Up There will be no new lecture or practical content this week. This is an open week that we’ll use to tie up any loose ends and wrap up the course content. "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]]
diff --git a/docs/software-setup.html b/docs/software-setup.html
index 4edf025..c178324 100644
--- a/docs/software-setup.html
+++ b/docs/software-setup.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/statistical-modeling-path-analysis.html b/docs/statistical-modeling-path-analysis.html
index 3dc9850..62cd0aa 100644
--- a/docs/statistical-modeling-path-analysis.html
+++ b/docs/statistical-modeling-path-analysis.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/typographic-conventions.html b/docs/typographic-conventions.html
index 444c646..c4c0921 100644
--- a/docs/typographic-conventions.html
+++ b/docs/typographic-conventions.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/weekly-preparation.html b/docs/weekly-preparation.html
index 73124f9..5f96ed5 100644
--- a/docs/weekly-preparation.html
+++ b/docs/weekly-preparation.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/docs/willy_wallaby_from_wasatch_files/figure-html/unnamed-chunk-132-1.png b/docs/willy_wallaby_from_wasatch_files/figure-html/unnamed-chunk-132-1.png
index 96c9b58..cb39d01 100644
Binary files a/docs/willy_wallaby_from_wasatch_files/figure-html/unnamed-chunk-132-1.png and b/docs/willy_wallaby_from_wasatch_files/figure-html/unnamed-chunk-132-1.png differ
diff --git a/docs/willy_wallaby_from_wasatch_files/figure-html/unnamed-chunk-134-1.png b/docs/willy_wallaby_from_wasatch_files/figure-html/unnamed-chunk-134-1.png
index 7368148..b08f1d1 100644
Binary files a/docs/willy_wallaby_from_wasatch_files/figure-html/unnamed-chunk-134-1.png and b/docs/willy_wallaby_from_wasatch_files/figure-html/unnamed-chunk-134-1.png differ
diff --git a/docs/wrap-up.html b/docs/wrap-up.html
index 6f541e6..39f4bfa 100644
--- a/docs/wrap-up.html
+++ b/docs/wrap-up.html
@@ -424,12 +424,12 @@
6.4 In-Class Exercises
7 Multiple Group Models
@@ -448,6 +448,8 @@
7.4 In-Class Exercises
8 Wrap-Up
diff --git a/sections/week6/class.Rmd b/sections/week6/class.Rmd
index 60fc269..c9e1f61 100644
--- a/sections/week6/class.Rmd
+++ b/sections/week6/class.Rmd
@@ -54,7 +54,7 @@ condom <- read.csv(paste0(dataDir, "toradata.csv"), stringsAsFactors = TRUE)
---
-###
+### {#toraCFA}
The data contain multiple indicators of *attitudes*, *norms*, and *control*.
Run a CFA for these three latent variables.
diff --git a/sections/week7/class.Rmd b/sections/week7/class.Rmd
index 14ff5a4..de46b3c 100644
--- a/sections/week7/class.Rmd
+++ b/sections/week7/class.Rmd
@@ -15,6 +15,37 @@ library(lavaan)
We'll now pick up where we left off with the [At-Home Exercises](at-home-exercises-6.html)
by testing measurement invariance in the two-group CFA of prolonged grief disorder.
+As you saw in the lecture, measurement invariance testing allows us to
+empirically test for differences in the measurement model between the groups. If
+we can establish measurement invariance, we can draw the following (equivalent)
+conclusions:
+
+- Our latent constructs are defined equivalently in all groups.
+- The participants in every group are interpreting our items in the same way.
+- Participants in different groups who have the same values for the observed
+indicators will also have the same score on the latent variable.
+- Between-group differences in latent parameters are due to true differences in
+the underlying latent constructs and not caused by differences in measurement.
+
+Anytime we make between-group comparisons (e.g., ANOVA, t-tests, moderation by
+group, etc.) we assume invariant measurement. That is, we assume that the scores
+we're comparing have the same meaning in each group. When doing multiple group
+SEM, however, we're apprised of the incredibly powerful capability of actually
+testing this---very important, and often violated---assumption.
+
+The process of testing measurement invariance can get quite complex, but the
+basic procedure boils down to using model comparison tests to evaluate the
+plausibility of increasingly strong between-group constraints. For most problems,
+these constraints amount to the following three levels:
+
+1. *Configural:* The same pattern of free and fixed effects in all groups
+1. *Weak (aka Metric):* Configural + Equal factor loadings in all groups
+1. *Strong (aka Scalar):* Weak + Equal item intercepts in all groups
+
+You can read more about measurement invariance [here][van_de_schoot_et_al] and
+[here][putnick_bornstein], and you can find a brief discussion of how to
+conduct measurement invariance tests in **lavaan** [here][mi_tutorial].
+
---
####
@@ -28,6 +59,8 @@ Load the *PGDdata2.txt* data as you did for the At-Home Exercises.
Click to show code
```{r eval = FALSE}
+library(dplyr)
+
## Load the data:
pgd <- read.table("PGDdata2.txt",
na.strings = "-999",
@@ -42,6 +75,7 @@ str(pgd)
```
```{r echo = FALSE}
+dataDir <- "../../../../data/"
pgd <- read.table(paste0(dataDir, "PGDdata2.txt"),
na.strings = "-999",
header = TRUE,
@@ -57,7 +91,7 @@ str(pgd)
---
-####
+#### {#miTesting}
Test *configural*, *weak*, and *strong* invariance using the multiple-group CFA
from \@ref(twoGroupCFA).
@@ -87,12 +121,18 @@ strongOut <- cfa(cfaMod,
group.equal = c("loadings", "intercepts")
)
+summary(configOut)
+summary(weakOut)
+summary(strongOut)
+
## Test invariance through model comparison tests:
compareFit(configOut, weakOut, strongOut) %>% summary()
```
```{r, echo = FALSE}
-fit1 <- fitMeasures(configOut, c("chisq", "df", "pvalue", "rmsea", "cfi", "tli", "srmr"))
+fit1 <- fitMeasures(configOut,
+ c("chisq", "df", "pvalue", "rmsea", "cfi", "tli", "srmr")
+ )
cChi2 <- fit1["chisq"] %>% round(2)
cDf <- fit1["df"]
@@ -103,11 +143,11 @@ srmr <- fit1["srmr"] %>% round(3)
tmp <- anova(configOut, weakOut, strongOut)
-wChi2 <- tmp[2, "Chisq diff"] %>% round(3)
+wChi2 <- tmp[2, "Chisq diff"] %>% round(2)
wDf <- tmp[2, "Df diff"]
wP <- tmp[2, "Pr(>Chisq)"] %>% round(3)
-sChi2 <- tmp[3, "Chisq diff"] %>% round(3)
+sChi2 <- tmp[3, "Chisq diff"] %>% round(2)
sDf <- tmp[3, "Df diff"]
sP <- tmp[3, "Pr(>Chisq)"] %>% round(3)
```
@@ -134,10 +174,355 @@ sP <- tmp[3, "Pr(>Chisq)"] %>% round(3)
---
-End of In-Class Exercises 7
+### Testing Between-Group Differences
+
+---
+
+Once we establish strong invariance, we have empirical evidence that latent mean
+levels are comparable across groups. Hence, we can test for differences in those
+latent means. In this section, we'll conduct the equivalent of a t-test using
+our two-group CFA model.
+
+More specifically, we want to know if the latent mean of `grief` differs
+significantly between the `Kin2` groups. The null and alternative hypotheses for
+this test are as follows:
+
+\[
+H_0: \alpha_1 = \alpha_2\\
+H_1: \alpha_1 \neq \alpha_2
+\]
+
+Where, $\alpha_1$ and $\alpha_2$ represent the latent means in Group 1 and
+Group 2, respectively.
+
+---
+
+####
+
+Use the strongly invariant model from \@ref(miTesting) to test the latent mean
+difference described above.
+
+- What is your conclusion?
+
+
+ Click to show code
+
+```{r}
+## Estimate the model with the latent means equated:
+resOut <- cfa(cfaMod,
+ data = pgd,
+ group = "Kin2",
+ group.equal = c("loadings", "intercepts", "means")
+ )
+
+## Check the results:
+summary(resOut)
+
+## Test the mean differences via a LRT:
+anova(strongOut, resOut)
+```
+
+```{r echo = FALSE}
+tmp <- anova(strongOut, resOut)
+
+chi2 <- tmp[2, "Chisq diff"] %>% round(2)
+df <- tmp[2, "Df diff"]
+p <- tmp[2, "Pr(>Chisq)"] %>% round(3)
+```
+
+
+ Click for explanation
+We can equate the latent means by specifying the `group.equal = "means"` argument
+in `cfa()`. Then, we simply test this constrained model against the strong
+invariance model to get our test of mean differences.
+
+In this case, the means of the `grief` factor do not significantly differ between
+`Kin2` groups ($\Delta \chi^2[`r df`] = `r chi2`$, $p = `r p`$).
+
+
+
+
+---
+
+### Multiple-Group SEM for Moderation
+
+---
+
+Now, we're going to revisit the TORA model from the
+[Week 6 In-Class Exercises](in-class-exercises-5.html), and use a multiple-group
+model to test the moderating effect of *sex*.
+
+---
+
+####
+
+Load the data contained in the *toradata.csv* file.
+
+
+ Click to show code
+
+```{r, eval = FALSE}
+condom <- read.csv("toradata.csv", stringsAsFactors = TRUE)
+```
+
+```{r, echo = FALSE}
+dataDir <- "../../../../data/"
+condom <- read.csv(paste0(dataDir, "toradata.csv"), stringsAsFactors = TRUE)
+```
+
+
+
+---
+
+Before we get to any moderation tests, however, we first need to establish
+measurement invariance. The first step in any multiple-group analysis that
+includes latent variables is measurment invariance testing.
+
+---
+
+####
+
+Test for measurement invariance across *sex* groups in the three latent variables
+of the TORA model from \@ref(toraCFA).
+
+- Test configural, weak, and strong invariance.
+- Test for invariance in all three latent factors simultaneously.
+- Is full measurement invariance (i.e., up to and including strong invariance)
+supported?
+
+
+ Click to show code
+
+```{r}
+tora_cfa <- '
+ attitudes =~ attit_1 + attit_2 + attit_3
+ norms =~ norm_1 + norm_2 + norm_3
+ control =~ control_1 + control_2 + control_3
+'
+
+## Estimate the models:
+config <- cfa(tora_cfa, data = condom, group = "sex")
+weak <- cfa(tora_cfa, data = condom, group = "sex", group.equal = "loadings")
+strong <- cfa(tora_cfa,
+ data = condom,
+ group = "sex",
+ group.equal = c("loadings", "intercepts")
+ )
+
+## Check that everything went well:
+summary(config)
+summary(weak)
+summary(strong)
+
+## Test measurement invariance:
+compareFit(config, weak, strong) %>% summary()
+
+## Make sure the strongly invariant model still fits well in an absolute sense:
+fitMeasures(strong)
+```
+
+```{r, echo = FALSE}
+fit1 <- fitMeasures(config,
+ c("chisq", "df", "pvalue", "rmsea", "cfi", "tli", "srmr")
+ )
+
+chi2 <- fit1["chisq"] %>% round(2)
+df <- fit1["df"]
+p <- fit1["pvalue"] %>% round(3)
+rmsea <- fit1["rmsea"] %>% round(3)
+cfi <- fit1["cfi"] %>% round(3)
+srmr <- fit1["srmr"] %>% round(3)
+
+tmp <- anova(config, weak, strong)
+
+wChi2 <- tmp[2, "Chisq diff"] %>% round(2)
+wDf <- tmp[2, "Df diff"]
+wP <- tmp[2, "Pr(>Chisq)"] %>% round(3)
+
+sChi2 <- tmp[3, "Chisq diff"] %>% round(2)
+sDf <- tmp[3, "Df diff"]
+sP <- tmp[3, "Pr(>Chisq)"] %>% round(3)
+
+fit2 <- fitMeasures(strong,
+ c("chisq", "df", "pvalue", "rmsea", "cfi", "tli", "srmr")
+ )
+
+strongChi2 <- fit1["chisq"] %>% round(2)
+strongDf <- fit1["df"]
+strongP <- fit1["pvalue"] %>% round(3)
+strongRmsea <- fit1["rmsea"] %>% round(3)
+strongCfi <- fit1["cfi"] %>% round(3)
+strongSrmr <- fit1["srmr"] %>% round(3)
+```
+
+
+ Click for explanation
+
+Yes, we have been able to establish full measurement invariance.
+
+- Configural invariance holds.
+ - The unrestricted, multiple-group model fits the data well
+ ($\chi^2[`r df`] = `r chi2`$,
+ $p = `r p`$,
+ $\textit{RMSEA} = `r rmsea`$,
+ $\textit{CFI} = `r cfi`$,
+ $\textit{SRMR} = `r srmr`$).
+- Weak invariance holds.
+ - The model comparison test shows a non-significant loss of fit between the
+ configural and weak models ($\Delta \chi^2[`r wDf`] = `r wChi2`$, $p = `r wP`$).
+- Strong invariance holds.
+ - The model comparison test shows a non-significant loss of fit between the
+ weak and strong models ($\Delta \chi^2[`r sDf`] = `r sChi2`$, $p = `r sP`$).
+ - The strongly invariant model still fits the data well
+ ($\chi^2[`r strongDf`] = `r strongChi2`$,
+ $p = `r strongP`$,
+ $\textit{RMSEA} = `r strongRmsea`$,
+ $\textit{CFI} = `r strongCfi`$,
+ $\textit{SRMR} = `r strongSrmr`$).
+
+
+
+
+---
+
+Once we've established measurement invariance, we can move on to testing
+hypotheses about between-group differences secure in the knowledge that our
+latent factors represent the same hypothetical constructs in all groups.
+
+---
+
+#### {#toraFullModel}
+
+Estimate the full TORA model from \@ref(updatedModel) as a multiple-group model.
+
+- Use `sex` as the grouping variables.
+- Keep the strong invariance constraints in place.
+
+
+ Click to show code
+
+```{r}
+## Add the structural paths to the model:
+tora_sem <- paste(tora_cfa,
+ 'intent ~ attitudes + norms
+ behavior ~ intent + control',
+ sep = '\n')
+
+## Estimate the model:
+toraOut <- sem(tora_sem,
+ data = condom,
+ group = "sex",
+ group.equal = c("loadings", "intercepts")
+ )
+
+## Check the results:
+summary(toraOut, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE)
+```
+
+
+
+---
+
+####
+
+Conduct an omnibus test to check if `sex` moderates any of the latent regression
+paths in the model from \@ref(toraFullModel).
+
+
+ Click for explanation
+
+```{r}
+## Estimate a restricted model wherein the latent regressions are all equated
+## across groups.
+toraOut0 <- sem(tora_sem,
+ data = condom,
+ group = "sex",
+ group.equal = c("loadings", "intercepts", "regressions")
+ )
+
+## Test the constraints:
+anova(toraOut, toraOut0)
+```
+
+```{r echo = FALSE}
+tmp <- anova(toraOut, toraOut0)
+
+chi2 <- tmp[2, "Chisq diff"] %>% round(2)
+df <- tmp[2, "Df diff"]
+p <- tmp[2, "Pr(>Chisq)"] %>% round(3)
+```
+
+
+ Click for explanation
+
+We can equate the latent regressions by specifying the `group.equal = "regressions"`
+argument in `sem()`. Then, we simply test this constrained model against the
+unconstrained model from \@ref(toraFullModel) to get our test of moderation.
+
+Equating all regression paths across groups produces a significant loss of fit
+($\Delta \chi^2[`r df`] = `r chi2`$, $p < 0.001$). Therefore, *sex* must
+moderate at least some of these paths.
+
+
+
+
+---
+
+####
+
+Conduct a two-parameter test to check if `sex` moderates the effects of `intent`
+and `control` on `behavior`.
+
+- Use the `lavTestWald()` function to conduct your test.
+- Keep only the weak invariance constraints when estimating the model.
+
+
+ Click to show code
+
+```{r}
+## Add the structural paths to the model and assign labels:
+tora_sem <- paste(tora_cfa,
+ 'intent ~ attitudes + norms
+ behavior ~ c(b1f, b1m) * intent + c(b2f, b2m) * control',
+ sep = '\n')
+
+## Estimate the model:
+toraOut <- sem(tora_sem, data = condom, group = "sex", group.equal = "loadings")
+
+## Check the results:
+summary(toraOut, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE)
+
+## Test the constraints:
+lavTestWald(toraOut, "b1f == b1m; b2f == b2m")
+```
+
+```{r echo = FALSE}
+tmp <- lavTestWald(toraOut, "b1f == b1m; b2f == b2m")
+
+chi2 <- round(tmp$stat, 2)
+df <- tmp$df
+p <- round(tmp$p.value, 3)
+```
+
+ Click for explanation
+
+The Wald test suggest significant moderation ($\Delta \chi^2[`r df`] = `r chi2`$,
+$p = `r p`$). Equating these two regression slopes across groups produces a
+significant loss of fit. Therefore, *sex* must moderate one or both of these paths.
+
+
+
+
+---
+
+End of In-Class Exercises
---
[sesam2_data]: https://surfdrive.surf.nl/files/index.php/s/dfzC7Tf5HHiTX8M/download
[pgd_data]: https://surfdrive.surf.nl/files/index.php/s/xxkp4gZY682AGyY/download
+[tora_data]: https://surfdrive.surf.nl/files/index.php/s/I8IxckbNJlY5bQ3/download
[boelen_et_al_2010]: https://doi.org/10.1016/j.jad.2010.01.076
+[mi_tutorial]: https://www.lavaan.ugent.be/tutorial/groups.html
+[van_de_schoot_et_al]: https://doi.org/10.1080/17405629.2012.686740
+[putnick_bornstein]: https://doi.org/10.1016/j.dr.2016.06.004