-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathindex.Rmd
275 lines (218 loc) · 19 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
---
title: "Theory Construction and Statistical Modeling"
subtitle: "A guide to structural equation modeling in R"
author: "Caspar J. van Lissa¹"
date: "¹Utrecht University, Methodology & Statistics"
site: bookdown::bookdown_site
output:
bookdown::gitbook:
config:
toc:
collapse: section
search: yes
fontsettings:
size: 2
split_by: section
df_print: paged
documentclass: book
bibliography: [book.bib, packages.bib]
biblio-style: apalike
link-citations: yes
description: "This is a guide for the elective course TCSM."
---
# Course {-}
```{r global_options, include=FALSE}
knitr::opts_chunk$set(warning=FALSE, message=FALSE)
schedule <- list(
lectures = rbind(data.frame(
weeks = c(37:38, 42:44),
days = "Tuesday",
time = "13:15 - 14:15",
location = "online"),
data.frame(
weeks = c(39, 41),
days = "Tuesday",
time = "14:15 - 15:00",
location = "online")),
tutorials = data.frame(
weeks = c(37:39, 41:44),
days = "Thursday",
time = "10:00 - 12:45",
location = "online"),
qna = data.frame(
weeks = c(37:39, 41:44),
days = "Monday",
time = "13:15 - 15:00",
location = "online"),
Exams = data.frame(
weeks = c(40, 43, 45),
days = "Tuesday",
time = "18:00",
location = "blackboard"))
schedule <- do.call(rbind, lapply(names(schedule), function(x){
cbind(schedule[[x]], Activity = x)
}))
schedule$Date <- as.Date(paste(2021, schedule$weeks, 1, sep="-"), "%Y-%U-%u") + (as.numeric(
factor(schedule$days, levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")))-1)
schedule$Date <- format(schedule$Date, "%d-%m-%Y")
topics <- c("Introduction","Factor Analysis (exploratory, EFA)", "Factor Analysis (confirmatory, CFA)",
"GLM: Means and SEM", "Mediation", "Moderation")
names(topics) <- sort(unique(schedule$weeks))[c(1:3,5:7)]
schedule$Topic <- topics[as.character(schedule$weeks)]
#topics[match(schedule$weeks, sort(unique(schedule$weeks)))]
# NA
# schedule$Topic[which(schedule$weeks %in% as.integer(names(topics)))] <-
schedule$Topic[schedule$Activity == "Exams"] <- c("Factor analysis, group", "Path analysis, group", "SEM, individual")
```
In this course you will learn how to translate a social scientific theory into a statistical model, how to analyze your data with these models, and how to interpret and report your results following APA standards.
**Course overview**
```{r, results = "asis", echo = FALSE}
library(DT)
# library(lubridate)
# mdy(isoweek(4))
# lubridate::isoweek(as.Date("13-09-2021", format = "%d-%m-%Y"))
# lubridate::parse_date_time("37", orders = "%W")
# lubridate::isoweek(as.Date("13-09-2021", format = "%d-%m-%Y"))
schedule$Date <- as.Date(schedule$Date, format = "%d-%m-%Y")
library(DT)
DT::datatable(schedule[order(schedule$Date, decreasing = FALSE), ], rownames = FALSE, options = list(pageLength = nrow(schedule))) |>
formatStyle(
'time',
backgroundColor = styleEqual("14:15 - 15:00", 'yellow'))
```
**You do not need a book for this course.**
Most information is contained within this GitBook and the course readings. Optionally, if you want to expand your learning, you can follow the excellent [`lavaan` tutorial](https://lavaan.ugent.be/tutorial/).
The analyses will be executed using the statistical programming environment R, and in particular using the structural equation modeling package `lavaan`.
## Staff
**Coordinator:**
[dr. Caspar J. van Lissa](https://www.uu.nl/staff/cjvanlissa)
**Lecturers**
[dr. Caspar J. van Lissa](https://www.uu.nl/staff/cjvanlissa)
**Computer labs**
[Danielle McCool](https://www.uu.nl/staff/DMMcCool)
[Laura Hofstee](https://www.uu.nl/staff/LHofstee)
## Learning goals
After completing the course, you will be able to do the following:
1. Translate a verbal theory into a conceptual model, and translate a conceptual model into a statistical model
2. Independently analyze data using the free, open-source statistical software R (a marketable job skill)
3. Apply the “latent variable model” to a real-life problem, where observed variables do not directly measure, but are indicators of, an unobserved social scientific construct.
4. Use the “path model” to describe how several variables are causally related to one another, including complex relationships such as mediation and moderation
5. Explain to a fellow student that the technique “structural equation modeling” combines latent variable models with path models, and allows you to test an entire theory in one go
6. Reflect critically on decisions in the analysis of structural equation models, both your own, and in published research
## Theory Construction and Statistical Modeling
In order to test a theory, we must express the theory as a statistical model and then test this model on quantitative (numeric) data. This course uses existing tutorial datasets to practice the process of translating verbal theories into testable statistical models. Those who are interested in the methods of acquiring high quality data for your own theory, we refer to the course “Conducting a Survey” which is taught from November – January.
In this course we will use datasets from different disciplines within the social sciences (educational sciences, psychology and sociology) to explain and illustrate theories and practices that are used in all social science disciplines to statistically model social science theories.
Most information about the course is available in this GitBook.
Communication about the course occurs through <https://uu.blackboard.nl> (Login with your student ID and password).
## Course overview
The course consists of three parts, each of which is assessed with an assignment:
1. Factor analysis, in which you learn different ways of capturing unobserved constructs.
2. Path analysis, in which you learn how to model regression and ANOVA as a structural equation model with observed variables.
3. Structural equation modeling, in which you put together the first two topics.
## Adaptations related to the coronavirus measures
The corona crisis is challenging all of us to rethink how we teach and learn. But aside from the challenges, it also offers opportunities. In adapting this course, we kept two goals in mind: increasing the alignment between the way of teaching and the learning goals, and ensuring high-quality interaction among students and between the students and teachers while still using online communication. Based on these goals, we made the following changes:
1. During the course, you will be working in learning teams to promote interaction among students and peer support
2. The two in-person exams (4 hours each) are replaced with take-home assignments: Two group assignments, and one individual assignment.
### Why not in-person meetings?
When the courses were scheduled, it was not yet clear what direction the university wanted to take.
Every teacher could make a choice: Online-only, or hybrid education.
At that time, I chose the online format, because:
1. Last year, the online version of this course was very successful
2. This elective course attracts diverse students with complex schedules; many expressed a preference for the flexibility of an online course
3. Due to the 75 student maximum on lecture halls, it is very difficult to schedule courses. Doing this course in person would mean that we have meetings on varying days at different times. That would likely create scheduling conflicts for many of you.
### Why learning in groups?
Contact with fellow students is a key aspect of the university experience. During this time of social distancing, it is important to find new ways to stay in touch with fellow students. There are also aspects of learning in groups that can really improve your knowledge, like peer feedback. The groups are made randomly when the course starts, but you can switch with a consenting member of another group in the first week.
### Why use take home assignments for assessment?
Assignments are a suitable form of assessment for a skills-based course like TCSM. It also takes a lot of the pressure off because you can work at your own pace. This course used to have two 4-hour exams; not exactly corona-proof. Using take-home assignments entrusts you with the responsibility to make this assignment in good faith, without instrumental assistance or plagiarism, so I kindly ask you to make good on this trust, and hand in original work to show what you’ve learned.
## Grading
Your grade for the course is based on your “portfolio” composed of the three take-home assignments.
The first assignment (deadline: `r schedule$Date[schedule$Activity == "Exams"][1]`) assesses your understanding of the latent variable model. This is a group assignment, made with your learning team. The exam counts towards 25% of your grade.
The second graded assignment (deadline: `r schedule$Date[schedule$Activity == "Exams"][2]`) assesses your understanding of path models. This is a group assignment, made with your learning team. The exam counts towards 25% of your grade.
The final assignment (deadline: `r schedule$Date[schedule$Activity == "Exams"][3]`) assesses your ability to independently analyse data, combining the skills you learned during this course. This is an individual assignment, which counts towards 50% of your grade.
## Assignments
A description of the assignments follows below.
For each assignment, every element labeled with a lower case letter is graded fail (0 points), pass (1 point), or excellent (1.5 points).
Grades are summed for each assignment, and rescaled from 1-10.
The final grade is a weighted average across assignments of the rescaled grades (weights in %).
Note that the assignments are **not** intended to be full-blown papers!
You only get 200 words to justify your theoretical model, and 300 words to discuss the results.
The focus should be on your analysis; how it relates to theory (introduction), and what you have learned from it and how you might improve it (discussion).
1. Apply the “latent variable model” to a real-life problem, where observed variables do not directly measure, but are indicators of, an unobserved social scientific construct (G, 25%)
a. Find a suitable dataset, for example: (no word limit)
i. Data you have collected for a previous course
ii. Open data, provided with a published paper
iii. The “Coping with COVID-19” dataset (if you can’t find anything)
b. Describe the dataset, and introduce the theoretical latent variable model (200 words)
c. Estimate the latent variable model (PCA, EFA, CFA) and conduct reliability analysis, provide relevant output in a suitable format (no word limit; as short as possible and as long as necessary to report the relevant output)
d. Explain your rationale for important modeling decisions (300 words)
i. Motivate your choice for the type of latent variable model
ii. Discuss assumptions
iii. Discuss other important decisions, as discussed in the course reading materials
e. Report and interpret the results in APA style (no word limit; as short as possible and as long as necessary to report the relevant results)
f. Discuss the results in max 300 words
i. Devote attention to strengths and limitations
2. Use the “path model” to describe how several variables are causally related to one another (G, 25%)
a. Find a suitable dataset, for example: (no word limit)
i. Data you have collected and analyzed for a previous course
ii. Open data, provided with a published paper
iii. The “Coping with COVID-19” dataset (if you can’t find anything)
b. Describe the dataset, and introduce the theoretical path model (200 words)
c. Conduct a SEM path model to answer the theoretical questions (no word limit; as short as possible and as long as necessary to report the relevant output)
i. This can be a re-analysis of a question that had been tested using regression, ANOVA, or t-test analysis in the original paper
d. Explain your rationale for important modeling decisions (300 words)
i. Fit between theory and model
ii. Model assumptions
iii. Difference/similarity between the path model and the (original) regression, ANOVA, or t-test analysis
iv. Why you use standardized or unstandardized coefficients
e. Report and interpret the results in APA style (no word limit; as short as possible and as long as necessary to report the relevant results)
i. Include measures of explained variance for the dependent variables.
f. Discuss the results (max 300 words)
i. Devote attention to strengths and limitations
3. Independently analyze data using the free, open-source statistical software R (I, 50%)
a. Find a suitable dataset, for example: (no word limit)
i. Data you have collected for a previous course
ii. Open data, provided with a published paper
iii. The “Coping with COVID-19” dataset (if you can’t find anything)
b. Describe the dataset, and introduce a theory involving at least 3 variables that can be tested using these data (300 words)
c. Translate the theory to lavaan syntax and estimate the model (could be multiple models if you think it’s necessary; no word limit)
d. Use at least all of the following: (no word limit)
i. One latent variable
ii. Moderation (continuous or multi-group)
iii. Mediation
e. Explain your rationale for important modeling decisions (300 words)
i. Fit between theory and model
ii. Model assumptions
iii. Difference/similarity between the path model and the (original) regression, ANOVA, or t-test analysis
iv. Why you use standardized or unstandardized coefficients
f. Report and interpret your results in APA style (no word limit; as short as possible and as long as necessary to report the relevant results)
g. Discuss your results in maximum 500 words
## Rules for the take-home assignments
• For all three graded assignments, you are allowed to use all course materials, including the GitBook, and search engines
• The first two assignments are made in groups.
o For these assignments, it is not allowed to work together with other groups.
• The final assignment is made individually.
o For this assignment, it is not allowed to recruit outside help
• In all cases, you are obligated to hand in original work conducted by you or your group. Failure to do so constitutes fraud.
• You also have a moral obligation to obey the rules. For this course, I have chosen a form of examination that allows you to showcase what you have learned flexibly. This spares you the stress of long exams (the two exams for this course used to be 4 hours each) and cramming all course material. It also assumes that you make the assignment in good faith, so I simply ask that you hold up your end of the bargain, and hand in your original work to show what you’ve learned.
• The final assignment also helps you assess your ability to independently analyse data, which is important to know for your future courses and/or career.
## Statement on fair grading during corona
We, the lecturers, are committed to offering distance learning and assessment to ensure that you can continue your studies. The final assignment is therefore a check at the end of the course, for both you and your lecturers, of whether you are ready for subsequent courses in your study programme.
You will be completing this assignment remotely, instead of making a test at the university under the supervision of lecturers. For most students this will make no difference, but of course remote assessment is more susceptible to fraud.
By handing in your final assignment, you therefore explicitly confirm that you have made this assignment by yourself and are submitting work you have written yourself, that you will be using your own login details, and that you have not had instrumental help in making the assignment.
All assignments are submitted via SafeAssign in Blackboard and are checked for plagiarism. If fraud or plagiarism is detected or suspected, the Board of Examiners will be informed in the usual manner, and in the event of fraud, the sanctions referred to in Article 5.14 of the Education and Examination Regulations (EER) will apply (download from https://students.uu.nl/files/fsw-ba-oer-engelstalig-2019-2020pdf). Note that the sanctions may apply to the person committing fraud, to the person(s) making fraud possible, and to the test as a whole.
Lecturers also have the option of administering additional oral tests if they have reasons to do so.
Thank you for your cooperation, we will only be able to get through this period together!
## Attendance
Attendance is not mandatory, but we highly recommend everyone to attend all lectures and the Thursday practicals. In our experience, students who actively participate tend to pass the course, whereas those who do not tend to drop out or fail. The lectures and practicals ‘build’ on each other, so in the unfortunate event that you have to miss either one, please make sure you have caught up with the materials before the next session. The practicals on Monday are for troubleshooting -- to be sure that everyone can keeps up with the course -- and they are no substitute for the Thursday practicals.
## Literature
All literature is available on the Internet, as long as you are within the UU-domain. See the course site on Blackboard for direct links to all following articles. For some articles, we are not allowed to post direct URLs. Searching through either [Google Scholar](www.scholar.google.com) or the [University Library](www.ubu.uu.nl) will work wonders.
## Reading questions
Along with every article, we will provide reading questions. You will not be graded on the reading questions, but it is important to prepare the reading questions before every lecture. The purpose of the reading questions is as follows:
- to provide relevant background knowledge for the lecture;
- to explain the key terms and concepts;
- to be aware of important publications that shaped the field;
- to help you to prepare for the exam; it is not necessary to learn the answers to the reading questions by heart, but the reading questions will help you extract the relevant insights from the literature.
- We will post standard answers to the reading questions online before the exam, but their most valuable role is to guide your understanding during the course, not while preparing the exam.
## Preparations
Before every meeting you need to do the assigned homework (see detailed program).
We assume you have basic knowledge about multivariate statistics before entering this course. You do not need any prior experience working with R. If you feel unsure about your knowledge about statistics, and in particular ANOVA and multiple regression analysis, please e-mail the course coordinator. If you wish to refresh your knowledge, we recommend the chapters on ANOVA, multiple regression and Exploratory Factor Analysis from Field’s Discovering Statistics using R. London; Sage. Read the chapters on ‘multiple regression’ and ‘exploratory factor analysis’ if you want to prepare for the first meeting. If you do not have this book, many other statistics textbooks usually discuss these topics equally well. You do not need to buy any book for this course.