-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path05-Week2_home.Rmd
285 lines (211 loc) · 11.3 KB
/
05-Week2_home.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
# Week 2 - Home
```{r include=FALSE}
knitr::opts_chunk$set(warning=FALSE, message=FALSE)
```
This exercise is based on:
Kestilä, Elina (2006). Is There Demand for Radical Right
Populism in the Finnish Electorate? *Scandinavian Political Studies 29*(3),169-191
You have read and answered questions about the article in the reading questions. In this
exercise, as well as in the second class practical, we will analyze these data ourselves.
The data for this practical stem from the first round of the European Social Survey (ESS).
This is a repeated cross-sectional survey across 32 European countries. The first round
was held in 2002, and since then, subsequent rounds of data-collection are held bi-
anually. More info, as well as access to all data -> www.europeansocialsurvey.org.
The raw, first round data can also be found [here](https://github.com/cjvanlissa/TCSM_student). The file is called ESSround1-
a.sav. This file contains data for all respondents, but only those variables are included
that you will need in this exercise.
### Question 1.a
Download the file, and import it in R. Inspect the file (no. of cases and no. of variables) to see if the file opened well.
```{r, message=FALSE, warning=FALSE, eval=FALSE}
library(foreign)
data <- read.spss("ESSround1-a.sav", to.data.frame = TRUE)
```
```{r, echo=FALSE, warning=FALSE}
library(foreign)
data <- read.spss("TCSM_student/ESSround1-a.sav", to.data.frame = TRUE)
```
<details>
<summary><b>For a description of all variables in the dataset, click here!</b></summary>
```{r, results = "asis", echo = FALSE}
desc <- c("Title of dataset",
"ESS round",
"Edition",
"Production date",
"Country",
"Respondent's identification number",
"Trust in the legal system",
"Trust in the police",
"Trust in the United Nations",
"Trust in the European Parliament",
"Trust in country's parliament",
"State of health services in country nowadays",
"State of education in country nowadays",
"How satisfied with present state of economy in country",
"How satisfied with the national government",
"How satisfied with the way democracy works in country",
"Politicians interested in votes rather than peoples opinions",
"Politicians in general care what people like respondent think",
"Trust in politicians",
"Allow many/few immigrants of same race/ethnic group as majority",
"Allow many/few immigrants of different race/ethnic group from majority",
"Allow many/few immigrants from richer countries in Europe",
"Allow many/few immigrants from poorer countries in Europe",
"Allow many/few immigrants from richer countries outside Europe",
"Allow many/few immigrants from poorer countries outside Europe",
"Qualification for immigration: christian background",
"Qualification for immigration: be white",
"Average wages/salaries generally brought down by immigrants",
"Immigrants harm economic prospects of the poor more than the rich",
"Immigrants take jobs away in country or create new jobs",
"Taxes and services: immigrants take out more than they put in or less",
"Immigration bad or good for country's economy",
"Country's cultural life undermined or enriched by immigrants",
"Immigrants make country worse or better place to live",
"Immigrants make country's crime problems worse or better",
"Richer countries should be responsible for accepting people from poorer countries",
"Better for a country if almost everyone share customs and traditions",
"Better for a country if a variety of different religions",
"Country has more than its fair share of people applying refugee status",
"People applying refugee status allowed to work while cases considered",
"Government should be generous judging applications for refugee status",
"Most refugee applicants not in real fear of persecution own countries",
"Financial support to refugee applicants while cases considered",
"Granted refugees should be entitled to bring close family members",
"Gender",
"Year of birth",
"Highest level of education",
"Years of full-time education completed",
"How interested in politics",
"Placement on left right scale")
library(kableExtra)
kable(data.frame(Variable = names(data), Description = desc)) %>%
kable_styling(bootstrap_options = c("striped", "hover"))
```
`r if(knitr::is_html_output()){"\\details"}`
### Question 1.b
The ESS-file contains much more information than we need to re-analyze the paper by
Kestilä. We need to reduce the number of cases to make sure our results pertain to the relevant target population.
Kestilä only uses data from ten countries: `c("Austria", "Belgium", "Denmark", "Finland", "France", "Germany", "Italy", "Netherlands", "Norway", "Sweden")`.
As explained in the tutorial chapters at the beginning of this GitBook, it is possible to select rows from the data. In this case, we will select only rows from these countries by means of boolean indexing, using the `%in%` function to check if the value of `$cntry` is in the list.
*Hint: Use* `[]; %in%`
<details>
<summary>Click for explanation</summary>
Use the following code:
```{r, echo = TRUE}
df <- data[data$cntry %in% c("Austria", "Belgium", "Denmark", "Finland", "France", "Germany", "Italy", "Netherlands", "Norway", "Sweden"), ]
```
`r if(knitr::is_html_output()){"\\details"}`
### Question 1.c
Inspect the data file again to see whether step 1b went ok.
### Question 1.d
Before we can start the analyses, we first need to screen the
data. What are the things we need to watch for? (think about your earlier
statistics-courses)?
<details>
<summary>Click for explanation</summary>
This question is open to interpretation. One thing you might notice is that all variables the authors used are currently coded as factor variables (e.g., "Factor w/ 11 levels"):
```{r, echo = TRUE}
library(tidySEM)
descriptives(df)
levels(df$trstlgl)
```
In keeping with conventions, we could treat ordinal Likert scales with >5 levels as continuous. We can either re-code the data, or prevent `read.spss()` from coding these variables as factors when it reads the data. Here is code for both approaches.
#### Re-coding factors to numeric
We can convert a factor to numeric using the function `as.numeric()`. However, in this case, we have `r 45-7` variables to convert - that's a lot of code.
Thankfully, there is a function to apply this transformation to each column of the data: `lapply()`, short for list apply. This function takes each list element (column of the `data.frame`), and applies the function `as.numeric()` to it:
```{r, echo = TRUE}
df[7:44] <- lapply(df[7:44], as.numeric)
```
#### Reading data without coding factors
An alternative solution is to prevent the function `read.spss()` from using value labels to code variables as factors when the data are loaded. However, we're not going to use this right now, so the information below is merely illustrative:
```{r, eval = FALSE, echo = TRUE}
# The option use.value.labels = FALSE stops the function from coding factors:
data <- read.spss("ESSround1-a.sav", to.data.frame = TRUE, use.value.labels = FALSE)
# Then, re-select the subset of data. The countries are now also unlabeled, so
# we select them by number:
df <- data[data$cntry %in% c(21,18,17,15,9,8,6,5,2,1), ]
```
`r if(knitr::is_html_output()){"\\details"}`
### Question 1.e
Aside from screening variables by looking at summary statistics, we can also plot their distributions. You already know how to do this for one single variable.
Yet it is not very difficult to plot all variables in a data.frame.
To do this, we need to turn the data from "wide format" (one column per variable)
into "long format" (one column with the variable names, one column with the values).
The package `tidyr` has a convenient function to reshape data to long format:
`pivot_longer()`.
First, make a long data.frame containing variables 7:44.
```{r, eval = FALSE, echo = TRUE}
install.packages("tidyr")
library(tidyr)
df_plot <- df[7:44]
df_plot <- pivot_longer(df_plot, names(df_plot))
```
```{r, echo = FALSE}
library(tidyr)
df_plot <- df[7:44]
df_plot <- pivot_longer(df_plot, names(df_plot))
```
Next, we can plot the data - using `geom_histogram()`, `geom_density()` or `geom_boxplot()` as before. A new additional function allows us to make separate plots for each variable: `+ facet_wrap(~name, scales = "free_x")`.
<details>
<summary>Click for explanation</summary>
```{r, echo = TRUE}
library(ggplot2)
ggplot(df_plot, aes(x = value)) + geom_histogram() + facet_wrap(~name, scales = "free_x")
```
Notice the fact that you can see that the scales are actually categorical (because of the gaps between bars), and that most variables look relatively normally distributed despite being categorical. It's probably fine to treat them as continuous.
`r if(knitr::is_html_output()){"\\details"}`
### Question 1.f
Check the scale descriptives table again. Are there any incorrectly coded missing value labels, or other inexplicable values?
### Question 1.g
The first step in re-analyzing data is replicating the results from the paper by
Kestilä. Run a Principal Component Analysis using `psych::principal()`, and
choose the exact same specification as Kestilä concerning estimation method,
rotation etc. Do two analyses: one for trust in politics, and one for attitudes towards immigration. Remember that you can view the help file for `psych::principal()` by running
`?psych::principal`.
*Hint: Use* `psych::principal()`
<details>
<summary>Click for explanation</summary>
#### Trust in politics
Kestilä extracted three components, with VARIMAX rotation. When we print the results, we can hide all factor loadings smaller than the smallest one in their table, to make it easier to read:
```{r, echo = TRUE}
library(psych)
pca_trust <- principal(df[, 7:19], nfactors = 3, rotate = "varimax")
print(pca_trust, cut = .3, digits =3)
```
For attitude towards immigration, Kestilä extracted five components, with VARIMAX rotation:
```{r, echo = TRUE}
library(psych)
pca_att <- principal(df[, 20:44], nfactors = 5, rotate = "varimax")
print(pca_att, cut = .3)
```
`r if(knitr::is_html_output()){"\\details"}`
### Question 1.h
<!-- Video here about extracting factor scores-->
Extract the PCA factor scores from the results objects, and add them to the data.frame. Give the PCA scores informative names, based on your interpretation of the factor loadings, so that you understand what they summarize.
*Hint: Use* `$; colnames(); cbind()`
<details>
<summary>Click for explanation</summary>
Extracting factor scores
The factor scores are inside of the results objects. Use the `$` operator to access them:
```{r, echo = TRUE}
head(pca_att$scores)
```
We're going to give these factor scores some informative names, and add them to our data.frame. **You should give them different, informative names based on the meaning of the factors!**
```{r, echo = TRUE}
# Print names
colnames(pca_att$scores)
# Change names
colnames(pca_att$scores) <- c("Att1", "Att2", "Att3", "Att4", "Att5")
colnames(pca_trust$scores) <- c("Trust1", "Trust2", "Trust3")
# Add columns
df <- cbind(df, pca_trust$scores, pca_att$scores)
```
`r if(knitr::is_html_output()){"\\details"}`
### Question 1.i
Are you able to replicate her results?
<details>
<summary>Click for explanation</summary>
No, probably not. The team of teachers was unable to reproduce this analysis, even though it should be pretty easy to do so...
`r if(knitr::is_html_output()){"\\details"}`
### Question 1.j
Save your syntax and bring your data and syntax to the practical on Thursday.