-
Notifications
You must be signed in to change notification settings - Fork 0
/
mesoudi2015_reanalysis.Rmd
322 lines (255 loc) · 15.2 KB
/
mesoudi2015_reanalysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
---
title: "Bayesian reanalysis of Mesoudi et al. (2015)"
author: "Alex Mesoudi"
date: "22 May 2016"
output:
pdf_document:
fig_caption: yes
fig_height: 3
fig_width: 8
number_sections: yes
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(rethinking)
library(ggplot2)
library(ggsci)
library(Rmisc)
# load data from website
HKdata <- read.csv("HKdata.csv")
# select relevant data
d <- HKdata[, c("culture", "age", "sex", "season1_copies", "season2_copies", "season3_copies")]
# re-order cultural groups to match original paper
d$culture <- factor(d$culture, levels=c('UK','HK','CI','CM'))
```
# Introduction
This is a reanalysis of the data presented in Mesoudi, Chang, Murray & Lu (2015), using methods from Richard McElreath's Statistical Rethinking book and `rethinking` package (McElreath 2016). The reanalysis generates virtually identical results to the original analysis, but is hopefully more useful and transparent to other researchers.
In Mesoudi et al. (2015) we found higher rates of social learning in a computer-based artifact-design task amongst participants from mainland China, compared to participants from the UK, from Hong Kong, and Chinese immigrant students studying in the UK. Participants were presented with 29 opportunities to either copy or not copy others during a season of 30 hunts (participants could not copy on the first hunt). Participants could see others' success rates, making social learning payoff-biased and therefore highly beneficial in this challenging task. There were 3 seasons, each with 29 opportunities to copy. Seasons 1 and 2 featured different environments but no within-season environmental change. Season 3 featured a different environment *and* within-season environmental change.
# Visualisation of data
The original paper presented bar charts of the proportion of copying per season as in Figure 1 below (Figure 2 in Mesoudi et al. 2015). We can see the higher copying proportion in Chinese Mainland (CM) participants in Seasons 1 and 2, but not in Season 3. In Season 3 the UK, HK and CI participants increased their copying frequency almost up to CM levels.
However, bar charts can obscure variation in data. Ideally visualisations would include representations of the actual data, and counts rather than proportions. Figures 2-4 show some alternatives. These show that there is a broad spread of copying frequencies within each cultural group, but that in Seasons 1 and 2 there are more high-copying CM participants compared to the other three groups.
```{r echo=FALSE, fig.cap="Bar charts as in Mesoudi et al. (2015). UK=British, HK=Hong Kong, CI=Chinese Immigrants, CM=Chinese Mainland."}
s1plot <- ggplot(d, aes(culture, season1_copies/29)) + stat_summary(fun = mean, geom = "bar", fill = "White", color = "Black") + coord_cartesian(ylim = c(0, 0.42)) + labs(x = "", y = "frequency of copying") + stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.2) + theme_bw() + ggtitle("Season 1") + theme(plot.title = element_text(hjust = 0.5))
s2plot <- ggplot(d, aes(culture, season2_copies/29)) + stat_summary(fun = mean, geom = "bar", fill = "White", color = "Black") + coord_cartesian(ylim = c(0, 0.42)) + labs(x = "", y = "") + stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.2) + theme_bw() + ggtitle("Season 2") + theme(plot.title = element_text(hjust = 0.5))
s3plot <- ggplot(d, aes(culture, season3_copies/29)) + stat_summary(fun = mean, geom = "bar", fill = "White", color = "Black") + coord_cartesian(ylim = c(0, 0.42)) + labs(x = "", y = "") + stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.2) + theme_bw() + ggtitle("Season 3") + theme(plot.title = element_text(hjust = 0.5))
multiplot(s1plot, s2plot, s3plot, cols = 3)
```
```{r echo=FALSE, fig.cap="Jitter and box plots"}
s1plot <- ggplot(d, aes(culture, season1_copies, color = culture)) + geom_boxplot(outlier.shape = NA) + geom_jitter(width = 0.3, alpha = 0.7, size = 0.7) + ggtitle("Season 1") + theme_bw() + theme(legend.position='none', plot.title = element_text(hjust = 0.5))+ scale_color_npg() + ylab("Copying frequency") + xlab("Group")
s2plot <- ggplot(d, aes(culture, season2_copies, color = culture)) + geom_boxplot(outlier.shape = NA) + geom_jitter(width = 0.3, alpha = 0.7, size = 0.7) + ggtitle("Season 2") + theme_bw() + theme(legend.position='none', plot.title = element_text(hjust = 0.5))+ scale_color_npg() + ylab("") + xlab("Group")
s3plot <- ggplot(d, aes(culture, season3_copies, color = culture)) + geom_boxplot(outlier.shape = NA) + geom_jitter(width = 0.3, alpha = 0.7, size = 0.7) + ggtitle("Season 3") + theme_bw() + theme(legend.position='none', plot.title = element_text(hjust = 0.5))+ scale_color_npg() + ylab("") + xlab("Group")
multiplot(s1plot, s2plot, s3plot, cols = 3)
```
```{r echo=FALSE, fig.cap="Count and box plots. The size of the circle indicates the number of participants at that value."}
s1plot <- ggplot(d, aes(culture, season1_copies, color = culture)) + geom_boxplot(outlier.shape = NA) + geom_count(alpha = 0.7) + ggtitle("Season 1") + theme_bw() + theme(legend.position='none', plot.title = element_text(hjust = 0.5)) + scale_color_npg() + ylab("Copying frequency") + xlab("Group")
s2plot <- ggplot(d, aes(culture, season2_copies, color = culture)) + geom_boxplot(outlier.shape = NA) + geom_count(alpha = 0.7) + ggtitle("Season 2") + theme_bw() + theme(legend.position='none', plot.title = element_text(hjust = 0.5))+ scale_color_npg() + ylab("") + xlab("Group")
s3plot <- ggplot(d, aes(culture, season3_copies, color = culture)) + geom_boxplot(outlier.shape = NA) + geom_count(alpha = 0.7) + ggtitle("Season 3") + theme_bw() + theme(legend.position='none', plot.title = element_text(hjust = 0.5))+ scale_color_npg() + ylab("") + xlab("Group")
multiplot(s1plot, s2plot, s3plot, cols = 3)
```
```{r echo=FALSE, fig.cap="Violin plots, showing narrower bases for CM in Seasons 1 and 2, compared to other groups which are bottom-heavy."}
s1plot <- ggplot(d, aes(culture, season1_copies, color = culture)) + geom_violin() + ggtitle("Season 1") + theme_bw() + theme(legend.position='none', plot.title = element_text(hjust = 0.5))+ scale_color_npg() + ylab("Copying frequency") + xlab("Group")
s2plot <- ggplot(d, aes(culture, season2_copies, color = culture)) + geom_violin() + ggtitle("Season 2") + theme_bw() + theme(legend.position='none', plot.title = element_text(hjust = 0.5))+ scale_color_npg() + ylab("") + xlab("Group")
s3plot <- ggplot(d, aes(culture, season3_copies, color = culture)) + geom_violin() + ggtitle("Season 3") + theme_bw() + theme(legend.position='none', plot.title = element_text(hjust = 0.5))+ scale_color_npg() + ylab("") + xlab("Group")
multiplot(s1plot, s2plot, s3plot, cols = 3)
```
# Bayesian re-analysis
```{r echo=FALSE, results='hide'}
# create dummy vars for the cultural groups (UK=reference group)
d$cultureHK <- ifelse( d$culture == "HK", 1, 0)
d$cultureCI <- ifelse( d$culture == "CI", 1, 0)
d$cultureCM <- ifelse( d$culture == "CM", 1, 0)
d$female <- ifelse( d$sex == "Female", 1, 0)
# season 1 models -----------------------------
# null intercept-only model
s1.null.model <- map(
alist(
season1_copies ~ dbinom( 29 , p ) ,
logit(p) <- a,
a ~ dnorm(0,1)
) ,
data=d )
# model with cultural group
s1.culture.model <- map(
alist(
season1_copies ~ dbinom( 29 , p ) ,
logit(p) <- a + bHK*cultureHK + bCI*cultureCI + bCM*cultureCM,
a ~ dnorm(0,1) ,
bHK ~ dnorm(0,1) ,
bCI ~ dnorm(0,1) ,
bCM ~ dnorm(0,1)
) ,
data=d )
# full model with age and sex too
s1.full.model <- map(
alist(
season1_copies ~ dbinom( 29 , p ) ,
logit(p) <- a + bHK*cultureHK + bCI*cultureCI + bCM*cultureCM + bA*age + bF*female,
a ~ dnorm(0,1) ,
bHK ~ dnorm(0,1) ,
bCI ~ dnorm(0,1) ,
bCM ~ dnorm(0,1) ,
bA ~ dnorm(0,0.5) ,
bF ~ dnorm(0,0.5)
) ,
data=d )
# compare models: full model best supported
s1.compare <- compare(s1.null.model, s1.culture.model, s1.full.model)
# precis of full model
precis(s1.full.model)
# HK virtually identical to UK, CI slightly higher, CM much higher
# also effect of sex - females higher than males
# age has slight negative effect (older copy less)
# NB very similar coefficients to models presented in the original paper
# use stan instead - same outcome
# s1.model.stan <- map2stan( s1.full.model , data=d , iter=1e4 , warmup=1000 )
# precis(s1.model.stan)
# pairs(s1.model.stan)
# exponentiate to get relative odds (e.g. CM is 2.24 times more likely to copy than UK; women are 1.37 more likely than men)
exp(coef(s1.full.model))
# season 2 models------------------------------
# null intercept-only model
s2.null.model <- map(
alist(
season2_copies ~ dbinom( 29 , p ) ,
logit(p) <- a,
a ~ dnorm(0,1)
) ,
data=d )
# model with cultural group
s2.culture.model <- map(
alist(
season2_copies ~ dbinom( 29 , p ) ,
logit(p) <- a + bHK*cultureHK + bCI*cultureCI + bCM*cultureCM,
a ~ dnorm(0,1) ,
bHK ~ dnorm(0,1) ,
bCI ~ dnorm(0,1) ,
bCM ~ dnorm(0,1)
) ,
data=d )
# full model with age and sex too
s2.full.model <- map(
alist(
season2_copies ~ dbinom( 29 , p ) ,
logit(p) <- a + bHK*cultureHK + bCI*cultureCI + bCM*cultureCM + bA*age + bF*female,
a ~ dnorm(0,1) ,
bHK ~ dnorm(0,1) ,
bCI ~ dnorm(0,1) ,
bCM ~ dnorm(0,1) ,
bA ~ dnorm(0,0.5) ,
bF ~ dnorm(0,0.5)
) ,
data=d )
# compare models: full model best supported
s2.compare <- compare(s2.null.model, s2.culture.model, s2.full.model)
# precis of full model
precis(s2.full.model)
# HK virtually identical to UK, CI slightly lower, CM much higher
# also effect of sex - females higher than males
# age has no effect
# NB very similar coefficients to models presented in the original paper
# use stan instead - same outcome
# s2.model.stan <- map2stan( s2.full.model , data=d , iter=1e4 , warmup=1000 )
# precis(s2.model.stan)
# pairs(s2.model.stan)
# exponentiate to get relative odds
exp(coef(s2.full.model))
# season 3 models---------------------------------
# null intercept-only model
s3.null.model <- map(
alist(
season3_copies ~ dbinom( 29 , p ) ,
logit(p) <- a,
a ~ dnorm(0,1)
) ,
data=d )
# model with cultural group
s3.culture.model <- map(
alist(
season3_copies ~ dbinom( 29 , p ) ,
logit(p) <- a + bHK*cultureHK + bCI*cultureCI + bCM*cultureCM,
a ~ dnorm(0,1) ,
bHK ~ dnorm(0,1) ,
bCI ~ dnorm(0,1) ,
bCM ~ dnorm(0,1)
) ,
data=d )
# full model with age and sex too
s3.full.model <- map(
alist(
season3_copies ~ dbinom( 29 , p ) ,
logit(p) <- a + bHK*cultureHK + bCI*cultureCI + bCM*cultureCM + bA*age + bF*female,
a ~ dnorm(0,1) ,
bHK ~ dnorm(0,1) ,
bCI ~ dnorm(0,1) ,
bCM ~ dnorm(0,1) ,
bA ~ dnorm(0,0.5) ,
bF ~ dnorm(0,0.5)
) ,
data=d )
# compare models: full model and culture model equally supported
s3.compare <- compare(s3.null.model, s3.culture.model, s3.full.model)
# precis of full model
precis(s3.full.model)
# CI virtually identical to UK, HK and CM are higher
# also effect of sex - females higher than males
# age and sex have very weak effects
# NB very similar coefficients to models presented in the original paper
# use stan instead - same outcome
# s3.model.stan <- map2stan( s3.full.model , data=d , iter=1e4 , warmup=1000 )
# precis(s3.model.stan)
# pairs(s3.model.stan)
# exponentiate to get relative odds (e.g. CM is 2.24 times more likely to copy than UK; women are 1.37 more likely than men)
exp(coef(s3.full.model))
```
The original paper used negative binomial regression to compare copying proportions across cultural groups. These regressions are shown in Table 1 of Mesoudi et al. (2015). Here I use the `map` function from the `rethinking` package to compare cultural groups in an aggregated binomial model. The dependent measure is the number of hunts (out of 29) on which a participant copied another participant. This is performed separately for season 1, 2 and 3, each of which featured 29 opportunities to copy[^1]. For each season three models are compared: an intercept-only model, a model with cultural group as a predictor, and a full model with cultural group, age and sex as predictors. UK is the reference group. The code for the full model of Season 1 is:
[^1]: I do not run a single regression with season as a within-participant factor because, as noted above, Season 3 introduces within-season environmental change and so is not comparable to the others. Seasons 1 and 2 could be combined, but the gain in having one fewer model does not seem to me to outweigh the cost of having coefficients that are harder to interpret.
```{r echo=TRUE, eval=FALSE}
s1.full.model <- map(
alist(
season1_copies ~ dbinom( 29 , p ) ,
logit(p) <- a + bHK*cultureHK + bCI*cultureCI + bCM*cultureCM + bA*age + bF*female,
a ~ dnorm(0,1) ,
bHK ~ dnorm(0,1) ,
bCI ~ dnorm(0,1) ,
bCM ~ dnorm(0,1) ,
bA ~ dnorm(0,0.5) ,
bF ~ dnorm(0,0.5)
) ,
data=d )
```
with Season 2 and 3 identical except for the dependent measures. I also ran these models with MCMC using `map2stan` but the results were virtually identical, so I do not report this here.
## Seasons 1 and 2
For both Seasons 1 and 2 the full model is overwhelmingly best supported:
```{r echo=FALSE, eval=TRUE, comment=""}
s1.compare
s2.compare
```
The full model for Season 1 is shown below, along with exponentiated coefficients to give relative odds (e.g. CM participants are 2.24 times more likely to copy compared to UK participants). HK participants are virtually identical in copying frequency to UK participants. CI are slightly higher, and CM are much higher. Women copy more than men, and age has little effect.
\newpage
\pagebreak
```{r echo=FALSE, eval=TRUE, tidy=TRUE, comment=""}
precis(s1.full.model)
exp(coef(s1.full.model))
```
The full model for Season 2 is shown below and is almost the same as for Season 1. HK participants are again virtually identical to UK participants, CI slightly lower, and CM much higher. Women again copy more than men, and age again has little effect.
```{r echo=FALSE, eval=TRUE, tidy=TRUE, comment=""}
precis(s2.full.model)
exp(coef(s2.full.model))
```
## Season 3
For Season 3, the full model and culture model both received support but the full model slightly more:
```{r echo=FALSE, eval=TRUE, tidy=TRUE, comment=""}
s3.compare
```
The full model is shown below. CI participants are virtually identical to UK participants, while HK and CM are higher. Both age and sex have very weak effects.
```{r echo=FALSE, eval=TRUE, tidy=TRUE, comment=""}
precis(s3.full.model)
exp(coef(s3.full.model))
```
# Summary
The coefficients and confidence intervals shown here are very similar to the original regression model results shown in Mesoudi et al. (2015), but hopefully more straightforward and understandable. I will soon incorporate task performance into the above, to complete the reanalysis of the original paper.
The data file HKdata.csv and the RMarkdown file containing code for running these reanalyses and producing this document is available at:
<https://github.com/amesoudi/mesoudi_chang_murray_lu_2015>
# References
McElreath, R. (2016). Statistical rethinking: A Bayesian course with examples in r and stan. CRC Press.
Mesoudi, A., Chang, L., Murray, K., & Lu, H. (2015). Higher frequency of social learning in China than in the West shows cultural variation in the dynamics of cultural evolution. Proceedings of the Royal Society B, 282, 20142209.