-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTwitterAnalysis.Rmd
222 lines (191 loc) · 7.42 KB
/
TwitterAnalysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
---
title: "Twitter Analysis"
author: "Jess Robinson"
date: "June 9, 2019"
output: github_document
---
```{r global_options}
#settings
knitr::opts_chunk$set(fig.path='Figs/',
warning=FALSE, message=FALSE)
```
```{r setup}
#load libraries
library(tidyverse)
library(readr)
library(here)
library(knitr)
library(rtweet)
library(tidytext)
library(ggwordcloud)
set.seed(1234)
theme_set(theme_minimal())
```
#State Leaders
For the first part of my homework, I analyzed the tweets of a convenience sample of state leaders from English-speaking countries. These included the presidents or prime ministers of the United States, the United Kingdom, Nigeria, South Africa, and Australia.
##Load Data
```{r}
may <- get_timeline(user = "theresa_may", n = 1000) %>%
select(screen_name, text, created_at)
morrison <- get_timeline(user = "ScottMorrisonMP", n = 1000) %>%
select(screen_name, text, created_at)
buhari <- get_timeline(user = "MBuhari", n = 1000) %>%
select(screen_name, text, created_at)
trump <- get_timeline(user = "realDonaldTrump", n = 1000) %>%
select(screen_name, text, created_at)
ramaphosa <- get_timeline(user = "CyrilRamaphosa", n = 1000) %>%
select(screen_name, text, created_at)
```
##Make List
```{r}
state_leaders <- c("Trump", "May", "Morrison", "Buhari", "Ramaphosa")
state_tweets <- list(trump, may, morrison, buhari, ramaphosa) %>%
set_names(state_leaders) %>%
map_df(as_tibble, .id = "State Leader") %>%
group_by(screen_name)
```
##Tidy Text
```{r}
tidy_tweets <- state_tweets %>%
unnest_tokens(word, text) %>%
anti_join(stop_words, by = "word") %>%
filter(word != "t.co") %>%
filter(word != "https")
```
##Word Cloud
```{r}
tidy_tweets %>%
group_by(word) %>%
count(sort = TRUE) %>%
ungroup() %>%
top_n(n = 150, wt = n) %>%
mutate(angle = 90 * sample(c(0, 1), n(), replace = TRUE, prob = c(70, 30))) %>%
ggplot(aes(label = word, size = n, angle = angle)) +
geom_text_wordcloud_area(rm_outside = TRUE) +
scale_size(range = c(2, 15)) +
ggtitle("Most Frequent Words in Tweets by State Leaders") +
theme_minimal()
```
This world cloud shows the most frequent words used overall in these twitter accounts. Unsurprisingly, words like "president", "people", and the names of political parties occur quite often.
##Most Frequent Words by Leader
```{r}
tidy_tweets %>%
count(screen_name, word) %>%
arrange(desc(n)) %>%
group_by(screen_name) %>% #
mutate(top = seq_along(word)) %>%
filter(top <= 15) %>%
ggplot(aes(x = -top, y = n, fill = screen_name)) +
geom_col(color = "black") +
geom_text(aes(label = word), hjust = "left", nudge_y = 100) +
labs(title = "Most Frequent Words by State Leader",
x = NULL,
y = "Word Count") +
facet_wrap( ~ screen_name) +
coord_flip() +
theme(legend.position = "none",
axis.text.x = element_text(angle = 45, hjust = 1),
axis.ticks.y = element_blank(),
axis.text.y = element_blank())
```
These charts show the most commons terms used by each state leader. Of interest is which domestic topics state leaders tend to talk about, such as Mueller on Trump's twitter and Brexit on May's twitter. Also of interest is how some state leaders talk about their own country very much, as in the case of Nigeria, versus how much other state leaders talk more about other countries, as is the case in Mexico.
##Sentiment Anaylsis
```{r}
(tweets_nrc <- tidy_tweets %>%
inner_join(get_sentiments("nrc")) %>%
group_by(screen_name, sentiment)) %>%
ungroup %>%
count(word, sentiment) %>%
arrange(desc(n)) %>%
group_by(sentiment) %>% #
mutate(top = seq_along(word)) %>%
filter(top <= 15) %>%
ggplot(aes(x = -top, y = n, fill = sentiment)) +
geom_col(color = "black") +
geom_text(aes(label = word), hjust = "left", nudge_y = 100) +
labs(title = "Most frequent Words in State Leaders Twitters by Sentiment",
x = NULL,
y = "Word Count") +
facet_wrap( ~ sentiment, ncol = 5) +
coord_flip() +
theme(legend.position = "none",
axis.text.x = element_text(angle = 45, hjust = 1),
axis.ticks.y = element_blank(),
axis.text.y = element_blank())
```
The next step I took was to perform a sentiment analysis of the tweets of these world leaders. One surprising finding was the degree to which government was a negative or fearful word, while deal was joyful and suprising.
#Brazil
After performing this analysis, I became curious if a similar strategy could be used to investigate tweets by politicians in Brazil. For the second half of this assignment, I looked at trends in the words used by Brazilian politicians.
##Load Data
```{r}
bolsanaro <- get_timeline(user = "jairbolsonaro", n = 1000) %>%
select(screen_name, text, created_at)
moro <- get_timeline(user = "SF_Moro", n = 1000) %>%
select(screen_name, text, created_at)
alves <- get_timeline(user = "DamaresAlves", n = 1000) %>%
select(screen_name, text, created_at)
weintraub <- get_timeline(user = "AbrahamWeint", n = 1000) %>%
select(screen_name, text, created_at)
araujo <- get_timeline(user = "ernestofaraujo", n = 1000) %>%
select(screen_name, text, created_at)
carvalho <- get_timeline(user = "OdeCarvalho", n = 1000) %>%
select(screen_name, text, created_at)
```
##Make List
```{r}
brazil_leaders <- c("Bolsanaro", "Moro", "Alves", "Weintraub", "Araujo", "Carvalho")
brazil_tweets <- list(bolsanaro, moro, alves, weintraub, araujo, carvalho) %>%
set_names(brazil_leaders) %>%
map_df(as_tibble, .id = "brazil_leader") %>%
group_by(screen_name)
```
##Tidy Text
```{r}
tidy_brazil <- brazil_tweets %>%
unnest_tokens(word, text) %>%
anti_join(stop_words, by = "word") %>%
filter(word != "t.co") %>%
filter(word != "https")%>%
filter(word != "amp")
```
##Word Cloud
```{r}
tidy_brazil %>%
group_by(word) %>%
count(sort = TRUE) %>%
ungroup() %>%
top_n(n = 150, wt = n) %>%
mutate(angle = 90 * sample(c(0, 1), n(), replace = TRUE, prob = c(70, 30))) %>%
ggplot(aes(label = word, size = n, angle = angle)) +
geom_text_wordcloud_area(rm_outside = TRUE) +
scale_size(range = c(2, 15)) +
ggtitle("Most Frequent Words in Tweets by Brazilian Leaders") +
theme_minimal()
```
Nope!
Because stop_words only filters out English words, the main words used by the Brazilian politicians that this word cloud display are all common prepositions, articles, and other filler words in Portuguese.
Without a corpus by which to filter out these words, this analysis doesn't yield particularly interesting results.
##Most Frequent Words - Brazil
```{r}
tidy_brazil %>%
count(screen_name, word) %>%
arrange(desc(n)) %>%
group_by(screen_name) %>% #
mutate(top = seq_along(word)) %>%
filter(top <= 15) %>%
ggplot(aes(x = -top, y = n, fill = screen_name)) +
geom_col(color = "black") +
geom_text(aes(label = word), hjust = "left", nudge_y = 100) +
labs(title = "Most Frequent Words by Brazilian Leader",
x = NULL,
y = "Word Count") +
facet_wrap( ~ screen_name) +
coord_flip() +
theme(legend.position = "none",
axis.text.x = element_text(angle = 45, hjust = 1),
axis.ticks.y = element_blank(),
axis.text.y = element_blank())
```
That said, it was interesting to see which politicians used certain prepositions more than others.
For instance, Bolsanaro uses "de" much more than the other Brazilian politicans.
He also discusses "Brasil" more than the other politicans do. Another interesting outlier is Araujo's frequent mention of Venezuela, which rivals his rate of Brasil mentions.s