forked from cjbarrie/sicss_22
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path00_auth.qmd
240 lines (151 loc) · 5.29 KB
/
00_auth.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
---
title: "Using APIs"
subtitle: "SICSS, 2022"
author: Christopher Barrie
format:
revealjs:
chalkboard: true
editor: visual
---
## Introduction
- What is an API?
- `Counting tweets`
- Applied examples
- Group work
- `Getting tweets`
- Applied examples
- Group work
## What is an API?
- "Application Programming Interface"
- We'll focus on: Web APIs
## API use over time
![](images/programweb.png){fig-align="center"}
## Early APIs
![](images/ebayAPI.png){fig-align="center"}
## Early APIs
![](images/twitterAPI.png){fig-align="center"}
## Examples
![](images/diggengap.png){fig-align="center" width="406"}
- **APIs used**: Google AdWords; Facebook Marketing
- **Data and code**: <https://www.demographic-research.org/volumes/vol43/27/>
## Examples
![](images/munger1.png){fig-align="center" width="406"}
- **APIs used**: YouTube API
- **Data and code**: <https://github.com/kmunger/YT_descriptive>
## What's an API made of?
- We're talking about *Web APIs*
- "**A**pplication **P**rogramming **I**nterface"
- Helpful to think about compared to scraping...
## Scraping Twitter
![](images/scrape.gif){fig-align="center"}
## Using the Twitter API
![](images/api.gif){fig-align="center"}
## Requesting versus scraping
- **Scraping:**
- Extracts visible content from screen
- Often trial and error as not optimized for serving data ( legibility not readability)
- **API calls**
- Requests content from a "blank" version of page
- Delivers data in more readily usable format (JSON, csv etc.)
## What does it look like?
- Essentially, a long URL string...
```{r, echo = F}
endpoint_url <- "https://api.twitter.com/2/tweets/search/all"
cat(endpoint_url, paste0("+ something + something else"))
```
## What is it made of?
1. A query
2. An endpoint
3. Optional parameters
## What is it made of?
```{r, echo = T}
my_query <- "#BLM lang:EN"
endpoint_url <- "https://api.twitter.com/2/tweets/search/all"
params <- list(
"query" = my_query,
"start_time" = "2021-01-01T00:00:00Z",
"end_time" = "2021-07-31T23:59:59Z",
"max_results" = 20
)
```
## What is it made of?
```{r, echo = T}
my_query <- "#BLM lang:EN"
endpoint_url <- "https://api.twitter.com/2/tweets/search/all"
params <- list(
"query" = my_query,
"start_time" = "2021-01-01T00:00:00Z",
"end_time" = "2021-07-31T23:59:59Z",
"max_results" = 20
)
params
```
## What is it made of?
```{r, echo = T, eval = F}
r <- httr::GET(url = endpoint_url,
httr::add_headers(
Authorization = paste0("bearer ", Sys.getenv("TWITTER_BEARER"))),
query = params)
r[["url"]]
```
```
[1] "https://api.twitter.com/2/tweets/search/all?query=%23BLM%20lang%3AEN&start_time=2021-01-01T00%3A00%3A00Z&end_time=2021-07-31T23%3A59%3A59Z&max_results=20"
```
## But we don't often do this...
- Pre-packaged libraries more common
- E.g., `academictwitteR`
- Co-developed with [Justin Ho](https://github.com/justinchuntingho) and [Chung-Hong Chan](https://github.com/chainsawriot)
![](images/athex.png){fig-align="right" width="300"}
## So what we'll do is ...
- We'll focus on `academictwitteR`
- But we'll also be thinking later more generally about:
- APIs
- research design
## Analyzing Twitter data
Why?
1. Important site of political communication
2. Important site of social interaction
3. It's very accessible
Especially, since intro. of **Academic Research Product Track**...
## Twitter Academic API
- 10m tweets monthly
- Full historical archive
- Authorization requires...
## Authorization
Go [here](https://developer.twitter.com/en/products/twitter-api/academic-research)
1. Do you have an academic profile?
2. Can you describe your research?
- Will you share with government entity?
- Is the research non-commercial?
- Is it conducted at individual level?
- Will you make non-anon. data available?
## What you'll see
![](images/twitterdev.png){fig-align="center"}
Or a live example...
## What you'll see
This is what the [Developer Portal](https://developer.twitter.com/en/portal/dashboard) looks like
## Getting started...
![](images/TWITTER_BEARER.gif){fig-align="center"}
Don't forget to **Restart R**
## After we've done that...
We're good to go 🥳
## After we've done that...
We're good to go 🥳...
Sort of
## Ethical considerations
![](images/dectree.png){fig-align="center"}
## Ethical considerations
::: {layout-ncol="2"}
![](images/bitbybit.png)
"I don't think researchers should not be automatically bound by such terms-of-service agreements. Ideally, if researchers violate terms- of-service agreements, they should explain their decision openly... as suggested by transparency-based accountability. But this openness may expose researchers to added legal risk..."
:::
## Ethical considerations
::: {layout-ncol="2"}
![](images/freelon.png)
""By employing TOS- compliant methods, you are respecting the business prerogatives of the company that created the platform you are studying, but you may or may not be respecting the dignity and privacy of the platform's users" "
:::
## Summary
- Difference between TOS compliance and ethics
- But related: users expect privacy
- Difference between ethics and legality
- But related: users may be vulnerable