Repo containing code to for R package academictwitteR to collect tweets from v2 API endpoint for the Academic Research Product Track.
Get started by reading vignette("academictwitteR")
.
To cite package ‘academictwitteR’ in publications use:
Christopher Barrie and Justin Chun-ting Ho (2021). academictwitteR:
an R package to access the Twitter Academic Research
Product Track v2 API endpoint. R package version 0.0.0.9000.
https://github.com/cjbarrie/academictwitteR. doi:10.5281/zenodo.4714637
A BibTeX entry for LaTeX users is
@Manual{academictwitteR,
title = {academictwitteR: an R package to access the Twitter Academic Research Product Track v2 API endpoint},
author ={Christopher Barrie and Justin Chun-ting Ho},
year = 2021,
note = {R package version 0.0.0.9000},
url = {https://github.com/cjbarrie/academictwitteR},
doi = {10.5281/zenodo.4714637}
}
You can install the development package with:
devtools::install_github("cjbarrie/academictwitteR")
The academictwitteR package has been designed with the efficient storage of data in mind. Queries to the API include arguments to specify whether tweets be stored as a .rds file using the file
argument or as separate JSON files for tweet- and user-level information separately with argument data_path
.
Tweets are returned as a data.frame object and, when a file
argument has been included, will also be saved as a .rds file.
Getting tweets of specified users via get_user_tweets()
. This function captures tweets for a particular user or set of users and collects tweets between specified date ranges, avoiding rate limits by sleeping between calls. A call may look like:
bearer_token <- "" # Insert bearer token
users <- c("TwitterDev", "jack")
tweets <-
get_user_tweets(users,
"2010-01-01T00:00:00Z",
"2020-01-01T00:00:00Z",
bearer_token)
Getting tweets of specified string or series of strings via get_all_tweets()
. This function captures tweets containing a particular string or set of strings between specified date ranges, avoiding rate limits by sleeping between calls.
This function can also capture tweets for a particular hashtag or set of hashtags when specified with the # operator.
For a particular set of strings a call may look like:
bearer_token <- "" # Insert bearer token
tweets <-
get_all_tweets("apples OR oranges",
"2020-01-01T00:00:00Z",
"2020-01-05T00:00:00Z",
bearer_token)
For a particular set of hashtags a call may look like:
bearer_token <- "" # Insert bearer token
tweets <-
get_all_tweets(
"#BLM OR #BlackLivesMatter",
"2020-01-01T00:00:00Z",
"2020-01-05T00:00:00Z",
bearer_token
)
Note that the "AND" operator is implicit when specifying more than one character string in the query. See here for information on building queries for search tweets.
Thus, when searching for all elements of a character string, a call may look like:
bearer_token <- "" # Insert bearer token
tweets <-
get_all_tweets("apples oranges",
"2020-01-01T00:00:00Z",
"2020-01-05T00:00:00Z",
bearer_token)
, which will capture tweets containing both the words "apples" and "oranges." The same logic applies for hashtag queries.
Files are stores as JSON files in specified directory when a data_path
is specified. Tweet-level data is stored in files beginning "data_"; user-level data is stored in files beginning "users_".
If a filename is supplied, the functions will save the resulting tweet-level information as a .rds file.
Functions always return a data.frame object unless a data_path
is specified and bind_tweets
is set to FALSE
. When collecting large amounts of data, we recommend using the data_path
option with bind_tweets = FALSE
. This mitigates potential data loss in case the query is interrupted.
An example of such a query would be:
bearer_token <- "" # Insert bearer token
tweets <-
get_all_tweets(
"#BLM OR #BlackLivesMatter",
"2014-01-01T00:00:00Z",
"2020-01-01T00:00:00Z",
bearer_token,
data_path = "data/",
bind_tweets = FALSE
)
, which would collect all tweets containing the hashtags "#BLM" or "BlackLivesMatter" over a six-year period.
Users can then use the bind_tweet_jsons
and bind_user_jsons
convenience functions to bundle the jsons into a data.frame object for analysis in R as such:
tweets <- bind_tweet_jsons(data_path = "data/")
users <- bind_user_jsons(data_path = "data/")
For more information on the parameters and fields available from the v2 Twitter API endpoint see: https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-all.
The API call returns both the tweet data and the user information separately, but currently only the former is parsed. It is possible to obtain other user information such as user handle and display name. These can then be merged with the dataset using the author_id field.
bearer_token <- "" # Insert bearer token
users <- c("TwitterDev", "jack")
tweets_df <-
get_user_tweets(users,
"2020-01-01T00:00:00Z",
"2020-01-05T00:00:00Z",
bearer_token)
users_df <-
get_user_profile(unique(tweets_df$author_id), bearer_token)
Function originally taken from Gist by https://github.com/schochastics.