Skip to content

Latest commit

 

History

History
51 lines (38 loc) · 4.16 KB

paper.md

File metadata and controls

51 lines (38 loc) · 4.16 KB
title tags authors affiliations date bibliography
academictwitteR: an R package to access the Twitter Academic Research Product Track v2 API endpoint
R
twitter
social media
API
name orcid affiliation
Christopher Barrie
0000-0002-9156-990X
1
name orcid affiliation
Justin Chun-ting Ho
0000-0002-7884-1059
2
name index
School of Social and Political Sciences, University of Edinburgh, Scotland, UK.
1
name index
Centre for European Studies and Comparative Politics, Sciences Po, France.
2
23 April 2021
paper.bib

Statement of need

In January, 2021, Twitter announced the "Academic Research Product Track." This provides academic researchers with greatly expanded access to Twitter data. Existing R packages for querying the Twitter API, such as the popular rtweet package [@rtweet], are yet to introduce functionality to allow users to connect to the new v2 API endpoints with Academic Research Product Track credentials. The academictwitteR package [@academictwitteR] is built with academic research in mind. It encourages efficient and responsible storage of data, given the likely large amounts of data being collected, as well as a number of shortcut and query building functions to access new v2 API endpoints.

Summary

The Twitter Application Programming Interface, or API, was first introduced in 2006. It was designed principally with commercial objectives in mind. Over time, however, researchers began to repurpose the Twitter API for academic ends. In January, 2021, Twitter announced the "Academic Research Product Track", noting that "[t]oday, academic researchers are one of the largest groups of people using the Twitter API."

Authorization for the Academic Research Product Track provides access to the Twitter v2 API endpoints, introduced in 2020, as well as much improved data access. In summary the Academic Research product track allows the authorized user:

  1. Access to the full archive of (as-yet-undeleted) tweets published on Twitter;
  2. A higher monthly tweet cap (10m---or 20x what was previously possible with the standard v1.1 API);
  3. Ability to access these data with more precise filters permitted by the v2 API.

The academictwitteR package was designed: 1) to make the Academic Research Product Track easily accessible for R users by providing dedicated functions to query the the v2 API endpoints; 2) to encourage academic researchers efficiently and safely to store their data.

<<<<<<< HEAD The functions allow the user to collect tweets from (or to) specified users and to collect tweets containing specified words or sets of words. In particular, queries that include so-called "conjunction-required"" operators can also be accessed via a set of shortcut functions for accessing e.g. tweets containing media content, tweets containing geographic location information, or tweets containing urls. Additionally, separate query builder functions allow the user to specify complex queries to incorporate into the API call.

The functions allow the user to collect tweets from (or to) specified users and to collect tweets containing specified words or sets of words. In particular, queries that include so-called "conjunction-required" operators can also be accessed via a set of shortcut functions for accessing e.g. tweets containing media content, tweets containing geographic location information, or tweets containing urls. Additionally, separate query builder functions allow the user to specify complex queries to incorporate into the API call.

a55475f5d8ea852e5173b04c3eec68a35762d0b8

Data is stored in serialized form as RDS files or as separate JSON files. The former represents the most efficient storage solution for native R data-file formats; the latter helps mitigate loss by storing data as separate JSONs for each pagination token (or up to 500 tweets). Convenience functions are also included to bind tweet- and user-level information stored as JSON files, and to pick up data collection where it left off in the case of unplanned interruption.

References