Release v1.24
2020 US Presidential Election Tweet IDs
The repository contains an ongoing collection of tweets IDs associated with the 2020 United States presidential elections, with our data collection starting on May 20, 2019. We leveraged Twitter’s streaming API to follow specified accounts and also collect in real-time tweets that mention specific keywords. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use. We currently have a backlog of historical twitter files that we are working on pre-processing and extracting Tweet IDs from; we will be releasing both past and future data sets as the data becomes available and as we finish pre-processing the data. Thank you for your patience!
This release contains Tweet IDs collected from 12/01/19 - 4/16/21.
The associated paper to this repository can be found here: #Election2020: The First Public Twitter Dataset on the 2020 US Presidential Election
Data Usage Agreement
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:
Chen, E., Deb, A. & Ferrara, E. #Election2020: the first public Twitter dataset on the 2020 US Presidential election. J Comput Soc Sc (2021). https://doi.org/10.1007/s42001-021-00117-9
The published paper can be found here at the Journal of Computational Social Science.
PDF of paper can also be found here on arXiv: #Election2020: The First Public Twitter Dataset on the 2020 US Presidential Election
Statistics Summary (v1.24)
Number of Tweets : 1,482,120,978
Language breakdown of top 10 most prevalent languages :
Language | ISO | No. tweets | % total Tweets |
---|---|---|---|
English | en | 1,312,617,690 | 88.56% |
Undefined | und | 103,104,355 | 6.96% |
Spanish | es | 23,618,229 | 1.59% |
French | fr | 7,277,329 | 0.49% |
Portuguese | pt | 6,836,086 | 0.46% |
Japanese | ja | 4,578,769 | 0.31% |
Turkish | tr | 2,478,179 | 0.17% |
German | de | 2,252,064 | 0.15% |
Italian | it | 2,032,204 | 0.14% |
Indonesian | in | 1,980,117 | 0.13% |
Known Gaps
Date | Time |
---|---|
12/06/19 | 00:00 - 23:00 UTC |
12/07/19 | 18:00 - 19:00 UTC |
12/07/19 | 18:00 - 19:00 UTC |
12/29/19 | 01:00 - 02:00 UTC |
01/03/20 | 05:00 - 06:00 UTC |
01/05/20 | 02:00 - 24:00 UTC |
01/06/20 | 00:00 - 18:00 UTC |
01/19/20 | 04:00 - 23:00 UTC |
01/25/20 | 01:00 - 20:00 UTC |
01/28/20 | 22:00 - 24:00 UTC |
01/29/20 | 00:00 - 05:00 UTC |
02/02/20 | 03:00 - 18:00 UTC |
02/04/20 | 19:00 - 24:00 UTC |
02/05/20 | 00:00 - 23:00 UTC |
03/05/20 | 23:00 - 24:00 UTC |
03/06/20 | 00:00 - 24:00 UTC |
03/07/20 | 00:00 - 02:00 UTC |
03/09/20 | 19:00 - 22:00 UTC |
03/17/20 | 00:00 - 01:00 UTC |
03/21/20 | 23:00 - 24:00 UTC |
03/25/20 | 22:00 - 24:00 UTC |
03/26/20 | 00:00 - 15:00 UTC |
04/02/20 | 00:00 - 23:00 UTC |
04/12/20 | 00:00 - 15:00 UTC |
04/16/20 | 23:00 - 24:00 UTC |
04/18/20 | 23:00 - 24:00 UTC |
04/19/20 | 00:00 - 05:00 UTC |
04/21/20 | 22:00 - 24:00 UTC |
04/27/20 | 22:00 - 24:00 UTC |
04/28/20 | 01:00 - 14:00 UTC |
04/30/20 | 20:00 - 24:00 UTC |
05/01/20 | 00:00 - 24:00 UTC |
05/02/20 | 00:00 - 04:00 UTC |
05/05/20 | 17:00 - 18:00 UTC |
05/07/20 | 00:00 - 13:00 UTC |
05/17/20 | 23:00 - 24:00 UTC |
05/18/20 | 00:00 - 15:00 UTC |
05/23/20 | 23:00 - 24:00 UTC |
05/26/20 | 21:00 - 24:00 UTC |
05/27/20 | 00:00 - 03:00 UTC |
05/30/20 | 16:00 - 24:00 UTC |
05/31/20 | 00:00 - 06:00 UTC |
06/01/20 | 17:00 - 19:00 UTC |
06/03/20 | 18:00 - 22:00 UTC |
06/04/20 | 21:00 - 24:00 UTC |
06/05/20 | 00:00 - 17:00 UTC |
06/07/20 | 14:00 - 24:00 UTC |
06/08/20 | 00:00 - 04:00 UTC |
06/12/20 | 20:00 - 24:00 UTC |
06/13/20 | 00:00 - 21:00 UTC |
06/16/20 | 17:00 - 24:00 UTC |
06/17/20 | 00:00 - 02:00 UTC |
06/19/20 | 16:00 - 17:00 UTC |
Inquiries
Please read through the README and the closed issues to see if your question has already been addressed first.
If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.
If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.
Related Papers
- Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set Datatset Github
- What types of COVID-19 conspiracies are populated by Twitter bots?
- Political polarization drives online conversations about COVID‐19 in the United States
- Characterizing social media manipulation in the 2020 U.S. presidential election
- COVID-19 misinformation and the 2020 U.S. presidential election