This dataset contains tweets and users mostly from the Portuguese Twittersphere. The watched users stem from a seed of political accounts (usernames
) and news sources(usernames_news
) (see list in the config below)
Property | Description |
---|---|
event | Portuguese Presidential Elections, Jan 24th 2021 on Wikipedia |
collection start | Sep 2nd 2020 |
collection end | Jan 30th 2021 |
code release | v1.0 |
code branch archive | branch |
tweets | 57 155 221 (57 million) |
users | 1 115 491 (1 million) |
labels | 1857 suspended users (along with time_suspended date, and time_unsuspended in case of account reactivation) |
archived API for EW UI | election-watch-portugal-presidentials-2021 |
download link | download 9.5GB zip |
restore command | mongorestore --uri="mongodb://localhost:27017/" -d ew_db ./election-watch-folder --gzip |
The dataset contains labels on suspended users, and also contains a property tweeted_languages
that contains aggregated values of tweets per language (as detected by Twitter).
This dataset contains a period of 5 days 19/10/2020
- 24/10/2020
where the collection process was hindered, as you can see in the picture below.
When you use any of these datasets on your work, please cite the thesis this is all based on, here's the bibtex
version:
@masterthesis{ramalho2021highlevel,
title={High-level Approaches to Detect Malicious Political Activity on Twitter},
author={Miguel Sozinho Ramalho},
year={2021},
eprint={2102.04293},
archivePrefix={arXiv},
primaryClass={cs.SI}
}
{
"seed": {
"usernames": ["CeuAlbuquerque", "HelgaCorreia2", "cli_as", "ainterna_pt", "_jalmeida_", "jmpureza", "Jesario1", "_tinoderans_", "ascenso_simoes", "moisesscf", "PSantanaLopes", "francisco__rs", "Partido_PAN", "proque_twit", "JoanaMortagua", "anamiguel1981", "coelho_lima1", "Telmo_Correia", "EBrilhanteDias", "KatarMoreira2", "_ERGUE_TE", "AnaPassosFaro", "FirminoMarquesB", "economia_pt", "cristovaonorte", "1956purp", "MinistroCabrita", "OsVerdes", "pdr_coimbra", "LiberalPT", "limacosta", "jprebelo_sejd", "MRMortagua", "RuiRioPSD", "cultura_pt", "partido_alianca", "JooPaul57839990", "EsquerdaNet", "carlitosbras", "pcp_pt", "ebarrocomelo", "PedroFgSoares", "AlexandraNViei1", "anabela_pedroso", "Diogo_Leao", "zmaglh", "partidochega", "ruitavares", "catarinarf", "filipenb", "AndreCVentura", "MigCMatos", "RBaptistaLeite", "paulorios65", "andrecventura", "govpt", "aapbatista", "JoaoAtaide", "DuarteMarques", "gracafonseca", "alberto_machado", "ambiente_pt", "JorgePauloOliv2", "AntonioFilipe", "FernandoRuasPE", "_CDSPP", "antoniocostapm", "DuarteCordeiro", "tbribeiro", "heloisapolonia", "PartidoTerraMPT", "pedrosizavieira", "jvstorres", "lnes_Sousa_Real", "catarina_mart", "mariofcenteno", "LaraFMartinho", "Alexandre_Poco", "jlcarneiro2009", "coelhopresident", "ppdpsd", "MariaManuelRola", "LuisVPMonteiro", "LIVREpt", "cdupcppev", "Educacao_PT", "justica_pt", "editeestrela", "monicaquintela3", "movimentojpp", "noscidadaos", "HortenseMartins", "defesa_pt", "Ana_M_MG", "PNSpedronuno", "CristasAssuncao", "jpintocoelho60", "HugCarvalho", "luismtesta", "psocialista", "AnaMartinsGomes", "BrunoARFialho", "joao_ferreira33", "mmatias_", "_tinoderans_", "LiberalMayan"],
"usernames_news": ["lusa_noticias", "cmjornal", "dntwit", "JornalNoticias", "dnoticiaspt", "AO_Online", "Publico", "SICNoticias", "observadorpt", "tvi24pt", "RTPNoticias", "expresso", "Renascenca", "Radio_Comercial", "ojeconomico", "ECO_PT", "dinheiro_vivo", "SolOnline", "Visao_pt", "itwitting"],
},
"collection": {
"seed": {
"friends": false,
"followers": true
},
"limits": {
"max_watched_users": 1000000,
"max_daily_increase": 5000,
"max_daily_increase_ratio": 0.1,
"min_appearances_before_watched": 100
},
"ignore_tweet_media": true,
"oldest_tweet": "Tue Sep 1 00:00:00 +0000 2020",
"search_languages": ["pt", "und"],
"max_threads": 16,
"min_tweets_before_restricting_by_language": 10
},
"mongodb": {
"database": "electionswatch",
},
}