Skip to content

Analysis of changes measured on Twitter data of users reporting their infection to SARS-CoV-2

License

Notifications You must be signed in to change notification settings

digitalepidemiologylab/content_changes_paper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dynamics of social media behavior before and after SARS-CoV-2 infection

Methods

The first task is to identify Twitter users who reported that they tested positive to Covid-19. This step is achieved with positive_filter.py(which depends on filters.py). The so-called test-positive tweets are stored under data/positive in daily Parquet files and are then grouped in a single file (data/df_positive.pkl). The Twitter timelines of the selected users are then retrieved with download_timelines.py and stored (Pickle files in data/timelines/raw). The script parse_timelines.py is then used to parse the raw timelines (in JSON Line files) and store the output data in Parquet files. The following analyses are applied to the parsed timelines:

The results of these various analyses are collected and concatenated with timeline_combine_all.py, which enables to generate user-specific files in data/language/all_timelines.

Pre/post comparisons

After the tweets of the users who reported that they tested positive to Covid-19 are processed with the various ML-based methods described above, the output files are stored in data/language/all_timelines. Individual-level pre/post comparisons related to these data are then performed with statistical_analysis.py. The collective analyses consist of Wilcoxon signed-rank tests, as detailed in wilcoxon_features.R and adjusted_pvalues.py.

It should be noted that the collective analyses should be performed after executing statistical_analysis.py since the latter script contains a few preprocessing steps required for filtering the users retained in the pre/post comparisons. More information about the output of statistical_analysis.py is provided here.

Figures

The figures shown in the article can be generated as follows:

About

Analysis of changes measured on Twitter data of users reporting their infection to SARS-CoV-2

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages