Dynamics of social media behavior before and after SARS-CoV-2 infection

Methods

The first task is to identify Twitter users who reported that they tested positive to Covid-19. This step is achieved with positive_filter.py(which depends on filters.py). The so-called test-positive tweets are stored under data/positive in daily Parquet files and are then grouped in a single file (data/df_positive.pkl). The Twitter timelines of the selected users are then retrieved with download_timelines.py and stored (Pickle files in data/timelines/raw). The script parse_timelines.py is then used to parse the raw timelines (in JSON Line files) and store the output data in Parquet files. The following analyses are applied to the parsed timelines:

Tagging of tweets containing symptoms (timeline_medcat.py). Tweets are tagged with MedCAT. We used sampling_for_comparison.py to sample 100 tweets for the comparison of MedCAT with the lexicon-based approach developed by Sarker et al.
Temporal assessment of the self-reports of symptoms through Named Entity Recognition with SUTime (time_extract.py)
Filtering self-reports of symptoms (cf. reporting_classification folder)
Domain analysis of shared URLs (timeline_url.py)
Multi-label classification of the tweets into different general topics (cf. topic_classification folder)
Multi-label classification of tweets according to the expressed emotions (cf. SpanEmo folder)

The results of these various analyses are collected and concatenated with timeline_combine_all.py, which enables to generate user-specific files in data/language/all_timelines.

Pre/post comparisons

After the tweets of the users who reported that they tested positive to Covid-19 are processed with the various ML-based methods described above, the output files are stored in data/language/all_timelines. Individual-level pre/post comparisons related to these data are then performed with statistical_analysis.py. The collective analyses consist of Wilcoxon signed-rank tests, as detailed in wilcoxon_features.R and adjusted_pvalues.py.

It should be noted that the collective analyses should be performed after executing statistical_analysis.py since the latter script contains a few preprocessing steps required for filtering the users retained in the pre/post comparisons. More information about the output of statistical_analysis.py is provided here.

Figures

The figures shown in the article can be generated as follows:

Figure 1: analysis_positive_tweets.py
Figure 2: RankFlow visualization tool
Figures 3, 4, and 5: plot_median_differences.py
Supplementary Figure 1: generate_causal_impact_figure.py
Supplementary Figure 2: analysis_positive_tweets.py
Supplementary Figure 3: timeline_symptoms.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dynamics of social media behavior before and after SARS-CoV-2 infection

Methods

Pre/post comparisons

Figures

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
SpanEmo		SpanEmo
URLcategorization		URLcategorization
data		data
preprocess		preprocess
reporting_classification		reporting_classification
results_statistical_analysis		results_statistical_analysis
topic_classification		topic_classification
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
adjust_pvalues.py		adjust_pvalues.py
analysis_positive_tweets.py		analysis_positive_tweets.py
download_timelines.py		download_timelines.py
filters.py		filters.py
generate_causal_impact_figure.py		generate_causal_impact_figure.py
parse_timelines.py		parse_timelines.py
plot_median_differences.py		plot_median_differences.py
positive_filter.py		positive_filter.py
sampling_for_comparison.py		sampling_for_comparison.py
settings.cfg		settings.cfg
statistical_analysis.py		statistical_analysis.py
time_extract.py		time_extract.py
timeline_combine_all.py		timeline_combine_all.py
timeline_medcat.py		timeline_medcat.py
timeline_url.py		timeline_url.py
utils.py		utils.py
wilcoxon_features.R		wilcoxon_features.R

License

digitalepidemiologylab/content_changes_paper

Folders and files

Latest commit

History

Repository files navigation

Dynamics of social media behavior before and after SARS-CoV-2 infection

Methods

Pre/post comparisons

Figures

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages