Cascade Influence

This repository contains:

The scripts to estimate user influence from Twitter information cascades (i.e. Cas.In);
A small dataset of 20 cascades for testing Cas.In;
A hands-on tutorial to walk you through running Cas.In on real cascades.

Citation

The algorithm was introduced in the paper:

Rizoiu, M.-A., Graham, T., Zhang, R., Zhang, Y., Ackland, R., & Xie, L. (2018). #DebateNight: The Role and Influence of Socialbots on Twitter During the 1st 2016 U.S. Presidential Debate. In Proc. International AAAI Conference on Web and Social Media (ICWSM ’18) (pp. 1–10). Stanford, CA, USA.
pdf at arxiv with supplementary material

Bibtex

@inproceedings{rizoiu2018debatenight,
    address = {Stanford, CA, USA},
    author = {Rizoiu, Marian-Andrei and Graham, Timothy and Zhang, Rui and Zhang, Yifei and Ackland, Robert and Xie, Lexing},
    booktitle = {International AAAI Conference on Web and Social Media (ICWSM '18)},
    title = {{{\#}DebateNight: The Role and Influence of Socialbots on Twitter During the 1st 2016 U.S. Presidential Debate}},
    url = {https://arxiv.org/abs/1802.09808},
    year = {2018}
}

License

Both dataset and code are distributed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license, a copy of which can be obtained following this link. If you require a different license, please contact Yifei Zhang, Marian-Andrei Rizoiu or Lexing Xie.

How to run Cas.In in a terminal:

Required packages:

python3
numpy
pandas

Arguments of Cas.In:

--cascade_path : the path of cascade file (see the format here below).

--time_decay : the coefficient value of time decay (hyperparameter $r$ in the paper). Default:-0.000068

--save2csv : save result to csv file. Default: False

Command:

cd scripts
python3 influence.py --cascade_path path/to/file

File format and toy dataset

Dataset

We provide a toy dataset -- dubbed SMH -- for testing Cas.In. It was collected in 2017 by following the Twitter handle of the Sydney Morning Herald newspaper (tweets and retweets mentioning SMH or linking to an article from SMH).

The data contains 20 cascades (one file per cascade). We annonymized the user_id (as per Twitter's ToS) by mapping original values to a sequence from 0 to n, while preserving the identity of users across cascades.

The format cascade files:

A csv file with 3 columns (time, magnitude, user_id), where each row is a tweet in the cascade:
- time represents the timestamp of tweet -- the first tweet is always at time zero, for the following retweets it shows the offset in seconds from the initial tweet;
- magnitude is the local influence of the user (here the number of followers);
- user_id the id of the user emitting the tweet (here annonymized).
The rows in the file (i.e. the tweets) are sorted by the timestamp;

eg:

time,magnitude,user_id 
0,4674,"0"
321,1327,"1"
339,976,"2"
383,477,"3"
699,1209,"4"
824,119,"5"
835,1408,"6"
1049,896,"7"

Cascade influence tutorial

Next, we drive you through using Cas.In for estimating user influence starting from a single cascade.

Preliminary

We need to first load all required packages of cascade influence.

cd scripts

import pandas as pd
import numpy as np
from casIn.user_influence import P,influence

Compute influence in one cascade

Read data

Load the first cascade in the SMH toy dataset:

cascade = pd.read_csv("../data/SMH/SMH-cascade-0.csv")
cascade.head()

	time	magnitude	user_id
0	0	991	419
1	127	1352	658
2	2149	2057	264
3	2465	1155	1016
4	2485	1917	790

Compute matrix P

We first need to compute the probabilities $p_{ij}$ , where $p_{ij}$ is the probability that $j^{th}$ tweet is a direct retweet of the $i^{th}$ (see the paper for more details). We need to specify the hyper-parameter , the time decay coefficient. Here we choose .

p_ij = P(cascade,r = -0.000068)

Compute user influence and matrix M

The function influence() will return an array of influences for each user and the matrix $M = m_{ij}$ , where $m_{ij}$ is the influence of the $i^{th}$ tweet of the $j^{th}$ tweet (direct and indirect).

inf, m_ij = influence(p_ij)

Link influence with user_id

Now, we add the computed user influence back to the pandas data structure.

cascade["influence"] = pd.Series(inf)
cascade.head()

	time	magnitude	user_id	influence
0	0	991	419	60.000000
1	127	1352	658	34.590370
2	2149	2057	264	29.656122
3	2465	1155	1016	13.535845
4	2485	1917	790	15.913873

Compute influence over multiple cascades

Load function

The function casIn() compute influence in one cascade, which basically contain all the steps described above

from casIn.user_influence import casIn
influence = casIn(cascade_path="../data/SMH/SMH-cascade-0.csv",time_decay=-0.000068)
influence.head()

	time	magnitude	user_id	influence
0	0	991	419	60.000000
1	127	1352	658	34.590370
2	2149	2057	264	29.656122
3	2465	1155	1016	13.535845
4	2485	1917	790	15.913873

Load multiple cascades

The SMH toy dataset contains 20 cascades for testing out Cas.In. Let's load all of them:

cascades = []
for i in range(20):
    inf = casIn(cascade_path="../data/SMH/SMH-cascade-%d.csv" % i,time_decay=-0.000068)
    cascades.append(inf)
cascades = pd.concat(cascades)

Compute user influence in multiple cascades

The influence of a user is by definition the mean influence of the tweets they emit. We compute the user influence as follows:

result = cascades.groupby("user_id").agg({"influence" : "mean"})
result.sort_values("influence",ascending=False).head()

	influence
user_id
734	214.000000
1225	205.000000
755	190.554571
60	189.557461
581	141.033129

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
scripts		scripts
.gitignore		.gitignore
README.md		README.md
Tutorial.ipynb		Tutorial.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cascade Influence

Citation

License

How to run Cas.In in a terminal:

Required packages:

Arguments of Cas.In:

Command:

File format and toy dataset

Dataset

The format cascade files:

Cascade influence tutorial

Preliminary

Compute influence in one cascade

Read data

Compute matrix P

Compute user influence and matrix M

Link influence with user_id

Compute influence over multiple cascades

Load function

Load multiple cascades

Compute user influence in multiple cascades

About

Releases

Packages

Contributors 2

Languages

computationalmedia/cascade-influence

Folders and files

Latest commit

History

Repository files navigation

Cascade Influence

Citation

License

How to run Cas.In in a terminal:

Required packages:

Arguments of Cas.In:

Command:

File format and toy dataset

Dataset

The format cascade files:

Cascade influence tutorial

Preliminary

Compute influence in one cascade

Read data

Compute matrix P

Compute user influence and matrix M

Link influence with user_id

Compute influence over multiple cascades

Load function

Load multiple cascades

Compute user influence in multiple cascades

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages