The increase of harassment in online discussion has become a problem for several platforms; social media platforms such as Twitter, Facebook, Instagram, etc. This issue is also present on other platfroms such as Wikipedia and Yahoo, where editors and authors can coordinate work; and users discuss topics in different talk pages.
in 2017 the Wikimedia Foundation in collaboration with Jigsaw wanted to analyse Wikipedia talk page discussion in context with personal attacks, toxicity, and aggression. This research, published as: Ex Machina: Personal Attacks Seen at Scale (https://arxiv.org/abs/1610.08914), is the inspiration behind the Dissertation.
There are three notebooks;
- downloading necessary data and pre-processing,
datafile.ipynb
- baseline model (logistic regression):
baseline.ipynb
- Optuna study: hypertuning.ipynb
For the LSTM model, there is one .py
that can be executed: LSTM_model.py
. The python scripts needs access to a data folder with the datasets.