This repository is a fork of the original athene_system which stood at the second number in the FNC-1 challenge. I have changed the dataset, by using SemEval2016 Task 6 data.
SemEval2016 Task 6 - Stance Detection Challenge - Dataset is a set of pairs of Target-Tweet labelled with FAVOR,AGAINST, NONE categories.
FNC-1 - Stance Detection Challenge - Dataset is a set of pairs of News Article-Headline labelled with AGREES,DISAGREES, DISCUSSES, UNRELATED categories.
We had to change the nueral networks first and last layers in order to accomodate th echanges of the dataset.
Following are the instruction as it from the original reporsitory of Athene_System
The repository was developed as a part of the Fake News Challenge Stage 1 (FNC-1 by team Athene: Andreas Hanselowski, Avinesh PVS, Benjamin Schiller and Felix Caspelherr. In the project, we worked in close collaboration with Debanjan Chaudhuri.
Our new paper in COLING 2018: A Retrospective Analysis of the Fake News Challenge Stance Detection Task
Our Blog Post on the Fake News Challenge.
Prof. Dr. Iryna Gurevych, AIPHES-Ubiquitous Knowledge Processing (UKP) Lab, TU-Darmstadt, Germany
Software dependencies
python >= 3.4.0 (tested with 3.4.0)
Install required python packages.
python3.4 -m pip install -r requirements.txt --upgrade
In order to reproduce the the results of our best submission to the FNC-1, please go to Athene_FNC-1 Google Drive and download the and and unzip them in respective folders.
unzip athene_system/data/fnc-1/features unzip athene_system/data/fnc-1/mlp_models
Parts of the Natural Language Toolkit (NLTK) might need to be installed manually.
python3.4 -c "import nltk;'stopwords');'punkt');'wordnet')"
Copy Word2Vec GoogleNews-vectors-negative300.bin.gz in folder athene_system/data/embeddings/google_news/
Download Paraphrase Database: Lexical XL Paraphrases 1.0 and extract it to the ppdb folder.
gunzip ppdb-1.0-xl-lexical.gz athene_system/data/ppdb/
To use the Stanford-parser an instance has to be started in parallel: Download Stanford CoreNLP, extract anywhere and execute following command:
wget java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9020
In order to reproduce the classification results of the best submission at the day of the FNC-1, it is mandatory to use tensorflow v0.9.0 (ideally GPU version) and the exact library versions stated in requirements.txt, including python 3.4.
Setup tested on Anaconda3 (tensorflow 0.9 gpu version)*
conda create -n env_python3.4 python=3.4 anaconda source activate env_python3.4 env_python3.4/bin/python3.4 -m pip install -r requirements.txt --upgrade env_python3.4/bin/python3.4 -m pip install --upgrade
To run the pre trained model and test
python -p ftest
For more details
python --help
e.g.: python -p crossv holdout ftrain ftest
* crossv: runs 10-fold cross validation on train / validation set and prints the results
* holdout: trains classifier on train and validation set, tests it on holdout set and prints the results
* ftrain: trains classifier on train/validation/holdout set and saves it to athene_systems/data/fnc-1/mlp_models
* ftest: predicts stances of unlabeled test set based on the model (see Installation, step 2)
After ftest was executed, the labeled stances will be saved to disk:
cat athene_system/data/fnc-1/fnc_results/submission.csv
A more detailed description of the system including the features, which have been used, can be found in the document: system_description_athene.pdf