QARank is licensed under ASL 2.0 and other lenient licenses, allowing its use for academic and commercial purposes without restrictions.
- Download the jar file of the project from here.
- Alternatively, download the zip of the java project and import it as a maven project in eclipse for experimentation.
- QARank requires a training file, a test file and unannotated data to train models.
- The system was trained and tested on SemEval 2016 - Task 3: Community Question Answering - Subtask A data.
- Create a directory named xml_files in your local machine.
- The training+dev data can be downloaded from here.
- The original test data can be downloaded from here.
- After unzipping this folder, move to
semeval2016-task3-cqa-ql-traindev-v3.2/v3.2/train/
. The entire training data for Task 3 can be found here. - Choose any of the subtask A train files for training and copy it to xml_files directory. Rename the training file train.xml.
- Alternatively, combine various training xml files into one file train.xml for larger training data. Make sure to preserve the XML tree structure while doing this.
- Similarly, choose one of the subtask A files in
semeval2016-task3-cqa-ql-traindev-v3.2/v3.2/dev/
orsemeval2016_task3_tests/SemEval2016_task3_test/English/
as test data and rename it test.xml. - The unannotated data can be downloaded from here.
- After unzipping the this folder, move to
QL-unannotated-data-subtaskA.xml/QL-unannotated-data-subtaskA.xml
and copy this file to xml_files directory and rename it unannotated.xml. - This file is large and requires a lot of memory to train models. To avoid larger training time, one can use training data for the same task. Make sure to rename the file to unannotated.xml.
- Download the python scripts required to run the system from here.
- Unzip this
resources
folder in a suitable place.
- Unzip this
- The trained word embeddings on the large unannotated data can be found here.
- Run QARank jar as
java -Xmx10g -jar QARank.jar [absolute-path-to-xml_files-folder] [absolute-path-to-resources-folder]
- The system will generate all folders and required files.
- The final MAP scores of the system and the SVM accuracy can be found in result_files/final_scores.txt file.
- Users can run the system on a different dataset, given the training and test files are in the format as in SemEval 2016 - Task 3.
- The evaluation scripts used in the system can be looked up here.