Please run the following python code:
import stanza'en')
import nltk'stopwords')'wordnet')'averaged_perceptron_tagger')
You will need to provide a JSON file containing your whole dataset (train+dev+test) with the following schema:
"facts": List[String],
"base_question": String,
"target_question": String
Every string should be tokenized and lower cased. An example showing how to go about
creating this file can be found in data_processing.data_generator.generate_repeat_q_squad_raw
Next, you will need to run models.repeat_q
in preprocessing
mode, passing in argument
the path to the JSON file mentioned above. This will create a vocabulary file and optionally
an embedding matrix file for you.
You can now train the model using models.repeat_q
in training
mode. Please refer to the arguments' descriptions for
more information by running:
python -m models.repeat_q --help
Please store you api key in a file ".gkg_api_key" located at the root directory
To pre-process SG DQG data, you'll need to run python -m spacy download en_core_web_sm
prior to doing anything.
To run anything related to the NQG model, you'll want to use the script models/
Command: train
Description: Trains the model using data located at /data/processed/nqg. This directory shall contain two subdirectories "dev" and "train". The content of these directories shall follow the format used by the original NQG team:, even though the original dataset can be any of your liking and the NER and POS features can be modified as well to use any convention/tool.
--vocab_size num
: prune the vocabulary of the dataset to the required number of words "num". Optional; Default value:
Command: generate_data
Description: Generates the necessary data from the raw SQuAD dataset to train the NQG model on. The SQuAD 1.1 data files shall be stored at /data/squad_dataset.
Command: Use
from the NQG repository with any .pt file storing
your trained model.
Command: beam_search
Description: Makes predictions for the SQuAD dev set (see data format from section train).
--model_path path
: path to a .pt trained model file.
- Follow instructions 1 from:
- Have the GloVe embedding .txt file in /models/pretrained/ and run