Make sure to have followed the installation instructions contained in README before proceeding.
This project is based off of the AllenNLP library (version 1.3.0).
In order to train a model, place train.tsv
, dev.tsv
, test.tsv
files under a subfolder of data/
(e.g., data/my-dataset/train.tsv
etc).
The file structure is pretty simple: every line must contain the label and the lemma/gloss pair to be fed to the model, separated by the tab character. For example:
POLITICS_GOVERNMENT_AND_NOBILITY [TAB] royal family | Royal persons collectively
Note that the usage of lemmas is not mandatory, since the model works even with glosses only. In the example above, lemma and gloss are separated by means of a pipe character, otherwise, the example would look like the following:
POLITICS_GOVERNMENT_AND_NOBILITY [TAB] Royal persons collectively
Once the training data is ready, you can start training by executing:
python src/main.py <data-folder-name>
Where, in case your dataset files were placed under data/my-dataset
, <data-folder-name>
is my-dataset.
Once the model has finished training, all the training files will be saved under models/trained/<data-folder-name>
. In case you wish to try an interactive version of the trained model, you can launch it via python src/serve.py trained/<data-folder-name>
.
Download the WordNet-based model at https://drive.google.com/drive/folders/1v_sIGDcdx-KT6szYEo2DAeNq9NqM0ABs?usp=sharing and place it under the models/released/
folder.
To run a simple interactive command-line demo, run the following command:
python src/serve.py released/wn
To tag a file in the tsv format described above (in case you only have raw sentences, simply prepend NODOMAIN\t
to every line of the file), run the following command:
allennlp predict models/released/wn.tar.gz <path/to/file.tsv> --output-file <path/to/output.jsonl> --batch-size <batch-size> --cuda-device 0 --use-dataset-reader --include-package src.allen_elements --silent
For further information about the command, check out the AllenNLP Documentation page or type allennlp predict --help
.