Skip to content

Latest commit

 

History

History
59 lines (37 loc) · 2.55 KB

README.md

File metadata and controls

59 lines (37 loc) · 2.55 KB

Transformer-based-TWG-parsing

Statistical Parsing for Tree Wrapping Grammars with Transformer-based supertagging and A-star parsing

This the repository for the experiments for the LREC 2022 submission with the title "RRGparbank: A Parallel Role and Reference Grammar Treebank"

Installation

Install ParTAGe-TWG.

Also install the packages from the requirements.txt file.

The code works with the Python version 3.9

Download language model

Here is the list of language models described in LREC paper:

Use downloaded model

Unzip the downloaded model and rename the folder with the unzipped model to "best_model".

Parse sentences

Parse a file with sentences using the file parse_twg.

It takes two arguments - input file with plain sentences and output file.

Please take a look at the example input and output files:

python parse_twg.py example_input_file.txt example_output_file.txt

The output format of the output file is discbracket (discontinuous bracket trees). Read more about this format here.

Please note that for the French model you need to rename the model name from "bert" to "camembert":

language_model = NERModel(
    "bert", "best_model", use_cuda=device # for French, replace "bert" with "camembert"
)

To use DistilBERT model, rename the model name from "bert" to "distilbert":

language_model = NERModel(
    "distilbert", "best_model", use_cuda=device 
)