Skip to content

Latest commit

 

History

History
60 lines (49 loc) · 2.75 KB

README.md

File metadata and controls

60 lines (49 loc) · 2.75 KB

NFLAT4NER

This is the code for the paper NFLAT: Non-Flat-Lattice Transformer for Chinese Named Entity Recognition.

Introduction

We advocate a novel lexical enhancement method, InterFormer, that effectively reduces the amount of computational and memory costs by constructing non-flat lattices. Furthermore, with InterFormer as the backbone, we implement NFLAT for Chinese NER. NFLAT decouples lexicon fusion and context feature encoding. Compared with FLAT, it reduces unnecessary attention calculations in "word-character" and "word-word". This reduces the memory usage by about 50% and can use more extensive lexicons or higher batches for network training.

Environment Requirement

The code has been tested under Python 3.7. The required packages are as follows:

torch==1.5.1
numpy==1.18.5
FastNLP==0.5.0
fitlog==0.3.2

you can click here to know more about FastNLP. And you can click here to know more about Fitlog.

Example to Run the Codes

  1. Download the pretrained character embeddings and word embeddings and put them in the data folder.

  2. Modify the utils/paths.py to add the pretrained embedding and the dataset.

  3. Long sentence clipping for MSRA and Ontonotes, run the command:

python sentence_clip.py
  1. Merging char embeddings and word embeddings:
python char_word_mix.py
  1. Model training and evaluation
    • Weibo dataset
    python main.py --dataset weibo
    • Resume dataset
    python main.py --dataset resume
    • Ontonotes dataset
    python main.py --dataset ontonotes
    • MSRA dataset
    python main.py --dataset msra

Acknowledgements