Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
bert_base.json		bert_base.json
deberta_base.json		deberta_base.json
deberta_xlarge.json		deberta_xlarge.json
deberta_xxlarge.json		deberta_xxlarge.json
mlm.sh		mlm.sh
prepare_data.py		prepare_data.py

README.md

Pre-train an efficient transformer language model for natual language understanding

Data

We use wiki103 data as example, which is publicly available. It contains three text files, train.txt, valid.txt and text.txt. We use train.txt to train the model and valid.txt to evalute the intermeidate checkpoints. We first need to run prepara_data.py to tokenize these text files. We concatenate all documents into a single text and split it into lines of tokens, while each line has at most 510 token (2 tokens are left to special tokens [CLS] and [SEP]).

Pre-training with Masked Language Modeling task

Run mlm.sh to train a bert like model which uses MLM as the pre-training task. For example,

mlm.sh bert-base will train a bert base model which uses absolute position encoding

mlm.sh deberta-base will train a deberta base model which uses Disentangled Attention

Pre-training with Replaced Token Detection task

Coming soon...

Distributed training

To train with multiple node, you need to specify three environment variables,

WORLD_SIZE - Total nodes that are used for the training

MASTER_ADDR - The IP address or host name of the master node

MASTER_PORT - The port of the master node

RANK - The rank of current node

For example, to run train a model with 2 nodes,

On node0,

export WORLD_SIZE=2
export MASTER_ADDR=node0
export MASTER_PORT=7488
export RANK=0
./rtd.sh deberta-v3-xsmall

On node1,

export WORLD_SIZE=2
export MASTER_ADDR=node0
export MASTER_PORT=7488
export RANK=1
./rtd.sh deberta-v3-xsmall

Model config options

relative_attention Whether to used relative attention
pos_att_type Relative position encoding type
- P2C Postion to content attention
- C2P Content to position attention

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

language_model

language_model

README.md

Pre-train an efficient transformer language model for natual language understanding

Data

Pre-training with Masked Language Modeling task

Pre-training with Replaced Token Detection task

Distributed training

Model config options

Files

language_model

Directory actions

More options

Directory actions

More options

Latest commit

History

language_model

Folders and files

parent directory

README.md

Pre-train an efficient transformer language model for natual language understanding

Data

Pre-training with Masked Language Modeling task

Pre-training with Replaced Token Detection task

Distributed training

Model config options