Paper here
1. Loading a TensorFlow checkpoint (e.g. Google's pre-trained models)
You can convert any TensorFlow checkpoint for BERT (in particular the pre-trained models released by Google) in a PyTorch save file by using the convert_tf_checkpoint_to_pytorch.py
script.
This script takes as input a TensorFlow checkpoint (three files starting with bert_model.ckpt
) and the associated configuration file (bert_config.json
), and creates a PyTorch model for this configuration, loads the weights from the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that can be imported using torch.load()
.
You only need to run this conversion script once to get a PyTorch model. You can then disregard the TensorFlow checkpoint (the three files starting with bert_model.ckpt
) but be sure to keep the configuration file (bert_config.json
) and the vocabulary file (vocab.txt
) as these are needed for the PyTorch model too.
To run this specific conversion script you will need to have TensorFlow and PyTorch installed (pip install tensorflow
). The rest of the repository only requires PyTorch.
Here is an example of the conversion process for a pre-trained BERT-Base Uncased
model:
export BERT_BASE_DIR=/path/to/bert/multilingual_L-12_H-768_A-12
python3 convert_tf_checkpoint_to_pytorch.py \
--tf_checkpoint_path $BERT_BASE_DIR/bert_model.ckpt \
--bert_config_file $BERT_BASE_DIR/bert_config.json \
--pytorch_dump_path $BERT_BASE_DIR/pytorch_model.bin
You can download Google's pre-trained models for the conversion here.
There is used the BERT-Cased, Multilingual (recommended) in this solution.
This code was tested on Python 3.6. The requirements are:
- PyTorch (>= 0.4.1)
- tqdm
- tensorflow (for convertion)
To install the dependencies:
pip install -r ./requirements.txt
Solution located in notebook AGRR-2019-full.ipynb. There are two solutions:
- model trained only on train set;
- model trained on train and dev set.
You should run the first cell in notebook before all.
For train models you should run section 0. Parse data
in notebook.
In section you should specify your own paths:
- train_path - path to train.csv file;
- valid_path - path to valid.csv file;
- vocab_file - path to google bert pretrained vocab;
- bert_config_file - path to google bert pretrained config;
- init_checkpoint_pt - path to google bert pretrained weights.
After that run section 1. Create dataloaders
with changed paths.
Run section 2. Create model
.
Before run section 3. Create learner
you should specify argument best_model_path
(as your own path).
Run section 4. Learn your NER model
.
For get results on dev set run sections 5. Evaluate
and 6. To needle format
(don't forget change paths for dev dataset and parsed dev dataset).
I obtained the following results by your script:
Binary classification quality (f1-score): 0.9583778014941302
Gapping resolution quality (symbol-wise f-measure): 0.9576077060524616
Run section 6. To needle format
(if didn't run on prev step).
For get test predictions run section 7. Make prediction
(don't forget change paths for dev dataset and parsed dev dataset).
Run section 8. Merge train and dev
(don't forget change paths for dev dataset and parsed dev dataset).
Run sections 9. Train full
, 9.2. Create model
, 9.3. Create learner
(don't forget change paths for dev dataset and parsed dev dataset).
After that call only one (you can call all cells) cell with code learner.fit(num_epochs, target_metric='f1')
.
For get test prediction on run section 9.5 Make prediction with model trained on full data
(don't forget change paths for dev dataset and parsed dev dataset).