Official code for paper Towards Interpreting BERT for Reading Comprehension Based QA.
This code was developed on top of the official BERT code released by Google.
The base model checkpoints for BERT can be downloaded from the official git repository. We used uncased_L-12_H-768_A-12
, which has the following configuration: 12-layer, 768-hidden, 12-heads, 110M parameters
. All codes have to be run inside the bert
directory.
We used the SQuAD-V1 and Self-RC (DuoRC) datasets - all data processing is taken care of by the code itself, before training or evaluation.
We used tensorflow-v1.14
with Python 2 for training/evaluation, and Python 3 for all analyses.
Use run_squad_infer.py
for SQuAD and duorc_infer.py
for DuoRC.
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
export SQUAD_DIR=/path/to/json-datasets
python -u run_squad_infer.py
--vocab_file=$BERT_BASE_DIR/vocab.txt \
--bert_config_file=$BERT_BASE_DIR/bert_config.json \
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
--do_train=False \
--train_file=$SQUAD_DIR/train-json-file \
--do_predict=True \
--predict_file=$SQUAD_DIR/dev-json-file \
--train_batch_size=6 \
--learning_rate=3e-5 \
--num_train_epochs=2.0 \
--max_seq_length=384 \
--doc_stride=128 \
--output_dir=/path/to/new/model/folder
Use evaluate-v2.0.py
for SQuAD and duorc_evaluate-v2.0.py
for DuoRC.
python -u evaluate-v2.0.py /path/to/dataset/json /path/to/predictions/json --checkpoint_dir=/path/to/checkpoints/folder
- set layer number from 0 to 11 in
do_integrated_grad
predict_batch_size
must always be set to 1 in this code- The IG scores are stored in a folder called
ig_scores
(for SQuAD) andig_scores_duorc
(for DuoRC), in files named in the formatimportance_scores_ig_layer_number.npy
- Embeddings for the first 200 datapoints are stored in a folder called
embs
(for SQuAD) andembs_duorc
(for DuoRC), in files named in the formatemb_enclayer_layer_number.npy
- The tokenized 'QN [SEP] PASSAGE' will be stored in
output_dir/qn_and_doc_tokens.npy
- Use
bert_dec_flips.py
for SQuAD andduorc_dec_flips.py
for DuoRC (since the datasets need to be processed differently).
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
export SQUAD_DIR=/path/to/json-datasets
python -u bert_dec_flips.py \
--vocab_file=$BERT_BASE_DIR/vocab.txt \
--bert_config_file=$BERT_BASE_DIR/bert_config.json \
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
--do_train=False \
--train_file=$SQUAD_DIR/train-json-file \
--do_predict=True \
--predict_file=$SQUAD_DIR/dev-json-file \
--train_batch_size=6 \
--learning_rate=3e-5 \
--num_train_epochs=2.0 \
--max_seq_length=384 \
--doc_stride=128 \
--predict_batch_size=1 \
--do_integrated_grad=11 \
--output_dir=/path/to/new/model/folder
Use jensen_shannon.py
and graph_js.py
.
Use tsne.py
(this code uses the embeddings saved by the IG scores code above).
Use quantifier_questions_analysis.py
.
For any questions, please contact [email protected].