Skip to content

Ronalmoo/QA-unknown-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unknown detection in QA systems

This repository provides the code for unknown detection(including fine tuning models)

Download

I used the model on the huggingface.

Installation

Install requirements

$ pip install -r requirements.txt

Datasets

We provide a pre-processed version of benchmark datasets for each task as follows:

  • In domain
  • BioASQ
  • Out domain
  • SQuAD 1.1
  • MS MARCO: 5 domains(music, computing, law, film, finance) on question answering task.

Fine-tuning

SAVE_DIR=./output
DATA_DIR=''
TRAINDATA=BioASQ-train_split-factoid-7b.json
INDOMAIN=BioASQ-dev-factoid-7b.json
OUTDOMAIN=''
OFFICIAL_DIR=./scripts/bioasq_eval
BATCH_SIZE=16
LEARNING_RATE=5e-5
NUM_EPOCHS=3
MAX_LENGTH=384
SEED=42
CUDA_VISIBLE_DEVICES=2
MODEL_NAME=bert-base-cased
DOMAIN_TYPE=bioasq

# Finetuning with indomain data
$ python3 fine_tuning.py \
    --model_type ${MODEL_NAME} \
    --model_name_or_path ${MODEL_NAME} \
    --do_train \
    --train_file ${DATA_DIR}/${INDOMAIN} \
    --per_gpu_train_batch_size ${BATCH_SIZE} \
    --learning_rate ${LEARNING_RATE} \
    --num_train_epochs ${NUM_EPOCHS} \
    --max_seq_length ${MAX_LENGTH} \
    --seed ${SEED} \
    --output_dir ${SAVE_DIR}/${MODEL_NAME}/bioasq_in_domain \
    --overwrite_cache \
    --overwrite_output 

# Inference with indomain or outdomain data
$ python3 fine_tuning.py \
    --model_type ${MODEL_NAME} \
    --model_name_or_path ${SAVE_DIR}/${MODEL_NAME}/bioasq_in_domain \
    --do_eval \
    --predict_file ${DATA_DIR}/${TESTDATA} \
    --golden_file ${DATA_DIR}/7B_golden.json \
    --per_gpu_eval_batch_size ${BATCH_SIZE} \
    --max_seq_length ${MAX_LENGTH} \
    --seed ${SEED} \
    --official_eval_dir ${OFFICIAL_DIR} \
    --output_dir ${SAVE_DIR}/${MODEL_NAME}/${DOMAIN_TYPE}_bioasq_dev \
    --eval_all_checkpoints \
    --overwrite_cache \

or

$ sh fine_tuning.sh

Unknown detection

  • Get AUROC
python3 get_auroc.py --path 'your path that contains n_best_prediction.json '
  • Get AURC
python3 get_aurc.py --path 'your path that contains n_best_prediction.json '

Results

  • Indomain generalization

generalization

  • Out of domain detection(AUROC)

auroc

  • Out of domain detection(AURC)

AURC

Contact Information

For help or issues, please submit a GitHub issue. (ydaniel0826 (at) gmail.com)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published