Unknown detection in QA systems

This repository provides the code for unknown detection(including fine tuning models)

Download

I used the model on the huggingface.

Installation

Install requirements

$ pip install -r requirements.txt

Datasets

We provide a pre-processed version of benchmark datasets for each task as follows:

In domain
BioASQ
Out domain
SQuAD 1.1
MS MARCO: 5 domains(music, computing, law, film, finance) on question answering task.

Fine-tuning

SAVE_DIR=./output
DATA_DIR=''
TRAINDATA=BioASQ-train_split-factoid-7b.json
INDOMAIN=BioASQ-dev-factoid-7b.json
OUTDOMAIN=''
OFFICIAL_DIR=./scripts/bioasq_eval
BATCH_SIZE=16
LEARNING_RATE=5e-5
NUM_EPOCHS=3
MAX_LENGTH=384
SEED=42
CUDA_VISIBLE_DEVICES=2
MODEL_NAME=bert-base-cased
DOMAIN_TYPE=bioasq

# Finetuning with indomain data
$ python3 fine_tuning.py \
    --model_type ${MODEL_NAME} \
    --model_name_or_path ${MODEL_NAME} \
    --do_train \
    --train_file ${DATA_DIR}/${INDOMAIN} \
    --per_gpu_train_batch_size ${BATCH_SIZE} \
    --learning_rate ${LEARNING_RATE} \
    --num_train_epochs ${NUM_EPOCHS} \
    --max_seq_length ${MAX_LENGTH} \
    --seed ${SEED} \
    --output_dir ${SAVE_DIR}/${MODEL_NAME}/bioasq_in_domain \
    --overwrite_cache \
    --overwrite_output 

# Inference with indomain or outdomain data
$ python3 fine_tuning.py \
    --model_type ${MODEL_NAME} \
    --model_name_or_path ${SAVE_DIR}/${MODEL_NAME}/bioasq_in_domain \
    --do_eval \
    --predict_file ${DATA_DIR}/${TESTDATA} \
    --golden_file ${DATA_DIR}/7B_golden.json \
    --per_gpu_eval_batch_size ${BATCH_SIZE} \
    --max_seq_length ${MAX_LENGTH} \
    --seed ${SEED} \
    --official_eval_dir ${OFFICIAL_DIR} \
    --output_dir ${SAVE_DIR}/${MODEL_NAME}/${DOMAIN_TYPE}_bioasq_dev \
    --eval_all_checkpoints \
    --overwrite_cache \

or

$ sh fine_tuning.sh

Unknown detection

Get AUROC

python3 get_auroc.py --path 'your path that contains n_best_prediction.json '

Get AURC

python3 get_aurc.py --path 'your path that contains n_best_prediction.json '

Results

Indomain generalization

Out of domain detection(AUROC)

Out of domain detection(AURC)

Contact Information

For help or issues, please submit a GitHub issue. (ydaniel0826 (at) gmail.com)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
question-answering		question-answering
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unknown detection in QA systems

Download

Installation

Datasets

Fine-tuning

Unknown detection

Results

Contact Information

About

Releases

Packages

Languages

License

Ronalmoo/QA-unknown-detection

Folders and files

Latest commit

History

Repository files navigation

Unknown detection in QA systems

Download

Installation

Datasets

Fine-tuning

Unknown detection

Results

Contact Information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages