BCoQA

This repository contains the BCoQA dataset and evaluation scripts, which are part of our research on developing a robust Context-based Conversational Question Answering (CCQA) system for the Bangla language. The dataset is constructed by quality controlled machine translation and LLM based augmentation of established English CCQA datasets. The evaluation scripts provide a benchmark for assessing the performance of CCQA systems on the Bangla language.

The BCoQA dataset comprises over 14,000 conversations, featuring more than 140,000 question-answer pairs. Notably, the questions are conversational in nature, often requiring context from previous questions and answers to respond accurately. The answers are provided in free-form text, adding to the complexity and realism of the dataset.

BCoQA Paper

In progress.

Dataset

Through Huggingface datasets library

You can download the dataset directly from huggingface library using the following code snippet.

from datasets import load_dataset
bcoqa = load_dataset("arbitropy/bcoqa")

You can browse the dataset using this Huggingface link

Through JSON files

Alternatively, you can download the dataset directly from the following links to the JSON files:

Evaluation

To evaluate your model, go through the following steps:

Save your model predictions: Save your model's output in the same JSON format as the demo file.
Run the evaluation script: Use the official evaluation script) to evaluate your model. To run the script, use the following command:
```
python bcoqa-eval.py --test-file <path_to_test.json> --pred-file <path_to_predictions.json>.
```

Finetune

There are various ways to build a good CCQA system. One simple approach is to fine-tune a model by adding the conversation history to the context. The bt5-bcoqa-finetune.ipynb notebook demonstrates one of many ways to do this.

Benchmark

We fine-tuned three sequence-to-sequence models on the training set and assessed their performance using the test set and our evaluation script. Additionally, we conducted human evaluations to provide a reference point. The results are presented below.

Model Name	Parameter Count	Exact Match	F1 Score
HumanEval	---	68.1	79.2
banglat5	223M	33.2	46.4
mt5-base	580M	29.5	42.1
mbart-large-50	610M	26.8	36.2

Acknowledgement

We would like to extend our gratitude to the CoQA and QuAC teams for providing the original English datasets, which served as the foundation for our work. Additionally, we appreciate the CoQA team for sharing their base script, which we modified to create our own evaluation script. Their contributions have been invaluable to our project.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
bcoqa-eval.py		bcoqa-eval.py
bt5-bcoqa-finetune.ipynb		bt5-bcoqa-finetune.ipynb
demo_output_bcoqa.json		demo_output_bcoqa.json
stopwords.txt		stopwords.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BCoQA

Table of Contents

BCoQA Paper

Dataset

Through Huggingface datasets library

Through JSON files

Evaluation

Finetune

Benchmark

Acknowledgement

About

Releases

Packages

Languages

License

arbitropy/BCoQA

Folders and files

Latest commit

History

Repository files navigation

BCoQA

Table of Contents

BCoQA Paper

Dataset

Through Huggingface datasets library

Through JSON files

Evaluation

Finetune

Benchmark

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages