Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring

This repository houses the implementation of the paper titled "Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring," which has been accepted for presentation at EMNLP 2024 Findings.

Abstract

Generating rationales that justify scoring decisions has been a promising way to facilitate explainability in automated scoring systems. However, existing methods do not match the accuracy of classifier-based methods. Plus, the generated rationales often contain hallucinated information. To address these issues, we propose a novel framework capable of generating more faithful rationales and, more importantly, matching performance with classifier-based black-box scoring systems. We first mimic the human assessment process by querying Large Language Models (LLMs) to generate a thought tree. We then summarise intermediate assessment decisions from each thought tree path for creating synthetic rationale data and rationale preference data. Finally, we utilise the generated synthetic data to calibrate LLMs through a two-step training process: supervised fine-tuning and preference optimization. Extensive experimental results demonstrate that our framework achieves a 38% assessment performance improvement in the QWK score compared to prior work while producing higher-quality rationales, as recognised by human evaluators and LLMs. Our work sheds light on the effectiveness of performing preference optimization using synthetic preference data obtained from thought tree paths.

Open Source Contributions

We are thrilled to make our datasets and models accessible at all stages of our research. Explore our collections and models via the following links:

Usage Instructions

Environment Setup

conda env create -f environment.yml

Stage 1: Imitate Human Assessment Process via Thought Trees

1. Configure Thought Trees:

Edit configs/tot_query.yaml

2. Set your API keys for various services (Azure, OpenAI, Mistral, VLLM) to integrate LLM querying.

If you use Azure OpenAI api service: Add your api info in here.
If you use OpenAI API service: Add your api key in here.
If you use Mistral API service: Add your api key in here.
If you use VLLM local api sever: Change your configuration in here.
Using custom models: You probably need to change model list in here.

3. Generate Thought Trees

python query.py

Stage 2: Summarise Thought Tree Paths as Rationales

1. Configure Generation

Edit configs/generation.yaml

2. Generate Batch Query File

python generate.py

We utilize OpenAI’s batch API to generate synthetic data efficiently.

Stage 3: Calibrate LLMs to Generate Rationales

We used LLaMA-Factory(Thanks!) to train our models. Please refer to our example training scripts/configs: [train sft model] [train dpo model].

Cite Our Work

If you find our method useful, please cite our paper as follows:

@misc{li2024calibratingllmspreferenceoptimization,
      title={Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring}, 
      author={Jiazheng Li and Hainiu Xu and Zhaoyue Sun and Yuxiang Zhou and David West and Cesare Aloisi and Yulan He},
      year={2024},
      eprint={2406.19949},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2406.19949}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
datasets/asap		datasets/asap
tot_assessment		tot_assessment
.gitignore		.gitignore
README.md		README.md
baseline.py		baseline.py
environment.yml		environment.yml
generate.py		generate.py
query.py		query.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring

Abstract

Open Source Contributions

Usage Instructions

Environment Setup

Stage 1: Imitate Human Assessment Process via Thought Trees

1. Configure Thought Trees:

2. Set your API keys for various services (Azure, OpenAI, Mistral, VLLM) to integrate LLM querying.

3. Generate Thought Trees

Stage 2: Summarise Thought Tree Paths as Rationales

1. Configure Generation

2. Generate Batch Query File

Stage 3: Calibrate LLMs to Generate Rationales

Cite Our Work

About

Languages

lijiazheng99/thought_tree_assessment

Folders and files

Latest commit

History

Repository files navigation

Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring

Abstract

Open Source Contributions

Usage Instructions

Environment Setup

Stage 1: Imitate Human Assessment Process via Thought Trees

1. Configure Thought Trees:

2. Set your API keys for various services (Azure, OpenAI, Mistral, VLLM) to integrate LLM querying.

3. Generate Thought Trees

Stage 2: Summarise Thought Tree Paths as Rationales

1. Configure Generation

2. Generate Batch Query File

Stage 3: Calibrate LLMs to Generate Rationales

Cite Our Work

About

Resources

Stars

Watchers

Forks

Languages