Skip to content
/ RankCoT Public

Source code for our paper ''RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts''

License

Notifications You must be signed in to change notification settings

NEUIR/RankCoT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts

Source code for our paper :
RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts

Click the links below to view our papers, checkpoints:

If you find this work useful, please cite our paper and give us a shining star 🌟

@article{wu2025rankcotrefiningknowledgeretrievalaugmented,
      title={RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts}, 
      author={Mingyan Wu and Zhenghao Liu and Yukun Yan and Xinze Li and Shi Yu and Zheni Zeng and Yu Gu and Ge Yu},
      year={2025},
      eprint={2502.17888},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.17888}, 
}

Overview

RankCoT is a knowledge refinement method that incorporates reranking signals in generating CoT-based summarization for knowledge refinement based on given query and all retrieval documents. During training, RankCoT prompts the LLM to generate Chain-of-Thought (CoT) candidates based on the query and individual documents. It then fine-tunes the LLM to directly reproduce the best CoT from these candidate outputs based on all retrieved documents, which requires LLM to filter out irrelevant documents during generating CoT-style summarization. Additionally, RankCoT incorporates a self-reflection mechanism that further refines the CoT outputs, resulting in higher-quality training data.

Set Up

Use git clone to download this project

git clone https://github.com/NEUIR/RankCoT.git
cd RankCoT

To prevent conflicts between packages, we mainly use two virtual environment management packages, one for model inference and one for model training.

for model inference, please:
conda env create -n llama3_inf -f inference_environment.yml

for model training, please:
conda env create -n llama3_ft -f training_environment.yml

Data

Download the files from here and place them in the data/ directory.

data/
- retriever_train_4000_noread_psg_modify10passage.jsonl/ # ❗️Note: We modified the data format so that one question corresponds to ten lines of data, and these ten lines of data correspond to different related documents.
- test_data/ # test data in our experiments

Using RankCoT model

(1) Use git clone to download the model: ❗️Note: This is a lora checkpoint of RankCoT, please merge it before use.

git clone https://huggingface.co/MignonMiyoung/RankCoT

(2) Use RankCoT model to refine the knowledge:

conda activate llama3_inf
python src/answer_generation/querypassage_to_CoT.py \
--model_path  # The path to RankCoT model \
--data_path # e.g. nq_modify10passage \
--output_name # e.g. nq_querypassage_to_CoT.jsonl
--max_psg_length 1500

(3) Question answering:

python src/answer_generation/queryCoT_to_answer.py \
--model_path  # e.g. Meta-Llama-3-8B-Instruct \
--data_path # e.g. nq_querypassage_to_CoT.jsonl \
--output_name # e.g. nq_queryCoT_to_answer.jsonl

For different tasks, you need to set different generation max tokens and different templates:

TASK max tokens template metrics
NQ 32 QA_queryCoT_to_answer accuracy
TriviaQA 32 QA_queryCoT_to_answer accuracy
HotpotQA 32 QA_queryCoT_to_answer accuracy
PopQA 32 QA_queryCoT_to_answer accuracy
ASQA 200 QA_queryCoT_to_answer_forasqa str-em
MARCO QA 100 QA_queryCoT_to_answer_forrouge rouge

(4) Evaluating For different tasks, you need to use different metrics for evaluating. We use different evaluation files to evaluate different tasks, and only one dataset is allowed at a time.

for accuracy metric, please:
python src/answer_generation/evaluate.py

for str-em metric, please:
python src/answer_generation/evaluate_forasqa.py

for rouge metric, please:
python src/answer_generation/evaluate_forrouge.py

Training RankCoT

Constructing training data

(1) CoT data generation

conda activate llama3_inf
python src/CoTdata_generation/querypassage_to_CoT.py \
--model_path  # e.g. Meta-Llama-3-8B-Instruct \
--data_path # e.g.  data/retriever_train_4000_noread_psg_modify10passage.jsonl \
--output_name # e.g. querypassage_to_CoT.jsonl
--max_psg_length 1500

(2) CoT refinement through self-reflection

python src/answer_generation/queryCoT_to_answer.py \
--model_path  # e.g. Meta-Llama-3-8B-Instruct \
--data_path # e.g. querypassage_to_CoT.jsonl \
--output_name # e.g. queryCoT_to_answer.jsonl

(3) Constructing preference data

python src/modelft/COT_MODELANSWER_dpodata_gen.py

(4) Filter invalid data

python src/modelft/select_notnone_data.py

(5) Data ratio division

python src/modelft/dataset_partitioning_dataprocess.py

Training the model

After constructing the training data, you can start training the RankCoT model.

(1) First step: You need to download Llama3-8B-Instruct model as Knowledge Refinement Model.

(2) Second step: use lora to train the model

conda activate llama3_ft
bash scripts/lora_dpo_llama.sh

(3) Third step: Select the checkpoint with the lowest eval loss, and combine the weights of the RankCoT model trained using lora in Second step.

python src/modelft/merge_model.py

Contact

If you have questions, suggestions, and bug reports, please email:

About

Source code for our paper ''RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts''

Resources

License

Stars

Watchers

Forks