RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts

Source code for our paper :
RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts

Click the links below to view our papers, checkpoints:

If you find this work useful, please cite our paper and give us a shining star 🌟

@article{wu2025rankcotrefiningknowledgeretrievalaugmented,
      title={RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts}, 
      author={Mingyan Wu and Zhenghao Liu and Yukun Yan and Xinze Li and Shi Yu and Zheni Zeng and Yu Gu and Ge Yu},
      year={2025},
      eprint={2502.17888},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.17888}, 
}

Overview

RankCoT is a knowledge refinement method that incorporates reranking signals in generating CoT-based summarization for knowledge refinement based on given query and all retrieval documents. During training, RankCoT prompts the LLM to generate Chain-of-Thought (CoT) candidates based on the query and individual documents. It then fine-tunes the LLM to directly reproduce the best CoT from these candidate outputs based on all retrieved documents, which requires LLM to filter out irrelevant documents during generating CoT-style summarization. Additionally, RankCoT incorporates a self-reflection mechanism that further refines the CoT outputs, resulting in higher-quality training data.

Set Up

Use git clone to download this project

git clone https://github.com/NEUIR/RankCoT.git
cd RankCoT

To prevent conflicts between packages, we mainly use two virtual environment management packages, one for model inference and one for model training.

for model inference, please:
conda env create -n llama3_inf -f inference_environment.yml

for model training, please:
conda env create -n llama3_ft -f training_environment.yml

Data

Download the files from here and place them in the data/ directory.

data/
- retriever_train_4000_noread_psg_modify10passage.jsonl/ # ❗️Note: We modified the data format so that one question corresponds to ten lines of data, and these ten lines of data correspond to different related documents.
- test_data/ # test data in our experiments

Using RankCoT model

(1) Use git clone to download the model: ❗️Note: This is a lora checkpoint of RankCoT, please merge it before use.

git clone https://huggingface.co/MignonMiyoung/RankCoT

(2) Use RankCoT model to refine the knowledge:

conda activate llama3_inf
python src/answer_generation/querypassage_to_CoT.py \
--model_path  # The path to RankCoT model \
--data_path # e.g. nq_modify10passage \
--output_name # e.g. nq_querypassage_to_CoT.jsonl
--max_psg_length 1500

(3) Question answering:

python src/answer_generation/queryCoT_to_answer.py \
--model_path  # e.g. Meta-Llama-3-8B-Instruct \
--data_path # e.g. nq_querypassage_to_CoT.jsonl \
--output_name # e.g. nq_queryCoT_to_answer.jsonl

For different tasks, you need to set different generation max tokens and different templates:

TASK	max tokens	template	metrics
NQ	32	QA_queryCoT_to_answer	accuracy
TriviaQA	32	QA_queryCoT_to_answer	accuracy
HotpotQA	32	QA_queryCoT_to_answer	accuracy
PopQA	32	QA_queryCoT_to_answer	accuracy
ASQA	200	QA_queryCoT_to_answer_forasqa	str-em
MARCO QA	100	QA_queryCoT_to_answer_forrouge	rouge

(4) Evaluating For different tasks, you need to use different metrics for evaluating. We use different evaluation files to evaluate different tasks, and only one dataset is allowed at a time.

for accuracy metric, please:
python src/answer_generation/evaluate.py

for str-em metric, please:
python src/answer_generation/evaluate_forasqa.py

for rouge metric, please:
python src/answer_generation/evaluate_forrouge.py

Training RankCoT

Constructing training data

(1) CoT data generation

conda activate llama3_inf
python src/CoTdata_generation/querypassage_to_CoT.py \
--model_path  # e.g. Meta-Llama-3-8B-Instruct \
--data_path # e.g.  data/retriever_train_4000_noread_psg_modify10passage.jsonl \
--output_name # e.g. querypassage_to_CoT.jsonl
--max_psg_length 1500

(2) CoT refinement through self-reflection

python src/answer_generation/queryCoT_to_answer.py \
--model_path  # e.g. Meta-Llama-3-8B-Instruct \
--data_path # e.g. querypassage_to_CoT.jsonl \
--output_name # e.g. queryCoT_to_answer.jsonl

(3) Constructing preference data

python src/modelft/COT_MODELANSWER_dpodata_gen.py

(4) Filter invalid data

python src/modelft/select_notnone_data.py

(5) Data ratio division

python src/modelft/dataset_partitioning_dataprocess.py

Training the model

After constructing the training data, you can start training the RankCoT model.

(1) First step: You need to download Llama3-8B-Instruct model as Knowledge Refinement Model.

(2) Second step: use lora to train the model

conda activate llama3_ft
bash scripts/lora_dpo_llama.sh

(3) Third step: Select the checkpoint with the lowest eval loss, and combine the weights of the RankCoT model trained using lora in Second step.

python src/modelft/merge_model.py

Contact

If you have questions, suggestions, and bug reports, please email:

[email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
figs		figs
scripts		scripts
src		src
LICENSE		LICENSE
README.md		README.md
inference_environment.yml		inference_environment.yml
training_environment.yml		training_environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts

Overview

Set Up

Data

Using RankCoT model

Training RankCoT

Constructing training data

Training the model

Contact

About

Languages

License

NEUIR/RankCoT

Folders and files

Latest commit

History

Repository files navigation

RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts

Overview

Set Up

Data

Using RankCoT model

Training RankCoT

Constructing training data

Training the model

Contact

About

Resources

License

Stars

Watchers

Forks

Languages