Skip to content

Latest commit

 

History

History
59 lines (45 loc) · 2.13 KB

README.md

File metadata and controls

59 lines (45 loc) · 2.13 KB

PairJudgeRM

This repo is the official implementation of the paper "PairJudge RM: Perform Best-of-N Sampling with Knockout Tournament".

News

  • 2025-01-31: We have released the checkpoint of our PairJudgeRM model. You can download it from here.
  • 2025-01-31: We have released the training data of our PairJudgeRM model. You can download it from here.

Repository Structure

  • data/: contains the datasets used in the experiments.
  • PairJudge/: contains the source code of PairJudgeRM.
  • PairJudge/compare_resp.py: contains the implementation of PairJudgeRM.
  • PairJudge/knockout.py: contains the implementation of Knockout Tournament.

The checkpoint of our PairJudgeRM model is coming soon. Stay tuned!

Before that you can run the code will online llm api like gpt4o,claude-3.5-sonnet or gemini-1.5-flash

for example:

export PYTHONPATH=$PYTHONPATH:$(pwd)

# Define the input file
input_file=data/math-500/LLaMA-3.1-8B-Instruction_64.json

# Define the prompt template
prompt_template=prompts/compare_0_ex.md

# Define the base URL and API key
judge_model=gpt-4o
base_url="https://api.openai.com/v1"
api_key="YOUR_API_KEY"

# Run the Python script with the appropriate arguments
python pairwise/knockout.py \
    --model $judge_model \
    --input $input_file \
    --prompt_template $prompt_template \
    --base_url $base_url \
    --api_key $api_key \
    -n 64

If you want to run the code on our PairJudgeRM model, you can replace the judge_model with PairJudge-RM and base_url with http://localhost:8000/v1. One vllm server is needed to run the code.

Citation

If you find our work useful, please consider citing our paper:

@article{liu2025PairJudge,
  title={PairJudge RM: Perform Best-of-N Sampling with Knockout Tournament},
  author={Liu, Yantao and Yao, Zijun and Min, Rui and Cao, Yixin and Hou, Lei and Li, Juanzi},
  journal={arXiv preprint arXiv:2501.13007},
  year={2025},
  note={in progress work},
  url={https://doi.org/10.48550/arXiv.2501.13007}
}