Skip to content

Latest commit

 

History

History
executable file
·
61 lines (45 loc) · 1.97 KB

README.md

File metadata and controls

executable file
·
61 lines (45 loc) · 1.97 KB

Reinforcement Learning Contemplation (RLC)


Official implementation for paper Language Model Self-improvement by Reinforcement Learning Contemplation, which is accepted at ICLR 2024.

How to run the code?


  1. Install necessary dependencies.

Python version used for experiment is 3.10. Install packages for python:

pip install -r requirements.txt

Install trlx, which is built based on the open-sourced trlx repository for training LLMs with RL. Note that we made some changes to the original trlx.

cd ./trlx_ours
pip install torch==2.0.0 --extra-index-url https://download.pytorch.org/whl/cu116 # for cuda
pip install -e .
  1. Run the code.
python RLC.py --ask_mode "standard_answer_reward" --is_chain_of_thought True --model_name google/flan-t5-large --bbh_set date_understanding
  1. Optional arguments for running the main file.
  • --model_name: which model to load
  • --ask_mode: how to ask for judgement
  • --is_chain_of_thought: whether to use chain of thought technique
  • --dataset_name: The Specific dataset name:[TruthfulQA, CommonQA, BIG-Bench-Hard/bbh, human_annotations]
  • --llm_generate_mode: generate mode : (multinomial_sampling)
  • --few_shot_cot: number of CoT demonstrations to use (only for BigBench dataset)

Citations

Please cite the paper if you use RLC method or find the paper insightful. Feel free to contact the authors or open an issue if you have any questions.

@inproceedings{pang2024rlc,
  author       = {Jing{-}Cheng Pang and
                  Pengyuan Wang and
                  Kaiyuan Li and
                  Xiong{-}Hui Chen and
                  Jiacheng Xu and
                  Zongzhang Zhang and
                  Yang Yu},
  title        = {Language Model Self-improvement by Reinforcement Learning Contemplation},
  booktitle    = {The Twelfth International Conference on Learning Representations (ICLR)},
  year         = {2024}
}