GitHub - lafmdp/RLC: [ICLR'24] Official code for "Language Model Self-improvement by Reinforcement Learning Contemplation".

Reinforcement Learning Contemplation (RLC)

Official implementation for paper Language Model Self-improvement by Reinforcement Learning Contemplation, which is accepted at ICLR 2024.

How to run the code?

Install necessary dependencies.

Python version used for experiment is 3.10. Install packages for python:

pip install -r requirements.txt

Install trlx, which is built based on the open-sourced trlx repository for training LLMs with RL. Note that we made some changes to the original trlx.

cd ./trlx_ours
pip install torch==2.0.0 --extra-index-url https://download.pytorch.org/whl/cu116 # for cuda
pip install -e .

Run the code.

python RLC.py --ask_mode "standard_answer_reward" --is_chain_of_thought True --model_name google/flan-t5-large --bbh_set date_understanding

Optional arguments for running the main file.

--model_name: which model to load
--ask_mode: how to ask for judgement
--is_chain_of_thought: whether to use chain of thought technique
--dataset_name: The Specific dataset name:[TruthfulQA, CommonQA, BIG-Bench-Hard/bbh, human_annotations]
--llm_generate_mode: generate mode : (multinomial_sampling)
--few_shot_cot: number of CoT demonstrations to use (only for BigBench dataset)

Citations

Please cite the paper if you use RLC method or find the paper insightful. Feel free to contact the authors or open an issue if you have any questions.

@inproceedings{pang2024rlc,
  author       = {Jing{-}Cheng Pang and
                  Pengyuan Wang and
                  Kaiyuan Li and
                  Xiong{-}Hui Chen and
                  Jiacheng Xu and
                  Zongzhang Zhang and
                  Yang Yu},
  title        = {Language Model Self-improvement by Reinforcement Learning Contemplation},
  booktitle    = {The Twelfth International Conference on Learning Representations (ICLR)},
  year         = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
chatbot		chatbot
fig		fig
trlx_ours		trlx_ours
utils		utils
README.md		README.md
RLC.py		RLC.py
chatgpt_api.py		chatgpt_api.py
prompts.py		prompts.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning Contemplation (RLC)

How to run the code?

Citations

About

Releases

Packages

Languages

lafmdp/RLC

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning Contemplation (RLC)

How to run the code?

Citations

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages