Code for paper Text2Reward: Reward Shaping with Language Models for Reinforcement Learning. Please refer to our project page for more demonstrations and up-to-date related resources.
To establish the environment, run this code in the shell:
# set up conda
conda create -n text2reward python=3.7
conda activate text2reward
# set up ManiSkill2 environment
cd ManiSkill2
pip install -e .
pip install stable-baselines3==1.8.0 wandb tensorboard
cd ..
cd run_maniskill
bash download_data.sh
# set up MetaWorld environment
cd ..
cd Metaworld
pip install -e .
# set up code generation
pip install langchain chromadb==0.4.0
- If you have not installed
mujoco
yet, please follow the instructions from here to install it. After that, please try the following commands to confirm the successful installation:
$ python3
>>> import mujoco_py
- If you encounter the following errors when running ManiSkill2, we refer you to read the documents here.
RuntimeError: vk::Instance::enumeratePhysicalDevices: ErrorInitializationFailed
Some required Vulkan extension is not present. You may not use the renderer to render, however, CPU resources will be still available.
Segmentation fault (core dumped)
To reimplement our experiment results, you can run the following scripts:
ManiSkill2:
bash run_oracle.sh
bash run_zero_shot.sh
bash run_few_shot.sh
It's normal to encounter the following warnings:
[svulkan2] [error] GLFW error: X11: The DISPLAY environment variable is missing
[svulkan2] [warning] Continue without GLFW.
MetaWorld:
bash run_oracle.sh
bash run_zero_shot.sh
Firstly please add the following environment variable to your .bashrc
(or .zshrc
, etc.).
export PYTHONPATH=$PYTHONPATH:~/path/to/text2reward
Then navigate to the directory text2reward/code_generation/single_flow
and run the following scripts:
# generate reward code for Maniskill
bash run_maniskill_zeroshot.sh
bash run_maniskill_fewshot.sh
# generate reward code for MetaWorld
bash run_metaworld_zeroshot.sh
By default, the run_oracle.sh
script above uses the expert-written rewards provided by the environment; the run_zero_shot.sh
and run_few_shot.sh
scripts use the generated rewards used in our experiments. If you want to run a new experiment based on the reward you provide, just follow the bash script above and modify the --reward_path
parameter to the path of your own reward.
If you find our work helpful, please cite us:
@inproceedings{xietext2reward,
title={Text2Reward: Reward Shaping with Language Models for Reinforcement Learning},
author={Xie, Tianbao and Zhao, Siheng and Wu, Chen Henry and Liu, Yitao and Luo, Qian and Zhong, Victor and Yang, Yanchao and Yu, Tao},
booktitle={The Twelfth International Conference on Learning Representations}
}