Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts
Python Environment Configuration
Update the prefix
parameter in environment.yml
Build Python environment with following command
conda env create -f environment.yml
Download Llama-2-7b-chat-hf from https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
Move the downloaded Llama-2-7b-chat-hf to path base_models/llama2-hf-chat-7b
Move the OfflineRL dataset to path data/${offlinerl_dataset_name}
. We provide 2 toy datasets for testing: data/clevr_robot.npy
and data/meta_world.npy
Update the num_processes
parameter to the num of GPUs you want to use in config/ds_clevr.yaml
and config/ds_meta.yaml
Update the paths and CUDA_VISIBLE_DEVICES
in scripts/train_clevr.sh
and scripts/train_meta.sh
fine-tune the LLM
bash scripts/train_clevr.sh
bash scripts/train_meta.sh
Move the fine-tuned LLM to path finetuned_models/${model_name}
Generate rollouts with the fine-tuned LLM
python3 src/clevr_generate.py --model_path ${model_path} --prompt_path ${prompt_path} --output_path ${output_path} --level ${level}
python3 src/meta_generate.py --model_path ${model_path} --output_path ${output_path} --level ${level}
We provide 1 toy instruction prompt dataset for testing(Generation on Meta-World does not need dataset): data/clevr_rephrase_prompt.npy
python3 src/clevr_generate.py --model_path ${model_path} --prompt_path data/clevr_rephrase_prompt.npy --output_path ${output_path} --level rephrase_level
python3 src/meta_generate.py --model_path ${model_path} --output_path ${output_path} --level rephrase_level
Move the imaginary dataset to path data/${imaginary_dataset_name}
Train the OfflineRL policy with the OfflineRL dataset and imaginary datast
python3 src/clevr_offline_train.py --ds_type ${ds_type} --agent_name ${agent_name} --dataset_path ${dataset_path} --device ${device} --seed ${seed}
python3 src/meta_offline_train.py --ds_type ${ds_type} --agent_name ${agent_name} --dataset_path ${dataset_path} --device ${device} --seed ${seed}
We provide 2 toy offlineRL datasets for testing: data/clevr_robot.hdf5
and data/meta_world.hdf5
python3 src/clevr_offline_train.py --ds_type rephrase_level --agent_name ${agent_name} --dataset_path data/clevr_robot.hdf5 --device ${device} --seed ${seed}
python3 src/meta_offline_train.py --ds_type rephrase_level --agent_name ${agent_name} --dataset_path data/meta_world.hdf5 --device ${device} --seed ${seed}