- Update the
prefix
parameter inenvironment.yml
- Build Python environment with following command
conda env create -f environment.yml
- Download Llama-2-7b-chat-hf from https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
- Move the downloaded Llama-2-7b-chat-hf to path
base_models/llama2-hf-chat-7b
- Move the OfflineRL dataset to path
data/${offlinerl_dataset_name}
. We provide 2 toy datasets for testing:data/clevr_robot.npy
anddata/meta_world.npy
- Update the
num_processes
parameter to the num of GPUs you want to use inconfig/ds_clevr.yaml
andconfig/ds_meta.yaml
- Update the paths and
CUDA_VISIBLE_DEVICES
inscripts/train_clevr.sh
andscripts/train_meta.sh
- fine-tune the LLM
- CLEVR-Robot
bash scripts/train_clevr.sh
- Meta-World
bash scripts/train_meta.sh
- Move the fine-tuned LLM to path
finetuned_models/${model_name}
- Generate rollouts with the fine-tuned LLM
- CLEVR-Robot
python3 src/clevr_generate.py --model_path ${model_path} --prompt_path ${prompt_path} --output_path ${output_path} --level ${level}
- Meta-World
python3 src/meta_generate.py --model_path ${model_path} --output_path ${output_path} --level ${level}
- We provide 1 toy instruction prompt dataset for testing(Generation on Meta-World does not need dataset):
data/clevr_rephrase_prompt.npy
python3 src/clevr_generate.py --model_path ${model_path} --prompt_path data/clevr_rephrase_prompt.npy --output_path ${output_path} --level rephrase_level
- Meta-World
python3 src/meta_generate.py --model_path ${model_path} --output_path ${output_path} --level rephrase_level
- Move the imaginary dataset to path
data/${imaginary_dataset_name}
- Train the OfflineRL policy with the OfflineRL dataset and imaginary datast
- CLEVR-Robot
python3 src/clevr_offline_train.py --ds_type ${ds_type} --agent_name ${agent_name} --dataset_path ${dataset_path} --device ${device} --seed ${seed}
- Meta-World
python3 src/meta_offline_train.py --ds_type ${ds_type} --agent_name ${agent_name} --dataset_path ${dataset_path} --device ${device} --seed ${seed}
- We provide 2 toy offlineRL datasets for testing:
data/clevr_robot.hdf5
anddata/meta_world.hdf5
- CLEVR-Robot
python3 src/clevr_offline_train.py --ds_type rephrase_level --agent_name ${agent_name} --dataset_path data/clevr_robot.hdf5 --device ${device} --seed ${seed}
- Meta-World
python3 src/meta_offline_train.py --ds_type rephrase_level --agent_name ${agent_name} --dataset_path data/meta_world.hdf5 --device ${device} --seed ${seed}