Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts

Python Environment Configuration

Update the prefix parameter in environment.yml
Build Python environment with following command

conda env create -f environment.yml

LLM Grounding

Download Llama-2-7b-chat-hf from https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
Move the downloaded Llama-2-7b-chat-hf to path base_models/llama2-hf-chat-7b
Move the OfflineRL dataset to path data/${offlinerl_dataset_name}. We provide 2 toy datasets for testing: data/clevr_robot.npy and data/meta_world.npy
Update the num_processes parameter to the num of GPUs you want to use in config/ds_clevr.yaml and config/ds_meta.yaml
Update the paths and CUDA_VISIBLE_DEVICES in scripts/train_clevr.sh and scripts/train_meta.sh
fine-tune the LLM

CLEVR-Robot

bash scripts/train_clevr.sh

Meta-World

bash scripts/train_meta.sh

Rollout Generation

Move the fine-tuned LLM to path finetuned_models/${model_name}
Generate rollouts with the fine-tuned LLM

CLEVR-Robot

python3 src/clevr_generate.py --model_path ${model_path} --prompt_path ${prompt_path} --output_path ${output_path} --level ${level}

Meta-World

python3 src/meta_generate.py --model_path ${model_path} --output_path ${output_path} --level ${level}

We provide 1 toy instruction prompt dataset for testing(Generation on Meta-World does not need dataset): data/clevr_rephrase_prompt.npy

python3 src/clevr_generate.py --model_path ${model_path} --prompt_path data/clevr_rephrase_prompt.npy --output_path ${output_path} --level rephrase_level

Meta-World

python3 src/meta_generate.py --model_path ${model_path} --output_path ${output_path} --level rephrase_level

OfflineRL Training

Move the imaginary dataset to path data/${imaginary_dataset_name}
Train the OfflineRL policy with the OfflineRL dataset and imaginary datast

CLEVR-Robot

python3 src/clevr_offline_train.py --ds_type ${ds_type} --agent_name ${agent_name} --dataset_path ${dataset_path} --device ${device} --seed ${seed}

Meta-World

python3 src/meta_offline_train.py --ds_type ${ds_type} --agent_name ${agent_name} --dataset_path ${dataset_path} --device ${device} --seed ${seed}

We provide 2 toy offlineRL datasets for testing: data/clevr_robot.hdf5 and data/meta_world.hdf5

CLEVR-Robot

python3 src/clevr_offline_train.py --ds_type rephrase_level --agent_name ${agent_name} --dataset_path data/clevr_robot.hdf5 --device ${device} --seed ${seed}

Meta-World

python3 src/meta_offline_train.py --ds_type rephrase_level --agent_name ${agent_name} --dataset_path data/meta_world.hdf5 --device ${device} --seed ${seed}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts

Python Environment Configuration

LLM Grounding

Rollout Generation

OfflineRL Training

Files

README.md

Latest commit

History

README.md

File metadata and controls

Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts

Python Environment Configuration

LLM Grounding

Rollout Generation

OfflineRL Training