Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning
[arxiv], Accepted at NeurIPS 2021.
This codebase includes inference-based off-policy algorithms, both KL control (SAC) and EM control (MPO, AWR, AWAC) methods.
If you use this codebase for your research, please cite the paper:
@inproceedings{furuta2021inference,
title={Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning},
author={Hiroki Furuta and Tadashi Kozuno and Tatsuya Matsushima and Yutaka Matsuo and Shixiang Shane Gu},
booktitle = {Advances in Neural Information Processing Systems},
year={2021}
}
We recommend you to use Docker. See README for setting up.
See examples for the details.
python train_sac.py exp=HalfCheetah-v2 seed=0 gpu=0
python train_mpo.py exp=HalfCheetah-v2 seed=0 gpu=0
python train_awr.py exp=HalfCheetah-v2 seed=0 gpu=0
python train_awac.py exp=HalfCheetah-v2 seed=0 gpu=0
For ablation experiments (ELU or LayerNorm), use following command:
python train_sac2.py gpu=0 seed=0 env=Ant-v2 actor.nn_size=256 critic.nn_size=256 agent.architecture='nn2' agent.activation='elu' agent.use_layer_norm=False
python train_sac2.py gpu=0 seed=0 env=Ant-v2 actor.nn_size=256 critic.nn_size=256 agent.architecture='nn2' agent.activation='relu' agent.use_layer_norm=True
python train_sac2.py gpu=0 seed=0 env=Ant-v2 actor.nn_size=256 critic.nn_size=256 agent.architecture='nn2' agent.activation='elu' agent.use_layer_norm=True
For MPO w/o ELU and LayerNorm:
python train_mpo2.py gpu=0 seed=0 exp=Ant-v2
This codebase is based on PFRL.