Skip to content

Latest commit

 

History

History
43 lines (34 loc) · 1.91 KB

README.md

File metadata and controls

43 lines (34 loc) · 1.91 KB

Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning

[arxiv], Accepted at NeurIPS 2021.

This codebase includes inference-based off-policy algorithms, both KL control (SAC) and EM control (MPO, AWR, AWAC) methods.

If you use this codebase for your research, please cite the paper:

@inproceedings{furuta2021inference,
  title={Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning},
  author={Hiroki Furuta and Tadashi Kozuno and Tatsuya Matsushima and Yutaka Matsuo and Shixiang Shane Gu},
  booktitle = {Advances in Neural Information Processing Systems},
  year={2021}
}

Dependencies

We recommend you to use Docker. See README for setting up.

Examples

See examples for the details.

python train_sac.py exp=HalfCheetah-v2 seed=0 gpu=0
python train_mpo.py exp=HalfCheetah-v2 seed=0 gpu=0
python train_awr.py exp=HalfCheetah-v2 seed=0 gpu=0
python train_awac.py exp=HalfCheetah-v2 seed=0 gpu=0

For ablation experiments (ELU or LayerNorm), use following command:

python train_sac2.py gpu=0 seed=0 env=Ant-v2 actor.nn_size=256 critic.nn_size=256 agent.architecture='nn2' agent.activation='elu' agent.use_layer_norm=False

python train_sac2.py gpu=0 seed=0 env=Ant-v2 actor.nn_size=256 critic.nn_size=256 agent.architecture='nn2' agent.activation='relu' agent.use_layer_norm=True

python train_sac2.py gpu=0 seed=0 env=Ant-v2 actor.nn_size=256 critic.nn_size=256 agent.architecture='nn2' agent.activation='elu' agent.use_layer_norm=True

For MPO w/o ELU and LayerNorm:

python train_mpo2.py gpu=0 seed=0 exp=Ant-v2

Reference

This codebase is based on PFRL.