Optimizing Hindsight Experience Replay with Kronecker-factored Approximate Curvature

Abstract:

Hindsight Experience Replay (HER) is one of the efficient algorithm to solve Reinforcement Learning tasks related to sparse rewarded environments. But bad sample efficiency and poor convergence are the drawbacks of HER. Natural gradients solves these challenges by converging the model parameters better. It avoids taking bad actions that collapse the training performance. But this methodology requires expensive computation and thus increase in training time. In this paper we propose a methodology “Optimizing HER with Kronecker-factored Approximation Curvature (KFAC) “. Our proposed method solves the sample efficient problem and increases success rate with better convergence.

Installation

Clone the repo and cd into it:

git clone https://github.com/dhuruvapriyan/Optimizing-Hindsight-Experience-Replay-with-Kronecker-factored-Approximate-Curvature.git
cd baselines

If you don't have TensorFlow installed already, install your favourite flavor of TensorFlow. In most cases,
```
pip install tensorflow-gpu # if you have a CUDA-compatible gpu and proper drivers
```
or
```
pip install tensorflow
```
should be sufficient. Refer to TensorFlow installation guide for more details.
Install baselines package
```
pip install -e .
```

MuJoCo

Some of the baselines examples use MuJoCo (multi-joint dynamics in contact) physics simulator, which is proprietary and requires binaries and a license (temporary 30-day license can be obtained from www.mujoco.org). Instructions on setting up MuJoCo can be found here

Testing the installation

All unit tests in baselines can be run using pytest runner:

pip install pytest
pytest

Training models

Most of the algorithms in baselines repo are used as follows:

python -m baselines.run --alg=her --env=<environment_id> [additional arguments]

Example 1. HER+KFAC with MuJoCo FetchReach

For instance, to train a fully-connected network controlling MuJoCo humanoid using PPO2 for 20M timesteps

python -m baselines.run --alg=her --env=FetchReach-v1 --num_timesteps=2e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Optimizing Hindsight Experience Replay with Kronecker-factored Approximate Curvature

Abstract:

Installation

MuJoCo

Testing the installation

Training models

Example 1. HER+KFAC with MuJoCo FetchReach

Files

README.md

Latest commit

History

README.md

File metadata and controls

Optimizing Hindsight Experience Replay with Kronecker-factored Approximate Curvature

Abstract:

Installation

MuJoCo

Testing the installation

Training models

Example 1. HER+KFAC with MuJoCo FetchReach