Skip to content

Latest commit

 

History

History
52 lines (41 loc) · 2.34 KB

File metadata and controls

52 lines (41 loc) · 2.34 KB

Optimizing Hindsight Experience Replay with Kronecker-factored Approximate Curvature

Abstract:

Hindsight Experience Replay (HER) is one of the efficient algorithm to solve Reinforcement Learning tasks related to sparse rewarded environments. But bad sample efficiency and poor convergence are the drawbacks of HER. Natural gradients solves these challenges by converging the model parameters better. It avoids taking bad actions that collapse the training performance. But this methodology requires expensive computation and thus increase in training time. In this paper we propose a methodology “Optimizing HER with Kronecker-factored Approximation Curvature (KFAC) “. Our proposed method solves the sample efficient problem and increases success rate with better convergence.

Installation

  • Clone the repo and cd into it:

    git clone https://github.com/dhuruvapriyan/Optimizing-Hindsight-Experience-Replay-with-Kronecker-factored-Approximate-Curvature.git
    cd baselines
  • If you don't have TensorFlow installed already, install your favourite flavor of TensorFlow. In most cases,

    pip install tensorflow-gpu # if you have a CUDA-compatible gpu and proper drivers

    or

    pip install tensorflow

    should be sufficient. Refer to TensorFlow installation guide for more details.

  • Install baselines package

    pip install -e .

MuJoCo

Some of the baselines examples use MuJoCo (multi-joint dynamics in contact) physics simulator, which is proprietary and requires binaries and a license (temporary 30-day license can be obtained from www.mujoco.org). Instructions on setting up MuJoCo can be found here

Testing the installation

All unit tests in baselines can be run using pytest runner:

pip install pytest
pytest

Training models

Most of the algorithms in baselines repo are used as follows:

python -m baselines.run --alg=her --env=<environment_id> [additional arguments]

Example 1. HER+KFAC with MuJoCo FetchReach

For instance, to train a fully-connected network controlling MuJoCo humanoid using PPO2 for 20M timesteps

python -m baselines.run --alg=her --env=FetchReach-v1 --num_timesteps=2e7