Ray RLlib is an RL execution toolkit built on the Ray distributed execution framework. See the user documentation and paper.
RLlib includes the following reference algorithms:
- Proximal Policy Optimization (PPO) which is a proximal variant of TRPO.
- The Asynchronous Advantage Actor-Critic (A3C).
- Deep Q Networks (DQN).
- Ape-X Distributed Prioritized Experience Replay.
- Evolution Strategies, as described in this paper. Our implementation is adapted from here.
These algorithms can be run on any OpenAI Gym MDP, including custom ones written and registered by the user.