Skip to content

Commit

Permalink
Update README.
Browse files Browse the repository at this point in the history
  • Loading branch information
AdamJelley committed Jun 8, 2024
1 parent dbf77a6 commit ad06625
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 4 deletions.
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Efficient Offline Reinforcement Learning: The Critic is Critical
# Efficient Offline Reinforcement Learning:</br>The Critic is Critical

[<img src="https://img.shields.io/badge/license-Apache_2.0-blue">](https://github.com/tinkoff-ai/CORL/blob/main/LICENSE)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)

This repository contains the codebase to reproduce the experiments for the paper: Efficient Offline Reinforcement Learning: The Critic is Critical. This paper investigates improving the efficiency and stability of offline off-policy reinforcement learning algorithms (such as TD3+BC and EDAC) through supervised pre-training of both the actor and the critic to match the behavior policy.
This repository contains the codebase for the paper: Efficient Offline Reinforcement Learning: The Critic is Critical. This paper investigates improving the efficiency and stability of offline off-policy reinforcement learning algorithms (such as TD3+BC and EDAC) through supervised pre-training of both the actor and the critic to match the behavior policy.

![MotivationalExample](assets/GitHubFigure.png)

Expand All @@ -18,15 +18,17 @@ This codebase builds on the excellent [CORL](https://github.com/tinkoff-ai/CORL)
> * 📈 Benchmarked Implementation for N algorithms
> * 🖼 [Weights and Biases](https://wandb.ai/site) integration
Note that since this research was undertaken, the CORL codebase has been significantly improved and refactored (for example to include offline-to-online algorithms). This codebase does not include these updates in order to preserve the codebase that was used for our research and to prevent any discrepencies with results in the paper. However, if you are interested in continuing an aspect of this research it should be straightforward (ish!) to merge our changes into the latest CORL codebase. Feel free to raise an issue if you are interested in doing so and we would be happy to help.
Note that since this research was undertaken, the CORL codebase has been significantly improved and refactored (for example to include offline-to-online algorithms). This codebase does not include these updates in order to preserve the codebase that was used for our research and to prevent any discrepencies with results in the paper. However, if you are interested in continuing an aspect of this research it should be straightforward (ish!) to merge our changes into the latest CORL codebase. Feel free to raise an issue if you need any help with doing so.

Please note also that [ReBRAC](https://arxiv.org/abs/2305.09836) (developed concurrently with this research and now included in the updated CORL codebase) contains many similar auxiliary findings for improving the efficiency and stability of offline off-policy reinforcement learning algorithms (such as the use of layer normalization, deeper networks, and decoupled penalization on **both** the actor and critic) and is recommended as a base offline off-policy algorithm for future research. However, ReBRAC does **not** include any form of supervised pre-training to improve efficiency (the core contribution of our work).

## Installation

```bash
git clone [email protected]:AdamJelley/EfficientOfflineRL.git && cd EfficientOfflineRL
pip install -r requirements_dev.txt
conda create -n EORL python=3.10
conda activate EORL
pip install -r requirements/requirements_dev.txt

# alternatively, you could use docker
docker build -t <image_name> .
Expand Down
Binary file modified assets/GitHubFigure.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions requirements/requirements_dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ tqdm==4.64.0
wandb==0.12.21
mujoco-py==2.1.2.14
numpy==1.23.1
pandas==2.0.0
gym[mujoco_py,classic_control]==0.23.0
--extra-index-url https://download.pytorch.org/whl/cu113
torch==1.11.0+cu113
Expand Down

0 comments on commit ad06625

Please sign in to comment.