|
1 | 1 | <div align="center">
|
2 |
| - <a href="http://fsrl.readthedocs.io"><img width="300px" height="auto" src="docs/_static/images/osrl-logo.png"></a> |
| 2 | + <a href="http://www.offline-saferl.org"><img width="300px" height="auto" src="https://github.com/liuzuxin/osrl/raw/main/docs/_static/images/osrl-logo.png"></a> |
3 | 3 | </div>
|
4 | 4 |
|
5 | 5 | <br/>
|
6 | 6 |
|
7 | 7 | <div align="center">
|
8 | 8 |
|
9 | 9 | <a></a>
|
10 |
| - [](#license) |
| 10 | + [](#license) |
| 11 | + [](https://pypi.org/project/osrl-lib) |
| 12 | + [](https://github.com/liuzuxin/osrl/stargazers) |
| 13 | + [](https://pepy.tech/project/osrl-lib) |
11 | 14 | <!-- [](https://fsrl.readthedocs.io) -->
|
12 | 15 | <!-- [](https://codecov.io/github/liuzuxin/fsrl)
|
13 | 16 | [](https://github.com/liuzuxin/fsrl/actions/workflows/test.yml) -->
|
14 | 17 | <!-- [](https://app.codecov.io/gh/liuzuxin/fsrl) -->
|
15 | 18 | <!-- [](https://github.com/liuzuxin/fsrl/tree/HEAD/tests) -->
|
16 |
| - <!-- [](https://pypi.org/project/fsrl) --> |
17 |
| - <!-- [](https://github.com/liuzuxin/fsrl/stargazers) |
18 |
| - [](https://pepy.tech/project/fsrl) --> |
19 |
| - <!-- [](#license) --> |
20 | 19 |
|
21 | 20 | </div>
|
22 | 21 |
|
23 | 22 | ---
|
24 | 23 |
|
25 | 24 | **OSRL (Offline Safe Reinforcement Learning)** offers a collection of elegant and extensible implementations of state-of-the-art offline safe reinforcement learning (RL) algorithms. Aimed at propelling research in offline safe RL, OSRL serves as a solid foundation to implement, benchmark, and iterate on safe RL solutions.
|
26 | 25 |
|
27 |
| -The OSRL package is a crucial component of our larger benchmarking suite for offline safe learning, which also includes [FSRL](https://github.com/liuzuxin/fsrl) and [DSRL](https://github.com/liuzuxin/dsrl), and is built to facilitate the development of robust and reliable offline safe RL solutions. |
| 26 | +The OSRL package is a crucial component of our larger benchmarking suite for offline safe learning, which also includes [DSRL](https://github.com/liuzuxin/DSRL) and [FSRL](https://github.com/liuzuxin/FSRL), and is built to facilitate the development of robust and reliable offline safe RL solutions. |
28 | 27 |
|
29 | 28 | To learn more, please visit our [project website](http://www.offline-saferl.org).
|
30 | 29 |
|
31 | 30 | ## Structure
|
32 | 31 | The structure of this repo is as follows:
|
33 | 32 | ```
|
34 |
| -├── osrl # offline safe RL algorithms |
35 |
| -│ ├── common_net.py |
36 |
| -│ ├── common_util.py |
37 |
| -│ ├── xx_algorithm.py |
38 |
| -│ ├── xx_algorithm_util.py |
39 |
| -│ ├── ... |
| 33 | +├── examples |
| 34 | +│ ├── configs # the training configs of each algorithm |
| 35 | +│ ├── eval # the evaluation escipts |
| 36 | +│ ├── train # the training scipts |
| 37 | +├── osrl |
| 38 | +│ ├── algorithms # offline safe RL algorithms |
| 39 | +│ ├── common # base networks and utils |
40 | 40 | ```
|
| 41 | +The implemented offline safe RL and imitation learning algorithms include: |
| 42 | + |
| 43 | +| Algorithm | Type | Description | |
| 44 | +|:-------------------:|:-----------------:|:------------------------:| |
| 45 | +| BCQ-Lag | Q-learning | [BCQ](https://arxiv.org/pdf/1812.02900.pdf) with [PID Lagrangian](https://arxiv.org/abs/2007.03964) | |
| 46 | +| BEAR-Lag | Q-learning | [BEARL](https://arxiv.org/abs/1906.00949) with [PID Lagrangian](https://arxiv.org/abs/2007.03964) | |
| 47 | +| CPQ | Q-learning | [Constraints Penalized Q-learning (CPQ))](https://arxiv.org/abs/2107.09003) | |
| 48 | +| COptiDICE | Distribution Correction Estimation | [Offline Constrained Policy Optimization via stationary DIstribution Correction Estimation](https://arxiv.org/abs/2204.08957) | |
| 49 | +| CDT | Sequential Modeling | [Constrained Decision Transformer](https://arxiv.org/abs/2302.07351) | |
| 50 | +| BC-All | Imitation Learning | [Behavior Cloning](https://arxiv.org/abs/2302.07351) with all datasets | |
| 51 | +| BC-Safe | Imitation Learning | [Behavior Cloning](https://arxiv.org/abs/2302.07351) with safe trajectories | |
| 52 | +| BC-Frontier | Imitation Learning | [Behavior Cloning](https://arxiv.org/abs/2302.07351) with high-reward trajectories | |
| 53 | + |
41 | 54 |
|
42 | 55 | ## Installation
|
43 |
| -Pull the repo and install: |
| 56 | + |
| 57 | +OSRL is currently hosted on [PyPI](https://pypi.org/project/osrl-lib), you can simply install it by: |
| 58 | + |
| 59 | +```bash |
| 60 | +pip install osrl-lib |
44 | 61 | ```
|
45 |
| -git clone https://github.com/liuzuxin/osrl.git |
| 62 | + |
| 63 | +You can also pull the repo and install: |
| 64 | +```bash |
| 65 | +git clone https://github.com/liuzuxin/OSRL.git |
46 | 66 | cd osrl
|
47 | 67 | pip install -e .
|
48 | 68 | ```
|
49 | 69 |
|
| 70 | +If you want to use the `CDT` algorithm, please also manually install the `OApackage`: |
| 71 | +```bash |
| 72 | +pip install OApackage==2.7.6 |
| 73 | +``` |
| 74 | + |
50 | 75 | ## How to use OSRL
|
51 | 76 |
|
52 |
| -The example usage are in the `examples` folder, where you can find the training and evaluation scripts for all the algorithms. |
| 77 | +The example usage are in the `examples` folder, where you can find the training and evaluation scripts for all the algorithms. |
| 78 | +All the parameters and their default configs for each algorithm are available in the `examples/configs` folder. |
| 79 | +OSRL uses the `WandbLogger` in [FSRL](https://github.com/liuzuxin/FSRL) and [Pyrallis](https://github.com/eladrich/pyrallis) configuration system. The offline dataset and offline environments are provided in [DSRL](https://github.com/liuzuxin/DSRL), so make sure you install both of them first. |
53 | 80 |
|
| 81 | +### Training |
54 | 82 | For example, to train the `bcql` method, simply run by overriding the default parameters:
|
55 | 83 |
|
56 | 84 | ```shell
|
57 |
| -python examples/train/train_bcql.py --param1 args1 |
| 85 | +python examples/train/train_bcql.py --task OfflineCarCirvle-v0 --param1 args1 ... |
58 | 86 | ```
|
59 |
| -All the parameters and their default configs for each algorithm are available in the `examples/configs` folder. |
| 87 | +By default, the config file and the logs during training will be written to `logs\` folder and the training plots can be viewed online using Wandb. |
| 88 | + |
| 89 | +You can also launch a sequence of experiments or in parallel via the [EasyRunner](https://github.com/liuzuxin/easy-runner) package, see `examples/train_all_tasks.py` for details. |
| 90 | + |
| 91 | +### Evaluation |
| 92 | +To evaluate a trained agent, for example, a BCQ agent, simply run |
| 93 | +``` |
| 94 | +python example/eval/eval_bcql.py --path path_to_model --eval_episodes 20 |
| 95 | +``` |
| 96 | +It will load config file from `path_to_model/config.yaml` and model file from `path_to_model/checkpoints/model.pt`, run 20 episodes, and print the average normalized reward and cost. |
| 97 | + |
| 98 | + |
| 99 | +## Contributing |
| 100 | + |
| 101 | +If you have any suggestions or find any bugs, please feel free to submit an issue or a pull request. We welcome contributions from the community! |
0 commit comments