Skip to content

Commit 3e586b7

Browse files
liuzuxinJa4822
andauthored
Update configs, scripts, and instructions (#2)
* update configs * Update setup.py * update configs * fix bugs * update task * clean steup * update train and eval scripts * clean setup * update README * update training scripts * update configs * remove bc-frontier * format * format * add fsrl dependency * Update README.md * Update setup.py * update cdt * add bc frontier * clean up --------- Co-authored-by: Ja4822 <3471606159@qq.com>
1 parent 1840b85 commit 3e586b7

32 files changed

+1056
-477
lines changed

README.md

+60-18
Original file line numberDiff line numberDiff line change
@@ -1,59 +1,101 @@
11
<div align="center">
2-
<a href="http://fsrl.readthedocs.io"><img width="300px" height="auto" src="docs/_static/images/osrl-logo.png"></a>
2+
<a href="http://www.offline-saferl.org"><img width="300px" height="auto" src="https://github.com/liuzuxin/osrl/raw/main/docs/_static/images/osrl-logo.png"></a>
33
</div>
44

55
<br/>
66

77
<div align="center">
88

99
<a>![Python 3.8+](https://img.shields.io/badge/Python-3.8%2B-brightgreen.svg)</a>
10-
[![License](https://img.shields.io/badge/License-MIT-yellow.svg)](#license)
10+
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](#license)
11+
[![PyPI](https://img.shields.io/pypi/v/osrl-lib?logo=pypi)](https://pypi.org/project/osrl-lib)
12+
[![GitHub Repo Stars](https://img.shields.io/github/stars/liuzuxin/osrl?color=brightgreen&logo=github)](https://github.com/liuzuxin/osrl/stargazers)
13+
[![Downloads](https://static.pepy.tech/personalized-badge/osrl-lib?period=total&left_color=grey&right_color=blue&left_text=downloads)](https://pepy.tech/project/osrl-lib)
1114
<!-- [![Documentation Status](https://img.shields.io/readthedocs/fsrl?logo=readthedocs)](https://fsrl.readthedocs.io) -->
1215
<!-- [![CodeCov](https://codecov.io/github/liuzuxin/fsrl/branch/main/graph/badge.svg?token=BU27LTW9F3)](https://codecov.io/github/liuzuxin/fsrl)
1316
[![Tests](https://github.com/liuzuxin/fsrl/actions/workflows/test.yml/badge.svg)](https://github.com/liuzuxin/fsrl/actions/workflows/test.yml) -->
1417
<!-- [![CodeCov](https://img.shields.io/codecov/c/github/liuzuxin/fsrl/main?logo=codecov)](https://app.codecov.io/gh/liuzuxin/fsrl) -->
1518
<!-- [![tests](https://img.shields.io/github/actions/workflow/status/liuzuxin/fsrl/test.yml?label=tests&logo=github)](https://github.com/liuzuxin/fsrl/tree/HEAD/tests) -->
16-
<!-- [![PyPI](https://img.shields.io/pypi/v/fsrl?logo=pypi)](https://pypi.org/project/fsrl) -->
17-
<!-- [![GitHub Repo Stars](https://img.shields.io/github/stars/liuzuxin/fsrl?color=brightgreen&logo=github)](https://github.com/liuzuxin/fsrl/stargazers)
18-
[![Downloads](https://static.pepy.tech/personalized-badge/fsrl?period=total&left_color=grey&right_color=blue&left_text=downloads)](https://pepy.tech/project/fsrl) -->
19-
<!-- [![License](https://img.shields.io/github/license/liuzuxin/fsrl?label=license)](#license) -->
2019

2120
</div>
2221

2322
---
2423

2524
**OSRL (Offline Safe Reinforcement Learning)** offers a collection of elegant and extensible implementations of state-of-the-art offline safe reinforcement learning (RL) algorithms. Aimed at propelling research in offline safe RL, OSRL serves as a solid foundation to implement, benchmark, and iterate on safe RL solutions.
2625

27-
The OSRL package is a crucial component of our larger benchmarking suite for offline safe learning, which also includes [FSRL](https://github.com/liuzuxin/fsrl) and [DSRL](https://github.com/liuzuxin/dsrl), and is built to facilitate the development of robust and reliable offline safe RL solutions.
26+
The OSRL package is a crucial component of our larger benchmarking suite for offline safe learning, which also includes [DSRL](https://github.com/liuzuxin/DSRL) and [FSRL](https://github.com/liuzuxin/FSRL), and is built to facilitate the development of robust and reliable offline safe RL solutions.
2827

2928
To learn more, please visit our [project website](http://www.offline-saferl.org).
3029

3130
## Structure
3231
The structure of this repo is as follows:
3332
```
34-
├── osrl # offline safe RL algorithms
35-
│ ├── common_net.py
36-
│ ├── common_util.py
37-
│ ├── xx_algorithm.py
38-
│ ├── xx_algorithm_util.py
39-
│ ├── ...
33+
├── examples
34+
│ ├── configs # the training configs of each algorithm
35+
│ ├── eval # the evaluation escipts
36+
│ ├── train # the training scipts
37+
├── osrl
38+
│ ├── algorithms # offline safe RL algorithms
39+
│ ├── common # base networks and utils
4040
```
41+
The implemented offline safe RL and imitation learning algorithms include:
42+
43+
| Algorithm | Type | Description |
44+
|:-------------------:|:-----------------:|:------------------------:|
45+
| BCQ-Lag | Q-learning | [BCQ](https://arxiv.org/pdf/1812.02900.pdf) with [PID Lagrangian](https://arxiv.org/abs/2007.03964) |
46+
| BEAR-Lag | Q-learning | [BEARL](https://arxiv.org/abs/1906.00949) with [PID Lagrangian](https://arxiv.org/abs/2007.03964) |
47+
| CPQ | Q-learning | [Constraints Penalized Q-learning (CPQ))](https://arxiv.org/abs/2107.09003) |
48+
| COptiDICE | Distribution Correction Estimation | [Offline Constrained Policy Optimization via stationary DIstribution Correction Estimation](https://arxiv.org/abs/2204.08957) |
49+
| CDT | Sequential Modeling | [Constrained Decision Transformer](https://arxiv.org/abs/2302.07351) |
50+
| BC-All | Imitation Learning | [Behavior Cloning](https://arxiv.org/abs/2302.07351) with all datasets |
51+
| BC-Safe | Imitation Learning | [Behavior Cloning](https://arxiv.org/abs/2302.07351) with safe trajectories |
52+
| BC-Frontier | Imitation Learning | [Behavior Cloning](https://arxiv.org/abs/2302.07351) with high-reward trajectories |
53+
4154

4255
## Installation
43-
Pull the repo and install:
56+
57+
OSRL is currently hosted on [PyPI](https://pypi.org/project/osrl-lib), you can simply install it by:
58+
59+
```bash
60+
pip install osrl-lib
4461
```
45-
git clone https://github.com/liuzuxin/osrl.git
62+
63+
You can also pull the repo and install:
64+
```bash
65+
git clone https://github.com/liuzuxin/OSRL.git
4666
cd osrl
4767
pip install -e .
4868
```
4969

70+
If you want to use the `CDT` algorithm, please also manually install the `OApackage`:
71+
```bash
72+
pip install OApackage==2.7.6
73+
```
74+
5075
## How to use OSRL
5176

52-
The example usage are in the `examples` folder, where you can find the training and evaluation scripts for all the algorithms.
77+
The example usage are in the `examples` folder, where you can find the training and evaluation scripts for all the algorithms.
78+
All the parameters and their default configs for each algorithm are available in the `examples/configs` folder.
79+
OSRL uses the `WandbLogger` in [FSRL](https://github.com/liuzuxin/FSRL) and [Pyrallis](https://github.com/eladrich/pyrallis) configuration system. The offline dataset and offline environments are provided in [DSRL](https://github.com/liuzuxin/DSRL), so make sure you install both of them first.
5380

81+
### Training
5482
For example, to train the `bcql` method, simply run by overriding the default parameters:
5583

5684
```shell
57-
python examples/train/train_bcql.py --param1 args1
85+
python examples/train/train_bcql.py --task OfflineCarCirvle-v0 --param1 args1 ...
5886
```
59-
All the parameters and their default configs for each algorithm are available in the `examples/configs` folder.
87+
By default, the config file and the logs during training will be written to `logs\` folder and the training plots can be viewed online using Wandb.
88+
89+
You can also launch a sequence of experiments or in parallel via the [EasyRunner](https://github.com/liuzuxin/easy-runner) package, see `examples/train_all_tasks.py` for details.
90+
91+
### Evaluation
92+
To evaluate a trained agent, for example, a BCQ agent, simply run
93+
```
94+
python example/eval/eval_bcql.py --path path_to_model --eval_episodes 20
95+
```
96+
It will load config file from `path_to_model/config.yaml` and model file from `path_to_model/checkpoints/model.pt`, run 20 episodes, and print the average normalized reward and cost.
97+
98+
99+
## Contributing
100+
101+
If you have any suggestions or find any bugs, please feel free to submit an issue or a pull request. We welcome contributions from the community!

examples/configs/bc_configs.py

+45-4
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
1-
from typing import Any, DefaultDict, Dict, List, Optional, Tuple
21
from dataclasses import asdict, dataclass
2+
from typing import Any, DefaultDict, Dict, List, Optional, Tuple
3+
34
from pyrallis import field
45

56

67
@dataclass
78
class BCTrainConfig:
89
# wandb params
9-
project: str = "OSRL-baselines-new"
10+
project: str = "OSRL-baselines"
1011
group: str = None
1112
name: Optional[str] = None
1213
prefix: Optional[str] = "BC"
@@ -16,7 +17,7 @@ class BCTrainConfig:
1617
# dataset params
1718
outliers_percent: float = None
1819
noise_scale: float = None
19-
inpaint_ranges: Tuple[Tuple[float, float], ...] = None
20+
inpaint_ranges: Tuple[Tuple[float, float, float, float], ...] = None
2021
epsilon: float = None
2122
density: float = 1.0
2223
# training params
@@ -29,7 +30,7 @@ class BCTrainConfig:
2930
cost_limit: int = 10
3031
episode_len: int = 300
3132
batch_size: int = 512
32-
update_steps: int = 300_000
33+
update_steps: int = 100_000
3334
num_workers: int = 8
3435
bc_mode: str = "all" # "all", "safe", "risky", "frontier", "boundary", "multi-task"
3536
# model params
@@ -80,6 +81,20 @@ class BCAntCircleConfig(BCTrainConfig):
8081
episode_len: int = 500
8182

8283

84+
@dataclass
85+
class BCBallRunConfig(BCTrainConfig):
86+
# training params
87+
task: str = "OfflineBallRun-v0"
88+
episode_len: int = 100
89+
90+
91+
@dataclass
92+
class BCBallCircleConfig(BCTrainConfig):
93+
# training params
94+
task: str = "OfflineBallCircle-v0"
95+
episode_len: int = 200
96+
97+
8398
@dataclass
8499
class BCCarButton1Config(BCTrainConfig):
85100
# training params
@@ -191,89 +206,113 @@ class BCPointPush2Config(BCTrainConfig):
191206
task: str = "OfflinePointPush2Gymnasium-v0"
192207
episode_len: int = 1000
193208

209+
194210
@dataclass
195211
class BCAntVelocityConfig(BCTrainConfig):
196212
# training params
197213
task: str = "OfflineAntVelocityGymnasium-v1"
198214
episode_len: int = 1000
199215

216+
200217
@dataclass
201218
class BCHalfCheetahVelocityConfig(BCTrainConfig):
202219
# training params
203220
task: str = "OfflineHalfCheetahVelocityGymnasium-v1"
204221
episode_len: int = 1000
205222

223+
206224
@dataclass
207225
class BCHopperVelocityConfig(BCTrainConfig):
208226
# training params
209227
task: str = "OfflineHopperVelocityGymnasium-v1"
210228
episode_len: int = 1000
211229

230+
212231
@dataclass
213232
class BCSwimmerVelocityConfig(BCTrainConfig):
214233
# training params
215234
task: str = "OfflineSwimmerVelocityGymnasium-v1"
216235
episode_len: int = 1000
217236

237+
218238
@dataclass
219239
class BCWalker2dVelocityConfig(BCTrainConfig):
220240
# training params
221241
task: str = "OfflineWalker2dVelocityGymnasium-v1"
222242
episode_len: int = 1000
223243

244+
224245
@dataclass
225246
class BCEasySparseConfig(BCTrainConfig):
226247
# training params
227248
task: str = "OfflineMetadrive-easysparse-v0"
228249
episode_len: int = 1000
250+
update_steps: int = 200_000
251+
229252

230253
@dataclass
231254
class BCEasyMeanConfig(BCTrainConfig):
232255
# training params
233256
task: str = "OfflineMetadrive-easymean-v0"
234257
episode_len: int = 1000
258+
update_steps: int = 200_000
259+
235260

236261
@dataclass
237262
class BCEasyDenseConfig(BCTrainConfig):
238263
# training params
239264
task: str = "OfflineMetadrive-easydense-v0"
240265
episode_len: int = 1000
266+
update_steps: int = 200_000
267+
241268

242269
@dataclass
243270
class BCMediumSparseConfig(BCTrainConfig):
244271
# training params
245272
task: str = "OfflineMetadrive-mediumsparse-v0"
246273
episode_len: int = 1000
274+
update_steps: int = 200_000
275+
247276

248277
@dataclass
249278
class BCMediumMeanConfig(BCTrainConfig):
250279
# training params
251280
task: str = "OfflineMetadrive-mediummean-v0"
252281
episode_len: int = 1000
282+
update_steps: int = 200_000
283+
253284

254285
@dataclass
255286
class BCMediumDenseConfig(BCTrainConfig):
256287
# training params
257288
task: str = "OfflineMetadrive-mediumdense-v0"
258289
episode_len: int = 1000
290+
update_steps: int = 200_000
291+
259292

260293
@dataclass
261294
class BCHardSparseConfig(BCTrainConfig):
262295
# training params
263296
task: str = "OfflineMetadrive-hardsparse-v0"
264297
episode_len: int = 1000
298+
update_steps: int = 200_000
299+
265300

266301
@dataclass
267302
class BCHardMeanConfig(BCTrainConfig):
268303
# training params
269304
task: str = "OfflineMetadrive-hardmean-v0"
270305
episode_len: int = 1000
306+
update_steps: int = 200_000
307+
271308

272309
@dataclass
273310
class BCHardDenseConfig(BCTrainConfig):
274311
# training params
275312
task: str = "OfflineMetadrive-harddense-v0"
276313
episode_len: int = 1000
314+
update_steps: int = 200_000
315+
277316

278317
BC_DEFAULT_CONFIG = {
279318
# bullet_safety_gym
@@ -283,6 +322,8 @@ class BCHardDenseConfig(BCTrainConfig):
283322
"OfflineDroneCircle-v0": BCDroneCircleConfig,
284323
"OfflineCarRun-v0": BCCarRunConfig,
285324
"OfflineAntCircle-v0": BCAntCircleConfig,
325+
"OfflineBallCircle-v0": BCBallCircleConfig,
326+
"OfflineBallRun-v0": BCBallRunConfig,
286327
# safety_gymnasium: car
287328
"OfflineCarButton1Gymnasium-v0": BCCarButton1Config,
288329
"OfflineCarButton2Gymnasium-v0": BCCarButton2Config,

0 commit comments

Comments
 (0)