Decision Transformer: fixed 'reward_scale' in configs as in wandb reports #75

suessmann · 2023-08-01T10:04:39Z

Hiya,

Working my way through DT implementation I noticed that provided configs for locomotion tasks do not deliver perfomance as in wandb reports. Looking closely, the issue happened to be in reward_scale entry, namely the configs in repo had a value of reward_scale: 1.0, while the wandb reports (e.g. this one) show reward_scale: 0.001.

I also ran a small-scale experiment on hopper-medium-replay-v2, the perfomance of an updated config matched the one you report.

fixed 'reward_scale' in configs as in wandb reports

5e6624d

suessmann requested review from Howuhh, Scitator and vkurenkov as code owners August 1, 2023 10:04

Howuhh approved these changes Aug 1, 2023

View reviewed changes

Scitator approved these changes Aug 1, 2023

View reviewed changes

vkurenkov approved these changes Aug 1, 2023

View reviewed changes

vkurenkov merged commit 6afec90 into tinkoff-ai:main Aug 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decision Transformer: fixed 'reward_scale' in configs as in wandb reports #75

Decision Transformer: fixed 'reward_scale' in configs as in wandb reports #75

suessmann commented Aug 1, 2023

Decision Transformer: fixed 'reward_scale' in configs as in wandb reports #75

Decision Transformer: fixed 'reward_scale' in configs as in wandb reports #75

Conversation

suessmann commented Aug 1, 2023