Skip to content

Decision Transformer: fixed 'reward_scale' in configs as in wandb reports #75

Merged
merged 1 commit into from
Aug 1, 2023

Conversation

suessmann
Copy link
Contributor

Hiya,

Working my way through DT implementation I noticed that provided configs for locomotion tasks do not deliver perfomance as in wandb reports. Looking closely, the issue happened to be in reward_scale entry, namely the configs in repo had a value of reward_scale: 1.0, while the wandb reports (e.g. this one) show reward_scale: 0.001.

I also ran a small-scale experiment on hopper-medium-replay-v2, the perfomance of an updated config matched the one you report.

@vkurenkov vkurenkov merged commit 6afec90 into tinkoff-ai:main Aug 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants