title

section

openreview

abstract

layout

series

publisher

issn

id

month

tex_title

firstpage

lastpage

page

order

cycles

bibtex_author

author

date

address

container-title

volume

genre

issued

pdf

extras

Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards

Poster

i84V7i6KEMd

Preference-based reinforcement learning (PbRL) aligns a robot behavior with human preferences via a reward function learned from binary feedback over agent behaviors. We show that encoding environment dynamics in the reward function improves the sample efficiency of PbRL by an order of magnitude. In our experiments we iterate between: (1) encoding environment dynamics in a state-action representation $z^{sa}$ via a self-supervised temporal consistency task, and (2) bootstrapping the preference-based reward function from $z^{sa}$, which results in faster policy learning and better final policy performance. For example, on quadruped-walk, walker-walk, and cheetah-run, with 50 preference labels we achieve the same performance as existing approaches with 500 preference labels, and we recover $83%$ and $66%$ of ground truth reward policy performance versus only $38%$ and $21%$ without environment dynamics. The performance gains demonstrate that explicitly encoding environment dynamics improves preference-learned reward functions.

inproceedings

Proceedings of Machine Learning Research

PMLR

2640-3498

metcalf23a

0

Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards

1484

1532

1484-1532

1484

false

Metcalf, Katherine and Sarabia, Miguel and Mackraz, Natalie and Theobald, Barry-John

given	family
Katherine	Metcalf

given	family
Miguel	Sarabia

given	family
Natalie	Mackraz

given	family
Barry-John	Theobald

2023-12-02

Proceedings of The 7th Conference on Robot Learning

229

inproceedings

date-parts

2023

12

2

https://proceedings.mlr.press/v229/metcalf23a/metcalf23a.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2023-12-02-metcalf23a.md

2023-12-02-metcalf23a.md

Files

2023-12-02-metcalf23a.md

Latest commit

History

2023-12-02-metcalf23a.md

File metadata and controls