title | abstract | openreview | section | layout | series | publisher | issn | id | month | tex_title | firstpage | lastpage | page | order | cycles | bibtex_author | author | date | address | container-title | volume | genre | issued | extras | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Walking the Values in Bayesian Inverse Reinforcement Learning |
The goal of Bayesian inverse reinforcement learning (IRL) is recovering a posterior distribution over reward functions using a set of demonstrations from an expert optimizing for a reward unknown to the learner. The resulting posterior over rewards can then be used to synthesize an apprentice policy that performs well on the same or a similar task. A key challenge in Bayesian IRL is bridging the computational gap between the hypothesis space of possible rewards and the likelihood, often defined in terms of Q values: vanilla Bayesian IRL needs to solve the costly forward planning problem – going from rewards to the Q values – at every step of the algorithm, which may need to be done thousands of times. We propose to solve this by a simple change: instead of focusing on primarily sampling in the space of rewards, we can focus on primarily working in the space of Q-values, since the computation required to go from Q-values to reward is radically cheaper. Furthermore, this reversion of the computation makes it easy to compute the gradient allowing efficient sampling using Hamiltonian Monte Carlo. We propose ValueWalk – a new Markov chain Monte Carlo method based on this insight – and illustrate its advantages on several tasks. |
48SI6DOqUH |
Papers |
inproceedings |
Proceedings of Machine Learning Research |
PMLR |
2640-3498 |
bajgar24a |
0 |
Walking the Values in Bayesian Inverse Reinforcement Learning |
273 |
287 |
273-287 |
273 |
false |
Bajgar, Ondrej and Abate, Alessandro and Gatsis, Konstantinos and Osborne, Michael |
|
2024-09-12 |
Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence |
244 |
inproceedings |
|