Skip to content

Latest commit

 

History

History
58 lines (58 loc) · 2.38 KB

2021-07-21-foster21a.md

File metadata and controls

58 lines (58 loc) · 2.38 KB
title abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm. Are similar guarantees possible for contextual bandits? While positive results are known for certain special cases, there is no general theory characterizing when and how instance-dependent regret bounds for contextual bandits can be achieved for rich, general classes of policies. We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds. We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case. Finally, we provide structural results that tie together a number of complexity measures previously proposed throughout contextual bandits, reinforcement learning, and active learning and elucidate their role in determining the optimal instance-dependent regret. In a large-scale empirical evaluation, we find that our approach often gives superior results for challenging exploration problems. Turning our focus to reinforcement learning with function approximation, we develop new oracle-efficient algorithms for reinforcement learning with rich observations that obtain optimal gap-dependent sample complexity.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
foster21a
0
Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective
2059
2059
2059-2059
2059
false
Foster, Dylan and Rakhlin, Alexander and Simchi-Levi, David and Xu, Yunzong
given family
Dylan
Foster
given family
Alexander
Rakhlin
given family
David
Simchi-Levi
given family
Yunzong
Xu
2021-07-21
Proceedings of Thirty Fourth Conference on Learning Theory
134
inproceedings
date-parts
2021
7
21