Skip to content

Latest commit

 

History

History
45 lines (45 loc) · 1.47 KB

2019-06-25-auer19a.md

File metadata and controls

45 lines (45 loc) · 1.47 KB
abstract section title layout series id month tex_title firstpage lastpage page order cycles bibtex_author author date address publisher container-title volume genre issued pdf extras
We consider the variant of the stochastic multi-armed bandit problem where the stochastic reward distributions may change abruptly several times. In contrast to previous work, we are able to achieve (nearly) optimal mini-max regret bounds without knowing the number of changes. For this setting, we propose an algorithm called ADSWITCH and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. Our regret bound is the first optimal bound for an algorithm that is not tuned with respect to the number of changes.
contributed
Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes
inproceedings
Proceedings of Machine Learning Research
auer19a
0
Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes
138
158
138-158
138
false
Auer, Peter and Gajane, Pratik and Ortner, Ronald
given family
Peter
Auer
given family
Pratik
Gajane
given family
Ronald
Ortner
2019-06-25
PMLR
Proceedings of the Thirty-Second Conference on Learning Theory
99
inproceedings
date-parts
2019
6
25