abstract | section | title | layout | series | id | month | tex_title | firstpage | lastpage | page | order | cycles | bibtex_author | author | date | address | publisher | container-title | volume | genre | issued | extras | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
We consider the variant of the stochastic multi-armed bandit problem where the stochastic reward distributions may change abruptly several times. In contrast to previous work, we are able to achieve (nearly) optimal mini-max regret bounds without knowing the number of changes. For this setting, we propose an algorithm called ADSWITCH and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. Our regret bound is the first optimal bound for an algorithm that is not tuned with respect to the number of changes. |
contributed |
Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes |
inproceedings |
Proceedings of Machine Learning Research |
auer19a |
0 |
Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes |
138 |
158 |
138-158 |
138 |
false |
Auer, Peter and Gajane, Pratik and Ortner, Ronald |
|
2019-06-25 |
PMLR |
Proceedings of the Thirty-Second Conference on Learning Theory |
99 |
inproceedings |
|