abstract

section

title

layout

series

id

month

tex_title

firstpage

lastpage

page

order

cycles

bibtex_author

author

date

address

publisher

container-title

volume

genre

issued

pdf

extras

We consider the variant of the stochastic multi-armed bandit problem where the stochastic reward distributions may change abruptly several times. In contrast to previous work, we are able to achieve (nearly) optimal mini-max regret bounds without knowing the number of changes. For this setting, we propose an algorithm called ADSWITCH and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. Our regret bound is the first optimal bound for an algorithm that is not tuned with respect to the number of changes.

contributed

Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes

inproceedings

Proceedings of Machine Learning Research

auer19a

0

Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes

138

158

138-158

138

false

Auer, Peter and Gajane, Pratik and Ortner, Ronald

given	family
Peter	Auer

given	family
Pratik	Gajane

given	family
Ronald	Ortner

2019-06-25

PMLR

Proceedings of the Thirty-Second Conference on Learning Theory

99

inproceedings

date-parts

2019

6

25

http://proceedings.mlr.press/v99/auer19a/auer19a.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2019-06-25-auer19a.md

2019-06-25-auer19a.md

Files

2019-06-25-auer19a.md

Latest commit

History

2019-06-25-auer19a.md

File metadata and controls