Skip to content

Latest commit

 

History

History
55 lines (55 loc) · 2.16 KB

2021-07-01-baudry21a.md

File metadata and controls

55 lines (55 loc) · 2.16 KB
title abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
Optimal Thompson Sampling strategies for support-aware CVaR bandits
In this paper we study a multi-arm bandit problem in which the quality of each arm is measured by the Conditional Value at Risk (CVaR) at some level alpha of the reward distribution. While existing works in this setting mainly focus on Upper Confidence Bound algorithms, we introduce a new Thompson Sampling approach for CVaR bandits on bounded rewards that is flexible enough to solve a variety of problems grounded on physical resources. Building on a recent work by Riou & Honda (2020), we introduce B-CVTS for continuous bounded rewards and M-CVTS for multinomial distributions. On the theoretical side, we provide a non-trivial extension of their analysis that enables to theoretically bound their CVaR regret minimization performance. Strikingly, our results show that these strategies are the first to provably achieve asymptotic optimality in CVaR bandits, matching the corresponding asymptotic lower bounds for this setting. Further, we illustrate empirically the benefit of Thompson Sampling approaches both in a realistic environment simulating a use-case in agriculture and on various synthetic examples.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
baudry21a
0
Optimal Thompson Sampling strategies for support-aware CVaR bandits
716
726
716-726
716
false
Baudry, Dorian and Gautron, Romain and Kaufmann, Emilie and Maillard, Odalric
given family
Dorian
Baudry
given family
Romain
Gautron
given family
Emilie
Kaufmann
given family
Odalric
Maillard
2021-07-01
Proceedings of the 38th International Conference on Machine Learning
139
inproceedings
date-parts
2021
7
1