Skip to content

Latest commit

 

History

History
49 lines (49 loc) · 1.92 KB

2021-07-21-agrawal21a.md

File metadata and controls

49 lines (49 loc) · 1.92 KB
title abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
Regret Minimization in Heavy-Tailed Bandits
We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a single parameter exponential family. We work under the much weaker assumption that the moments of order \((1+\epsilon)\){are} uniformly bounded by a known constant \(B\), for some given \( \epsilon > 0\). We propose an optimal algorithm that matches the lower bound exactly in the first-order term. We also give a finite-time bound on its regret. We show that our index concentrates faster than the well-known truncated or trimmed empirical mean estimators for the mean of heavy-tailed distributions. Computing our index can be computationally demanding. To address this, we develop a batch-based algorithm that is optimal up to a multiplicative constant depending on the batch size. We hence provide a controlled trade-off between statistical optimality and computational cost.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
agrawal21a
0
Regret Minimization in Heavy-Tailed Bandits
26
62
26-62
26
false
Agrawal, Shubhada and Juneja, Sandeep K. and Koolen, Wouter M.
given family
Shubhada
Agrawal
given family
Sandeep K.
Juneja
given family
Wouter M.
Koolen
2021-07-21
Proceedings of Thirty Fourth Conference on Learning Theory
134
inproceedings
date-parts
2021
7
21