title

section

abstract

layout

series

publisher

issn

id

month

tex_title

firstpage

lastpage

page

order

cycles

bibtex_author

author

date

address

container-title

volume

genre

issued

pdf

extras

Scale-free Adversarial Reinforcement Learning

Original Papers

This paper initiates the study of scale-free learning in Markov Decision Processes (MDPs), where the scale of rewards/losses is unknown to the learner. We design a generic algorithmic framework, \underline{S}cale \underline{C}lipping \underline{B}ound (\texttt{SCB}), and instantiate this framework in both the adversarial Multi-armed Bandit (MAB) setting and the adversarial MDP setting. Through this framework, we achieve the first minimax optimal expected regret bound and the first high-probability regret bound in scale-free adversarial MABs, resolving an open problem raised in \cite{hadiji2020adaptation}. On adversarial MDPs, our framework also give birth to the first scale-free RL algorithm with a $\tilde{\mathcal{O}}(\sqrt{T})$ high-probability regret guarantee.

inproceedings

Proceedings of Machine Learning Research

PMLR

2640-3498

chen24d

0

Scale-free Adversarial Reinforcement Learning

1068

1101

1068-1101

1068

false

Chen, Mingyu and Zhang, Xuezhou

given	family
Mingyu	Chen

given	family
Xuezhou	Zhang

2024-06-30

Proceedings of Thirty Seventh Conference on Learning Theory

247

inproceedings

date-parts

2024

6

30

https://proceedings.mlr.press/v247/chen24d/chen24d.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2024-06-30-chen24d.md

2024-06-30-chen24d.md

Files

2024-06-30-chen24d.md

Latest commit

History

2024-06-30-chen24d.md

File metadata and controls