Skip to content

Latest commit

 

History

History
60 lines (60 loc) · 2.37 KB

2021-07-01-amin21a.md

File metadata and controls

60 lines (60 loc) · 2.37 KB
title abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards
A major challenge in reinforcement learning is the design of exploration strategies, especially for environments with sparse reward structures and continuous state and action spaces. Intuitively, if the reinforcement signal is very scarce, the agent should rely on some form of short-term memory in order to cover its environment efficiently. We propose a new exploration method, based on two intuitions: (1) the choice of the next exploratory action should depend not only on the (Markovian) state of the environment, but also on the agent’s trajectory so far, and (2) the agent should utilize a measure of spread in the state space to avoid getting stuck in a small region. Our method leverages concepts often used in statistical physics to provide explanations for the behavior of simplified (polymer) chains in order to generate persistent (locally self-avoiding) trajectories in state space. We discuss the theoretical properties of locally self-avoiding walks and their ability to provide a kind of short-term memory through a decaying temporal correlation within the trajectory. We provide empirical evaluations of our approach in a simulated 2D navigation task, as well as higher-dimensional MuJoCo continuous control locomotion tasks with sparse rewards.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
amin21a
0
Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards
275
285
275-285
275
false
Amin, Susan and Gomrokchi, Maziar and Aboutalebi, Hossein and Satija, Harsh and Precup, Doina
given family
Susan
Amin
given family
Maziar
Gomrokchi
given family
Hossein
Aboutalebi
given family
Harsh
Satija
given family
Doina
Precup
2021-07-01
Proceedings of the 38th International Conference on Machine Learning
139
inproceedings
date-parts
2021
7
1