This repository contains the code for the ICML'21 paper "TempoRL: Learning When to Act".
If you use TempoRL in you research or application, please cite us:
@inproceedings{biedenkapp-icml21,
author = {André Biedenkapp and Raghu Rajan and Frank Hutter and Marius Lindauer},
title = {{T}empo{RL}: Learning When to Act},
booktitle = {Proceedings of the 38th International Conference on Machine Learning (ICML 2021)},
year = {2021},
month = jul,
}
The appendix PDF has been uploaded to this repository and can be accessed here.
This code was developed with python 3.6.12 and torch 1.4.0. If you have the correct python version you need to install the dependencies via
pip install -r requirements.txt
If you only want to run quick experiments with the tabular agents you can install the minimal requirements in tabular_requirements.txt
via
pip install -r tabular_requirements.txt
To make use of the provided jupyter notebook you optionally have to install jupyter
pip install jupyter
To run an agent on any of the below listed environments run
python run_tabular_experiments.py -e 10000 --agent Agent --env env_name --eval-eps 500
replace Agent with q
for vanilla q-learning and sq
for our method.
Currently 3 simple environments available. Per default all environments give a reward of 1 when reaching the goal (X). The agents start in state (S) and can traverse open fields (o). When falling into "lava" (.) the agent receives a reward of -1. For no other transition are rewards generated. (When rendering environments the agent is marked with *) An agent can use at most 100 steps to reach the goal.
Modifications of the below listed environments can run without goal rewards (env_name ends in _ng) or reduce the goal reward by the number of taken steps (env_name ends in _perc).
-
lava (Cliff)
S o . . . . . . o X o o . . . . . . o o o o . . . . . . o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
-
lava2 (Bridge)
S o . . . . . . o X o o . . . . . . o o o o o o o o o o o o o o o o o o o o o o o o . . . . . . o o o o . . . . . . o o
-
lava3 (ZigZag)
S o . . o o o o o o o o . . o o o o o o o o . . o o . . o o o o . . o o . . o o o o o o o o . . o o o o o o o o . . o X
To train an agent on featurized environments run e.g.
python run_featurized_experiments.py -e 10000 -t 1000000 --eval-after-n-steps 200 -s 1 --agent tdqn --skip-net-max-skips 10 --out-dir . --sparse
replace tdqn (our agent with shared network architecture) with dqn or dar to run the respective baseline agents
To train a DDPG agent run e.g.
python run_ddpg_experiments.py --policy TempoRLDDPG --env Pendulum-v0 --start_timesteps 1000 --max_timesteps 30000 --eval_freq 250 --max-skip 16 --save_model --out-dir . --seed 1
replace TempoRLDDPG with FiGARDDPG or DDPG to run the baseline agents.
To train an agent on atari environments run e.g.
run_atari_experiments.py --env freeway --env-max-steps 10000 --agent tdqn --out-dir experiments/atari_new/freeway/tdqn_3 --episodes 20000 --training-steps 2500000 --eval-after-n-steps 10000 --seed 12345 --84x84 --eval-n-episodes 3
replace tdqn (our agent with shared network architecture) with dqn or dar to run the respective baseline agents.
We provide all learning curve data, final policy network weights as well as commands to generate that data at: https://figshare.com/s/d9264dc125ca8ba8efd8
(Download this data and move it into the experiments folder)