This is the Pytorch implementation of the paper: Neurips 2021 - There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning.
- Python 3.8
- For the other packages, please refer to the requirements.txt, or do
pip install -r requirements.txt
RAC can be trained on Cartpole using rac_cartpole.py, or on Turf using rac_turf.py. The parameter n_traj_classifier controls the number of trajectories used to train psi, the parameter epoch_classifier controls the number of training epochs of psi, and steps_action_model the number of training examples given to phi.
python rac_cartpole.py
python rac_turf.py --epoch_classifier 100 --steps_action_model 100000 --n_traj_classifier 50000
RAE can be trained on Cartpole using rae_cartpole.py, or on Turf using rae_turf.py. The parameter threshold is denoted beta in the paper. The online training of psi frequency is fixed using train_freq. The window w is controled using d_max.
python rae_cartpole.py
python rae_turf.py --threshold 0.8 --train_freq 500 --d_max 50000