Reproducing experiments

This wiki provides instructions on how to reproduce most of the experiments presented in the NeurIPS 2022 Offline RL workshop paper ``Towards Data-Driven Offline Simulations for Online Reinforcement Learning'' by Shengpu Tang, Felipe Vieira Frujeri, Dipendra Misra, Alex Lamb, John Langford, Paul Mineiro, Sebastian Kochman.

Figure 2: Illustrative Example of Evaluation Protocol

Figure 2 in the paper illustrates fidelity vs efficiency trade-off between different simulations. See appendix B.1. in the paper for details.

To see how this figure was produced, see notebook notebooks/metrics.ipynb.

Figure 4: latent state encoding

In order to reproduce Figure 4 from the paper, in which we represent the visitation for both the observations in the continuous grid and the corresponding latent state visitation (after encoding the observations using HOMER).

Collecting Continuous Grid data through a random policy

In order to train the HOMER encoder, we can use a random agent as the behavior policy to collect the data, to reproduce this data collection you can run this script:

python examples/continuous_grid/random_agent_rollout.py

Encoding observations to generate latent states

If you want to train the encoder from scratch please refer to this script with the following configurations:

python examples/continuous_grid/train_homer_encoder.py --num_epochs=1000 --seed=0 --batch_size=64 --latent_size=50 --hidden_size=64 --lr=1e-3 --weight_decay=0.0 --temperature_decay=False --output_dir='outputs/models' --num_samples=100000

We made available a model checkpoint for the HOMER based encoder here, to use it to encode the previous collected dataset and reproduce Figure 4 (visualizing both the original observation visitation and the latent state representation captured by the encoder), use this notebook as before

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing experiments

Figure 2: Illustrative Example of Evaluation Protocol

Figure 4: latent state encoding

Collecting Continuous Grid data through a random policy

Encoding observations to generate latent states

Clone this wiki locally