Reproducing experiments

This wiki provides instructions on how to reproduce most of the experiments presented in the NeurIPS 2022 Offline RL workshop paper ``Towards Data-Driven Offline Simulations for Online Reinforcement Learning'' by Shengpu Tang, Felipe Vieira Frujeri, Dipendra Misra, Alex Lamb, John Langford, Paul Mineiro, Sebastian Kochman.

Figure 2: Illustrative Example of Evaluation Protocol

Figure 2 in the paper illustrates fidelity vs efficiency trade-off between different simulations. See appendix B.1. in the paper for details.

To see how this figure was produced, see notebook notebooks/metrics.ipynb.

Figure 4: state encoder

TODO: can we add a bit more explanations to the steps so they are easier to understand and the reader can pick which steps they would like to execute?

Collecting Continuous Grid data through a random policy

python examples/continuous_grid/random_agent_rollout.py

Visualize state visitation in the collected dataset

To visualize the state visitation in your dataset, use the to the notebook

Using the observation encoder (state decoder) to generate latent states

We made available a model checkpoint for the HOMER based encoder here.
To visualize the latent state representation captured by the encoder, use the to the same notebook as before
To train the encoder from scratch use this script with the following configurations:

python examples/continuous_grid/train_homer_encoder.py --num_epochs=1000 --seed=0 --batch_size=64 --latent_size=50 --hidden_size=64 --lr=1e-3 --weight_decay=0.0 --temperature_decay=False --output_dir='outputs/models' --num_samples=100000

Provide feedback

Saved searches