Note that there may be minor bugs in the code.
You can run this code on your own machine or on Google Colab.
- Local option: If you choose to run locally, you will need to install MuJoCo and some Python packages; see installation.md from homework 1 for instructions. If you completed this installation for homework 1, you do not need to repeat it.
- Colab: The first few sections of the notebook will install all required dependencies. You can try out the Colab option by clicking the badge below:
The following files have blanks to be filled with your solutions from homework 1. The relevant sections are marked with "TODO: get this from hw1".
You will then need to complete the following new files for homework 2. The relevant sections are marked with "TODO".
You will also want to look through scripts/run_hw2.py (if running locally) or scripts/run_hw2.ipynb (if running on Colab), though you will not need to edit this files beyond changing runtime arguments in the Colab notebook.
You will be running your policy gradients implementation in four experiments total, investigating the effects of design decisions like reward-to-go estimators, neural network baselines for variance reduction, and advantage normalization. See the assignment PDF for more details.
We have provided a snippet that may be used for reading your Tensorboard eventfiles in scripts/read_results.py. Reading these eventfiles and plotting them with matplotlib or seaborn will produce the cleanest results for your submission. For debugging purposes, we recommend visualizing the Tensorboard logs using tensorboard --logdir data
.