Skip to content

Latest commit

 

History

History

hw2

This homework (code + data visualization) is complete.

Note that there may be minor bugs in the code.

Setup

You can run this code on your own machine or on Google Colab.

  1. Local option: If you choose to run locally, you will need to install MuJoCo and some Python packages; see installation.md from homework 1 for instructions. If you completed this installation for homework 1, you do not need to repeat it.
  2. Colab: The first few sections of the notebook will install all required dependencies. You can try out the Colab option by clicking the badge below:

Open In Colab

Complete the code

The following files have blanks to be filled with your solutions from homework 1. The relevant sections are marked with "TODO: get this from hw1".

You will then need to complete the following new files for homework 2. The relevant sections are marked with "TODO".

You will also want to look through scripts/run_hw2.py (if running locally) or scripts/run_hw2.ipynb (if running on Colab), though you will not need to edit this files beyond changing runtime arguments in the Colab notebook.

You will be running your policy gradients implementation in four experiments total, investigating the effects of design decisions like reward-to-go estimators, neural network baselines for variance reduction, and advantage normalization. See the assignment PDF for more details.

Plotting your results

We have provided a snippet that may be used for reading your Tensorboard eventfiles in scripts/read_results.py. Reading these eventfiles and plotting them with matplotlib or seaborn will produce the cleanest results for your submission. For debugging purposes, we recommend visualizing the Tensorboard logs using tensorboard --logdir data.