This is my submission for Udacity project Collaborate and Compete. Following is the description of project:
For this project, you will work with the Tennis environment.
In this environment, two agents control rackets to bounce a ball over a net. If an agent hits the ball over the net, it receives a reward of +0.1. If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01. Thus, the goal of each agent is to keep the ball in play.
The observation space consists of 8 variables corresponding to the position and velocity of the ball and racket. Each agent receives its own, local observation. Two continuous actions are available, corresponding to movement toward (or away from) the net, and jumping.
The task is episodic, and in order to solve the environment, your agents must get an average score of +0.5 (over 100 consecutive episodes, after taking the maximum over both agents). Specifically,
- After each episode, we add up the rewards that each agent received (without discounting), to get a score for each agent. This yields 2 (potentially different) scores. We then take the maximum of these 2 scores.
- This yields a single score for each episode.
The environment is considered solved, when the average (over 100 episodes) of those scores is at least +0.5.
- Clone the repository using the following command:
git clone --recurse-submodules https://github.com/sumitpai/Udacity-Tennis.git
It is very important that you add the --recurse-submodules
flag because rllib is an embedded sub repository. It will not be downloaded correctly without this flag.
-
Download python 3.6 and Pytorch 1.0.
-
Install the unity environment as described here: Getting Started section (The Unity ML-agant environment is already configured by Udacity)
-
The current repo has the environment for MAC. If you are using any other operating system download the corresponding build. If so, the code needs to be updated to point to the downloaded environment.
- The Tennis.ipynb can be executed to train/test the agent. You can skip the training part and directly jump to the last cell to load the saved checkpoints and see the trained agent perform the control task. Load the appropriate environments for the tasks before loading the checkpoints.
- I am building a library for doing reinforcement learning. The rllib folder is another embedded repository. It would be updated time to time. All my three projects are using the same repo. If you are interested in collaborating, you can fork it and contribute. I would be very grateful for your contributions.