Snake PPO

A pedagogical Proximal Policy Optimization (PPO) project applied to the classic Snake game.

Overview

This repository demonstrates how an agent, based on the PPO algorithm, learns to play Snake by collecting food and avoiding collisions. Over time, the agent discovers a strategy of quickly circling around the reward before taking it to minimize self-collisions, especially since it does not precisely track its entire tail position.

The environment is implemented in Game/Snake.py.
The PPO agent logic is in Agent/PPO.py.
Usage examples and training procedure are shown in main.ipynb.

Installation

Clone this repository.
Install dependencies:
```
pip install -r requirements.txt
```
You can then run or modify main.ipynb to train or test the PPO agent.

PPO Algorithm

PPO is a policy gradient method designed to stabilize training by limiting updates to the policy. In this project:

We use clipping (controlled by "clip_epsilon") to avoid overly large updates.
We incorporate entropy regularization (controlled by "entropy_coef") to encourage exploration.
We apply Generalized Advantage Estimation (GAE) for more stable advantage computation.

The core training code is in the train function of PPOAgent, and the environment loops in SnakeEnv.

Demo

Below is a demo of the trained agent playing, occasionally circling around the reward to avoid self-collisions (it doesn't know its tail's exact position, see main.ipynb for the observation space details):

Results

The agent’s reward curve increases as it masters collecting food. Episode length first rises with better survival but eventually decreases when it chooses to sacrifice longevity for quicker gains:

Usage

Train the agent by running:

# Inside main.ipynb
agent.train(total_epochs=10000, steps_per_epoch=4096)

Test the trained agent (with optional rendering):
```
agent.test_episode(render=True)
```

Explore main.ipynb for more details on experiments, and review the in-code comments for deeper understanding.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Snake PPO

Overview

Installation

PPO Algorithm

Demo

Results

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

Snake PPO

Overview

Installation

PPO Algorithm

Demo

Results

Usage