Skip to content

Using Deep Reinforcement Learning to play Breakout game

License

Notifications You must be signed in to change notification settings

PreethaSaha/RL_breakout

Repository files navigation

RL_breakout - Work in progess!

Description

This repository contains an implementation of a Deep Q-Network (DQN) to train an agent to play the classic Breakout game. The agent leverages convolutional neural networks to process game frames and utilizes reinforcement learning techniques, such as experience replay and target networks, to learn effective strategies for maximizing rewards. The model is developed using TensorFlow and the OpenAI Gym environment for gameplay simulation.

The main features of the code are as follows:

  • Deep Reinforcement Learning: utilizes a DQN architecture with convolutional neural networks to play Breakout

  • Experience Replay: stores past experiences in a buffer and randomly samples them to break correlation between consecutive experiences and improve learning stability

  • Target Network: incorporates a separate target network to stabilize training by updating it less frequently than the main network

  • Image Preprocessing: converts game frames to grayscale, resizes them to 84x84 pixels, and normalizes pixel values [0, 1] to feed into the network

  • Epsilon-Greedy Policy: balances exploration and exploitation through an epsilon-greedy approach.

    Pre-requisites:

    Python 3.7+; OpenAI Gym (with the atari package); TensorFlow 2.x; Numpy; PIL (Python Imaging Library); Matplotlib

    The appropriate non-conflicting versions of the dependencies used here are quoted in the requirements.txt. To install these, please follow step 2 of Usage.

Usage:

To use this code, please follow these steps:

  1. Clone the repository
git clone https://github.com/PreethaSaha/RL_breakout.git 
  
  1. Install the required dependencies
 pip install -r requirements.txt
  
  1. To train the DQN agent, run:
python testrun_v5_5k.py
  

You can adjust training parameters such as the number of episodes, epsilon decay, and batch size in the testrun_v5_5k.py file

Results:

Training:

The training loop runs for a specified number of episodes. During each episode, the agent starts by exploring (random actions) to learn about the environment. As training progresses, it shifts toward exploiting the best-known actions, reducing random actions.

The training progress is saved in a CSV file. The model weights are saved in breakout_model_v5_XX_XX.h5 whenever the agent achieves a predefined reward threshold.

alt text

Future improvements:

  • Parameter Tuning: experiment with different hyperparameters for better training
  • Prioritized Experience Replay: use a prioritized experience replay buffer to enhance learning from more informative experiences

Benefaction:

Contributions are welcome! Please submit issues and pull requests for improvements or bug fixes.

License:

This project is licensed under the MIT License - see the LICENSE file for more details.

References:

  1. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... & Hassabis, D. (2015). "Human-level control through deep reinforcement learning."
  2. Lin, L. J. (1992). "Self-improving reactive agents based on reinforcement learning, planning and teaching." Machine Learning, 8(3-4), 293-321.
  3. Bellemare, M. G., Naddaf, Y., Veness, J., & Bowling, M. (2013). "The Arcade Learning Environment: An Evaluation Platform for General Agents." Journal of Artificial Intelligence Research, 47, 253-279.

Releases

No releases published

Packages

No packages published

Languages