Reinforcement Learning in Doom

The project aims for exploring reinforcement learning (RL) possibilities in a dynamic computer game environment using the ViZDoom library and OpenAI Gym. The main objective was to train an agent capable of completing a given game level after prior learning.

Doom scenarios

Doom, being a first person shooter, allows for a variety of game levels called scenarios to be played. The scenarios used in this project are:

Basic - a simple scenario with a single room and one enemy to kill
Defend The Center - our character is placed in the center of a room and has to defend it from enemies that approach from every side possible
Deadly Corridor - the objective is to reach the vest waiting at the end of the corridor, whereas along the way there are enemies shooting at the player
My Way Home - peculiar scenario in which the player has to find a way out of a maze-like environment, without enemies

Repo structure explained

scenarios/ - contains the files necessary to run the scenarios. Each scenario has its own pair of files
- .cfg - configuration file for the scenario
- .wad - file that contains level data such as textures etc.
utils/ - contains two files setting the foundation for training RL agents
- environments.py - describes class VizDoomGym that serves as a wrapper for setting up game instances for training
- callbacks.py - contains callback class for logging training progress and saving models at various stages of training process
Basic/, DefendTheCenter/, DeadlyCorridor/, MyWayHome/ - each scenario has its own directory containing the implementation of the training in the according .py file. In order to start working on Your own scenario, it is recommended to copy one of our existing directories and modify the code accordingly. Adjusting the code includes setting desired paths for logs and models, changing the scenario name, and altering the training parameters such as learning rate or number of training episodes.
training/logs - log files from the training process may be found here. To obtain accces to the log dashboards follow these steps:
- navigate in command line to the directory in which the log file is
- type in cmd tensorboard --logdir .
- open the browser and go to http://localhost:6006/
training/train - contains folders with the trained models obtained throughout the training process. The models may be trained further from the point they were saved.

RL algorithms used

Proximal Policy Optimization (PPO)

PPO is an on-policy method as it learns through direct interaction with the environment by executing actions. These methods rely on the use of two neural networks: an agent containing the policy for action selection and a network predicting the reward value at a given time for a given state. Learning involves the agent's interaction with the environment, comparing the received reward to the expected reward, and updating the network weights in a way that favors actions that have yielded the highest relative reward.

Advantage Actor Critic (A2C)

A2C is a learning algorithm that combines the concepts of value-based and policy-based learning. In actor-critic methods, we train two networks: the actor, which represents the policy function responsible for indicating actions, and the critic, which is the value function that evaluates the action taken by the actor.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.idea		.idea
Basic		Basic
DeadlyCorridor		DeadlyCorridor
DefendTheCenter		DefendTheCenter
MyWayHome		MyWayHome
scenarios		scenarios
training		training
utils		utils
.gitignore		.gitignore
README.md		README.md
_vizdoom.ini		_vizdoom.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning in Doom

Doom scenarios

Repo structure explained

RL algorithms used

Proximal Policy Optimization (PPO)

Advantage Actor Critic (A2C)

About

Releases

Packages

Contributors 3

Languages

AKapich/Reinforcement_Learning_Doom

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning in Doom

Doom scenarios

Repo structure explained

RL algorithms used

Proximal Policy Optimization (PPO)

Advantage Actor Critic (A2C)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages