tic-tac-toe-q-learning

This repository implements Q-learning for the tic-tac-toe in python.

Author: Thomas Wong

Description

The update rule of Q-Learning is given by:

$\bbox[white]{\huge Q(s,a)\leftarrow Q(s,a) + \alpha [(r(s,a) + \gamma\underset{{a}'}{max} Q({s}',{a}')) - Q(s,a)]}$

Where $\bbox[white]{Q(s,a)}$ is the Q-value of the state-action pair, $\bbox[white]{r(s,a)}$ is the reward of the state-action pair, $\bbox[white]{\alpha}$ is the learning rate, $\bbox[white]{\gamma}$ is the discount factor, and $\bbox[white]{\underset{{a}'}{max} Q({s}',{a}')}$ is the maximum Q-value of the next state.

The parameters of the Q-learning algorithm are:

the learning rate alpha (default: 0.2)
the discount factor gamma (default: 0.9)
the epsilon-greedy policy (default: 0.3)

They can be set in the agent class or when the agent is being initalized. But the default values are highly recommended as they were the best settings came up after doing several experiments.

Prerequisites

Python 3.7 was used for this project but any later version should work.

Install the required packages by running:

pip install -r requirements.txt

Play the game

To play the game, simply run main.py by:

python main.py

The program will ask if you want to go first or after, you may choose by typing the letter y or n.

To move a move, simply input the row number and the column number (0 for the first row or column, 1 for the second row or column, and so on).

Retrain the agents

To retrain the agents, uncomment following line in the main.py file:

# init_training(100000, resume=False)

The above line will train 2 agents for 100,000 games where they play against each other.

100,000 seems to be enough to get the agents to converge and they are unbeatable.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
policy		policy
.gitignore		.gitignore
README.md		README.md
agent.py		agent.py
humanplayer.py		humanplayer.py
main.py		main.py
player.py		player.py
requirements.txt		requirements.txt
state.py		state.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tic-tac-toe-q-learning

Description

Prerequisites

Play the game

Retrain the agents

About

Releases

Packages

Contributors 2

Languages

thomas-tf/tic-tac-toe-q-learning

Folders and files

Latest commit

History

Repository files navigation

tic-tac-toe-q-learning

Description

Prerequisites

Play the game

Retrain the agents

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages