This repository implements Q-learning for the tic-tac-toe in python.
Author: Thomas Wong
The update rule of Q-Learning is given by:
Where is the Q-value of the state-action pair,
is the reward of the state-action pair,
is the learning rate,
is the discount factor, and
is the maximum Q-value of the next state.
The parameters of the Q-learning algorithm are:
- the learning rate alpha (default: 0.2)
- the discount factor gamma (default: 0.9)
- the epsilon-greedy policy (default: 0.3)
They can be set in the agent class or when the agent is being initalized. But the default values are highly recommended as they were the best settings came up after doing several experiments.
Python 3.7 was used for this project but any later version should work.
Install the required packages by running:
pip install -r requirements.txt
To play the game, simply run
The program will ask if you want to go first or after, you may choose by typing the letter y
or n
To move a move, simply input the row number and the column number (0 for the first row or column, 1 for the second row or column, and so on).
To retrain the agents, uncomment following line in the
# init_training(100000, resume=False)
The above line will train 2 agents for 100,000 games where they play against each other.
100,000 seems to be enough to get the agents to converge and they are unbeatable.