This repository is the final project for Reinforcement Learning - M2 MVA 2018. Inspired by AlphaGo Zero, we apply this method on English Checkers, a famous strategy board games for two players.
- Linux or macOS
- Python 3, version 3.4 or later is preferred
- PyTorch 1.0
- CPU or NVIDIA GPU + CUDA CuDNN
For pip users, run pip install -r requirements.txt
to install dependencies.
- Clone this repo:
git clone https://github.com/Tong-ZHAO/AlphaDraughts-Zero
cd AlphaDraughts-Zero
pip install -r requirements.txt
- The training parameters should be specified in
./src/config.py
beforehand.
Some parameters can be passed to./src/train.py
as arguments:
usage: train.py [-h] [--iterations N] [--lr LR] [--seed S] [--env ENV]
Training of AlphaDraughts Zero
optional arguments:
-h, --help show this help message and exit
--iterations N number of iterations of pipeline training)
--lr LR learning rate (default: 0.01)
--seed S random seed (default: 42)
--env ENV visdom environment
- To start the training:
cd src
python train.py
- To view loss plots, run
python -m visdom.server
and click the URL http://localhost:8097. - To see more details of training, check the log file in
./logs/
.
- To start the human-machine competition (qualitative evaluation):
cd src
python gui.py
- Some arguments could be passed to
./src/gui.py
:
usage: gui.py [-h] [--checkpoint C] [--human H] [--simulation S] [--ai A]
optional arguments:
-h, --help show this help message and exit
--checkpoint C which neural network model checkpoint to use.
--human H "white" or "black", which side human player plays, white
side always goes first.
--simulation S number of simulations for MCTS at each time step to choose
the action.
--ai A whether use AI, 1 means using AI, 0 means not using AI.
- Quantitative evaluation could be done using functions provided in
./src/elo.py
.