PROBLEM DESCRIPTION This is a Windows program for users to play Ultimate Tic Tac Toe against the computer at varying levels of difficulty or against other users in a pass and play style. They will be able to save and load the game and the game will have some quality-of-life improvements. These include a yellow highlight over valid moves and an annoying sound which is played when clicking on an invalid move. The AI uses a version of Monte Carlo Tree Search that uses neural networks which is based off of Alpha Zero.
PROGRAM DOCUMENTATION Compilation instructions Follow the commands I wrote at the bottom of the CMakeLists.txt file under the header instructions (the stuff in parentheses are comments, please don't type them into the command line). After compiling, the resulting executables along with any necessary files are in the Release folder which is a subfolder of build. If for some reason, you can't compile there should also be a Release folder in the same directory as build which contains everything fully compiled. Even if you do compile yourself this folder is useful because it contains an example config.txt and a pretrained model in its model subfolder.
Ultimate Tic Tac Toe usage After running the executable, you have to go into the menu and hit new game to start playing which was a source of confusion for some people. To setup a model for it to use, rename it to verifiedbest.pt and put it in the models folder.
Trainer usage The trainer executable needs to be run in the terminal and can be given the filepath to a model as an argument to start training that model. The trainer will need to have a config.txt file in the same directory. There should be an example config.txt file in the pre compiled Release folder so just copy that over if you compiled trainer yourself. Most of the options in config.txt are self-explanatory but load examples and skip training aren't. Load examples when set to a number other than zero, skips generating examples for the first iteration and instead starts training on examples from files in the examples folder. The examples should be named temp1.ex, temp2.ex, and so on until the number of examples you set. If for some reason, you want to create an example file not though trainer but from another source the file is formatted with an example each line. The example consists of alternating the state of the game board and the move probability of moving there and at the end the value of that game board. For example with a game board of size four, "2 0 2 0 2 0 2 1 0.5" represents that the lower right corner has a move probability of one while every other space on the board has a move probability of zero and the value of the game board is one half which most likely represents a tie but could also represent an equal amount of wins for either player. Skip training when set to one skips the training for the first iteration which is only useful by itself when testing, but when both load examples and skip training are non-zero the game skips straight to testing two models against each other. To input two models, you have to enter the filepaths as arguments in the command line in the order current model, previous model. There are a number of temporary model and example files produced during operation which could be of use if someone kills the program so not all of the time spent is lost or if you wanted to do some testing. If the program is killed during the generating examples phase, the examples should be saved at temp.ex. If the program was killed during the training phase, you can find the partially trained model in the models folder under the name temp2.pt or you can restart the training with the examples at temp.ex. If the program was killed in the testing models phase, you can restart that phase with the models in the model folder under the names temp.pt and temp2.pt referring to the previous and current model respectively. Another tip I have is redirecting the trainer output to a file like trainer.log when displaying games so that the command line doesn't run out of space.
Features to Add In the main.cpp file, you might notice only one line in the main function. That is because the code for running the window event loop is inside UTTTGameWindow to avoid a bug where commands to the window from GameWindow were getting ignored unless they originated from a WindowProc function. If it's possible to fix the bug, I think it would look better if main.cpp had some control of the window.
Every class specific to Ultimate Tic Tac Toe has UTTT before the name. Excluding UTTTNet which is a torch module, all of these classes don't derive from a base class but they should because that would make it easier for others to adapt this code to other games. This includes UTTTNet because that would make it possible to avoid using templates which isn't possible as it is.
When looking at the output of trainer, I noticed that although the neural network is decent (I can't win when playing O), the value the network predicts for game boards barely varies from one board to another and the value for game boards that are just one turn from an unavoidable win or loss are still around the original value when they should be at either zero or one. I don't know if this is a problem with my trainer, the neural network, or just a change in the config but it really should be fixed.
REFERENCES Designed By Hugo - youtube.com/watch?v=Kx5CN-V6FvQ - Used to create barebones of UTTTGameWindow class
Thakoor, Shantanu and Nair, Surag and Jhunjhunwala, Megha - Learning to play othello without human knowledge - Used for creating the trainer executable's main function, used to adapt MCTS to be used with neural networks, and used to create the NeuralNetwork and UTTTNet classes and StateInfo struct