Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training the engine to play custom variants #29

Open
bkitej opened this issue Nov 13, 2019 · 13 comments
Open

Training the engine to play custom variants #29

bkitej opened this issue Nov 13, 2019 · 13 comments
Labels
question Further information is requested

Comments

@bkitej
Copy link

bkitej commented Nov 13, 2019

Is there a process for training this engine to play custom variants not included in the Python-chess module, or in Stockfish's supported variants? I'm looking to write a variant where the only rule modification is one can capture one's own pieces, which doesn't exist in either yet. Assuming that Board is programmed correctly, could I train an engine around it?

@QueensGambit
Copy link
Owner

QueensGambit commented Nov 13, 2019

Hi @bkitej,
does your variant of interest have an official name?
If I understand your question correctly then you are interested in a variant which follows the same rules as classical chess with the extension to be also able to capture their own pieces except the king.

As you stated correctly, the python MCTS version of CrazyAra uses python-chess for move generation and the C++ MCTS version is based on Multi-Variant-Stockfish. Python-chess is also used to prepare the training data in the case of supervised learning.

I don't know your variant, but I assume that the games will mostly play out as regular chess games and only in a few, special cases it will be superior to capture their own piece.

The neural network input representation and output representation would be the same as for regular chess.

You could register your own variant in python chess or in Stockfish accordingly and modify the respective move-generation:

I assume there aren't any games available for your variant. Therefore supervised learning isn't possible.
Alternatively you could use a network trained on classical chess games and later adjust the policy distribution to ensure the exploration of capturing their own pieces:

Of course you could theoretically learn your variant from scratch or continue learning based on a given supervised network via selfplay. This however is rather computational intensive and would require an implementation in C++.

There is also Fairy-Stockfish which supports a few more shogi-variants at the cost of slightly lower performance. In principle this could also be used as a move-generation back-end instead of Multi-Variant-Stockfish for the MCTS C++ version.

I suggest that you first create your variant in python-chess and if this works correctly, you can either implement it in C++ and extend Mutli-Variant-Stockfish or add it to one of CrazyAra's MCTS versions.

The current neural network weights in the official releases only support crazyhouse so far.
Lately, I added a network input and output representation which supports all chess variants on lichess.org.
In 2020, I might add a lighter representation for classical chess and chess960 only and publish some network weights.

@QueensGambit QueensGambit added the question Further information is requested label Nov 13, 2019
@bkitej
Copy link
Author

bkitej commented Nov 19, 2019

Thank you for your very quick response! For a little background, I'm an undergraduate student in statistics, and this is for a project-based class. I have a few classes in ML under my belt, almost exclusively in Python, my language of choice.

does your variant of interest have an official name?

I've seen it once referred to as "friendly fire chess" online, but otherwise it doesn't seem to be an established variant at all.

If I understand your question correctly then you are interested in a variant which follows the same rules as classical chess with the extension to be also able to capture their own pieces except the king.

Correct!

I don't know your variant, but I assume that the games will mostly play out as regular chess games and only in a few, special cases it will be superior to capture their own piece.

The focus of my project is to discover these emergent behaviors. My own hypothesis is that games will become more materialistic, as checkmates can be escaped in more situations, at the cost of a weaker late game. How often it diverges from classical chess is another point of interest.

You could register your own variant in python chess or in Stockfish accordingly and modify the respective move-generation

This is likely the best direction to take. I've noticed there are a few Stockfish projects on github, is there a pure Python implementation you would recommend?

Of course you could theoretically learn your variant from scratch or continue learning based on a given supervised network via selfplay. This however is rather computational intensive and would require an implementation in C++.

I'm an undergraduate senior with virtually no knowledge of C++, but I do have access to my school's high-performance computing cluster... Does that brute-force approach sound like a viable alternative to writing it in C? I don't mind if the move generation itself takes up to a minute or so.

I suggest that you first create your variant in python-chess and if this works correctly, you can either implement it in C++ and extend Mutli-Variant-Stockfish or add it to one of CrazyAra's MCTS versions.

Can you tell me more about CrazyAra's MCTS versions? As in, if I'm able to successfully program my variant into python-chess, is that a relatively straightforward process? Naturally, I'd like to avoid C++ as much as possible...

@QueensGambit
Copy link
Owner

QueensGambit commented Nov 19, 2019

Naturally, I'd like to avoid C++ as much as possible...

Well understandable. An implementation in C++ will very likely exceed the scope of your project.

Starting a project with the idea to run reinforcement learning is often the first idea, but is usually neglected the more you learn and understand how much computational hardware is needed.

Python only implementations might only be practicable for simple games with small board sizes and a single piece type: e.g. Othello, Gobang, TicTacToe, Connect4, Reversie

The most simple python only MCTS implementation for classical chess is probably

Just to put things into perspective, here a few statistics about how the number of MCTS simulations per second (NPS) developed in the case of CrazyAra. The NPS is a measurement of how many games you would be able to generate when setting a fixed number of nodes per move (e.g. 800):
Our first MCTS python version CrazyAra 0.1.0 (which isn't available) was mainly based on https://github.com/Zeta36/chess-alpha-zero. Here, CrazyAra achieved 25-35 NPS on a GTX1080ti using an AlphaZero-like network with 7 blocks.

Chess-alpha-zero also uses python-chess for move-generation.
However, the main purpose of python-chess is not to be used in chess-engines.
The default 3-fold-repetition check is an example for this: niklasf/python-chess#338
The mcts of chess-alpha-zero also uses the simpler Keras framework at the cost of performance as well as plain python list iterations for implementing the MCTS formulas.

Both of these are not ideal and thus we implemented the MCTS python version from scratch in vectorized form using numpy and a more direct access to the neural network with MXNet/Gluon. Over the course of the development additional techniques were added, like a transposition table, which allows to double the NPS in some cases and a time management system.

For the most recent python MCTS version (0.5.1), the speed is about 250-300 NPS using the RISEv1 architecture, 7 residual blocks + 12 bottleneck residual blocks with SE.

The same network achieved 1200-1500 NPS in the first MCTS C++ release (0.6.0) and 3500-4000 NPS when using the more efficient RISEv2 mobile architecture.

In the current C++ development version, CrazyAra achieves 8000-10k NPS which generates about 20 crazyhouse games per minute on a GTX1080ti. For this CrazyAra uses larger a batch-size and a TensorRT back-end.

20 games per minute might seem much, but you should still have access to multiple GPUs in this case and preferably start learning from a given supervised network first.
Finding the right RL hyper-parameters is non-trivial and often requires a lot of fine-tuning.

[...] but I do have access to my school's high-performance computing cluster...

Nice to hear that, but if your computing-cluster only includes CPUs it might be not of much use for training and running NN inference on large scale. Moreover, if you are planning to use python only you might be able to utilize the GPUs by 1-10% which is rather inefficient.

Reinforcement learning methods are only supported in the C++ MCTS CrazyAra version.
Adapting CrazyAra to play "friendly fire chess" will be much easier when supervised networks for classical chess are available.
Therefore, I suggest that you try out chess-alpha-zero instead because it supports classical chess and you can train your network first using supervised learning:

For an exemplary AlphaBeta search you can take a look at Sunfish:

@bkitej
Copy link
Author

bkitej commented Nov 19, 2019

Thanks again for the quick response!

However, the main purpose of python-chess is not to be used in chess-engines.

Oh okay, good to know. Generating the ruleset and gamestate is my main (and... first) bottleneck. It looks like Sunfish has its own board model. Maybe stealing parts of this this will be easier than adapting python-chess's opaque bitboard implementation.

Re: my school's computing cluster, it's my understanding that it's a GPU cluster engineered specifically for the age of neural nets. Even inefficient GPU usage will outperform my humble MacBook Pro... plus, the demands of this first implementation aren't too strenuous.

I'll try out alpha-zero at your recommendation. I don't mind having this issue closed out in the meantime. Thank you!

@bkitej

This comment has been minimized.

@DeFilippis
Copy link

DeFilippis commented Nov 19, 2019 via email

@QueensGambit
Copy link
Owner

QueensGambit commented Nov 20, 2019

Answering these questions via GitHub is fine by me.
It seems to be interesting for others as well (e.g. @DeFilippis ) and this way I don't have to repeat myself.

PyChess is a great library when writing an engine for educational purposes because it comes along with many valuable tools and visualisation techniques.
It is also quite optimized within the python framework. The only major slowdown was the 3-fold-repetition check which took about 1/3 of the total runtime at one point.
If you like you can also integrate CrazyAra's can_claim_threefold_repetition() into chess-alpha-zero.

A fellow student also added the Bughouse variant into PyChess which might help you registering your own variant:

@bkitej
Copy link
Author

bkitej commented Nov 27, 2019

I'd like to revisit a couple points that have been mentioned so far.

  1. Using AlphaZero for supervised learning
  • How would training a network using classical chess training data, give me anything other than a classical chess engine? Would this be a matter of adjusting the existing weights?
  • If that is too broad a question: what differences might I expect to exist between a network trained on a classical engine, and one trained with reinforcement learning?
  1. Using a network trained on classical chess games and later adjusting the policy distribution
  • Is this the same as the prior suggestion? How would one adjust the policy distribution of an existing trained neural net?
  1. You mentioned you used python-chess to prepare the training data, but then programmed and trained the engine with C++... can you tell me more about that process?

A further note on the scope of this variant:

FFChess has a strictly larger range of legal moves at almost all stages of the game. For example, in classical chess, there are 20 possible first moves; in FFChess there are 40 by my count.

Self-play is still the ideal though. Under the constraints of my current technical ability and time, I'm considering a shift to studying the effects of FFChess on famous endgame positions only. Any thoughts on how that might affect the feasibility of this Python-only project?

@QueensGambit
Copy link
Owner

QueensGambit commented Nov 28, 2019

How would training a network using classical chess training data, give me anything other than a classical chess engine? Would this be a matter of adjusting the existing weights?

Of course you're right that training on classical games will result in a network which only plays classical chess. However, since FFChess is very similar to classical chess you could use this network as a starting for either training on FFChess data or starting selfplay. This concept is called transfer learning and will very likely reduce computational cost.

what differences might I expect to exist between a network trained on a classical engine, and one trained with reinforcement learning
Training a neural network by supervised learning and reinforcement has severall different concepts and challenges.

In supervised learning you can only learn from sparse labels:
For the policy target you use a one-hot-encoded vector to replicate what move was actually played in a given position. By providing different target moves with varying frequencies the NN learns to find the best fit for all given targets and hopefully generalizes well to unseen samples. The policy is usually optimized using a cross-entropy loss.

The value target is also binary (or threefold, if a draw is possible).
You assign a position as winning, drawn or losing depending on the final Monte-Carlo return (game result), regardless of the game progression. The value output is clipped into a numerical range of e.g. [-1,+1] using a Tanh activation or using Sigmoid if you want to use a range of [0,1] instead. The value loss is formalized as a typical mean squared error.

In supervised learning you usually have a broad diverse number of players in your data set which reduces the chance of over-fitting and missing key concepts of your variant.

In reinforcement you have to deal with a new dimension of complexity.
Mainly finding the right trade-off between exploration and exploitation.

You're always at risk of zeroing out important lines. This happened to some extent to AlphaZero by almost never playing the Kings Indian Defence or Sicilian Defence (https://arxiv.org/pdf/1712.01815.pdf, p. 6).
Generating your own samples as the new optimizing targets can also have some bad recursive effect or lead to forgetting of general knowledge.
For example, if you observe the winning progression against the previous network during traing you might get fooled and end up in rock - paper - scissors scenarios, where the new network always learns to exploit the previous one without actually increasing overall playing strength.
One prominent environment for this is Starcraft II, where almost any strategy can be exploited if you know it beforehand. As a consequence, DeepMind introduced a league system for comparing their agents.

In reinforcement learning your policy target is the final MCTS visit statistics distribution on which a temperature scaling is applied and the next move to play is sampled from. The cross entropy loss is therefore calculated using the full distribution. This is an important aspect because some DL-frameworks only support sparse cross-entropy by default.
Moreover, you also gather the Q-values as an additional feature. Abrams et al. proposed to use a mix between Q-value and game result as your new value target (https://medium.com/oracledevs/lessons-from-alphazero-part-4-improving-the-training-target-6efba2e71628).
This is similar to bootstrapping and requires an additional mixing hyper-parameter.
Adding bootstrapping often accelerates progress, but comes at the risk of not fully utilizing your Q-value range if the Q-value ratio is too high.
Besides that, Wu proposed to integrate auxiliary targets into the value loss (https://arxiv.org/pdf/1902.10565.pdf).

Using a network trained on classical chess games and later adjusting the policy distribution:

My assumption is that value evaluation of FFChess is quite similar to classical chess and the default policy distribution is also a good starting point.
Therefore, without having to retrain a network using self-play or supervised learning you can first manually re-adjust the policy from the NN with some heuristics.
As a simple baseline you could assign every self capture move a fixed value based on the current number of legal moves and renormalize afterwards.
This can be used to get an idea of how the variant plays out and give some first results on feasibility.

For instance, CrazyAra has a boolean UCI option "Enhance_Checks" which manually increases the probability for checks which are below a given threshold to ensure that short forced mating sequences are detected quickly even when the raw policy has blind spot for it (https://arxiv.org/pdf/1908.06660.pdf, 9.2.7, p. 23).

You mentioned you used python-chess to prepare the training data, but then programmed and trained the engine with C++... can you tell me more about that process?

First this project was entirely written in python. The full preprocessing pipeline of converting pgn-files into plane representation makes use of the python-chess and numpy library. For compression we use the zarr library. Later the neural network can be trained supervised which is also written in python.
If you want to use self-play, the most critical aspect in terms of performance is the runtime of your MCTS.
First we implemented the MCTS in python only using numpy, python-chess and MXNet.
Later the MCTS was rewritten into C++ which greatly improved speed, but took more time to implement.
The C++ binary has additional non-UCI-standard commands which are selfplay <number> to generate <number> amount of games and export the training samples after each game. The arena <number> compare command, runs a tournament between the current model and a model contender in order to suggest if the current network should be replaced and to track progress.

Self-play is still the ideal though. Under the constraints of my current technical ability and time, I'm considering a shift to studying the effects of FFChess on famous endgame positions only. Any thoughts on how that might affect the feasibility of this Python-only project?

Endgame positions are typically neither a strong point for both classical A/B as well as neural network engines.
They usually struggle at detecting fortress positions and even today some endgame positions are obvious to human expert players, but are hard to solve with A/B engines. Hence, almost any modern chess engine makes use of table bases.
Analysing endgame position in FFChess could be an interesting research question.
As an alternative, you could compare tactical position benchmarks when analysed in classical and FFChess.

@QueensGambit
Copy link
Owner

Hi @bkitej ,
I am not sure if this is still relevant to you, but there are two major back-ends (#59, #93) supported in CrazyAra now.

So you could define your variant friendly fire chess in one of them or implement your own environment from scratch using a blank State class.

There is also a model for standard chess available now which was trained on human standard chess games.

@KuruSa-arch
Copy link

Answering these questions via GitHub is fine by me. It seems to be interesting for others as well (e.g. @DeFilippis ) and this way I don't have to repeat myself.

PyChess is a great library when writing an engine for educational purposes because it comes along with many valuable tools and visualisation techniques. It is also quite optimized within the python framework. The only major slowdown was the 3-fold-repetition check which took about 1/3 of the total runtime at one point. If you like you can also integrate CrazyAra's can_claim_threefold_repetition() into chess-alpha-zero.

A fellow student also added the Bughouse variant into PyChess which might help you registering your own variant:

hey i need ur help in modifying an engine to play a specific variant where each player is granted 6 moves and the one with more material points at the end of 12th move wins the game nomatter how worse their position is or if theyre getting mated at next move however jst like normal chess if someone gets mated before 12th move they lose the game also to keep balance if someone captures a directly defended piece at 12th move thats counted as foul and opponent is granted a 13th penalty move, the game starts with any preset random equalized position offered to both players

@QueensGambit
Copy link
Owner

Hello @KuruSa-arch , for this you would need to extend the terminal condition to include the material evaluation.
You could compute the realtive material count as described here.
In the beginning, you could use default chess networks and only rely on the search to manage your chess varaint.
Alternatively, you could implement this in python using python-chess or directly include this variant in Fairy-Stockfish.

@KuruSa-arch
Copy link

alright thanks for the reply man, appreciate it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants