CeresTrain is a software platform for training deep neural networks for chess. It builds on the pioneering AlphaZero research from DeepMind which was subsequently replicated and greatly extended in open source by the Leela Chess Zero project.
The primary goals of the project are to:
- facilitate research into chess neural networks providing a modular set of flexible software building blocks which span the subtasks of training data collection, neural network architectural definition, optimization, network evaluation, introspection, and integration into MCTS chess engines such as its sister project (Ceres, 2020))
- provide a high performance platform for training of chess neural networks at scale, including a two parallel and interoperable backend implementations (based on C# via TorchSharp and one in Python) which leverage PyTorch 2.0 in an (optionally) distributed setting
The initial release of CeresTrain focuses on the simplified domain of endgame positions with 7 or less pieces, for which endgame tablebases (EGTB) are available. This domain offers numerous opportunities but also challenges. The initial version focuses on the transformer architecture, but the object-oriented design allows other architectures to also be be plugged in. Although this is the initial focus, the CeresTrain code base is fully general and can also be used to train on the full game.
Initial results in this endgame domain are encouraging. They demonstrate the correctness, completeness, and high performance of the training platform. They also attain impressive performance. For example, a network can be trained from scratch in about 4 hours on a single high-end consumer GPU to play KRPvKRP endgames at a level of value accuracy (predicting win/draw/loss) on real-world endgame positions slightly exceeding (97.88% versus 97.56%) that of current state-of-the-art neural networks (such as T2) with about 10% the parameter count and computation (FLOPS). This complete training process can be accomplished using CeresTrain with a single command line, executing either locally or remotely (or in C# via an API).
Unfortunately these results are yet not useful in improving actual chess engine play because tablebases are already available and precise policy information is not easily generated by training from tablebases. Furthermore, the experiment above only establishes outperformance in a highly specialized (single endgame) setting.
However the next stage of CeresTrain research will extend the work beyond EGTB positions (to 8 to 12 pieces). Success there would potentially carry over into real-world chess engine performance. More generally, it is hoped that the insights on architecture and training methods gleaned from rapid experimentation in the endgame domain will carry over to training runs on the full game.
Most of the effort thus far has been expended in building this computational infrastructure rather than conducting actual research. Despite this, a few incrementally useful research observations and ideas have been identified and will be documented in the near future to potentially aid the broader community in their efforts. It has also been confirmed that many interesting experiments can be conducted without requiring any code changes, and that more substantial variations can be implemented in the modular and object-oriented code base without requiring great effort.
To give a sense of the ease of use, speed and flexibility of the CeresTrain engine, here are examples of tasks that can be accomplished directly from the command line:
- On NVIDIA 4090 class hardware, train locally from tabula rosa a 5.5 million parameter deep neural network in about 6 minutes on random KPkp endgames which achieves value and policy accuracy far exceeding that of the state-of-the-art (SOTA) LC0 networks such as T2 (having approximately 75mm parameters)
- Extend this same training setup to execute a remote Linux host with two H100 GPUs and train on 50mm positions in less than 15 minutes, achieving near perfection (99.94% value accuracy on actual positions encountered in human play)
- Evaluate the value and policy accuracy of trained nets on either random or human-play positions, optionally displaying all test positions and their evaluations and correctness
- Run a tournament of the trained net against an opponent (such as a LC0 network) or an oracle (perfect play via EGTB) using any desired search limit (best policy, best value, or specified search depth)
- Load the network into the Ceres chess engine and launch a UCI session to allow immediate play against the network
- Adjust any of dozens of configurations parameters related to the network architecture or optimization setup (via JSON configuration files) and retrain with the modified settings. Adjustable parameters include network depth, width, number of attention heads, learning rate, dropout, regularization, and architectural features such as Smolgen and mixture of experts, normalization and activation types, etc.
- Exploit the speed improvements offered by the compile feature introduced in PyTorch 2. For example, the training speed of a large network (768x15 network with FFN multiplier 2 and Smolgen) increases from 8k/sec to 14k/sec with compile on a 4x A100 system.
The following pages provide more information on various aspects of the project. Additional documentation (for example, on the C# class library) is also forthcoming.
- Philosophical underpinnings
- Observations and statistics related to EGTB training
- Installing and configuring CeresTrain
- Getting started - training your first net from the command line
- Distributed training
- Project assessment and future work
CeresTrain has benefitted greatly from other research and open source projects, including:
- work of Rocketknight1 of "minimal_lczero: A minimal reproduction of LCZero training code, for ease of experimentation and benchmarking" which was used a starting point for some of the Python backend
- early work on Ender, a series of specialist endgame networks by dkappe
- the simple and elegant wrapping of PyTorch API for .NET via the Microsoft sponsored open sourced TorchSharp project with Niklas Gustafsson
- the SpectreConsole Nuget package for creating beautiful outputs to Console with live updating
- the Microsoft .NET platform, especially the impressive runtime performance optimizations which have been impressively reported on an annual basis by Steven Taub. These optimizations have increased the speed of the Ceres engine by at least 20% since the first release on .NET 5 in 2000.
- the enthusiastic Lella Chess Zero community on Discord, and especially members such as lepned who have provided invaluable advice, help and encouragement