A project-focused experimental optimizer which optimizes over state spaces with limited prior knowledge through self-play. See the Issues for planned features and optimizations.
The goal is to provide an API which makes it easier to set up data-parallel learning loops and to optimize searches by exploiting symmetries, with help from the trait system.
It combines ideas from the papers:
It mainly uses:
dfdx
for type-checked neural networks,rayon
for data-parallel MCTS exploration,- and is experimentally migrating data viz from
python
to Tensorboard.
Feedback on a high-level API is welcome. Please make Github Issues before making PRs.
Our first implementation can be summarized:
- Our agents search trees in parallel.
- During episodes, they seek to maximize the expected future improvement in cost function over the entire search path.
- They end each episodes when they visit new nodes.
- The neural network evaluates state vectors in batches.
- Each evaluation predicts the gain function and the agent's preference for actions from the new state.
- Between episodes, we populate new nodes with their predictions and update nodes along the path.
- Between epochs, we minimize cross entropy loss (priors' inaccuracy) and
$L_2$ loss (gain prediction inaccuracy).- Additionally, we select new roots from visited nodes.
- When stagnant, we generate new roots.
We restrict our attention to state spaces with a finite number of dimensions whose actions are discrete and commonly enumerated. Our most thorough examples uses the INTMinTree
, as demonstrated here. Examples 01
through 04
have prototypes to re-implement similarly.
The design of the MCTS and the formulation of StateActionSpace
and the INTMinTree
. All variants of the MCTS will be refactored with a similar design. It may be useful to create a CLI to alter hyperparameters during training.
License
Dual-licensed.
Licensed under the Apache License, Version 2.0 or the MIT license, at your option. This file may not be copied, modified, or distributed except according to those terms.