Skip to content

An implementation of Alpha Zero for discrete optimization problems.

Notifications You must be signed in to change notification settings

ariasanovsky/azdopt

Repository files navigation

AlphaZero discrete optimization

Tensorboard plots representing decreasing cost functions and loss functions`

A project-focused experimental optimizer which optimizes over state spaces with limited prior knowledge through self-play. See the Issues for planned features and optimizations.

The goal is to provide an API which makes it easier to set up data-parallel learning loops and to optimize searches by exploiting symmetries, with help from the trait system.

An Obsidian canvas depicting a flower-shaped arrangement of 5 code-blocks. ActionSequence, ActionMultiset, OrderedActionSet, and ActionSet are connected through the traits ActionsNeverRepeat and ActionOrderIndependent

It combines ideas from the papers:

It mainly uses:

  • dfdx for type-checked neural networks,
  • rayon for data-parallel MCTS exploration,
  • and is experimentally migrating data viz from python to Tensorboard.

Contributions

Feedback on a high-level API is welcome. Please make Github Issues before making PRs.

Alpha Zero algorithm

Our first implementation can be summarized:

  1. Our agents search trees in parallel.
  2. During episodes, they seek to maximize the expected future improvement in cost function over the entire search path.
    1. They end each episodes when they visit new nodes.
    2. The neural network evaluates state vectors in batches.
    3. Each evaluation predicts the gain function and the agent's preference for actions from the new state.
  3. Between episodes, we populate new nodes with their predictions and update nodes along the path.
  4. Between epochs, we minimize cross entropy loss (priors' inaccuracy) and $L_2$ loss (gain prediction inaccuracy).
    1. Additionally, we select new roots from visited nodes.
    2. When stagnant, we generate new roots.

States and Actions

We restrict our attention to state spaces with a finite number of dimensions whose actions are discrete and commonly enumerated. Our most thorough examples uses the INTMinTree, as demonstrated here. Examples 01 through 04 have prototypes to re-implement similarly.

Progress

The design of the MCTS and the formulation of StateActionSpace and the INTMinTree. All variants of the MCTS will be refactored with a similar design. It may be useful to create a CLI to alter hyperparameters during training.

License

License

Dual-licensed.

Licensed under the Apache License, Version 2.0 or the MIT license, at your option. This file may not be copied, modified, or distributed except according to those terms.

About

An implementation of Alpha Zero for discrete optimization problems.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published