I used the MCTS/DNN approach from AlphaGo Zero to create a mancala-playing AI.
AlphaGo Zero uses a DNN
- For each possible action
$a$ from$s$ , the probability$p_a=Pr(a|s)$ of the current player selecting$a$ from$s$ . - The probability
$v$ of the current player winner from position$s$ .
In other words:
To improve the parameters