You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I saw a YouTube video suggest that this was difficult in principle due to the possibility of the agent forming cartels (i.e. it learns that it's always best to cooperate with position 2 if it finds itself in position 1 and vice versa).
This should be possible to avoid by just choosing the objective function to disincentivise collaboration.
So rather than having the agent maximise own win probability it could, for example, maximise the difference between own win probability and that of the opposing player most likely to win. Perhaps the negative weights could be applied to all other players weighted by their win probability.
The text was updated successfully, but these errors were encountered:
Mentioned in #101 and you make a similar point there. I don't think there needs to be a single correct objective function - I think it just needs to have some weight on own win probability and some negative weight on opponents in strong positions.
Your idea may have potential but it is way too abstract in its present form for me to evaluate. I would encourage you to flesh it out using a concrete game as an example. Also, try and be specific about how each component of AlphaZero should be adapted to work with your idea (MCTS, network training objective, self-play...).
I saw a YouTube video suggest that this was difficult in principle due to the possibility of the agent forming cartels (i.e. it learns that it's always best to cooperate with position 2 if it finds itself in position 1 and vice versa).
This should be possible to avoid by just choosing the objective function to disincentivise collaboration.
So rather than having the agent maximise own win probability it could, for example, maximise the difference between own win probability and that of the opposing player most likely to win. Perhaps the negative weights could be applied to all other players weighted by their win probability.
The text was updated successfully, but these errors were encountered: