artificial intelligence - In counterfactual regret minimization, why are additions to regret weighted by reach probability? - Computer Science Stack Exchange #967
Labels
Algorithms
Sorting, Learning or Classifying. All algorithms go here.
artificial-intelligence
forum-discussion
Quotes clipped from forums
game-theory
probability-theory
In Counterfactual Regret Minimization, Why Are Additions to Regret Weighted by Reach Probability?
Context
The question explores the reasoning behind weighting additions to regret and strategy in Counterfactual Regret Minimization (CFR) algorithms.
Original Algorithm Lines
Key Variables
rI[a]
: Accumulated regret for information set I and action asI[a]
: Accumulated strategy for information set I and action aπi
: Probability of reaching game state for learning playerπ−i
: Probability of reaching game state for other playerMotivation
The weighting serves two primary purposes:
Theoretical Basis
The approach is grounded in the paper "Regret Minimization in Games with Incomplete Information", which proves that overall regret is bounded by cumulative counterfactual regret.
The goal is to approximate a Nash equilibrium strategy by systematically minimizing cumulative counterfactual regret across multiple iterations.
Tags
Suggested labels
None
The text was updated successfully, but these errors were encountered: