Expectimax

Uncertain Outcomes

In many environments we’re faced with stochastic outcomes; we do not know with certainty what states will result from certain actions. This can be due to explicit randomness present in the environment, other unpredictable factors like opponents, or failing to execute an action perfectly. As a result, we should consider the average case rewards/costs instead of the worst case. Minimax computes the worst-case, always assuming players choose what’s best for them or worst for their opponent. This does not properly reflect randomness as discussed above, and thus we can use expectimax search for computing the average case.

Modeling Assumptions

Dangers of Optimism and Pessimism

  • Dangerously optimistic when we assume chance but the world is adversarial
  • Dangerously pessimistic when we assume the worst case but it’s unlikely

Mixed Layer Types

  • Can have the environment as a random agent making random moves in between plays of the real agents
  • Can be modeled with “expectiminimax”

Multi-agent Utilities

  • In the case the game is not zero-sum or has multiple players, we can use a generalization of minimax, where:
    • Terminal nodes have utility tuples
    • Node values are also utility tuples
    • Each player wants to maximize its own component of the tuple
    • Not necessarily competitive, can give rise to cooperation or other player strategies

Expected Utility

  • Rational agents should chose the action that maximizes the expected utility given what it knows about the world
  • Utilities are function from outcomes (states) to real numbers describing an agent’s preferences
  • Theorem states that any rational preferences can be encoded as a utility function
    • Given preferences satisfying basic axioms of rationalitym there exists a real-valued utility function UU such that U(A)U(B)    ABU(A) \ge U(B) \iff A \succeq B U([p1,S1;;pn,Sn])=ipiU(Si)U([p_1,S_1;\dots;p_n,S_n]) = \sum_ip_iU(S_i)
  • Principle of maximum expected utility (MEU): choose action maximizing expected utilityA
  • Human utilities: compare prize AA to “standard lottery” LpL_p between the best possible prize with probability pp and the worst possible catastrophe with probability 1p1-p
    • We adjust probability pp until indifference between ALpA \sim L_p
    • Resulting pp is a utility between 0 and 1