# Expectimax

# Uncertain Outcomes

In many environments we’re faced with stochastic outcomes; we do not know with certainty what states will result from certain actions. This can be due to explicit randomness present in the environment, other unpredictable factors like opponents, or failing to execute an action perfectly. As a result, we should consider the average case rewards/costs instead of the worst case. Minimax computes the worst-case, always assuming players choose what’s best for them or worst for their opponent. This does not properly reflect randomness as discussed above, and thus we can use **expectimax** search for computing the average case.

# Expectimax Search

Expectimax search computes the average score under optimal play i.e calculates expected utilities for each node’s children in the game tree.

- The algorithm mixes MAX nodes with CHANCE nodes, where CHANCE nodes derive their value from the expected value of its children i.e. the sum of the values of each of the children nodes weighted according to their probability of occurrence

## Pseudocode

## Pruning and Depth Limits

Pruning can be quite difficult and shave off important values that contribute heavily to the expectation. When limiting the search in real-world use cases, it is much better to use depth-limited expectimax by restricting the depth of the game tree over which the algorithm is allowed to compute.

## Example

# Modeling Assumptions

## Dangers of Optimism and Pessimism

- Dangerously optimistic when we assume chance but the world is adversarial
- Dangerously pessimistic when we assume the worst case but it’s unlikely

# Mixed Layer Types

- Can have the environment as a random agent making random moves in between plays of the real agents
- Can be modeled with “expectiminimax”

# Multi-agent Utilities

- In the case the game is not zero-sum or has multiple players, we can use a generalization of minimax, where:
- Terminal nodes have
**utility tuples** - Node values are also utility tuples
- Each player wants to maximize its own component of the tuple
- Not necessarily competitive, can give rise to cooperation or other player strategies

- Terminal nodes have

# Expected Utility

- Rational agents should chose the action that maximizes the
**expected utility**given what it knows about the world **Utilities**are function from outcomes (states) to real numbers describing an agent’s preferences- Theorem states that any rational preferences can be encoded as a utility function
- Given preferences satisfying basic axioms of rationalitym there exists a real-valued utility function $U$ such that $U(A) \ge U(B) \iff A \succeq B$ $U([p_1,S_1;\dots;p_n,S_n]) = \sum_ip_iU(S_i)$

- Principle of maximum expected utility (MEU): choose action maximizing expected utilityA
- Human utilities: compare prize $A$ to “standard lottery” $L_p$ between the best possible prize with probability $p$ and the worst possible catastrophe with probability $1-p$
- We adjust probability $p$ until indifference between $A \sim L_p$
- Resulting $p$ is a utility between 0 and 1