Expectimax

created: 2020-02-05 · modified: 2020-09-08

Uncertain Outcomes

In many environments we’re faced with stochastic outcomes; we do not know with certainty what states will result from certain actions. This can be due to explicit randomness present in the environment, other unpredictable factors like opponents, or failing to execute an action perfectly. As a result, we should consider the average case rewards/costs instead of the worst case. Minimax computes the worst-case, always assuming players choose what’s best for them or worst for their opponent. This does not properly reflect randomness as discussed above, and thus we can use expectimax search for computing the average case.

Expectimax Search

Expectimax search computes the average score under optimal play i.e calculates expected utilities for each node’s children in the game tree.

The algorithm mixes MAX nodes with CHANCE nodes, where CHANCE nodes derive their value from the expected value of its children i.e. the sum of the values of each of the children nodes weighted according to their probability of occurrence

Pseudocode

Pruning and Depth Limits

Pruning can be quite difficult and shave off important values that contribute heavily to the expectation. When limiting the search in real-world use cases, it is much better to use depth-limited expectimax by restricting the depth of the game tree over which the algorithm is allowed to compute.

Example

Modeling Assumptions

Dangers of Optimism and Pessimism

Dangerously optimistic when we assume chance but the world is adversarial
Dangerously pessimistic when we assume the worst case but it’s unlikely

Mixed Layer Types

Can have the environment as a random agent making random moves in between plays of the real agents
Can be modeled with “expectiminimax”

Multi-agent Utilities

In the case the game is not zero-sum or has multiple players, we can use a generalization of minimax, where:
- Terminal nodes have utility tuples
- Node values are also utility tuples
- Each player wants to maximize its own component of the tuple
- Not necessarily competitive, can give rise to cooperation or other player strategies

Expected Utility

Rational agents should chose the action that maximizes the expected utility given what it knows about the world
Utilities are function from outcomes (states) to real numbers describing an agent’s preferences
Theorem states that any rational preferences can be encoded as a utility function
- Given preferences satisfying basic axioms of rationalitym there exists a real-valued utility function $U$ such that $U(A) \ge U(B) \iff A \succeq B$ $U([p_1,S_1;\dots;p_n,S_n]) = \sum_ip_iU(S_i)$
Principle of maximum expected utility (MEU): choose action maximizing expected utilityA
Human utilities: compare prize $A$ $A$ to “standard lottery” $L_p$ $L_{p}$ between the best possible prize with probability $p$ $p$ and the worst possible catastrophe with probability $1-p$ $1 - p$
- We adjust probability $p$ until indifference between $A \sim L_p$
- Resulting $p$ is a utility between 0 and 1

Expectimax

Uncertain Outcomes

Expectimax Search

Pseudocode

Pruning and Depth Limits

Example

Modeling Assumptions

Dangers of Optimism and Pessimism

Mixed Layer Types

Multi-agent Utilities

Expected Utility

Metadata

Collection

Link Subgraph

Backlinks

Webmentions