Adversarial search
Deterministic Games
- States: (starting at )
- Players:
- Actions: (conditional on player or state)
- Transition function:
- Terminal test:
- Terminal utilities: , tell how good each ending state is for each player
- “Solution” for player is policy:
Zero-sum Games
- Agents have exactly opposite utilities
- Purely competitive environment, positive outcome for one player is negative for another
General Games
- Agents have independent utilities
- Cooperation, indifference, competition, etc are all possible
Value of a state
The value of a state is the best achievable outcome or utility from that state. The value of non-terminal states is given by which says that a state’s value takes on the highest value attained by any of its children. Terminal states have a known value according the terminal utility for that player.
Note that this definition holds for both single agent and adversarial game trees. In the latter, however, the opponent controls the child nodes or the successors to each of the nodes you control. The value of the current state is still the value of the best successor node, even if the actions immediately following are in the opponents hands.
Minimax
Minimax is an adversarial search algorithm that computes a minimax value at each node, which the best achievable utility against an optimal adversary
- From the perspective of one particular player, this corresponds to maximizing utility from children states for parent states where we have control, and minimizing utility where the opponent has control. This makes the assumption that the opponent is playing optimally since it’s applied to zero-sum games.
- Takes time and space, where is the branching factor and is the depth of the search
Note that realistically, we cannot search all the way to the leaves (gets far too expensive very quickly). Instead use a depth-limited search, and use an evaluation function to assign scores to the lowest level (non-terminal) states that we can reach.
- Can no longer guarantee optimality but allows us to actually make decent decisions in a reasonable amount of time
- The more levels we explore, the less the accuracy of the evaluation function matters (obscured by greater amount of real evidence)
- Shows tradeoff between feature complexity and computational complexity
- Ideally our evaluation function returns the actual minimax value, but in practice is typically a weighted linear sum of features
Thrashing
The term “thrashing” was used when Pacman simply went back and forth in the same location, despite there being an easy level to complete without ghosts. Here we were using the score of the game as the evaluation function, with a depth two look ahead. In this simple scenario, the game tree looks as follows:
{: style=“height:40%;width:40%”}
- With a replanning agent in this game, it starts at the root node and evaluates its options. The values of both successor nodes are the same: he will get a point for eating the dot by going left, or he will get a point for eating the dot after moving right twice. Either way, the value of the root node’s children is the same. Hence Pacman will pick from them arbitrarily. Once he’s at the new state (if he’s thrashing he will go to the right), he will reevaluate the game tree with depth two lookahead and find himself in the exact same situation as before: indifferent between left and right movements as the both yield the same score, just at different times down the game tree. This will continue and Pacman will “thrash” about.
- This can be fixed by either looking further into the game tree, or improving our evaluation function. In the latter case, we can simply add the distance to the nearest dot, forcing Pacman to value states where he is closer to a dot higher than ones where he’s further. This fixes the thrashing conundrum.
Alpha-Beta Pruning
- Prune’s node children by not exploring unnecessary states
- Consider the perspective of a MAX node, where we want to choose the successor from the current root that has the maximum value for us. Now let be the best value that we can get along the currently explored paths from the root. Then, if the MIN node currently being expanded has a child with value and it’s less than , then we can stop expanding immediately. This is because we know the MIN node will select from it’s children and take on a value at most value , and this value will propagate up as an upper bound for the value of the successor node to the root at the top of the current path. Knowing this, the MAX node will just pick as it’s the larger value. As soon as we hit the value , we can stop exploring that node’s other children since we know it will never be played.
- is MAX’s best option, is MIN’s best option. These two values are invariants maintained during the minimax search process, updated when we see a new best node, and used to cut off computation when we can.
- This pruning does not affect the minimax value computed for the root, but note that “intermediate” nodes (i.e. children of the root) may have the wrong absolute value since their entire tree wasn’t computed
- With the “perfect ordering” on children, we can double the solvable depth (time complexity drops to )