Add Alpha-Beta Pruning to the Search

(Warning: This can be a hard idea. If it were not so important to good play, I would have left it out. Watch closely.)

The idea behind alpha-beta pruning is given by an example.

20 W

Imagine that White computes a negamax value of 0.7 for capturing the pawn by c3-b4. But White will think about other options. After c3-d4, Black might try b6-b5. Now imagine that after thinking about this position

21 W

Black gives it a negamax value of -0.5. The question is: is there any reason for Black to explore other responses to c3-d4? The answer is no: since Black can force a value of 0.5 on White for c3-d4, White will always prefer the original c3-b4! Thus, Black need not think about, for example, c3-d4 b6-a6: Black has proven that White should never play c3-d4 in the first place, and more analysis will not change that.

The alpha-beta (a stupid name) pruning technique allows computing the negamax value of a tree in a way that gives the same answer with less work. The reason for calling it "pruning" can be seen from the game tree.

tree with AB prune

The subtree marked "?" in the tree will be "pruned away": there is no need to run this part of the recursion. Later we will talk about how much is saved. As it turns out, the savings are large.

To add alpha-beta to negamax, we need to keep track of the best score seen by each side so far in the search. We only compare against the opponent's best, but after the recursive call we become the opponent. Our worst possible value is the negation of our opponent's best value so far; we will call it a for "awful". Our "best" value we will call b. But sometimes we will use the Greek letters, so &alpha (alpha); and β (beta).

To find the negamax value of state s with max depth d
and a-b pruning:
    if s is a final state or d ≤ 0
        return score(s)
    M ← legal moves from s
    extract some move m from M
    s′ ← m(s)
    v′ ← -(negamax(s′d - 1, -b, -a))
    if v′ > b
        return v′
    a ← max(av′)
    for m in remaining moves of M
        s′ ← m(s)
        v ← -(negamax(s′d - 1, -b, -a))
        if v ≥ b
            return v
        v′ ← max(v′v)
        a ← max(av)
    return v′

Posted Wed 06 May 2015 02:12:20 AM PDT

By explicitly taking the maximum one-at-a-time rather than all at once, we can quit early if there is no point in more search, and we can use the current values of a and b rather than stale ones.

Missed Opportunity: Note that the starting a0 and b values used in best_move are quite conservative. If we somehow knew some tighter bounds on these values up front, we could get much more pruning. This is the idea behind "zero-window" or "scout" search.