Add Alpha-Beta Pruning to the Search

(Warning: This is a hard idea for some people. If it were not so important to good play, I would have left it out. Watch closely.)

The idea behind alpha-beta pruning is given by an example.

20 W
Imagine that White computes a negamax value of 0.7 for capturing the pawn by c3-b4. But White will consider other options. After c3-d4, Black might try b6-b5. Now imagine that after exploring this position
21 W
Black gives it a negamax value of -0.5. The question is: is there any reason for Black to explore other responses to c3-d4? The answer is no: since Black can force a value of 0.5 on White for c3-d4, White will always prefer the original c3-b4! Thus, Black need not consider, for example, c3-d4 b6-a6: Black has proven that White should never play c3-d4 in the first place, and more analysis will not change that.

Alpha-beta (a stupid name) pruning allows computing the negamax value of a tree in a way that gives the same answer with less work. The reason for calling it "pruning" can be seen from the game tree.
tree with AB
The subtree marked "?" in the tree will be "pruned away": there is no need to run this part of the recursion. Later we will talk about how much is saved—quite a lot.

To add alpha-beta to negamax, we need to keep track of the best score seen by each side so far in the search. We only compare against the opponent's best, but after the recursive call we become the opponent. Our worst possible value is the negation of our opponent's best value so far; we will call it a for "awful". Our best value we will call b.

negamax(s, d, a, b) = do
  if game_over(s) or d = 0 return value(s)
  v = -1.0
  a0 = a
  for m in moves(s) do
    v = max(v, -negamax(move(s, m), d - 1, -b, -a0))
    a0 = max(a0, v)
    if vb return b
  return v

best_move(s1) = do
  d0 = 1
  v = -1.0
  a0 = -1.0
  m0 = resign
  while time remains
    for m in moves(s1) do
      v0 = max(v, negamax(move(s1, m), d0, -1.0, -a0))
      a0 = max(a0, v0)
      if v0 > v then m0 = m
      v = max(v, v0)
    d0 = d0 + 1
  return m0
By explicitly taking the maximum one-at-a-time rather than all at once, we can quit early if there is no point in more search, and we can use the current values of a and b rather than stale ones.

Missed Opportunity: Note that the starting a0 and b values used in best_move are quite conservative. If we somehow knew some tighter bounds on these values up front, we could get much more pruning. This is the idea behind "zero-window" or "scout" search.

prev index next