CS 410/510 Games: Machine Learning

# Machine Learning

PSU CS410/510GAMES
Lecture 7

May 16, 2002

- More probability: perfect Yahtzee
- Rules
- Yahtzee
- Yacht

- One-player, maximize score
- Retrograde analysis
- Nonlinearity
- Avoiding recomputation

- More sophisticated
- One-player, maximize winning chance
- Two-player, maximize winning chance

- Rules
- Machine learning of games
- Goal: learn state values (vs. learn moves)
- Generic problems: opponent modeling, overtraining, feature selection
- TD-Gammon: neural nets for evaluation
- Gerry Tesauro, IBM: play random backgammon with neural net evaluator
- Gradient descent weight propagation with TD(lambda) assigns errors to positions
- Eventually used hand-coded features plus shallow search to improve play
- Learning "doubling" is hard

- Michael Buro: three strategies
- Multi-ProbCut: learn reasonable AB windows
- Opening Book: don't lose the same way twice
- Idea: use PVS, but with leaves leading to demonstrated win/loss labeled with infinite scores. Win for side is loss for other side...
- Learning: update the book as position values are "found". (Can always find a "best" position to play for unless tree value is -inf.)
- Problems
- Bad opponents mean bad Ws, bad plays means bad Ls
- Expensive to build an opening book this way

- GLEM: learning eval functions
- Eval function:
*g(sum(w[i] * val(c[i])))*where*c[i] = and(r[j])* - Uses gradient descent to adjust weights: effectively "one-neuron net"
- Features
*c[i]*and relations*r[i]*are given by implementor - Strategy: get endgame weights and back weight assignments up toward opening
- Problems: expense, overtraining, weight assignment

- Eval function:

- GAs for evaluation