|
Machines That Learn to Play
Games edited by Johannes Fürnkranz and Miroslav Kubat Advances in Computation: Theory and Practice, Volume 8 |
Abstracts |
Chapter 1 - Should Machines
Learn How to Play Games?
Miroslav Kubat
Chapter 2 - Machine Learning in Games: A Survey
Johannes Fürnkranz
This paper provides a survey of previously published work on machine learning in game playing. The material is organized around a variety of problems that typically arise in game playing and that can be solved with machine learning methods. This approach, we believe, allows both, researchers in game playing to find appropriate learning techniques for helping to solve their problems as well as machine learning researchers to identify rewarding topics for further research in game-playing domains. The chapter covers learning techniques that range from neural networks to decision tree learning in games that range from poker to chess. However, space constraints prevent us from giving detailed introductions to the used learning techniques or games. Overall, we aimed at striking a fair balance between being exhaustive and being exhausting.
Chapter 3 - Human Learning in Game Playing
Fernand Gobet & Herbert A. Simon
Games have played an important role in psychologists' study of complex cognitive processes. After a long period where research mainly dealt with issues about memory and problem solving, several lines of research are now specifically devoted to the learning by humans of game-playing skills. Current research on human learning in game playing can be classified into three broad categories: cross-sectional studies of perception, memory or problem solving; longitudinal studies following the development from novice to expert; and computer modeling. After examining the state of research, the chapter discusses the influences of psychology upon machine learning, and of machine learning upon psychology.
Chapter 4 - Toward Opening Book Learning
Michael Buro
In this chapter an opening book framework for game--playing programs is presented. Motivated by basic requirements for successfully playing a sequence of games---such as avoiding losing games twice in the same way---it is shown how reasonable move alternatives can be found in order to deviate from previous lines of play. Variants of the algorithm are used by several of today's best Othello programs and allow them to extend their opening books automatically.
Chapter 5 - Reinforcement Learning and Chess
Johnathan Baxter, Andrew Tridgell, Lex Weaver
In this chapter we present TDLeaf(λ), a variation on the TD(λ) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program KNIGHTCAP used TDLeaf(λ) to learn its evaluation function while playing on Internet chess servers. The main success we report is that KNIGHTCAP improved from a 1650 rating to a 2150 rating in just 308 games and 3 days of play. As a reference, a rating of 1650 corresponds to about level B human play (on a scale from E (1000) to A (1800)), while 2150 is human master level. We discuss some of the reasons for this success, principle among them being the use of on-line, rather than self-play (or play with respect to some other distribution of opponents). We also investigate whether TDLeaf(λ) can yield better results in the domain of backgammon, where TD(λ) has previously yielded striking success.
Chapter 6 - Comparison Training of Chess Evaluation
Functions
Gerald Tesauro
The supervised learning methodology of "comparison training" (Tesauro, 1989a) on a database of expert preferences is extended to search depths beyond 1-ply, and applied to the problem of training the weights in a linear evaluation function for the game of chess. An initial set of experiments was performed using SCP, a public-domain chess program. Training based on simple 1-ply searches was found to be ineffective, but for 1-ply plus quiescence expansion, high-quality solutions were found that outperform SCP's hand-tuned weights. The trained weights had performance that scaled well with search depth, and consistent improvement over the hand-tuned solution was found even for test depths much greater than the training search depth.
A discretized version of the algorithm was also developed and used to tune a subset of the weights in DEEP BLUE, having to do primarily with king safety evaluation. Training was based on 4-ply search (plus quiescence), and good test-set generalization was found out to 7-ply. During the 1997 rematch with Garry Kasparov, the tuning of the king-safety weights made a critical difference in one important position in game 2, and in the program's general understanding and handling of game 6.
Chapter 7 - Feature Construction for Game Playing
Paul E. Utgoff
To build an evaluation function for game-playing, one needs to construct informative features that enable accurate relative assessment of a game state. This chapter describes the feature construction problem, and suggests directions for dealing with shortcomings in the present state of the art.
Chapter 8 - Learning to Play Expertly: A Tutorial on Hoyle
Susan L. Epstein
HOYLE is a program that learns to play a broad variety of board games very well, in real time. Hoyle is not a traditional game playing artifact: it eschews search and tolerates, even encourages, inconsistent and incomplete knowledge. This paper describes the program's orientation and current status, reports on some of its achievements, and offers some explanation for its prowess.
Chapter 9 - Acquisition of Go Knowledge from Game Records
Takuya Kojima & Atsushi Yoshikawa
A large amount of knowledge is considered very useful for systems playing such games as Go and shogi (Japanese chess) which have a much larger search space than chess. Knowledge acquisition is therefore necessary for systems playing these games. This paper explains two approaches to acquire Go knowledge from game records: a deductive approach and an evolutionary one. The former is taken to acquire strict knowledge; several rules are acquired from a single training example. The latter is taken to acquire a large amount of heuristic knowledge from a large amount of training examples. Tsume-go problems are solved in order to compare the performance of the algorithm with others.
Chapter 10 - Honte, a G0-Playing Program Using Neural Nets
Fredrik A. Dahl
The Go-playing program HONTE is described. It uses neural nets together with more conventional AI-methods like alpha-beta search. A neural net is trained by supervised learning to imitate local shapes seen in a database of expert games. A second net is trained to estimate the safety of groups by self play using a modified version of TD(λ)-learning. A third net is trained to estimate territorial potential of unoccupied points, also based on self play and TD(λ)-learning. Although the program has not yet reached the level of the most popular commercial Go-programs, results are encouraging.
Chapter 11 - Learning to Play Strong Poker
Darse Billings, Lourdes Peña, Jonathan Schaeffer, Duane
Szafron
Poker is an interesting test-bed for artificial intelligence research. It is a game of imperfect knowledge, where multiple competing agents must deal with risk management, opponent modeling, unreliable information, and deception, much like decision-making applications in the real world. Opponent modeling is one of the most difficult problems in decision-making applications and in poker it is essential to achieving high performance. This chapter describes and evaluates the implicit and explicit learning in the poker program LOKI. LOKI implicitly "learns" sophisticated strategies by selectively sampling likely cards for the opponents and then simulating the remainder of the game. The program has explicit learning by observing its opponents, constructing opponent models and dynamically adapting its play to exploit patterns in the opponents' play. The result is a program capable of playing reasonably strong poker, but there remains considerable research to be done to play at a world-class level.