An Investigation Of An Adaptive Poker Player (Graham Kendall & Mark Willdig).pdf

(1097 KB) Pobierz
49263173 UNPDF
An Investigation of an Adaptive Poker Player
Graham Kendall and Mark Willdig
The University of Nottingham, School of Computer Science & IT, Jubilee Campus,
Wollaton Road, Nottingham NG8 1BB, UK
gxk@cs.nott.ac.uk, Mark.Willdig@siemenscomms.co.uk
Abstract : Other work has shown that adaptive learning can be highly
successful in developing programs which are able to play games at a level
similar to human players and, in some cases, exceed the ability of a vast
majority of human players. This study uses poker t o investigate how adaptation
can be used in games of imperfect information. An internal learning value is
manipulated which allows a poker playing agent to develop its playing strategy
over time. The results suggest that the agent is able to learn how to play poker,
initially losing, before winning as the players strategy becomes more
developed. The evolved player performs well against opponents with different
playing styles. Some limitations of previous work are overcome, such as deal
rotation to remove the bias introduced by one player always being the last to
act. This work provides encouragement that this is an area worth exploring
more fully in our future work.
1. Introduction
Game playing has a long research history. Chess has received particular interest
culminating in Deep Blue beating Kasparov in 1997, albeit with specialized hardware
(Hamilton, 1997) and brute force search. However, although arguably, being a ‘solved
game’ chess still receives interest as researchers turn to adaptive learning techniques
which allow computers to ‘learn’ to play chess, rather than being ‘told’ how it should
play (Kendall, 2001). Adaptive learning was being used for checkers as far back as
the 1950’s with Samuel’s seminal work (1959, re-produced in Samuel, 2000).
Checkers research would lead to Jonathan Schaeffer developing Chinook, which
claimed the world title in 1994 (Schaeffer, 1996). Like Deep Blue, it is arguable if
Chinook used AI techniques. Chinook had an opening and ending database. In certain
games it was able to play the entire game from these two databases. If this could not
be achieved, a form of mini-max search, with alpha-beta pruning was used. Despite
Chinook becoming the world champion, the search has continued for an adaptive
checkers player. Chellapilla and Fogel’s Chellapilla, 2000) Anaconda was named
due to the strangle hold it placed on its opponent. It is also named Blondie24, this
being the name it used when competing in internet games (Fogel, 2001). Anaconda
uses an artificial neural network (ANN), with 5000 weights, which are evolved by an
evolutionary strategy. The inputs to the ANN are the current board position and it
outputs a value which is used in a mini-max search. During the training period, using
co-evolution, the program is given no information other than whether it won or lost.
Once Anaconda is able to play at a suitable level, it often searches to a depth of 10,
but depths of 6 and 8 are also common in play. Anaconda has been available to the
delegates at the Congress on Evolutionary Computing (CEC) conference for the past
two years (CEC’00, San Diego and CEC’01, Seoul) with Fogel offering a prize of
$100 (CEC’00) and $200 (CEC’01) to anybody who could defeat it. The prize
remains unclaimed and at the next conference (CEC’02, Hawaii), the prize rises to
$300.
Poker also has an equally long research history with von Neumann and Morgensten
(von Neumann, 1944) experimenting with a simplified, two-player, version of poker.
Findler (Findler, 1977) studied poker, over a 20 year period. He also worked on a
simplified game, based on 5-card draw poker with no ante and no consideration of
betting position due to the computer always playing last. He concluded that dynamic
and adaptive algorithms are required for successful play and static mathematical
models were unsuccessful and easily beaten.
In more recent times three research groups have been researching poker. Jonathan
Schaeffer (of Chinook fame) and a number of his students have developed ideas
which have led to Loki, which is, arguably, the strongest poker playing program to
date. It is still a long way from being able to compete in the World Series of Poker
(WSOP), an annual event held in Las Vegas, but initial results are promising.
Schaeffer’s work concentrates on two main areas (Billings, 1998a and Schaeffer,
1999). The first research theme makes betting decisions using probabilistic
knowledge (Billings, 1999) to determine which action to take (fold, call or raise)
given the current game state. Billings et. al. also uses real time simulation of the
remainder of the game that allows the program to determine a statistically significant
result in the program’s decision making process. Schaeffer’s group also uses
opponent modeling (Billings, 1998b). This allows Loki to maintain a model of an
opponent and use this information to decide what betting decisions to make.
Koller and Pfeffer (Koller, 1997), using their Gala system, allow games of
imperfect information to be specified and solved, using a tree based approach.
However, due to the size of the trees they state “…we are nowhere close to being able
to solve huge games such as full-scale poker, and it is unlikely that we will ever be
able to do so.”
Luigi Barone and Lyndon While recognise four main types of poker player; Loose,
Tight, Passive, and Aggressive. These characteristics are combined to create the four
common types of poker players: Loose Passive, Loose Aggressive, Tight Passive and
Tight Aggressive players (Barone & While, 1999; 2000). A Loose Aggressive player
will overestimate their hand, raising frequently, and their aggressive nature will drive
the pot higher, increasing their potential winnings. A Loose Passive player will
overestimate their hand, but due to their passive nature will rarely raise, preferring to
call and allow other players to increase the pot. A Tight Aggressive player will play
to close constraints, participating in only a few hands which they have a high
probability of winning. The hands they do play, they will raise frequently to increase
the size of the pot. A Tight Passive player will participate in few hands, only
considering playing those that they have a high probability of winning. The passive
nature implies that they allow other players to drive the pot, raising infrequently
themselves.
In their first paper Barone and While (Barone, 1998) suggest evolutionary
strategies as a way of modelling an adaptive poker player. They use a simple poker
variant where each player has two private cards, there are five community cards and
one round of betting. This initial work incorporates three main areas of analysis; hand
strength, position and risk management. Two types of tables are used, a loose table
and a tight table. The work demonstrates how a player that has evolved using
evolutionary strategies can adapt its style to the two types of table.
In (Barone, 1999) they develop their work by introducing a hypercube which is an
n dimensional vector, used to store candidate solutions. The hypercube has one
dimension for the betting position (early, middle and late) and another dimension for
the risk management (selected from the interval 0..3). At each stage of the game the
relevant candidate solutions are selected from the hypercube (e.g. middle betting
position and risk management 2) and the decision is made whether to fold, call or
raise. To make the decision the hypercube entry holds seven real valued numbers
which are used as constants to three functions (fold, call and raise). In effect, the
functions lead to a probability of carrying out the relevant action. It s the seven real
values that are evolved depending on whether the player won the hand or not. Barone
reports that this poker player improves on the 1998 version. Their 2000 paper
(Barone, 2000) extends the dimensions of the hypercube to include four betting
rounds (pre-flop, post-flop, post-turn and post-river) and an opponent dimension so
that the evolved player can choose which type of player it is up against. The authors
report this player out performs a competent static player.
Poker, being a game of imperfect information, is interesting as a game for the basis
of research. Unlike chess and checkers, poker has some information that is unseen.
Poker also contains other unknowns such as the playing styles of the other players
who may use bluffing (and double bluffing) during the course of the game. These
elements add to the research interest. Unlike complete information games where the
techniques to solve the games (computational power allowing) have been known and
understood for a long time (such as mini-max search and alpha-beta pruning), games
of imperfect information have not received the same sort of analysis and, doing so,
could prove relevant to many other areas such as economics, on-line auctions and
negotiating.
2. The Rules of Poker
The exact rules for poker can be found in many poker books (see, for example,
Sklansky, 1994; 1996) and we simply give here the basic rules of one variant (Texas
Hold ‘Em) so that the reader is able to follow the remainder of this paper. Each player
is dealt two cards. These are private cards, only being visible to the player receiving
those cards. These cards are normally referred to as hole cards. A round of betting
follows this initial deal. Next, three community cards (called the flop) are dealt, face
up, in the middle of the table. These cards are used by every player to make the best
five card poker hand, using their hole cards. A round of betting follows the flop. Next,
another community card (called the turn) is dealt face up in the middle of the table.
Another round of betting follows. Finally, another community card (called the river)
is dealt and a final round of betting follows. Once this final round of betting has taken
place, assuming there are two or more players who still have an interest in the pot, the
cards are shown and the highest poker hand wins. In forming a poker hand, the
players can use any combination of their two hole cards and the five community cards
to make the best five card poker hand. The various poker hands are as follows, in
descending order.
Royal Flush : Ten, Jack, Queen, King and Ace, all in the same suit.
Straight Flush : any sequence of five cards, all of the same suit.
Four of a Kind : four cards having the same value, one from each suit.
Full House : three cards of the same value combined with two cards of the same
value. For example, Three 2’s and a pair of Queens.
Flush : all five cards have the same suit.
Straight : all five card values are in sequence, made up from at least two suits.
Three of a Kind : three cards all having the same value.
Two Pairs : two cards of the same value, combined with another two card of the same
value. For example, two 9’s and two 3’s.
A Pair : two cards having the same value.
Single Card : the highest value card is used to value the hand.
When betting, the players have three choices to make. They can either fold (throw
in their cards and relinquish all claims to the money in the pot), they can call (match
the amount of money bet so far) or they can raise (increase the current bet, thus
forcing all the other players to match this amount or fold). To start the betting it is
usual to put in some form of ante. This is a mechanism to start the betting by giving
the players an interest in the pot.
In this paper we have not implemented a full version of Texas Hold ‘Em,
preferring a version of poker, where the players are dealt five cards and, after a round
of betting, are allowed to trade two cards before a final round of betting. This version
is known as draw poker and was considered as a suitable test bed for this initial
investigation.
3. Experiments
We have implemented the four playing styles (loose passive, loose aggressive, tight
passive and tight aggressive) described above so that we can sit each of them at our
tables and find out if our approach can adapt to each of these styles. Each playing
style will play to a specific set of rules using the value of their current hand and the
current value of the pot to decide whether to fold, call, or raise.
Zgłoś jeśli naruszono regulamin