An Investigation Of An Adaptive Poker Player (Graham Kendall & Mark Willdig).pdf

An Investigation of an Adaptive Poker Player

Graham Kendall and Mark Willdig

The University of Nottingham, School of Computer Science & IT, Jubilee Campus,

Wollaton Road, Nottingham NG8 1BB, UK

gxk@cs.nott.ac.uk, Mark.Willdig@siemenscomms.co.uk

Abstract : Other work has shown that adaptive learning can be highly

successful in developing programs which are able to play games at a level

similar to human players and, in some cases, exceed the ability of a vast

majority of human players. This study uses poker t o investigate how adaptation

can be used in games of imperfect information. An internal learning value is

manipulated which allows a poker playing agent to develop its playing strategy

over time. The results suggest that the agent is able to learn how to play poker,

initially losing, before winning as the players strategy becomes more

developed. The evolved player performs well against opponents with different

playing styles. Some limitations of previous work are overcome, such as deal

rotation to remove the bias introduced by one player always being the last to

act. This work provides encouragement that this is an area worth exploring

more fully in our future work.

1. Introduction

Game playing has a long research history. Chess has received particular interest

culminating in Deep Blue beating Kasparov in 1997, albeit with specialized hardware

(Hamilton, 1997) and brute force search. However, although arguably, being a ‘solved

game’ chess still receives interest as researchers turn to adaptive learning techniques

which allow computers to ‘learn’ to play chess, rather than being ‘told’ how it should

play (Kendall, 2001). Adaptive learning was being used for checkers as far back as

the 1950’s with Samuel’s seminal work (1959, re-produced in Samuel, 2000).

Checkers research would lead to Jonathan Schaeffer developing Chinook, which

claimed the world title in 1994 (Schaeffer, 1996). Like Deep Blue, it is arguable if

Chinook used AI techniques. Chinook had an opening and ending database. In certain

games it was able to play the entire game from these two databases. If this could not

be achieved, a form of mini-max search, with alpha-beta pruning was used. Despite

Chinook becoming the world champion, the search has continued for an adaptive

checkers player. Chellapilla and Fogel’s Chellapilla, 2000) Anaconda was named

due to the strangle hold it placed on its opponent. It is also named Blondie24, this

being the name it used when competing in internet games (Fogel, 2001). Anaconda

uses an artificial neural network (ANN), with 5000 weights, which are evolved by an

evolutionary strategy. The inputs to the ANN are the current board position and it

outputs a value which is used in a mini-max search. During the training period, using

co-evolution, the program is given no information other than whether it won or lost.

Once Anaconda is able to play at a suitable level, it often searches to a depth of 10,

but depths of 6 and 8 are also common in play. Anaconda has been available to the

delegates at the Congress on Evolutionary Computing (CEC) conference for the past

two years (CEC’00, San Diego and CEC’01, Seoul) with Fogel offering a prize of

$100 (CEC’00) and $200 (CEC’01) to anybody who could defeat it. The prize

remains unclaimed and at the next conference (CEC’02, Hawaii), the prize rises to

$300.

Poker also has an equally long research history with von Neumann and Morgensten

(von Neumann, 1944) experimenting with a simplified, two-player, version of poker.

Findler (Findler, 1977) studied poker, over a 20 year period. He also worked on a

simplified game, based on 5-card draw poker with no ante and no consideration of

betting position due to the computer always playing last. He concluded that dynamic

and adaptive algorithms are required for successful play and static mathematical

models were unsuccessful and easily beaten.

In more recent times three research groups have been researching poker. Jonathan

Schaeffer (of Chinook fame) and a number of his students have developed ideas

which have led to Loki, which is, arguably, the strongest poker playing program to

date. It is still a long way from being able to compete in the World Series of Poker

(WSOP), an annual event held in Las Vegas, but initial results are promising.

Schaeffer’s work concentrates on two main areas (Billings, 1998a and Schaeffer,

1999). The first research theme makes betting decisions using probabilistic

knowledge (Billings, 1999) to determine which action to take (fold, call or raise)

given the current game state. Billings et. al. also uses real time simulation of the

remainder of the game that allows the program to determine a statistically significant

result in the program’s decision making process. Schaeffer’s group also uses

opponent modeling (Billings, 1998b). This allows Loki to maintain a model of an

opponent and use this information to decide what betting decisions to make.

Koller and Pfeffer (Koller, 1997), using their Gala system, allow games of

imperfect information to be specified and solved, using a tree based approach.

However, due to the size of the trees they state “…we are nowhere close to being able

to solve huge games such as full-scale poker, and it is unlikely that we will ever be

able to do so.”

Luigi Barone and Lyndon While recognise four main types of poker player; Loose,

Tight, Passive, and Aggressive. These characteristics are combined to create the four

common types of poker players: Loose Passive, Loose Aggressive, Tight Passive and

Tight Aggressive players (Barone & While, 1999; 2000). A Loose Aggressive player

will overestimate their hand, raising frequently, and their aggressive nature will drive

the pot higher, increasing their potential winnings. A Loose Passive player will

overestimate their hand, but due to their passive nature will rarely raise, preferring to

call and allow other players to increase the pot. A Tight Aggressive player will play

to close constraints, participating in only a few hands which they have a high

probability of winning. The hands they do play, they will raise frequently to increase

the size of the pot. A Tight Passive player will participate in few hands, only

considering playing those that they have a high probability of winning. The passive

nature implies that they allow other players to drive the pot, raising infrequently

themselves.

In their first paper Barone and While (Barone, 1998) suggest evolutionary

strategies as a way of modelling an adaptive poker player. They use a simple poker

variant where each player has two private cards, there are five community cards and

one round of betting. This initial work incorporates three main areas of analysis; hand

strength, position and risk management. Two types of tables are used, a loose table

and a tight table. The work demonstrates how a player that has evolved using

evolutionary strategies can adapt its style to the two types of table.

In (Barone, 1999) they develop their work by introducing a hypercube which is an

n dimensional vector, used to store candidate solutions. The hypercube has one

dimension for the betting position (early, middle and late) and another dimension for

the risk management (selected from the interval 0..3). At each stage of the game the

relevant candidate solutions are selected from the hypercube (e.g. middle betting

position and risk management 2) and the decision is made whether to fold, call or

raise. To make the decision the hypercube entry holds seven real valued numbers

which are used as constants to three functions (fold, call and raise). In effect, the

functions lead to a probability of carrying out the relevant action. It s the seven real

values that are evolved depending on whether the player won the hand or not. Barone

reports that this poker player improves on the 1998 version. Their 2000 paper

(Barone, 2000) extends the dimensions of the hypercube to include four betting

rounds (pre-flop, post-flop, post-turn and post-river) and an opponent dimension so

that the evolved player can choose which type of player it is up against. The authors

report this player out performs a competent static player.

Poker, being a game of imperfect information, is interesting as a game for the basis

of research. Unlike chess and checkers, poker has some information that is unseen.

Poker also contains other unknowns such as the playing styles of the other players

who may use bluffing (and double bluffing) during the course of the game. These

elements add to the research interest. Unlike complete information games where the

techniques to solve the games (computational power allowing) have been known and

understood for a long time (such as mini-max search and alpha-beta pruning), games

of imperfect information have not received the same sort of analysis and, doing so,

could prove relevant to many other areas such as economics, on-line auctions and

negotiating.

2. The Rules of Poker

The exact rules for poker can be found in many poker books (see, for example,

Sklansky, 1994; 1996) and we simply give here the basic rules of one variant (Texas

Hold ‘Em) so that the reader is able to follow the remainder of this paper. Each player

is dealt two cards. These are private cards, only being visible to the player receiving

those cards. These cards are normally referred to as hole cards. A round of betting

follows this initial deal. Next, three community cards (called the flop) are dealt, face

up, in the middle of the table. These cards are used by every player to make the best

five card poker hand, using their hole cards. A round of betting follows the flop. Next,

another community card (called the turn) is dealt face up in the middle of the table.

Another round of betting follows. Finally, another community card (called the river)

is dealt and a final round of betting follows. Once this final round of betting has taken

place, assuming there are two or more players who still have an interest in the pot, the

cards are shown and the highest poker hand wins. In forming a poker hand, the

players can use any combination of their two hole cards and the five community cards

to make the best five card poker hand. The various poker hands are as follows, in

descending order.

Royal Flush : Ten, Jack, Queen, King and Ace, all in the same suit.

Straight Flush : any sequence of five cards, all of the same suit.

Four of a Kind : four cards having the same value, one from each suit.

Full House : three cards of the same value combined with two cards of the same

value. For example, Three 2’s and a pair of Queens.

Flush : all five cards have the same suit.

Straight : all five card values are in sequence, made up from at least two suits.

Three of a Kind : three cards all having the same value.

Two Pairs : two cards of the same value, combined with another two card of the same

value. For example, two 9’s and two 3’s.

A Pair : two cards having the same value.

Single Card : the highest value card is used to value the hand.

When betting, the players have three choices to make. They can either fold (throw

in their cards and relinquish all claims to the money in the pot), they can call (match

the amount of money bet so far) or they can raise (increase the current bet, thus

forcing all the other players to match this amount or fold). To start the betting it is

usual to put in some form of ante. This is a mechanism to start the betting by giving

the players an interest in the pot.

In this paper we have not implemented a full version of Texas Hold ‘Em,

preferring a version of poker, where the players are dealt five cards and, after a round

of betting, are allowed to trade two cards before a final round of betting. This version

is known as draw poker and was considered as a suitable test bed for this initial

investigation.

3. Experiments

We have implemented the four playing styles (loose passive, loose aggressive, tight

passive and tight aggressive) described above so that we can sit each of them at our

tables and find out if our approach can adapt to each of these styles. Each playing

style will play to a specific set of rules using the value of their current hand and the

current value of the pot to decide whether to fold, call, or raise.

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: