Anderson - ACT. A simple theory of complex cognition.pdf

(1090 KB) Pobierz
amp51040355.tif
1995 Award Addresses
ACT
A Simple Theory of Complex Cognition
John R. Anderson
Carnegie Mellon University
In the Adaptive Character of Thought (ACT-R) theory,
complex cognition arisesfrom an interaction ofprocedural
and declarative knowledge. Procedural knowledge is rep-
resented in units called production rules, and declarative
knowledge is represented in units called chunks. The in-
dividual units are created by simple encodings of objects
in the environment (chunks) or simple encodings oftrans-
formations in the environment (production rules). A great
many such knowledge units underlie human cognition.
From this large database, the appropriate units are se-
lectedfor a particular context by activation processes that
are tuned to the statistical structure of the environment.
According to the ACT-R theory, the power of human cog-
nition depends on the amount of knowledge encoded and
the effective deployment of the encoded knowledge.
the concrete illustration to such an abstract statement. It
certainly seems like the kind of cognitive act that we are
unlikely to see from any other species.
We have studied extensively how people write re-
cursive programs (e.g., Anderson, Farrell, & Sauers, 1984;
Pirolli & Anderson, 1985). To test our understanding of
the process, we have developed computer simulations that
are themselves capable of writing recursive programs in
the same way humans do. Underlying this skill are about
500 knowledge units called production rules. For instance,
one of these production rules for programming recursion,
which might apply in the midst of the problem solving,
is
IF the goal is to identify the recursive relationship in a
function with a number argument
THEN set as subgoals to
1. Find the value of the function for some N
2. Find the value of the function for N- 1
3. Try to identify the relationship between the two
answers.
flects the fact that there is something special about
human cognition--that it achieves a kind of intel-
ligence not even approximated in other species. One can
point to marks of that intelligence in many domains.
Much of my research has been in the area of mathematics
and computer programming, fields in which the capacity
to come up with abstract solutions to problems is one
ability that is frequently cited with almost mystical awe.
A good example of this is the ability to write recursive
programs.
Consider writing a function to calculate the factorial
of a number. The factorial of a number can be described
to someone as the result you get when you multiply all
the positive integers up to that number. For instance,
Thus, in the case above, this might lead to finding that
factorial(5) = 120 (Step 1), factorial(4) = 24 (Step 2), and
that factorial (N) = factorial (N-l) X N (Step 3).
We (e.g., Anderson, Boyle, Corbett, & Lewis, 1990;
Anderson, Corbett, Koedinger, & Pelletier, 1995; Ander-
son & Reiser, 1985) have created computer-based in-
structional systems, called intelligent tutors, for teaching
cognitive skills based on this kind of production-rule
analysis. By basing instruction on such rules, we have
been able to increase students' rate of learning by a factor
of 3. Moreover, within our tutors we have been able to
factorial(5) = 5 X 4 X 3 X 2 X 1 = 120
In addition (it might appear by arbitrary convention), the
factorial of zero is defined to be I. In writing a recursive
program to calculate the factorial for any number N, one
defines factorial in terms of itself. Below is what such a
program might look like:
Editor's note. Articlesbased on APAaward addressesare givenspecial
consideration in the AmericanPsychologist's editorial selection process.
A versionofthis article was originallypresented as part ofan Award
for Distinguished Scientific Contributions address at the 103rd Annual
Convention ofthe American PsychologicalAssociation,New York, NY,
August 1995.
factorial(N) = 1
if N = 0
Author's note. This research was supported by Grant ONR N0014-
90-J-1489 from the Officeof Naval Research and Grant SBR 94-21332
from the National Science Foundation.
I would like to thank Marsha Lovett and Lynne Reder for their
comments on the article.
Correspondenceconcerningthis article shouldbe addressedto John
R. Anderson, Department of Psychology,Carnegie Mellon University,
Pittsburgh, PA 15213.For more information on the ACTtheory,consult
the ACT-Rhome pageon the WorldWideWeb:http://sands.psy.cmu.edu.
= factorial(N-1) × N if N > O.
The first part of the specification, factorial(O) = 1, is just
stating part of the definition of factorial. But the second
recursive specification seems mysterious to many and ap-
pears all the more mysterious that anyone can go from
April 1996 American Psychologist
Copyright 1996 by the American Psychological Association, Inc. 0003-066X/96/$2.00
Vol. 51, No. 4, 355-365
355
T he designation of our species as homo sapiens re-
241144732.002.png
Figure 1
Mean Actual Error Rate and Expected Error Rate Across
Successive Rule Applications
0.5
proposed a distinction between declarative knowledge,
which HAM dealt with, and procedural knowledge, which
HAM did not deal with. Borrowing ideas from Newell
(1972, 1973), it was proposed that procedural knowledge
was implemented by production rules. A production-sys-
tem model called ACTE was proposed to embody this
joint procedural-declarative theory. After 7 years of
working with variants of that system, we were able to
develop a theory called ACT* (Anderson, 1983) that em-
bodied a set of neurally plausible assumptions about how
such a system might be implemented and also psycho-
logically plausible assumptions about how production
rules might be acquired. That system remained with us
for l0 years, but a new system called ACT-R was then
put forward by Anderson (1993b). Reflecting technical
developments in the past decade, this system now serves
as a computer simulation tool for a small research com-
munity. The key insight of this version of the system is
that the acquisition and deployment processes are tuned
to give adaptive performance given the statistical structure
of the environment. It is the ACT-R system that we will
describe.
Representational Assumptions
Declarative and procedural knowledge are intimately
connected in the ACT-R theory. Production rules embody
procedural knowledge, and their conditions and actions
are defined in terms of declarative structures. A specific
production rule can only apply when that rule's condi-
tions are satisfied by the knowledge currently available
in declarative memory. The actions that a production
rule can take include creating new declarative structures.
Declarative knowledge in ACT-R is represented in
terms of chunks (Miller, 1956; Servan-Schreiber, 1991 )
that are schema-like structures, consisting of an isa pointer
specifying their category and some number of additional
pointers encoding their contents. Figure 2 is a graphical
display of a chunk encoding the addition fact that 3 + 4
= 7. This chunk can also be represented textually:
----C.--- Actual Error Rate
Expected Error Rate
04
03
0,2
Opportunityto Apply Rule(RequiredExercises Only)
Note. From "Student Modeling in the ACT Progromming Tutor," by A. T. Corbett,
.I R. Anderson, and A. 1-. O'Brien, 1995, in P. Nichols, S. Chipman, and B. Brennan,
CognitivelyDiagnosticAssessment, Hillsdale, N J: Erlbaurn. Copyright ] 995 by Erl-
bourn. Reprinted by permission.
track the learning of such rules and have found that they
improve gradually with practice, as illustrated in Figure
1. Our evidence indicates that underlying the complex,
mystical skill of recursive programming is about 500 rules
like the one above, and that each rule follows a simple
learning curve like Figure 1.
This illustrates the major claim of this article:
All that there is to intelligence is the simple accrual and
tuning of many small units of knowledge that in total
produce complex cognition. The whole is no more than
the sum of its parts, but it has a lot of parts.
The credibility of this claim has to turn on whether
we can establish in detail how the claim is realized in
specific instances of complex cognition. The goal of the
ACT theory, which is the topic of this article, has been
to establish the details ofthis claim. It has been concerned
with three principal issues: How are these units of knowl-
edge represented, how are they acquired, and how are
they deployed in cognition?
The ACT theory has origins in the human associative
memory (HAM) theory of human memory (Anderson &
Bower, 1973), which attempted to develop a theory of
how memories were represented and how those repre-
sentations mediated behavior that was observed in mem-
ory experiments. It became apparent that this theory only
dealt with some aspects of knowledge; Anderson (1976)
II I
Figure 2
Network Representation of an ACT-R Chunk
Addition-fact
addendl//fact3+4~ sum
Three~wj sj~ ~,..~.~ §~
Seven
addend2
Four
wj
356
April 1996 • American Psychologist
I
241144732.003.png
fact3+4
isa addition-fact
addendl three
addend2 four
sum
time for previous symbols) reflect the time for the extra
production. The next symbol to be encoded (the 3) takes
approximately 550 milliseconds to process (see Part e of
Figure 3), reflecting again two productions but this time
also retrieval of the fact 4 + 3 = 7. The mental represen-
tation of the equation at this point is collapsed into x +
7. The = sign is next processed in Part f of Figure 3. It
takes a particularly short time. We think this reflects the
strategy of some participants of just skipping over that
symbol. The final symbol comes in (see Part g of Figure
3) and leads to a long latency reflecting seven productions
that need to apply to transform the equation and the
execution of the motor response of typing the number
key.
Procedural knowledge, such as mathematical prob-
lem-solving skill, is represented by productions. Produc-
tion rules in ACT-R respond to the existence of specific
goals and often involve the creation of subgoals. For in-
stance, suppose a child was at the point illustrated below
in the solution of a multicolumn addition problem:
531
+248
9
The example in Figure 3 is supposed to reflect the
relative detail in which we have to analyze human cog-
nition in ACT-R to come up with faithful models. The
simulation is capable of solving the same problems as the
participants. It can actually interact with the same ex-
perimental software as the participants, execute the same
scanning actions, read the same computer screen, and
execute the same motor responses with very similar tim-
ing (Anderson, Matessa, & Douglass, 1995). When I say,
"The whole is no more than the sum of its parts but it
has a lot of parts," these are the parts I have in mind.
These parts are the productions rules and the chunk
structures that represent long-term knowledge and the
evolving understanding of the problem.
Knowledge units like these are capable of giving rel-
atively accurate simulations of human behavior in tasks
such as these. However, the very success of such simu-
lations only makes salient the two other questions that
the ACT-R theory must address, which are how did the
prior knowledge (productions and long-term chunks)
come to exist in the first place and how is it, if the mind
is composed of a great many of these knowledge units,
that the appropriate ones usually come to mind in a par-
ticular problem-solving context? These are the questions
of knowledge acquisition and knowledge deployment.
Focused on the tens column, the following production
rule might apply from the simulation of multicolumn
addition (Anderson, 1993b):
IF the goal is to add n 1 and n2 in a column
andnl +n2=n3
THEN set as a subgoal to write n3 in that column
This production rule specifies in its condition the goal of
working on the tens column and involves a retrieval of a
declarative chunk like the one illustrated in Figure 2. In
its action, it creates a subgoal that might involve things
like processing a carry. The subgoal structure assumed
in the ACT-R production system imposes this strong ab-
stract, hierarchical structure on behavior. As argued else-
where (Anderson, 1993a), this abstract, hierarchical
structure is an important part of what sets human cog-
nition apart from that of other species.
Much of the recent effort in the ACT-R theory has
gone into detailed analyses of specific problem-solving
tasks. One of these involves equation solving by college
students (e.g., Anderson, Reder, & Lebiere, in press). We
have collected data on how they scan equations, including
the amount of time spent on each symbol in the equation.
Figure 3 presents a detailed simulation of the solution of
equations like X + 4 + 3 = 13, plus the average scanning
times of participants solving problems of this form (mixed
in with many other types of equations in the same ex-
periment). As can be seen in Parts a-c of that figure, the
first three symbols are processed to create a chunk struc-
ture of the form x + 4. In the model, there is one pro-
duction responsible for processing each type of symbol.
The actual times for the first three symbols are given in
Parts a-c of Figure 3. They are on the order of 400 mil-
liseconds, which we take as representing approximately
300 milliseconds to implement the scanning and encoding
of the symbol and 100 milliseconds for the production
to create the augmentation to the representation. 2
The next symbol to be encoded, the +, takes about
500 milliseconds to process in Part d. As can be seen, it
involves two productions, one to create a higher level
chunk structure and another to encode the plus into that
structure. The extra 100 milliseconds (over the encoding
Knowledge Acquisition
A theory of knowledge acquisition must address both the
issue of the origins of the chunks and of the origins of
production rules. Let us first consider the origin of
chunks. As the production rules in Figure 3 illustrate,
chunks can be created by the actions of production rules.
However, as we will see shortly, production rules originate
from the encodings of chunks. To avoid circularity in the
theory we also need an independent source for the origin
of the chunks. That independent source involves encoding
from the environment. Thus, in the terms of Anderson
and Bower (1973), ACT-R is fundamentally a sensation-
1This involves a scheme wherein participants must point at the
part of the equation that they want to read next.
2Althoughour data stronglyconstrainthe processing,there remain
a number of arbitrary decisions about how to represent the equation
that could havebeen made differently.
April 1996 • American Psychologist
357
seven
241144732.004.png
•=
-~
,,~ + ~
~
~
~o ~ ~
o g
~o~= ='7-=g =-=- ='= ~
e== eg_:
.--
-~o ~ S~
-'-~" =.
g"
£~a=
£~---'E
-.~=
-= u ga
-5=~
-
=
o
~-'~= __.'~ ,E
"=
8
g
c~
U
<
%
d
o
E
~ b4o r~ ~
=,.,.
• ~, ~
?
+
a
~ ~, t,,~ •.-'] ,~.,~ or~
a°o-~
m o..- .-
•- o ~.~
+
a-
[....
,T = ~ N","
=
2
x
c=
)
CO
o
E
II
o3
+
>
X
Iln
+>~ ~ ~ m~
am ©~
~..,=
o
"a g =.~
if= ~'e
==r~l
£
='-=
241144732.005.png
atist theory in that its knowledge structures result from
environmental encodings.
We have only developed our ideas about environ-
mental encodings of knowledge with respect to the visual
modality (Anderson, Matessa, & Douglass, 1995). In this
area, it is assumed that the perceptual system has parsed
the visual array into objects and has associated a set of
features with each object. ACT-R can move its attention
over the visual array and recognize objects. We have
embedded within ACT-R a theory that might be seen as
a synthesis of the spotlight metaphor of Posner (1980),
the feature-synthesis model of Treisman (Treisman &
Sato, 1990), and the attentional model of Wolfe (1994).
Features within the spotlight can be synthesized into rec-
ognized objects. Once synthesized, the objects are then
available as chunks in ACT's working memory for further
processing. In ACT-R the calls for shifts of attention are
controlled by explicit firings of production rules.
The outputs of the visual module are working mem-
ory elements called chunks in ACT-R. The following is
a potential chunk encoding of the letter H:
object
and encode that the second structure is dependent on the
first. What the learner must do is find some mapping
between the two structures. The default assumption is
that identical structures directly map. In this case, it is
assumed the 3x in the first equation maps onto the 3x
in the second equation. This leaves the issue of how to
relate the 7 and 13 to the 6. ACT-R looks for some chunk
structure to make this mapping. In this case, it will find
a chunk encoding that 7 + 6 = 13. Completing the map-
ping ACT-R will form a production rule to map one
structure onto the other:
IF the goal is to solve an equation of the form
arg + nl = n3
andnl +n2=n3
THEN make the goal to solve an equation
of the form arg = n2
This approach takes a very strong view on instruction.
This view is that one fundamentally learns to solve prob-
lems by mimicking examples ofsolutions. This is certainly
consistent with the substantial literature showing that ex-
amples are as good as or better than abstract instruction
that tells students what to do (e.g., Cheng, Holyoak, Nis-
bett, & Oliver, 1986; Fong, Krantz, & Nisbett, 1986; Reed
& Actor, 199 I). Historically, learning by imitation was
given bad press as cognitive psychology broke away from
behaviorism (e.g., Fodor, Bever, & Garrett, 1974). How-
ever, these criticisms assumed a very impoverished com-
putational sense of what is meant by imitation.
It certainly is the case that abstract instruction does
have some effect on learning. There are two major func-
tions for abstract instruction in the ACT-R theory. On
the one hand, it can provide or make salient the right
chunks (such as 7 + 6 = 13 in the example above) that
are needed to bridge the transformations. It is basically
this that offers the sophistication to the kind of imitation
practiced in ACT-R. Second, instruction can take the form
of specifying a sequence of subgoals to solve a task (as
one finds in instruction manuals). In this case, assuming
the person already knows how to achieve such subgoals,
instruction offers the learner a way to create an example
of such a problem solution from which they can then
learn production rules like the one above.
The most striking thing about the ACT-R theory of
knowledge acquisition is how simple it is. One encx~es
chunks from the environment and makes modest infer-
ences about the rules underlying the transformations in-
volved in examples of problem solving. There are no great
leaps of insight in which large bodies of knowledge are
reorganized. The theory implies that acquiring compe-
tence is very much a labor-intensive business in which
one must acquire one-by-one all the knowledge compo-
nents. This flies very much in the face of current edu-
cational fashion but, as Anderson, Reder, and Simon
(1995) have argued and documented, this educational
fashion is having a very deleterious effect on education.
We need to recognize and respect the effort that goes into
acquiring competence (Ericcson, Krampe, & Tesche-
Romer, 1993). However, it would be misrepresenting the
isa H
left-vertical barl
fight-vertical bar2
horizontal bar3
We assume that before the recognition of the object, these
features (the bars) are available as parts of an object but
that the object itself is not recognized. In general, we
assume that the system can respond to the appearance
of a feature anywhere in the visual field. However, the
system cannot respond to the conjunction of features that
define a pattern until it has moved its attention to that
part of the visual field and recognized the pattern of fea-
tures. Thus, there is a correspondence between this model
and the feature synthesis model of Treisman (Treisman
& Sato, 1990).
A basic assumption is that the process of recognizing
a visual pattern from a set of features is identical to the
process of categorizing an object given a set of features.
We have adapted the Anderson and Matessa (1992) ra-
tional analysis of categorization to provide a mechanism
for assigning a category (such as H) to a particular con-
figuration of features. This is the mechanism within
ACT-R for translating stimulus features from the envi-
ronment into chunks like the ones above that can be pro-
cessed by the higher level production system.
With the environmental origins of chunks specified,
we can now turn to the issue of the origins of production
rules. Production rules specify the transformations of
chunks, and we assume that they are encoded from
examples of such transformations in the environment.
Thus, a student might encounter the following example
in instruction:
3x+7= 13
3x=6
April 1996 • American Psychologist
359
241144732.001.png
Zgłoś jeśli naruszono regulamin