Appendix I - Computer Arithmetic.pdf

(2898 KB) Pobierz
App I.fm
I.1
Introduction
I-2
I.2
Basic Techniques of Integer Arithmetic
I-2
I.3
Floating Point
I-13
I.4
Floating-Point Multiplication
I-17
I.5
Floating-Point Addition
I-21
I.6
Division and Remainder
I-27
I.7
More on Floating-Point Arithmetic
I-32
I.8
Speeding Up Integer Addition
I-37
I.9
Speeding Up Integer Multiplication and Division
I-44
I.10
Putting It All Together
I-58
I.11
Fallacies and Pitfalls
I-62
I.12
Historical Perspective and References
I-62
Exercises
I-68
736043703.003.png
I
Computer Arithmetic
by David Goldberg
Xerox Palo Alto Research Center
The Fast drives out the Slow even if the Fast is wrong.
W. Kahan
736043703.004.png
I-2
Appendix I
Computer Arithmetic
I.1
Introduction
Although computer arithmetic is sometimes viewed as a specialized part of CPU
design, it is a very important part. This was brought home for Intel in 1994 when
their Pentium chip was discovered to have a bug in the divide algorithm. This
floating-point flaw resulted in a flurry of bad publicity for Intel and also cost
them a lot of money. Intel took a $300 million write-off to cover the cost of
replacing the buggy chips.
In this appendix we will study some basic floating-point algorithms, includ-
ing the division algorithm used on the Pentium. Although a tremendous variety of
algorithms have been proposed for use in floating-point accelerators, actual
implementations are usually based on refinements and variations of the few basic
algorithms presented here. In addition to choosing algorithms for addition, sub-
traction, multiplication, and division, the computer architect must make other
choices. What precisions should be implemented? How should exceptions be
handled? This appendix will give you the background for making these and other
decisions.
Our discussion of floating point will focus almost exclusively on the IEEE
floating-point standard (IEEE 754) because of its rapidly increasing acceptance.
Although floating-point arithmetic involves manipulating exponents and shifting
fractions, the bulk of the time in floating-point operations is spent operating on
fractions using integer algorithms (but not necessarily sharing the hardware that
implements integer instructions). Thus, after our discussion of floating point, we
will take a more detailed look at integer algorithms.
Some good references on computer arithmetic, in order from least to most
detailed, are Chapter 3 of Patterson and Hennessy [2004]; Chapter 7 of Hama-
cher, Vranesic, and Zaky [1984]; Gosling [1980]; and Scott [1985].
I.2
Basic Techniques of Integer Arithmetic
Readers who have studied computer arithmetic before will find most of this sec-
tion to be review.
Ripple-Carry Addition
Adders are usually implemented by combining multiple copies of simple com-
ponents. The natural components for addition are
half adders
and
full adders
.
The half adder takes two bits
a
and
b
as input and produces a sum bit
s
and a
carry bit
c
as output. Mathematically,
s
= (
a
+
b
) mod 2
,
an d
c
=
(
a
+
b
)/
out
out
2
, where
is the floor function. As logic equations,
s
=
ab
+
a b
and
c
=
ab
,
out
. The half adder is also called a
(2,2) adder, since it takes two inputs and produces two outputs. The full adder
ab
means
a
b
and
a
+
b
means
a
b
where
736043703.005.png
I.2 Basic Techniques of Integer Arithmetic
I
-
3
is a (3,2) adder and is defined by
s
= (
a
+
b
+
c
) mod 2,
c
out
=
(
a
+
b
+
c
)/2
, or
the logic equations
I.2.1
s
=
ab c
+
abc
+
abc
+
abc
I.2.2
c
out
=
ab
+
ac
+
bc
-bit numbers out of
smaller pieces is propagating the carries from one piece to the next. The most
obvious way to solve this is with a
The principal problem in constructing an adder for
n
full adders,
as illustrated in Figure I.1. (In the figures in this appendix, the least-significant bit
is always on the right.) The inputs to the adder are
ripple-carry adder,
consisting of
n
a
n
–1
a
n
–2
⋅ ⋅ ⋅
a
0
and
b
n
–1
b
n
–2
⋅ ⋅ ⋅
n
–1
n
–2
b
, where
a
a
⋅ ⋅ ⋅
a
represents the number
a
2
+
a
2
+
⋅ ⋅ ⋅
+
a
.
0
n
–1
n
–2
0
n
–1
n
–2
0
The
c
output of the
i
th adder is fed into the
c
input of the next adder (the
+1
i
+1
set to 0. Since the low-order
carry-in is wired to 0, the low-order adder could be a half adder. Later, however,
we will see that setting the low-order carry-in bit to 1 is useful for performing
subtraction.
In general, the time a circuit takes to produce an output is proportional to the
maximum number of logic levels through which a signal travels. However, deter-
mining the exact relationship between logic levels and timings is highly technology
dependent. Therefore, when comparing adders we will simply compare the number
of logic levels in each one. How many levels are there for a ripple-carry adder? It
takes two levels to compute
i
+ 1)-th adder) with the lower-order carry-in
c
0
1 from a 0 and b 0 . Then it takes two more levels to
compute c 2 from c 1 , a 1 , b 1 , and so on, up to c n . So there are a total of 2 n levels. Typ-
ical values of n are 32 for integer arithmetic and 53 for double-precision floating
point. The ripple-carry adder is the slowest adder, but also the cheapest. It can be
built with only n simple cells, connected in a simple, regular way.
Because the ripple-carry adder is relatively slow compared with the designs
discussed in Section I.8, you might wonder why it is used at all. In technologies
like CMOS, even though ripple adders take time O( n ), the constant factor is very
small. In such cases short ripple adders are often used as building blocks in larger
adders.
c
a n– 1
b n– 1
a n– 2
b n– 2
a 1
b 1
a 0 b 0
0
Full
adder
Full
adder
Full
adder
Full
adder
c n
s n– 1
c n– 1
s n– 2
c 2
s 1
c 1
s 0
Figure I.1 Ripple-carry adder, consisting of n full adders. The carry-out of one full
adder is connected to the carry-in of the adder for the next most-significant bit. The car-
ries ripple from the least-significant bit (on the right) to the most-significant bit (on the
left).
i
(
736043703.006.png 736043703.001.png
I-4
Appendix I Computer Arithmetic
Radix-2 Multiplication and Division
The simplest multiplier computes the product of two unsigned numbers, one bit at
a time, as illustrated in Figure I.2(a). The numbers to be multiplied are a n –1 a n –2
⋅ ⋅ ⋅
a 0 and b n –1 b n –2 ⋅ ⋅ ⋅
Multiply Step ( i )
If the least-significant bit of A is 1, then register B, containing b n –1 b n –2 ⋅ ⋅ ⋅
b 0 , is added to P; otherwise 00
⋅ ⋅ ⋅
00 is added to P. The sum is placed back
into P.
( ii )
Registers P and A are shifted right, with the carry-out of the sum being
moved into the high-order bit of P, the low-order bit of P being moved into
register A, and the rightmost bit of A, which is not used in the rest of the
algorithm, being shifted out.
Carry-out
Shift
P
A
1
n
n
B
(a)
n
Shift
P
A
n + 1
n
0
B
(b)
1
n
Figure I.2 Block diagram of (a) multiplier and (b) divider for n -bit unsigned integers.
Each multiplication step consists of adding the contents of P to either B or 0 (depend-
ing on the low-order bit of A), replacing P with the sum, and then shifting both P and A
one bit right. Each division step involves first shifting P and A one bit left, subtracting B
from P, and, if the difference is nonnegative, putting it into P. If the difference is
nonnegative, the low-order bit of A is set to 1.
b 0 , and they are placed in registers A and B, respectively.
Register P is initially 0. Each multiply step has two parts.
736043703.002.png
Zgłoś jeśli naruszono regulamin