G22.2233  Prof. Grishman
Lecture 5: Arithmetic and ALU Design
Representing signed numbers (text, section 4.2  4.3)

negative numbers generally represented in two's complement

computing the two's complement: flipping each bit and adding 1

doing subtraction by adding the two's complement

sign extension

overflow detection [two negative numbers producing a positive sum, or two
positive numbers producing a negative sum]
MIPS ALU (text, section 4.5)
An ALU (arithmeticlogical unit) is a combinational circuit capable of
computing a variety of arithmetic and logical functions.

operations needed for MIPS instructions discussed so far: add, subtract,
and, or, zero test, comparison [MIPS also has nor, xor; multiply and divide]

general strategy: different circuits combined by multiplexer; multiplexer
select becomes function select for ALU

inverting B input for subtraction

feed output of highorder bit of adder to loworder bit for "set on less"
operation

use OR gate on ALU output for "equal" test
Carry lookahead (text, section 4.5)

simplest adder is "ripple carry": slow (delay time linear in
size of operands)

add time is usually critical in determining overall cycle time of a machine
We can speed up addition by introducing notion of "carry generate" and
"carry propagate":
gi = ai * bi
pi = ai + bi

this can be used to compute carries into each bit position:
c1 = g0 + (p0 * c0)
c2 = g1 + (p1 * g0) + (p1 * p0 * c0)
c3 = g2 + (p2 * g1) + (p2 * p1 * g0) + (p2 * p1 * p0 * c0)
c4 = g3 + (p3 * g2) + (p3 * p2 * g1) + (p3 * p2 * p1 * g0) + (p3 *
p2 * p1 * p0 * c0)

and then we compute each sum bit independently using the carry:
Sumi = (ai exor bi) exor ci
We get greater savings when we build a 16bit adder, and compute group
generate and propagate values for each 4bit group. Note that group values
are designated by capital letters.
P0 = p3 * p2 * p1 * p0
P1 = p7 * p6 * p5 * p4
P2 = p11 * p10 * p9 * p8
P3 = p15 * p14 * p13 * p12
G0 = g3 + (p3 * g2) + (p3 * p2 * g1) + (p3 * p2 * p1 * g0)
G1 = g7 + (p7 * g6) + (p7 * p6 * g5) + (p7 * p6 * p5 * g4)
G2 = g11 + (p11 * g10) + (p11 * p10 * g9) + (p11 * p10 * p9 * g8)
G3 = g15 + (p15 * g14) + (p15 * p14 * g13) + (p15 * p14 * p13 * g12)

we can use these to compute the carry into each group:
into bit 4 (C1 = c4), into bit 8 (C2 = c8), and into bit 12 (C3
= c12)
C1 = G0 + (P0 * c0)
C2 = G1 + (P1 * G0) + (P1 * P0 * c0)
C3 = G2 + (P2 * G1) + (P2 * P1 * G0) + (P2 * P1 * P0 * c0)
as well as the carry from the entire 16bit addition, C4

once we have C1, carries c5, c6, and c7 can be computed from c4 (=C1);
similarly, carries c9, c10, and c11 can be computed from c8 (=C2),
and
carries c13, c14, and c15 can be computed from c12 (=C3)

how long does this all take?

1 gate delay to compute gi, pi

2 gate delays to compute Gi (only 1 for Pi)

2 gate delays for Ci

2 gate delays for ci

1 gate delay (exclusiveor) for Sumi
all together, 8 gate delays to add 16 bits!

for a 64bit adder, we would compute generate and propagate on 16bit supergroups,
adding 4 more gate delays
in general, delay time with carry lookahead is logarithmic
in the size of the operands
Spring 2002