V22.0436  Prof. Grishman
Lecture 12  Carry lookahead
(text, section 4.5)
(finish discussion of MIPS ALU  lecture 9 notes)
Carry lookahead

simplest adder is "ripple carry": slow (delay time linear in
size of operands)

add time is usually critical in determining overall cycle time of a machine
We can speed up addition by introducing notion of "carry generate" and
"carry propagate":
gi = ai * bi
pi = ai + bi

this can be used to compute carries into each bit position:
c1 = g0 + (p0 * c0)
c2 = g1 + (p1 * g0) + (p1 * p0 * c0)
c3 = g2 + (p2 * g1) + (p2 * p1 * g0) + (p2 * p1 * p0 * c0)
c4 = g3 + (p3 * g2) + (p3 * p2 * g1) + (p3 * p2 * p1 * g0) + (p3 *
p2 * p1 * p0 * c0)

and then we compute each sum bit independently using the carry:
Sumi = (ai exor bi) exor ci
We get greater savings when we build a 16bit adder, and compute group
generate and propagate values for each 4bit group. Note that group values
are designated by capital letters.
P0 = p3 * p2 * p1 * p0
P1 = p7 * p6 * p5 * p4
P2 = p11 * p10 * p9 * p8
P3 = p15 * p14 * p13 * p12
G0 = g3 + (p3 * g2) + (p3 * p2 * g1) + (p3 * p2 * p1 * g0)
G1 = g7 + (p7 * g6) + (p7 * p6 * g5) + (p7 * p6 * p5 * g4)
G2 = g11 + (p11 * g10) + (p11 * p10 * g9) + (p11 * p10 * p9 * g8)
G3 = g15 + (p15 * g14) + (p15 * p14 * g13) + (p15 * p14 * p13 * g12)

we can use these to compute the carry into each group:
into bit 4 (C1 = c4), into bit 8 (C2 = c8), and into bit 12 (C3
= c12)
C1 = G0 + (P0 * c0)
C2 = G1 + (P1 * G0) + (P1 * P0 * c0)
C3 = G2 + (P2 * G1) + (P2 * P1 * G0) + (P2 * P1 * P0 * c0)
as well as the carry from the entire 16bit addition, C4

once we have C1, carries c5, c6, and c7 can be computed from c4 (=C1);
similarly, carries c9, c10, and c11 can be computed from c8 (=C2),
and
carries c13, c14, and c15 can be computed from c12 (=C3)

how long does this all take?

1 gate delay to compute gi, pi

2 gate delays to compute Gi (only 1 for Pi)

2 gate delays for Ci

2 gate delays for ci

1 gate delay (exclusiveor) for Sumi
all together, 8 gate delays to add 16 bits!

for a 64bit adder, we would compute generate and propagate on 16bit supergroups,
adding 4 more gate delays
in general, delay time with carry lookahead is logarithmic
in the size of the operands
Spring 1999