### Lecture 8 - Carry look-ahead

(demonstrate wire bundles for Assignment #3)

(discuss MIPS ALU -- Lecture 7 notes)

• simplest adder is "ripple carry":   slow (delay time linear in size of operands)
• add time is usually critical in determining overall cycle time of a machine
We can speed up addition by introducing notion of "carry generate" and "carry propagate":
gi = ai * bi
pi = ai + bi

• this can be used to compute carries into each bit position:

• c1 = g0 + (p0 * c0)
c2 = g1 + (p1 * g0) + (p1 * p0 * c0)
c3 = g2 + (p2 * g1) + (p2 * p1 * g0) + (p2 * p1 * p0 * c0)
c4 = g3 + (p3 * g2) + (p3 * p2 * g1) + (p3 * p2 * p1 * g0) + (p3 * p2 * p1 * p0 * c0)

• and then we compute each sum bit independently using the carry:

• Sumi = (ai ex-or bi) ex-or ci
We get greater savings when we build a 16-bit adder, and compute group generate and propagate values for each 4-bit group. Note that group values are designated by capital letters.
P0 = p3 * p2 * p1 * p0
P1 = p7 * p6 * p5 * p4
P2 = p11 * p10 * p9 * p8
P3 = p15 * p14 * p13 * p12
G0 = g3 + (p3 * g2) + (p3 * p2 * g1) + (p3 * p2 * p1 * g0)
G1 = g7 + (p7 * g6) + (p7 * p6 * g5) + (p7 * p6 * p5 * g4)
G2 = g11 + (p11 * g10) + (p11 * p10 * g9) + (p11 * p10 * p9 * g8)
G3 = g15 + (p15 * g14) + (p15 * p14 * g13) + (p15 * p14 * p13 * g12)

• we can use these to compute the carry into each group:

• into bit 4 (C1 = c4),  into bit 8 (C2 = c8), and into bit 12 (C3 = c12)
C1 = G0 + (P0 * c0)
C2 = G1 + (P1 * G0) + (P1 * P0 * c0)
C3 = G2 + (P2 * G1) + (P2 * P1 * G0) + (P2 * P1 * P0 * c0)
as well as the carry from the entire 16-bit addition, C4

• once we have C1, carries c5, c6, and c7 can be computed from c4 (=C1);

• similarly, carries c9, c10, and c11 can be computed from c8 (=C2), and
carries c13, c14, and c15 can be computed from c12 (=C3)

• how long does this all take?
• 1 gate delay to compute gi, pi
• 2 gate delays to compute Gi (only 1 for Pi)
• 2 gate delays for Ci
• 2 gate delays for ci
• 1 gate delay (exclusive-or) for Sumi
all together, 8 gate delays to add 16 bits!
• for a 64-bit adder, we would compute generate and propagate on 16-bit super-groups, adding 4 more gate delays

• in general, delay time with carry look-ahead is logarithmic in the size of the operands