### Lecture 5: Arithmetic and ALU Design

#### Representing signed numbers (text, section 4.2 - 4.3)

• negative numbers generally represented in two's complement
• computing the two's complement:  flipping each bit and adding 1
• doing subtraction by adding the two's complement
• sign extension
• overflow detection [two negative numbers producing a positive sum, or two positive numbers producing a negative sum]

#### MIPS ALU (text, section 4.5)

An ALU (arithmetic-logical unit) is a combinational circuit capable of computing a variety of arithmetic and logical functions.
• operations needed for MIPS instructions discussed so far: add, subtract, and, or, zero test, comparison [MIPS also has nor, xor; multiply and divide]
• general strategy: different circuits combined by multiplexer; multiplexer select becomes function select for ALU
• inverting B input for subtraction
• feed output of high-order bit of adder to low-order bit for "set on less" operation
• use OR gate on ALU output for "equal" test

#### Carry look-ahead (text, section 4.5)

• simplest adder is "ripple carry":   slow (delay time linear in size of operands)
• add time is usually critical in determining overall cycle time of a machine
We can speed up addition by introducing notion of "carry generate" and "carry propagate":
gi = ai * bi
pi = ai + bi
• this can be used to compute carries into each bit position:

• c1 = g0 + (p0 * c0)
c2 = g1 + (p1 * g0) + (p1 * p0 * c0)
c3 = g2 + (p2 * g1) + (p2 * p1 * g0) + (p2 * p1 * p0 * c0)
c4 = g3 + (p3 * g2) + (p3 * p2 * g1) + (p3 * p2 * p1 * g0) + (p3 * p2 * p1 * p0 * c0)
• and then we compute each sum bit independently using the carry:

• Sumi = (ai ex-or bi) ex-or ci
We get greater savings when we build a 16-bit adder, and compute group generate and propagate values for each 4-bit group. Note that group values are designated by capital letters.
P0 = p3 * p2 * p1 * p0
P1 = p7 * p6 * p5 * p4
P2 = p11 * p10 * p9 * p8
P3 = p15 * p14 * p13 * p12
G0 = g3 + (p3 * g2) + (p3 * p2 * g1) + (p3 * p2 * p1 * g0)
G1 = g7 + (p7 * g6) + (p7 * p6 * g5) + (p7 * p6 * p5 * g4)
G2 = g11 + (p11 * g10) + (p11 * p10 * g9) + (p11 * p10 * p9 * g8)
G3 = g15 + (p15 * g14) + (p15 * p14 * g13) + (p15 * p14 * p13 * g12)
• we can use these to compute the carry into each group:

• into bit 4 (C1 = c4),  into bit 8 (C2 = c8), and into bit 12 (C3 = c12)
C1 = G0 + (P0 * c0)
C2 = G1 + (P1 * G0) + (P1 * P0 * c0)
C3 = G2 + (P2 * G1) + (P2 * P1 * G0) + (P2 * P1 * P0 * c0)
as well as the carry from the entire 16-bit addition, C4
• once we have C1, carries c5, c6, and c7 can be computed from c4 (=C1);

• similarly, carries c9, c10, and c11 can be computed from c8 (=C2), and
carries c13, c14, and c15 can be computed from c12 (=C3)
• how long does this all take?
• 1 gate delay to compute gi, pi
• 2 gate delays to compute Gi (only 1 for Pi)
• 2 gate delays for Ci
• 2 gate delays for ci
• 1 gate delay (exclusive-or) for Sumi
all together, 8 gate delays to add 16 bits!
• for a 64-bit adder, we would compute generate and propagate on 16-bit super-groups, adding 4 more gate delays

• in general, delay time with carry look-ahead is logarithmic in the size of the operands
Spring 2002