Computer Architecture

Start Lecture #5

Remark: Lab 3 assigned. alu-final-1bit

A Simple Observation

The CarryIn to the LOB and Binvert to all the 1-bit ALUs are always the same. So the ALU has just one input called Bnegate, which is sent to the appropriate inputs in the 1-bit ALUs. The final 1-bit cell of the ALU is shown on the right.

Note that the circuit is the same for all bits; however different bits are wired differently, i.e., they have different inputs and their outputs are sent to different places.

Equality Detection

To see if A = B we simply form A-B and test if the result is zero

The Final Result

The final 32-bit ALU is shown below on the left. Again note that all the bits have the same circuit. The lob and hob have special external wiring; the other 30 bits are wired the same.

To the right of this diagram we see the symbol used for an ALU.

What are the control lines?

What functions can we perform?
function4-bit cntlAinvBnegOper

We think of the three control lines Ainvert, Bnegate, and Operation as forming a single 4-bit control line. The table on the right shows what four bit value is needed for each function.

Defining the MIPS ALU in Verilog


B.6: Faster Addition: Carry Lookahead

This adder is much faster than the ripple adder we did before, especially for wide (i.e., many bit) addition.

Fast Carry Using Infinite Hardware

This is a simple (theoretical) result.

  1. An adder is a combinatorial circuit hence it can be constructed with two (or three if you count the bubbles) levels of logic. Done

  2. Consider 32-bit (or 64-bit, or 128-bit, or N-bit) addition, R=A+B.
  3. The above applied to any logic function; here are the calculations specific for addition.

Fast Carry Using the First Level of Abstraction: Propagate and Generate

At each bit position we have two input bits a and b as well as a CarryIn input. We now define two other bits propagate and generate (p=ai+bi and g=aibi).

hp B.6.1 really F0422 2e

To summarize, using a subscript i to represent the bit number,

    to generate  a carry:   gi = ai bi
    to propagate a carry:   pi = ai+bi

The diagram on the right, from P&H, gives a plumbing analogue for generate and propagate. A full size version of the diagram is here in pdf.

The point is that liquid enters the main pipe if either the initial CarryIn or one of the generates is true. The water exits the pipe at the lower left (i.e., there is a CarryOut for this bit position) if all the propagate valves are open from the lowest liquid entrance to the exit.

The two diagrams in these notes are from the 2e; the colors changed between editions.

Given the generates and propagates, we can calculate all the carries for a 4-bit addition (recall that c0=Cin is an input) as follows (this is the formula version of the plumbing):

    c1 = g0 + p0 c0
    c2 = g1 + p1 c1 = g1 + p1 g0 + p1 p0 c0
    c3 = g2 + p2 c2 = g2 + p2 g1 + p2 p1 g0 + p2 p1 p0 c0
    c4 = g3 + p3 c3 = g3 + p3 g2 + p3 p2 g1 + p3 p2 p1 g0 + p3 p2 p1 p0 c0

4bit cla

Thus we can calculate c1 ... c4 in just two additional gate delays given the p's and g's (where we assume one gate can accept upto 5 inputs). Since we get gi and pi after one gate delay, the total delay for calculating all the carries is 3 (this includes c4=Carry-Out)

Each bit of the sum si can be calculated in 2 gate delays given ai, bi, and ci. Thus, for 4-bit addition, 5 gate delays after we are given a, b and Carry-In, we have calculated s and Carry-Out.

We show this in the diagram on the right.

Thus, for 4-bit addition, 5 gate delays after we are given a, b and Carry-In, we have calculated s and Carry-Out using a modest amount of realistic (no more than 5-input) logic.

How does the speed of this carry-lookahead adder CLA compare to our original ripple-carry adder?

Fast Carry Using the Second Level of Abstraction

We have finished the design of a 4-bit CLA; the next goal is a 16-bit fast adder. Let's consider, at varying levels of detail, five possibilities. cla hybrid 16bit

  1. Ripple carry. Simple, we know it, but not fast.
  2. General 2 levels of logic. Always applicable, we know it, but not practical.
  3. Extend the above design to 16 bits. Possible, we could do it, but some gates have 17 inputs. Would need a tree to reduce the input count.
  4. Put together four of the 4-bit CLAs. Shown in the diagram to the right is a schematic of our 4-bit CLA and a 16-bit adder constructed from four of them.
  5. Be more clever and put together the 4-bit CLAs in a carry-lookahead manner. One could call the result a 2-level CLA.