Computer Architecture

Start Lecture #6

hpfig B.6.2

Super Propagate and Super Generate

We start the adventure by defining ``super propagate'' and ``super generate'' bits.

    P0 = p3 p2 p1 p0      Low order 4-bit adder propagates a carry
    P1 = p7 p6 p5 p4
    P2 = p11 p10 p9 p8
    P3 = p15 p14 p13 p12  High order 4-bit adder propagates a carry

    G0 = g3 + p3 g2 + p3 p2 g1 + p3 p2 p1 g0   Low order 4-bit adder generates a carry
    G1 = g7 + p7 g6 + p7 p6 g5 + p7 p6 p5 g4
    G2 = g11 + p11 g10 + p11 p10 g9 + p11 p10 p9 g8
    G3 = g15 + p15 g14 + p15 p14 g13 + p15 p14 p13 g12
  

From these super propagates and super generates, we can calculate the super carries, i.e. the carries for the four 4-bit adders.

    C1 = G0 + P0 c0
    C2 = G1 + P1 C1 = G1 + P1 G0 + P1 P0 c0
    C3 = G2 + P2 C2 = G2 + P2 G1 + P2 P1 G0 + P2 P1 P0 c0
    C4 = G3 + P3 C3 = G3 + P3 G2 + P3 P2 G1 + P3 P2 P1 G0 + P3 P2 P1 P0 c0
  

But this looks terrific! These super carries are what we need to combine four 4-bit CLAs into a 16-bit CLA in a carry-lookhead manner. Recall that the hybrid approach suffered because the carries from one 4-bit CLA to the next (i.e., the super carries) were done in a ripple carry manner.

Since it is not completely clear how to combine the pieces so far presented to get a 16-bit, 2-level CLA, I will give a pictorial account very soon.

Before the pictures, let's assume the pieces can be put together and see how fast the 16-bit, 2-level CLA actually is. Recall that we have already seen two practical 16-bit adders: A ripple carry version taking 32 gate delays and a hybrid structure taking 14 gate delays. If the 2-level design isn't faster than 14 gate delays, we won't bother with the pictures.

Remember we are assuming 5-input gates. We use lower case p, g, and c for propagates, generates, and carries; and use capital P, G, and C for the super- versions.

  1. We calculate the p's and g's (lower case) in 1 gate delay (as with the 4-bit CLA).
  2. We calculate the P's one gate delay after we have the p's or 2 gate delays after we start.
  3. The G's are determined 2 gate delays after we have the g's and p's. So the G's are done 3 gate delays after we start.
  4. The C's are determined 2 gate delays after the P's and G's. So the C's are done 5 gate delays after we start.
  5. Now the C's are sent back to the 4-bit CLAs, which have already calculated the p's and g's. The c's are calculated in 2 more gate delays (7 total) and the s's 2 more after that (9 total).
cla-pg 4-bit

Since 9<14, let the pictures begin!

  1. First perform minor surgery on the 4-bit CLA.

  2. cla clb
  3. Next put four of these 4-bit CLAs together with a Carry Lookahead Block that calculates the C's from the P's, G's and Cin=C0.

  4. cla clb
  5. We actually are not done with the CL Block.

Building CLAs Using the CL Block

It is time to validate the claim that all sizes of PLAs can be build (recursively) using the CL Block.

1-bit CLA-PG

A 1-bit CLA is just a 1-bit adder. With only one bit there is no need for any lookahead since there is no ripple to try to avoid.

However, to enable us to build a 4-bit CLA from the 1-bit version, we actually need to build what we previously called a CLA-PG. The 1-bit CLA-PG has three inputs a, b, and cin. It produces 4 outputs s, cout, p, and g. We have given the logic formulas for all four outputs previously. cla 4bit

4-bit CLA-PG

A 4-bit CLA-PG is shown as the red portion in the figure to the right.

It has nine inputs: 4 a's, 4 b's, and cin and must produce seven outputs: 4 s's, cout, p, and g (recall that the last two were previously called the super propagate and super generate respectively).

The tall black box is our CL Block.

The question is, what must the ith ? box do in order for the entire (red) structure to be a 4-bit CLA-PG?.

  1. The box must produce si, one bit of the desired sum. But this is easy since the box receives ai, bi, and ci the carry in (c0 is cin).
  2. The box must produce pi and gi for the CL Block to consume. But that is also easy since it has as input ai and bi.
  3. It looks like the ? box is just a 1-bit CLA-PG!
  4. Unfortunately, not quite. The ? box is only a (large) subset of a 1-bit CLA-PG.
  5. cla 4bit pedantic
  6. What is missing?
  7. Ans. The ? box doesn't need to produce a carry out since the larger (4-bit) CLA-PG contains a Cl-block that produces all of them.

So, if we want to say that the 4-bit (1-level) CLA-PG is composed of four 1-bit (0-level) CLA-PGs together with a CL Block, we must draw the picture as on the right. The difference is that we explicitly show that the ? box produces cout, which is then not used.

This situation will occur for all sizes. For example, either picture on the right for a a 4-bit CLA-PG produces a carry out since all 4-bit full adders do so. However, a 16-bit CLA-PG, built from four of the 4-bit units and a CL Block, does not use the carry outs produced by the four 4-bit units.

We have several alternatives.

  1. Don't mention the problem of the unused cout. A common solution but too late for us.
  2. Draw the top version of the diagram (without the unused cout's) and delcare that a CLA-PG doesn't produce a carry out. Seems weird that a CLA-PG doesn't fully replace a full adder.
  3. Draw the top version of the diagram and admit that a level k CLA-PG doesn't really use four level k-1 CLA-PG's.
  4. Draw the bottom version of the diagram.
  5. Draw the top version of the diagram, but view it as an abbreviation of the bottom version. This last is the alternative we will choose.

As another abbreviation, we will henceforth say CLA when we mean CLA-PG.

Remark: Hence the 4-bit CLA (meaning CLA-PG) is composed of

  1. Four 1-bit CLAs
  2. One CLA block
  3. Wires
  4. Nothing else
cla 16bit png
16-bit CLA-PG

Now take four of these 4-bit adders and use the identical CL block to get a 16-bit adder.

The picture on the right shows one 4-bit adder (the red box) in detail. The other three 4-bit adders are just given schematically as small empty red boxes. The CL block is also shown and is wired to all four 4-bit adders.

The complete (large) picture is shown here.

Remark: Hence the 16-bit CLA is composed of

  1. Four 4-bit CLAs
  2. One CLA block
  3. Wires
  4. Nothing else
64-bit CLA-PG

To construct a 64-bit CLA no new components are needed. That is, the only components needed have already been constructed. Specifically you need.

  1. Four magenta boxes, identical to the one just constructed.
  2. One additional CL Block, identical to the one just used to make the magenta box.
  3. Wires to connect these five boxes.

Remark: Hence the 64-bit CLA (meaning CLA-PG) is composed of

  1. Four 16-bit CLAs
  2. One CLA block
  3. Wires
  4. Nothing else

When drawn (with a brown box) the 64-bit CLA-PG has 129 inputs (64+64+1) and 67 outputs (64+1+2).

256-bit CLA-PG
  1. Four brown boxes, identical to the one just constructed.
  2. One additional CL Block, identical to the one just used to make the brown box.
  3. Wires to connect these five boxes.

Remark: Hence the 256-bit CLA (meaning CLA-PG) is composed of

  1. Four 64-bit CLAs
  2. One CLA block
  3. Wires
  4. Nothing else
etc

Homework: How many gate delays are required for our 64-bit CLA-PG? How many gate delays are required for a 64-bit ripple carry adder (constructed from 1-bit full adders)?

Summary

CLAs greatly speed up addition; the increase in speed grows with the size of the numbers to be added.

Remark: CLAs implement n-bit addition in O(log(n)) gate delays.