Computer Architecture

Start Lecture #6

Super Propagate and Super Generate

We start the adventure by defining ``super propagate'' and ``super generate'' bits.

• A super propagate bit indicates whether the 4-bit CLA constructed above propagates a Carry-In to a Carry-Out. Super propagation occurs for a 4-bit adder when each of the constituent 1-bit adders propagates.
• A super generate bit indicates whether the 4-bit CLA constructed above generates a Carry-Out. Super generation occurs for a 4-bit adder when some 1-bit adder generates and all subsequent 1-bit adders propagate.
• To the right we show the P&H plumbing picture for super propagate and super generate. A larger picture is here.
• The corresponding logic formulas are as follows.
```    P0 = p3 p2 p1 p0      Low order 4-bit adder propagates a carry
P1 = p7 p6 p5 p4
P2 = p11 p10 p9 p8
P3 = p15 p14 p13 p12  High order 4-bit adder propagates a carry

G0 = g3 + p3 g2 + p3 p2 g1 + p3 p2 p1 g0   Low order 4-bit adder generates a carry
G1 = g7 + p7 g6 + p7 p6 g5 + p7 p6 p5 g4
G2 = g11 + p11 g10 + p11 p10 g9 + p11 p10 p9 g8
G3 = g15 + p15 g14 + p15 p14 g13 + p15 p14 p13 g12
```

From these super propagates and super generates, we can calculate the super carries, i.e. the carries for the four 4-bit adders.

• The first super carry C0, the Carry-In to the low-order 4-bit adder, is just c0 the input Carry-In.
• The second super carry C1 is the Carry-Out of the low-order 4-bit adder (which is also the Carry-In to the 2nd 4-bit adder.
• The third super carry C2 is the Carry-Out of the second 4-bit adder (which is also the Carry-In to the 3rd 4-bit adder.
• The forth super carry C3 is the Carry-Out of the third 4-bit adder (which is also the Carry-In to the 4th (high-order) 4-bit adder.
• The last super carry C4 is the Carry-out of the high-order 4-bit adder (which is also the overall Carry-out of the entire 16-bit adder).
• The corresponding logic formulas are as follows.
```    C1 = G0 + P0 c0
C2 = G1 + P1 C1 = G1 + P1 G0 + P1 P0 c0
C3 = G2 + P2 C2 = G2 + P2 G1 + P2 P1 G0 + P2 P1 P0 c0
C4 = G3 + P3 C3 = G3 + P3 G2 + P3 P2 G1 + P3 P2 P1 G0 + P3 P2 P1 P0 c0
```

But this looks terrific! These super carries are what we need to combine four 4-bit CLAs into a 16-bit CLA in a carry-lookhead manner. Recall that the hybrid approach suffered because the carries from one 4-bit CLA to the next (i.e., the super carries) were done in a ripple carry manner.

Since it is not completely clear how to combine the pieces so far presented to get a 16-bit, 2-level CLA, I will give a pictorial account very soon.

Before the pictures, let's assume the pieces can be put together and see how fast the 16-bit, 2-level CLA actually is. Recall that we have already seen two practical 16-bit adders: A ripple carry version taking 32 gate delays and a hybrid structure taking 14 gate delays. If the 2-level design isn't faster than 14 gate delays, we won't bother with the pictures.

Remember we are assuming 5-input gates. We use lower case p, g, and c for propagates, generates, and carries; and use capital P, G, and C for the super- versions.

1. We calculate the p's and g's (lower case) in 1 gate delay (as with the 4-bit CLA).
2. We calculate the P's one gate delay after we have the p's or 2 gate delays after we start.
3. The G's are determined 2 gate delays after we have the g's and p's. So the G's are done 3 gate delays after we start.
4. The C's are determined 2 gate delays after the P's and G's. So the C's are done 5 gate delays after we start.
5. Now the C's are sent back to the 4-bit CLAs, which have already calculated the p's and g's. The c's are calculated in 2 more gate delays (7 total) and the s's 2 more after that (9 total).

Since 9<14, let the pictures begin!

1. First perform minor surgery on the 4-bit CLA.
• Remove the calculation of the CarryOut, as that calculation will be performed by a different piece of logic.
• Add logic to calculate the super propagate and super generate bits P & G using the formulas given above.
• Label the resulting structure a 4-bit CLA-PG (not a standard name).
• CLA-PG has 9 inputs (two 4-bit addends and a carry-in) and 6 outputs (a 4-bit sum, P, and G).
• The diagram is on the right.

2. Next put four of these 4-bit CLAs together with a Carry Lookahead Block that calculates the C's from the P's, G's and Cin=C0.
• The formulas for the C's are above.
• The result, which is shown on the right, is a 16-bit (2-level) CLA!
• We will use CL Block to abbreviate Carry-Lookahead Block (I am afraid to use CLB fearing it will be confused with CLA).
• Note that I do not call it a 4-bit CL Block or a 16-bit CL block. More on this latter.
• The colors of the lines indicates when they are calculated.
1. The blue lines are inputs.
2. Then the red lines are calculated.
3. Then the magenta.
4. Finally the brown.
• That last bullet is stated sloppily. Gates are always calculating their outputs from their inputs. When we say something is calculated in k gate delays, we mean that the outputs are correct k gate delays after the inputs are correct. A more accurate statement of the previous bullet would be:
1. The blue lines are input, which are assumed to be valid when we start the addition.
2. The red lines are valid 3 gate delay after the blue (actually the Ps needs only 2 gate delays, but we use the Ps and Gs together so need to wait for the Gs).
Summary: the red lines are valid 3 gate delays after we start
3. The magenta lines are valid 2 gate delays after the red; so they are valid 5 gate delays after the start.
4. The brown lines are valid 4 gate delays after the magenta (2 gate delays to calculate the c's—note lower case, then two more for the Ss); so they are valid 9 gate delays after the start
• Since the magenta lines flow right to left (then down), I drew their arrowheads. I typically do not draw arrowheads for lines that go to the right, go down, or go right and down.

3. We actually are not done with the CL Block.
• We wish to make it useful for all levels of CLAs. That is, again assuming 5-input gates, we want the exact same CL block to be used for a 4-bit, 1-level CLA; a 16-bit 2-level CLA; a 64-bit, 3-level CL; a 256-bit, 4-level CLA, etc.
• Moreover, when going from an 4n-bit, n-level CLA to a4n+1-bit, n+1-level CAL, no new logic will be needed.
• Specifically, a 64-bit, 3-level CLA will be composed of four 16-bit, 2-level CLAs, one additional CL Block (identical to those in the smaller constituent CLAs), and some wires.
• The CL Block is drawn on the right and contains two outputs not shown or used previously, Pout and Gout.
• In the previous diagram we used a CL Block to assemble a 16-bit CLA from four 4-bit CLAs, but did not prepare for constructing a 64-bit CLA from four of these 16-bit CLAs. For that reason we did not have Pout and Gout (note that each 4-bit CLAs used did output a P and a G.
• When constructing a CLA using the CL block, there are actually three sizes of CLAs that are relevant.
1. The previous size CLA, i.e., the size of the constituent CLAs (4 in the diagram above).
2. The current size, i.e., the size being constructed (16 above).
3. The next size, i.e., the size for which the CLA under construction will be a constituent. The diagram above did not support the next size, a defect soon to be remedied.
• The CL Block has the following 9 inputs.
• 4 generate bits from the previous size, Gin0, Gin1, Gin2, Gin3.
• 4 propagate bits from the previous size Pin0, Pin1, Pin2, Pin3.
• The Carry in C0=Cin.
• It has the following 6 outputs
1. Four carries C1, C2, C3, and C4. The first three of which are used by the constituent CLAs of the previous size.
2. C4=Cout is an output of the current size CLA.
3. Gout and Pout, the generate and propagate to be used in the next size CLA.
• These outputs are calculated from the following, previously studied, formulas.
```	    C1 = G0 + PO Cin
C2 = G1 + P1 G0 + P1 P0 Cin
C3 = G2 + P2 G1 + P2 P1 G0 + P2 P1 P0 Cin
C4 = G3 + P3 G2 + P3 P2 G1 + P3 P2 P1 G0 + P3 P2 P1 P0 Cin
Gout = G3  +  P3 G2  +  P3 P2 G1  +  P3 P2 P1 G0
Pout = P3 P2 P1 P0
```

Building CLAs Using the CL Block

It is time to validate the claim that all sizes of PLAs can be build (recursively) using the CL Block.

1-bit CLA-PG

A 1-bit CLA is just a 1-bit adder. With only one bit there is no need for any lookahead since there is no ripple to try to avoid.

However, to enable us to build a 4-bit CLA from the 1-bit version, we actually need to build what we previously called a CLA-PG. The 1-bit CLA-PG has three inputs a, b, and cin. It produces 4 outputs s, cout, p, and g. We have given the logic formulas for all four outputs previously.

4-bit CLA-PG

A 4-bit CLA-PG is shown as the red portion in the figure to the right.

It has nine inputs: 4 a's, 4 b's, and cin and must produce seven outputs: 4 s's, cout, p, and g (recall that the last two were previously called the super propagate and super generate respectively).

The tall black box is our CL Block.

The question is, what must the ith ? box do in order for the entire (red) structure to be a 4-bit CLA-PG?.

1. The box must produce si, one bit of the desired sum. But this is easy since the box receives ai, bi, and ci the carry in (c0 is cin).
• si = ai bi ci + ai bi' ci' + ai' bi ci' + ai' bi' ci
2. The box must produce pi and gi for the CL Block to consume. But that is also easy since it has as input ai and bi.
• pi = ai + bi
• gi = ai bi
3. It looks like the ? box is just a 1-bit CLA-PG!
4. Unfortunately, not quite. The ? box is only a (large) subset of a 1-bit CLA-PG.
5. What is missing?
6. Ans. The ? box doesn't need to produce a carry out since the larger (4-bit) CLA-PG contains a Cl-block that produces all of them.

So, if we want to say that the 4-bit (1-level) CLA-PG is composed of four 1-bit (0-level) CLA-PGs together with a CL Block, we must draw the picture as on the right. The difference is that we explicitly show that the ? box produces cout, which is then not used.

This situation will occur for all sizes. For example, either picture on the right for a a 4-bit CLA-PG produces a carry out since all 4-bit full adders do so. However, a 16-bit CLA-PG, built from four of the 4-bit units and a CL Block, does not use the carry outs produced by the four 4-bit units.

We have several alternatives.

1. Don't mention the problem of the unused cout. A common solution but too late for us.
2. Draw the top version of the diagram (without the unused cout's) and delcare that a CLA-PG doesn't produce a carry out. Seems weird that a CLA-PG doesn't fully replace a full adder.
3. Draw the top version of the diagram and admit that a level k CLA-PG doesn't really use four level k-1 CLA-PG's.
4. Draw the bottom version of the diagram.
5. Draw the top version of the diagram, but view it as an abbreviation of the bottom version. This last is the alternative we will choose.

As another abbreviation, we will henceforth say CLA when we mean CLA-PG.

Remark: Hence the 4-bit CLA (meaning CLA-PG) is composed of

1. Four 1-bit CLAs
2. One CLA block
3. Wires
4. Nothing else
16-bit CLA-PG

Now take four of these 4-bit adders and use the identical CL block to get a 16-bit adder.

The picture on the right shows one 4-bit adder (the red box) in detail. The other three 4-bit adders are just given schematically as small empty red boxes. The CL block is also shown and is wired to all four 4-bit adders.

The complete (large) picture is shown here.

Remark: Hence the 16-bit CLA is composed of

1. Four 4-bit CLAs
2. One CLA block
3. Wires
4. Nothing else
64-bit CLA-PG

To construct a 64-bit CLA no new components are needed. That is, the only components needed have already been constructed. Specifically you need.

1. Four magenta boxes, identical to the one just constructed.
2. One additional CL Block, identical to the one just used to make the magenta box.
3. Wires to connect these five boxes.

Remark: Hence the 64-bit CLA (meaning CLA-PG) is composed of

1. Four 16-bit CLAs
2. One CLA block
3. Wires
4. Nothing else

When drawn (with a brown box) the 64-bit CLA-PG has 129 inputs (64+64+1) and 67 outputs (64+1+2).

256-bit CLA-PG
1. Four brown boxes, identical to the one just constructed.
2. One additional CL Block, identical to the one just used to make the brown box.
3. Wires to connect these five boxes.

Remark: Hence the 256-bit CLA (meaning CLA-PG) is composed of

1. Four 64-bit CLAs
2. One CLA block
3. Wires
4. Nothing else
etc

Homework: How many gate delays are required for our 64-bit CLA-PG? How many gate delays are required for a 64-bit ripple carry adder (constructed from 1-bit full adders)?

Summary

CLAs greatly speed up addition; the increase in speed grows with the size of the numbers to be added.

Remark: CLAs implement n-bit addition in O(log(n)) gate delays.