Computer Architecture
1999-2000 Fall
MW 3:30-4:45
Ciww 109
Allan Gottlieb
gottlieb@nyu.edu
http://allan.ultra.nyu.edu/~gottlieb
715 Broadway, Room 1001
212-998-3344
609-951-2707
email is best
 ======== START LECTURE #11
========
 
Note:  Midterm exam next wed or the week
after.  We will vote end of today's class.
Shifter
This is a sequential circuit.
- 
    Just a string of D-flops; output of one is input of next
    
    - 
        Input to first is the serial input.
    
 - 
        Output of last is the serial output.
    
 
    
 - 
    We want more.
    
    - 
        Left and right shifting (with serial input/output)
    
 - 
        Parallel load
    
 - 
        Parallel Output
    
 - 
        Don't shift every cycle
    
 
 - 
    Parallel output is just wires.
    
 - 
    Shifter has 4 modes (left-shift, right-shift, nop, load) so
    
    - 
        4-1 mux inside
    
 - 
        2 control lines must come in
    
 
 - 
    We could modify our registers to be shifters (bigger mux), but ...
 - 
    Our shifters are slow for big shifts; ``barrel shifters'' are
    better and kept separate from the processor registers.
 

Homework:
           A 4-bit shift register initially contains 1101.  It is
           shifted six times to the right with the serial input being
           101101.  What is the contents of the register after each
           shift.
Homework:
           Same register, same initial condition.  For
           the first 6 cycles the opcodes are left, left, right, nop,
           left, right and the serial input is 101101.  The next cycle
           the register is loaded (in parallel) with 1011.  The final
           6 cycles are the same as the first 6.  What is the contents
           of the register after each cycle?
4.6: Multiplication
- Of course we can do this with two levels of logic since
    multiplication is just a function of its inputs.
 - But just as with addition, would have a very big circuit and large
    fan in.  Instead we use a sequential circuit that mimics the
    algorithm we all learned in grade school.
 - 
    Recall how to do multiplication.
    
    - 
        Multiplicand times multiplier gives product
    
 - 
        Multiply multiplicand by each digit of multiplier
    
 - 
        Put the result in the correct column
    
 - 
        Then add the partial products just produced
    
 
    
 - 
    We will do it the same way ...
    ... but differently
    
    
    - 
        We are doing binary arithmetic so each ``digit'' of the
        multiplier is 1 or zero.
     - 
        Hence ``multiplying'' the mulitplicand by a digit of the
        multiplier means either
        
        - 
            Getting the multiplicand
        
 - 
            Getting zero
        
 
        
     - 
        Use an ``if appropriate bit of multiplier is 1'' stmt
        
     - 
        To get the ``appropriate bit''
        
        - 
            Start with the LOB of the multiplier
        
 - 
            Shift the multiplier right (so the next bit is the LOB)
        
 
        
     - 
        Putting in the correct column means putting it one column
        further left that the last time.
     - 
        This is done by shifting the
        multiplicand left one bit each time (even if the multiplier
        bit is zero)
        
     - 
        Instead of adding partial products at end, we keep a running sum.
        
        - 
            If the multiplier bit is zero, add the (shifted)
            multiplicand to the running sum
        
 - 
            If the bit is zero, simply skip the addition.
        
 
     
    
 - 
    This results in the following algorithm
 
    product <- 0
    for i = 0 to 31
        if LOB of multiplier = 1
            product = product + multiplicand
        shift multiplicand left 1 bit
        shift multiplier right 1 bit
Do on the board 4-bit multiplication (8-bit registers) 1100 x 1101.
Since the result has (up to) 8 bits, this is often called a 4x4->8
multiply.
The diagrams below are for a 32x32-->64 multiplier.
What about the control?
- 
    Always give the ALU the ADD operation
 - 
    Always send a 1 to the multiplicand to shift left
 - 
    Always send a 1 to the multiplier to shift right
 - 
    Pretty boring so far but
    
    - 
        Send a 1 to write line in product if and only if
        LOB multiplier is a 1
 - 
        I.e. send LOB to write line
 - 
        I.e. it really is pretty boring
    
 
 
This works!
But, when compared to the better solutions to come, is wasteful of
resourses and hence is
- 
    slower
 - 
    hotter
 - 
    bigger
 - 
    all these are bad
 
The product register must be 64 bits since the product can contain 64
bits.
Why is multiplicand register 64 bits?
- 
    So that we can shift it left
 - 
    I.e., for our convenience.  
    By this I mean it is not required by the problem specification,
    but only by the solution method chosen.
 
Why is ALU 64-bits?
- 
    Because the product is 64 bits
 - 
    But we are only adding a 32-bit quantity to the
    product at any one step.
 - 
    Hmmm.
 - 
    Maybe we can just pull out the correct bits from the product.
 - 
    Would be tricky to pull out bits in the middle
    because which bits to pull changes each step
 
POOF!!  ... as the smoke clears we see an idea.
We can solve both problems at once
- 
    DON'T shift the multiplicand left
    
    - 
        Hence register is 32-bits.
    
 - 
        Also register need not be a shifter
    
 
    
 - 
    Instead shift the product right!
 - 
    Add the high-order (HO) 32-bits of product register to the
    multiplicand and place the result back into HO 32-bits
    
    - 
        Only do this if the current multiplier bit is one.
   
     - 
        Use the Carry Out of the sum as the new bit to shift
        in
        
     - 
        The book forgot the last point but their example used numbers
        too small to generate a carry
   
 
 
This results in the following algorithm
    product <- 0
    for i = 0 to 31
        if LOB of multiplier = 1
            (serial_in, product[32-63]) <- product[32-63] + multiplicand
        shift product right 1 bit
        shift multiplier right 1 bit
What about control
- 
    Just as boring as before
 - 
    Send (ADD, 1, 1)  to (ALU, multiplier (shift right), Product
    (shift right)). 
 - 
    Send LOB to Product (write).
 
Redo same example on board
A final trick (``gate bumming'', like code bumming of 60s).
- 
    There is a waste of registers, i.e. not full unilization.
    
    - 
        The multiplicand is fully unilized since we always need all 32 bits.
    
 - 
        But once we use a multiplier bit, we can toss it so we need
        less and less of the multiplier as we go along.
    
 - 
        And the product is half unused at beginning and only slowly ...
    
 - 
        POOF!!
    
 
    
 - 
    ``Timeshare'' the LO half of the ``product register''.
    
    - 
        In the beginning LO half contains the multiplier.
    
 - 
        Each step we shift right and more goes to product
        less to multiplier.
    
 
    
 - 
    The algorithm changes to:
 
    product[0-31] <- multiplier
    for i = 0 to 31
        if LOB of product = 1
            (serial_in, product[32-63]) <- product[32-63] + multiplicand
        shift product right 1 bit
Control again boring.
- Send (ADD, 1) to (ALU, Product (shift right)).
 - Send LOB to Product (write).
 
Redo the same example on the board.
The above was for unsigned 32-bit multiplication.
What about  signed multiplication.
- 
    Save the signs of the multiplier and multiplicand.
 - 
    Convert multiplier and multiplicand to non-neg numbers.
 - 
    Use above algorithm.
 - 
    Only use 31 steps not 32 since there are only 31 multiplier bits
    (the HOB of the multiplier is the sign bit, not a bit used for
    multiplying).
 - 
    Compliment product if original signs were different.
 
There are faster multipliers, but we are not covering them.
4.7: Division
We are skiping division.
4.8: Floating Point
We are skiping floating point.
4.9: Real Stuff: Floating Point in the PowerPC and 80x86
We are skiping floating point.
Homework:
Read 4.10 ``Fallacies and Pitfalls'', 4.11 ``Conclusion'', 
and 4.12 ``Historical Perspective''.