Class Notes for Computer Architecture

Computer Architecture
1999-2000 Fall
MW 3:30-4:45
Ciww 109

Allan Gottlieb
gottlieb@nyu.edu
http://allan.ultra.nyu.edu/~gottlieb
715 Broadway, Room 1001
212-998-3344
609-951-2707
email is best

======== START LECTURE #4 ========

Note: I should have assigned B.5 as part of homework 2. I did include it in the solutions. B.5 is the part for which the comment ``(Assume you have constant signals 1 and 0 as well.)'' applies.

ROMs

One way to implement a mathematical (or C) function (without side effects) is to perform a table lookup.

A ROM (Read Only Memory) is the analogous way to implement a logic function.

For a math function f we start with x and get f(x).
For a ROM with start with the address and get the value stored at that address.
Normally math functions are defined for an infinite number of values, for example f(x) = 3x for all real numbers x
We can't build an infinite ROM (sorry), so we are only interested in functions defined for a finite number of values. Today a million is OK a billion is too big.
How do we create a ROM for the function f(3)=4, f(6)=20 all other values don't care?
Simply have the ROM store 4 in address 3 and 20 in address 6.
Consider a function defined for all n-bit numbers (say n=20) and having a k-bit output for each input.
- View an n-bit input as n 1-bit inputs.
- View a k-bit output as k 1-bit outputs.
- Since there are 2^n possible inputs and each requires a k 1-bit output, there are a total of (2^n)k bits of output, i.e. the ROM must hold (2^n)k bits.
- Now consider a truth table with n inputs and k outputs. The total number of output bits is again (2^n)k (2^n rows and k output columns).
Thus the ROM implements a truth table, i.e. is a logic function.

Important: A ROM does not have state. It is another combinational circuit. That is, it does not represent ``memory''. The reason is that once a ROM is manufactured, the output depends only on the input.

A PROM is a programmable ROM. That is you buy the ROM with ``nothing'' in its memory and then before it is placed in the circuit you load the memory, and never change it. This is like a CD-R.

An EPROM is an erasable PROM. It costs more but if you decide to change its memory this is possible (but is slow). This is like a CD-RW.

``Normal'' EPROMs are erased by some ultraviolet light process. But EEPROMs (electrically erasable PROMS) are faster and are done electronically.

All these EPROMS are erasable not writable, i.e. you can't just change one bit.

A ROM is similar to PLA

Both can implement any truth table, in principle.
A 2Mx8 ROM can really implment any truth table with 21 inputs (2^21=2M) and 8 outputs.
- It stores 2M bytes
- In ROM-speak, it has 21 address pins and 8 data pins
A PLA with 21 inputs and 8 outputs might need to have 2M minterms (AND gates).
- The number of minterms depends on the truth table itself.
- For normal TTs with 21 inputs the number of minterms is MUCH less than 2^21.
- The PLA is manufactured with the number of minterms needed
Compare a PAL with a PROM
- Both can in principle implement any TT
- Both are user programmable
- A PROM with n inputs and k outputs can implement any TT with n inputs and k outputs.
- A PAL that you buy does not have enough gates for all possibilities since most TTs with n inputs and k outputs don't require nearly (2^n)k gates.

Don't Cares

Sometimes not all the input and output entries in a TT are needed. We indicate this with an X and it can result in a smaller truth table.
Input don't cares.
- The output doesn't depend on all inputs, i.e. the output has the same value no matter what value this input has.
- We saw this when we did muxes
Output don't cares
- For some input values, either output is OK.
  - This input combination is impossible.
  - For this input combination, the given output is not used (perhaps it is ``muxed out'' downstream)

Example (from the book):

If A or C is true, then D is true (independent of B).
If A or B is true, then E is true.
F is true if exactly one of the inputs is true, but we don't care about the value of F if both D and E are true

Full truth table

     A   B   C || D   E   F
     ----------++----------
     0   0   0 || 0   0   0
     0   0   1 || 1   0   1
     0   1   0 || 0   1   1
     0   1   1 || 1   1   0
     1   0   0 || 1   1   1
     1   0   1 || 1   1   0
     1   1   0 || 1   1   0
     1   1   1 || 1   1   1

This has 7 minterms.

Put in the output don't cares

     A   B   C || D   E   F
     ----------++----------
     0   0   0 || 0   0   0
     0   0   1 || 1   0   1
     0   1   0 || 0   1   1
     0   1   1 || 1   1   X
     1   0   0 || 1   1   X
     1   0   1 || 1   1   X
     1   1   0 || 1   1   X
     1   1   1 || 1   1   X

Now do the input don't cares

B=C=1 ==> D=E=11 ==> F=X ==> A=X
A=1 ==> D=E=11 ==> F=X ==> B=C=X

     A   B   C || D   E   F
     ----------++----------
     0   0   0 || 0   0   0
     0   0   1 || 1   0   1
     0   1   0 || 0   1   1
     X   1   1 || 1   1   X
     1   X   X || 1   1   X

These don't cares are important for logic minimization. Compare the number of gates needed for the full TT and the reduced TT. There are techniques for minimizing logic, but we will not cover them.

Arrays of Logic Elements

Do the same thing to many signals
Draw thicker lines and use the ``by n'' notation.
Diagram below shows a 32-bit 2-way mux and an implementation with 32 1-bit, 2-way muxes.
A Bus is a collection of data lines treated as a single logical (n-bit) value.
Use an array of logic elements to process a bus. For example, the above mux switches between 2 32-bit buses.

*** Big Change Coming ***

Sequential Circuits, Memory, and State

Why do we want to have state?

Memory (i.e. ram not just rom or prom)
Counters
Reducing gate count
- Multiplier would be quadradic in comb logic.
- With sequential logic (state) can do in linear.
  - What follows is unofficial (i.e. too fast to understand)
  - Shift register holds partial sum
  - Real slick is to share this shift reg with multiplier
  - We will do this circuit later in the course

Assume you have a real OR gate. Assume the two inputs are both zero for an hour. At time t one input becomes 1. The output will OSCILLATE for a while before settling on exactly 1. We want to be sure we don't look at the answer before its ready.

B.4: Clocks

Frequency and period

Hertz (Hz), Megahertz, Gigahertz vs. Seconds, Microseconds, Nanoseconds
Old (descriptive) name for Hz is cycles per second (CPS)
Rate vs. Time

Edges

Rising Edge; falling edge
We use edge-triggered logic
State changes occur only on a clock edge
Will explain later what this really means
One edge is called the Active edge
- The edge (rising or falling) on which changes occur
- Choice is technology dependent
- Sometimes trigger on both edges (e.g., RAMBUS or DDR memory)

Synchronous system

Now we are going to add state elements to the combinational circuits we have been using previously.

Remember that a combinational/combinatorial circuits has its outpus determined by its input, i.e. combinatorial circuits do not contain state.

State elements include state (naturally).

i.e., memory
state-elements have clock as an input
can change state only at active edge
produce output Always; based on current state
all signals that are written to state elements must be valid at the time of the active edge.
For example, if cycle time is 10ns make sure combinational circuit used to compute new state values completes in 10ns
So state elements change on active edge, comb circuit stabilizes between active edges.
Think of registers or memory as state elements.
Can have loops like at the right.
A loop like this is a cycle of the computer.

B.5: Memory Elements

We want edge-triggered clocked memory and will only use edge-triggered clocked memory in our designs. However we get there by stages. We first show how to build unclocked memory; then using unclocked memory we build level-sensitive clocked memory; finally from level-sensitive clocked memory we build edge-triggered clocked memory.

Unclocked Memory

S-R latch (set-reset)

``Cross-coupled'' nor gates
Don't assert both S and R at once
When S is asserted (i.e., S=1 and R=0)
- the latch is Set (that's why it is called S)
- Q becomes true (Q is the output of the latch)
- Q' becomes false (Q' is the complemented output)
When R is asserted
- the latch is Reset
- Q becomes false
- Q' becomes true
When neither one is asserted
- The latch remains the same, i.e. Q and Q' stay as they were
- This is the memory aspect

Clocked Memory: Flip-flops and latches

The S-R latch defined above is not clocked memory. Unfortunately the terminology is not perfect.

For both flip-flops and latches the output equals the value stored in the structure. Both have an input and an output (and the complemented output) and a clock input as well. The clock determines when the internal value is set to the current input. For a latch, the change occurs whenever the clock is asserted (level sensitive). For a flip-flop, the change occurs at the active edge.

D latch

The D is for data

The left part uses the clock.
- When the clock is low, both R and S are forced low.
- When the clock is high, S=D and R=D' so the value store is D.
Output changes when input changes and the clock is asserted.
Level sensitive rather than edge triggered.
Sometimes called a transparent latch.
We won't use these in designs.
The right hand part of the circuit is the S-R (unclocked) latch we just constructed.

In the traces below notice how the output follows the input when the clock is high and remains constant when the clock is low. We assume the stored value is initially low.

D or Master-Slave Flip-flop

This was our goal. We now have an edge-triggered, clocked memory.

Built from D latches, which are transparent
The result is Not transparent
- Changes on the active edge
- This one has the falling edge as active edge
Sometimes called a master-slave flip-flop
Note substructures with letters reused having different meaning (block structure a la algol)
Master latch (the left one) is set during the time clock is asserted. Remember that the latch is transparent, i.e. follows its input when its clock is asserted. But the second latch is ignoring its input at this time. When the clock falls, the 2nd latch pays attention and the first latch keeps producing whatever D was at fall-time.
Actually D must remain constant for some time around the active edge.
- The set-up time before the edge
- The hold time after the edge
- See diagram below

Note how much less wiggly the output is with the master-slave flop than before with the transparent latch. As before we are assuming the output is initially low.

Homework: Try moving the inverter to the other latch What has changed?