### How fast can we go?

#### Propagation delay

The operation of a gate is not instantaneous.  After the input to a gate changes, it takes a certain amount of time before the output changes to its new value.  This time is called the gate delay.  The delay is determinined by the electrical characteristics of the gate (size, voltage, semiconductor properties).

The propagation delay of a combinational circuit is similarly the amount of time, after the input to the circuit changes, for the output to reach its final value.  If the circuit is built up from gates, the propagation delay depends on the longest path in the circuit from an input to the output:

propagation delay = gate delay   x   length of longest path (in gates)

#### The processing unit and clock speed

The heart of the processing unit of the LC-3  (see Figure 4.3, p. 102)  consists of the register file and the ALU.  A typical LC-3 arithmetic instruction (ADD, AND) reads two numbers out of the register file, sends them through the ALU, and writes the output of the ALU back into (another register of) the register file.

Suppose at time 0 we read two numbers out of the registers in the register file.  These bits must first go through the multiplexer which is part of the register file;  let's suppose this multiplexer has a propagation delay = 4 gate delays.  Then they go through the ALU;  let's suppose the ALU has a propagation delay = 20 gate delays.  Coming out of the ALU, the signals go back to the data input of the registers in the register file. Let's allow one more gate delay to account for the delay in the wires, for a total of 25 gate delays.

Suppose the gate delay = 10 ps (picoseconds = 10-12 seconds).  Then the total delay for this loop is 25 x 10 ps = 250 ps.  This means that, if we started at time 0, we must wait at least 250 ps before we 'clock' the register to load its new value.  A machine will normally clock the registers at some regular interval;  in this example, the interval must be at least 250 ps.

Clock speed = 1 / clock interval = 1 / (250 x 10 -12) = 1 / (0.250 x 10 -9) = 4 x 10 9 = 4 GHz.

So the fastest we could clock our machine is 4 GHz.

#### The control unit

The control unit controls the various select lines for the processing unit based on the instruction currently being executed.  Execution of an instruction may require up to six steps (P&P sec. 4.3), including instruction fetch, decode, address evaluation, operand fetch, execution, and result store.  If these steps each took one clock cycle, for a total of 6 clock cycles for one instruction, the speed of our LC-3 machine would be

4 x 10 9 / 6 = 0.66 x 10 9 instructions per second

#### Going faster

There are basically three factors which affect the machine speed (in instructions per second):  the gate delay;  the propagation delay of the ALU (in terms of gate delays), and the number of clock cycles needed for each instruction.
• The gate delay is determined by electrical factors, and keeps going down as circuits get smaller.
• The design of the ALU is already essentially optimal, and so its propagation delay (as a multiple of the gate delay) has not been changing.
• The number of clock cycles needs for a single instruction can be effectively reduced by overlapping the execution of successive instructions.  The simplest case is to fetch the next instruction while executing the current one.  One can go further by executing the current instruction, decoding the next one, and fetching the one after that, all at the same time.  Such multiple overlaps are referred to as pipelining.  In the best case, pipeliining allows the machine to start a new instruction on each clock cycle.