Computer Systems Org I - Prof. Grishman
Lecture 27 - Dec. 8, 2004
How fast can we go?
The operation of a gate is not instantaneous. After the input to
a gate changes, it takes a certain amount of time before the output
changes to its new value. This time is called the gate delay. The delay is
determinined by the electrical characteristics of the gate (size,
voltage, semiconductor properties).
The propagation delay of a combinational circuit is similarly the
amount of time, after the input to the circuit changes, for the output
to reach its final value. If the circuit is built up from gates,
the propagation delay depends on the longest path in the circuit from
an input to the output:
propagation delay = gate
length of longest path (in gates)
The processing unit and clock speed
The heart of the processing unit of the LC-3 (see Figure 4.3, p.
102) consists of the register file and the ALU. A typical
LC-3 arithmetic instruction (ADD, AND) reads two numbers out of the
register file, sends them through the ALU, and writes the output of the
ALU back into (another register of) the register file.
Suppose at time 0 we read two numbers out of the registers in the
register file. These bits must first go through the multiplexer
which is part of the register file; let's suppose this
multiplexer has a propagation delay = 4 gate delays. Then they go
through the ALU; let's suppose the ALU has a propagation delay =
20 gate delays. Coming out of the ALU, the signals go back to the
data input of the registers in the register file. Let's allow one more
gate delay to account for the delay in the wires, for a total of 25
Suppose the gate delay = 10 ps (picoseconds = 10-12
seconds). Then the total delay for this loop is 25 x 10 ps = 250
ps. This means that, if we started at time 0, we must wait at
least 250 ps before we 'clock' the register to load its new
value. A machine will normally clock the registers at some
regular interval; in this example, the interval must be at least
Clock speed = 1 / clock interval = 1 /
(250 x 10 -12) = 1 / (0.250 x 10 -9) = 4 x 10 9
= 4 GHz.
So the fastest we could clock our machine is 4 GHz.
The control unit
The control unit controls the various select lines for the processing
unit based on the instruction currently being executed. Execution
of an instruction may require up to six steps (P&P sec. 4.3),
including instruction fetch, decode, address evaluation, operand fetch,
execution, and result store. If these steps each took one clock
cycle, for a total of 6 clock cycles for one instruction, the speed of
our LC-3 machine would be
4 x 10 9 / 6 = 0.66 x 10 9
instructions per second
There are basically three factors which affect the machine speed (in
instructions per second): the gate delay; the propagation
delay of the ALU (in terms of gate delays), and the number of clock
cycles needed for each instruction.
- The gate delay is determined by electrical factors, and keeps
going down as circuits get smaller.
- The design of the ALU is already essentially optimal, and so its
propagation delay (as a multiple of the gate delay) has not been
- The number of clock cycles needs for a single instruction can be
effectively reduced by overlapping the
execution of successive instructions. The simplest case is to
fetch the next instruction while executing the current one. One
can go further by executing the current instruction, decoding the next
one, and fetching the one after that, all at the same time. Such
multiple overlaps are referred to as pipelining.
In the best case, pipeliining allows the machine to start a new
instruction on each clock cycle.