CSCI-UA.0436 - Prof. Grishman

Lecture 15: Measuring Performance

How fast is our single-cycle MIPS CPU?

For a synchronous machine, clock period must be greater than maximum propagation delay of combinational circuit (which computes next state).  So period for MIPS must be greater than time required for longest instruction.

How long is that?  Assume memories and ALU have 200 ps delay, and the register file (for read or write) 100 ps delay.  (Text, pp. 332-333)  Add up delays for each type of instruction, determine maximum delay (slowest instruction type).

Can we do better?  To answer that, we must consider
 

Measuring Performance

Text: Section 1.4

What is performance? Computer performance is a measure of how long it takes to perform a task, or how many tasks can be performed in a given time period. The performance that matters to us is how long it takes to perform our tasks. However, unless we can afford to benchmark our task on each machine we are considering, we have to rely on more generic measures of computer performance.

For the moment, we shall just discuss CPU performance and ignore IO. The basic equation is:

time to run program = (number of instructions executed) * (average CPI) * (clock cycle time)

where CPI = number of clock cycles per instruction. For a given program, the number of instructions executed depends on the compiler used and on the architecture (instruction set). The average CPI depends on the implementation of the architecture.

Some Popular Metrics

How Architecture Affects Performance

Our goal is to minimize the product of the three factors (number of instructions executed, average CPI, clock cycle time) . Whenever we consider a change to the architecture, we must evaluate its effect on each of these factors.

In particular, when we add an instruction to the instruction set, we must consider whether it can significantly reduce the number of instructions to execute (the first factor) without affecting the time per instruction (the last two factors). A specialized instruction may be used only rarely by a compiler (most of the execution time is spent on a small number of instructions;  see Figure 3.26 for the distribution of instructions for the SPEC benchmarks).  On the other hand, if the instruction requires a longer data path, it may require a longer clock cycle. The net effect would be a slower machine. [We ignore the issue of code size, which is much less important than it used to be because memory is so much cheaper.]

Good candidates for instructions are those which would be used frequently and would take much longer if performed by a sequence of other instructions ... for example, floating point operations for scientific applications.

The increased use of RISC machines, starting in the mid-80's, reflected a more careful assessment of the benefits and costs of adding instructions to the instruction set.

However, the issue of binary code compatibility remains very important, if not overwhelming.  The development of entirely new machine architectures has decreased since the 90's, with the Intel PC architecture, and its variants, increasingly dominant.  As we shall discuss later, current microprocessors achieve both speed and code compatibility by translating Intel instructions into a RISC-like instruction set internally.