V22.0436 - Prof. Grishman

Lecture 18: Measuring Performance

Text: Chapter 2

What is performance? Computer performance is a measure of how long it takes to perform a task, or how many tasks can be performed in a given time period. The performance that matters to us is how long it takes to perform our tasks. However, unless we can afford to benchmark our task on each machine we are considering, we have to rely on more generic measures of computer performance.

For the moment, we shall just discuss CPU performance and ignore IO. The basic equation is:

time to run program = (number of instructions executed) * (average CPI) * (clock cycle time)

where CPI = number of clock cycles per instruction. For a given program, the number of instructions executed depends on the compiler used and on the architecture (instruction set). The average CPI depends on the implementation of the architecture.

Some Popular Metrics

How Architecture Affects Performance

Our goal is to minimize the product of the three factors given above. Whenever we consider a change to the architecture, we must evaluate its effect on each of these factors.

In particular, when we add an instruction to the instruction set, we must consider whether it can significantly reduce the number of instructions to execute (the first factor) without affecting the time per instruction (the last two factors). A specialized instruction may be used only rarely by a compiler. On the other hand, if the instruction requires a longer data path, it may require a longer clock cycle. The net effect would be a slower machine. [We ignore the issue of code size, which is much less important than it used to be because memory is so much cheaper.]

Good candidates for instructions are those which would be used frequently and would take much longer if performed by a sequence of other instructions ... for example, floating point operations for scientific applications.

The trend toward RISC machines reflects a more careful assessment of the benefits and costs of adding instructions to the instruction set.

MIPS Implementations: multiple clock cycles / instruction

The first implementation we considered executed all instructions in a single cycle. This had two major disadvantages:

Suppose that a memory operation takes 10 ns, an ALU operations takes 10 ns, and a register operation takes 5 ns. Then R-type operations take 30 ns, beq 25 ns, sw 35 ns, and lw 40 ns (see P&H p. 309 for details). However, with a single-cycle system, we must have a clock cycle of 40 ns.

P&H describe how to overcome these problems through a multiple-clock-cycle implementation of the MIPS subset.