Computer Architecture

Start Lecture #18

Chapter 4 Performance analysis

Homework: Read Chapter 4.

4.1: Introductions

Defining Performance

Throughput measures the number of jobs per day/second/etc that can be accomplished.

Response time measures how long an individual job takes.

We define Performance as 1 / Execution time.

Relative Performance

We say that machine X is n times faster than machine Y or machine X has n times the performance of machine Y if the execution time of a given program on X = (1/n) * the execution time of the same program on Y.

But what program should be used for the comparison? Various suites have been proposed; some emphasizing CPU integer performance, others floating point performance, and still others I/O performance.

Measuring Performance

How should we measure execution time?

We mostly employ user-mode CPU time, but this does not mean the other metrics are worse.

Cycle time vs. Clock rate.

What is the cycle time for a 700MHz computer?

What is the clock rate for a machine with a 10ns cycle time?

4.2: CPU Performance and its Factors

The execution time for a given job on a given computer is

    (CPU) execution time = (#CPU clock cycles required) * (cycle time)
                         = (#CPU clock cycles required) / (clock rate)
  

The number of CPU clock cycles required equals the number of instructions executed times the average number of cycles in each instruction.

But real systems are more complicated than that!

Through a great many measurement, one calculates for a given machine the average CPI (cycles per instruction).

The number of instructions required for a given program depends on the instruction set. For example, one x86 instruction often accomplishes more than one MIPS instruction.

CPI is a good way to compare two implementations of the same instruction set (i.e., the same instruction set architecture or ISA. IF the clock cycle is unchanged, then the performance of a given ISA is inversely proportional to the CPI (e.g., halving the CPI doubles the performance).

Complicated instructions take longer; either more cycles or longer cycle time.

Older machines with complicated instructions (e.g. VAX in 80s) had CPI>>1.

With pipelining we can have many cycles for each instruction but still achieve a CPI of nearly 1.

Modern superscalar machines often have a CPI less than one. Sometimes one speaks of the IPC or instructions per cycle for such machines.

Putting this together, we see that

    Time (in seconds) =  #Instructions * CPI * Cycle_time (in seconds).
    Time (in ns)      =  #Instructions * CPI * Cycle_time (in ns).
  

Do on the board the example on page 247.