======== START LECTURE #19 ========

2.3: Relating the metrics

The execution time for a given job on a given computer is

(CPU) execution time = (#CPU clock cycles required) * (cycle time)
                     = (#CPU clock cycles required) / (clock rate)

The number of CPU clock cycles required equals the number of instructions executed times the average number of cycles in each instruction.

But real systems are more complicated than that!

Through a great many measurement, one calculates for a given machine the average CPI (cycles per instruction).

The number of instructions required for a given program depends on the instruction set. For example, we saw in chapter 3 that 1 Vax instruction is often accomplishes more than 1 MIPS instruction.

Complicated instructions take longer; either more cycles or longer cycle time.

Older machines with complicated instructions (e.g. VAX in 80s) had CPI>>1.

With pipelining can have many cycles for each instruction but still have CPI nearly 1.

Modern superscalar machines have CPI < 1.

Putting this together, we see that

   Time (in seconds) =  #Instructions * CPI * Cycle_time (in seconds).
   Time (in ns)      =  #Instructions * CPI * Cycle_time (in ns).

Homework: Carefully go through and understand the example on page 59

Homework: 2.1-2.5 2.7-2.10

Homework: Make sure you can easily do all the problems with a rating of [5] and can do all with a rating of [10].

What is the MIPS rating for a computer and how useful is it?

Homework: Carefully go through and understand the example on pages 61-3

How about MFLOPS (Million of FLoating point OPerations per Second)? For numerical calculations floating point operations are the ones you are interested in; the others are ``overhead'' (a very rough approximation to reality).

It has similar problems to MIPS.

Benchmarks are better than MIPS or MFLOPS, but still have difficulties.

Homework: Carefully go through and understand 2.7 ``fallacies and pitfalls''.

Chapter 7: Memory

Homework: Read Chapter 7

7.2: Introduction

Ideal memory is

So we use a memory hierarchy ...

  1. Registers
  2. Cache (really L1, L2, and maybe L3)
  3. Memory
  4. Disk
  5. Archive

... and try to catch most references in the small fast memories near the top of the hierarchy.

There is a capacity/performance/price gap between each pair of adjacent levels. We will study the cache <---> memory gap

We observe empirically (and teach in 202).

A cache is a small fast memory between the processor and the main memory. It contains a subset of the contents of the main memory.

A Cache is organized in units of blocks. Common block sizes are 16, 32, and 64 bytes. This is the smallest unit we can move to/from a cache.

A hit occurs when a memory reference is found in the upper level of memory hierarchy.