G22.2233 - Prof. Grishman

Lecture 12: Caches, cont'd

Effect on performance

Effect of cache miss is to stall CPU; effect of cache misses can be measured most accurately in terms of frequency of memory stall cycles (text, pp. 564-566), and hence effect on CPI.

Line size and spatial locality

To take advantage of spatial locality, most caches use a line size of more than one word; for example, the Pentium uses 32-byte lines. Increasing line size may reduce the miss rate but increase the time to process a cache miss (since more data must be read from memory).  To minimize delay in filling such a cache, modern processors provide a wide path from memory to processor, and the ability to stream data quickly from memory to processor. This has been enhanced by recent changes in memory chip design, which allow successive word to be read rapidly from the same chip.

Writes

Caches differ in how they handle stores: a write-through cache updates main memory (as well as the cache) as soon as a store is executed. A write-back cache updates only the cache; main memory is updated when the block is removed from the cache. (text, p. 607)

Two-level cache

Most systems now have two levels of cache: a very fast, small "level 1" cache  (e.g., 8-32KB) and a larger, somewhat slower "level 2" cache (typically 256KB up to 1 MB). This is important because of the widening gap between processor speeds and main memory access times.  In earlier microprocessors (Pentium Pro and Pentium II), the L1 cache was on-chip, while the L2 cache was off-chip;  now, with increasing transistor counts, both can be placed on a single chip (Pentium III and 4).  The Pentium III has a 512KB L2 cache, and separate 16K L1 caches for instructions and data.  The Pentium 4 has a 256KB or 512KB L2 cache, an 8KB L1 data cache, and a 12K decoded instruction cache.  The decoded instruction cache holds micro-operations, so the analysis of x86 instructions into microoperations only has to occur on an L1 cache miss.

The Memory Hierarchy --- Virtual memory

Virtual memory provides the next step in the memory hierarchy after main memory: disk memory. The gap in access times, however, is much larger (between 100ns main memory and 10ms disk access time) and this affects the parameters of the design (pages are much larger than cache lines, and hit ratios must be much higher for the machine to work efficiently). As technology changes, we can expect the parameters of the memory levels to change, but the basic idea of a memory hierarchy to remain.

Input-Output

Input-output: needs

Input-output: bus organization

Asynchronous transfer

Spring 2002