V22.0436 - Prof. Grishman

Lectures 21 and 22:  Cache

review basic organizations:  fully associative, direct mapped, set associative (Lecture 20 notes)

Spatial locality and block size

So far we assumed that the block size -- the amount of data in each cache entry -- is one word.
By storing multiple (2, 4, ...) consecutive words in a cache entry and fetching all the words on a cache miss, we can improve performance due to spatial locality (see Gottlieb's diagram of 4-word blocks)
There is a limit to the benefit of increasing block size, however:

Strategies for memory writes

Two basic strategies:

Effect on performance: effective memory access time

Goal is to have effective memory access time be close to the access time of the fastest memory

Effect on performance:  CPI

Effect of cache miss is to stall CPU; effect of cache misses can be measured most accurately in terms of number of memory stall cycles for read miss and write miss (text, section 5.3), and hence effect on CPI.

Unified vs. split instruction / data cache

Having separate caches for instructions and data does not improve hit rate but does support increased bandwidth -- one can fetch an instruction and data word at the same time.  Most current processors have separate L1 I and D caches.

Two-level cache

As gap in speed between CPU and memory speed grows larger, penalty for cache miss becomes unacceptably high.  To address this problem, all modern high end CPUs all have at least two levels of caches: A very fast, and hence not very big, first level (L1) cache together with a larger but slower L2 cache.  Some recent microprocessors (e.g., Core i7) have 3 levels.

When a miss occurs in L1, L2 is examined, and only if a miss occurs there is main memory referenced.