V22.0436 - Prof. Grishman

Lectures 21 and 22:  Cache

Direct mapped cache

Set associative cache

Spatial locality and block size

So far we assumed that the block size -- the amount of data in each cache entry -- is one word.
By storing multiple (2, 4, ...) consecutive words in a cache entry and fetching all the words on a cache miss, we can improve performance due to spatial locality (see Gottlieb's diagram of 4-word blocks)
There is a limit to the benefit of increasing block size, however:

Strategies for memory writes

Two basic strategies:

Effect on performance: effective memory access time

Goal is to have effective memory access time be close to the access time of the fastest memory

Effect on performance:  CPI (p. 475-477)

Calculate cache performance in terms of its effect on the CPI:
assume each miss (for instruction, data load, or data store) leads to a miss penalty, measured in clock cycles
(resulting from the CPU stalling while it waits for data from main memory)

Instruction fetch miss cycles / instruction = instruction miss rate x miss penalty
Data load/store miss cycles / instruction = % of load/store instructions x data miss rate x miss penalty
Total miss cycles / instruction = instruction fetch miss cycles/instruction + data load/store miss cycles/instruction

Effective CPI is increased by total miss cycles / instruction

Unified vs. split instruction / data cache

Having separate caches for instructions and data does not improve hit rate but does support increased bandwidth -- one can fetch an instruction and data word at the same time.  Most current processors have separate L1 I and D caches.

Two-level cache

As gap in speed between CPU and memory speed grows larger, penalty for cache miss becomes unacceptably high.  To address this problem, all modern high end CPUs all have at least two levels of caches: A very fast, and hence not very big, first level (L1) cache together with a larger but slower L2 cache.  Some recent microprocessors (e.g., Core i7) have 3 levels.

When a miss occurs in L1, L2 is examined, and only if a miss occurs there is main memory referenced.  (Performance analysis, p. 485).