Computer Architecture

Start Lecture #22

7.3: Measuring and Improving Cache Performance

Do the following performance example on the board. It would be an appropriate final exam question.

Homework: 7.17, 7.18.

A lower base (i.e. miss-free) CPI makes stalls appear more expensive since waiting a fixed amount of time for the memory corresponds to losing more instructions if the CPI is lower.

A faster CPU (i.e., a faster clock) makes stalls appear more expensive since waiting a fixed amount of time for the memory corresponds to more cycles if the clock is faster (and hence more instructions since the base CPI is the same).

Another performance example.

Remark: Larger caches have longer hit times.

Reducing Cache Misses by More Flexible Placement of Blocks

Improvement: Associative Caches

Consider the following sad story. Jane has a cache that holds 1000 blocks and has a program that only references 4 (memory) blocks, namely 23, 1023, 123023, and 7023. In fact the references occur in order: 23, 1023, 123023, 7023, 23, 1023, 123023, 7023, 23, 1023, 123023, 7023, 23, 1023, 123023, 7023, etc. Referencing only 4 blocks and having room for 1000 in her cache, Jane expected an extremely high hit rate for her program. In fact, the hit rate was zero. She was so sad, she gave up her job as webmistress, went to medical school, and is now a brain surgeon at the mayo clinic in rochester MN.

So far We have studied only direct mapped caches, i.e. those for which the location in the cache is determined by the address. Since there is only one possible location in the cache for any block, to check for a hit we compare one tag with the HOBs of the addr.

The other extreme is fully associative.

Set Associative Caches

Most common for caches is an intermediate configuration called set associative or n-way associative (e.g., 4-way associative). The value of n is typically 2, 4, or 8.

If the cache has B blocks, we group them into B/n sets each of size n. Since an n-way associative cache has sets of size n blocks, it is often called a set size n cache. For example, you often hear of set size 4 caches.

In a set size n cache, memory block number K is stored in set K mod the number of sets, which equals K mod (B/n).

The figure above shows a 2-way set associative cache. Do the same example on the board for 4-way set associative.

Determining the Set Number and the Tag.

Recall that for the a direct-mapped cache, the cache index gives the number of the block in the cache. The for a set-associative cache, the cache index gives the number of the set.

Just as the block number for a direct-mapped cache is the memory block number mod the number of blocks in the cache, the set number equals the (memory) block number mod the number of sets.

Just as the tag for a direct mapped cache is the memory block number divided by the number of blocks, the tab for a set-associative cache is the memory block number divided by the number of sets.

Do NOT make the mistake of thinking that a set size 2 cache has 2 sets, it has B/2 sets each of size 2.

Ask in class.

Why is set associativity good? For example, why is 2-way set associativity better than direct mapped?

Locating a Block in the Cache

How do we find a memory block in a set associative cache with block size 1 word?

Recall that a 1-way associative cache is a direct mapped cache and that an n-way associative cache for n the number of blocks in the cache is a fully associative.

The advantage of increased associativity is normally an increased hit ratio.

What are the disadvantages?
Answer: It is a slower and a little bigger due to the extra logic.

Choosing Which Block to Replace

When an existing block must be replaced, which victim should we choose? We ask the exact same question (with different words) when we study demand paging (remember 202!).

Sizes

There are two notions of size.

  1. The cache size is the capacity of the cache.
    This means, the size of all the blocks. In the diagram above it is the size of the blue portion.

    The size of the cache in the diagram is 256 * 4 * 4B = 4KB.

  2. Another size is is the total number of bits in the cache, which includes tags and valid bits. For the diagram this is computed as follows.

For the diagrammed cache, what fraction of the bits are user data?
Ans: 4KB / 55Kb = 32Kb / 55Kb = 32/55.

Tag Size and Division of the Address Bits

We continue to assume a byte addressed machines with all references to a 4-byte word.

The 2 LOBs are not used (they specify the byte within the word, but all our references are for a complete word). We show these two bits in dark blue. We continue to assume 32 bit addresses so there are 230 words in the address space.

Let's review various possible cache organizations and determine for each how large is the tag and how the various address bits are used. We will consider four configurations each a 16KB cache. That is the size of the data portion of the cache is 16KB = 4 kilowords = 212 words.

  1. Direct mapped, blocksize 1 (word).
  2. Direct mapped, blocksize 8
  3. 4-way set associative, blocksize 1
  4. 4-way set associative, blocksize 8

Homework: 7.46 and 7.47. Note that 7.46 contains a typo !! should be ||.

Reducing the Miss Penalty Using Multilevel Caches

Improvement: Multilevel caches

Modern high end PCs and workstations all have at least two levels of caches: A very fast, and hence not very big, first level (L1) cache together with a larger but slower L2 cache.

When a miss occurs in L1, L2 is examined and only if a miss occurs there is main memory referenced.

So the average miss penalty for an L1 miss is

    (L2 hit rate)*(L2 time) + (L2 miss rate)*(L2 time + memory time)
  

We are assuming that L2 time is the same for an L2 hit or L2 miss and that the main memory access doesn't begin until the L2 miss has occurred.

Do the example on the board (a reasonably exam question, but a little long since it has so many parts).