Computer Architecture
1999-2000 Fall
MW 3:30-4:45
Ciww 109
Allan Gottlieb
gottlieb@nyu.edu
http://allan.ultra.nyu.edu/~gottlieb
715 Broadway, Room 1001
212-998-3344
609-951-2707
email is best
======== START LECTURE #21
========
A lower base (i.e. miss-free) CPI makes stalls appear more expensive
since waiting a fixed amount of time for the memory
corresponds to losing more instructions if the CPI is lower.
A faster CPU (i.e., a faster clock) makes stalls appear more expensive
since waiting a fixed amount of time for the memory corresponds to
more cycles if the clock is faster (and hence more instructions since
the base CPI is the same).
Another performance example.
- Assume
- I-cache miss rate 3%.
- D-cache miss rate 5%.
- 40% of instructions reference data.
- miss penalty of 50 cycles.
- Base CPI is 2.
- What is the CPI including the misses?
- How much slower is the machine when misses are taken into account?
- Redo the above if the I-miss penalty is reduced to 10 (D-miss
still 50)
- With I-miss penalty back to 50, what is performance if CPU (and the
caches) are 100 times faster
Remark: Larger caches have longer hit times.
Improvement: Associative Caches
Consider the following sad story. Jane has a cache that holds 1000
blocks and has a program that only references 4 (memory) blocks,
namely 23, 1023, 123023, and 7023. In fact the references occur in
order: 23, 1023, 123023, 7023, 23, 1023, 123023, 7023, 23, 1023,
123023, 7023, 23, 1023, 123023, 7023, etc. Referencing only 4 blocks
and having room for 1000 in her cache, Jane expected an extremely high
hit rate for her program. In fact, the hit rate was zero. She was so
sad, she gave up her job as webmistriss, went to medical school, and
is now a brain surgeon at the mayo clinic in rochester MN.
So far We have studied only direct mapped caches,
i.e. those for which the location in the cache is determined by
the address. Since there is only one possible location in the
cache for any block, to check for a hit we compare one
tag with the HOBs of the addr.
The other extreme is fully associative.
-
A memory block can be placed in any cache block.
-
Since any memory block can be in any cache block, the cache index
where the memory block is stored tells us nothing about which
cache block is stored there. Hence the tag must be the entire
address. Moreover, we don't know which cache block to check so we
must check all cache blocks to see if we have a hit.
-
The larger tag is a problem.
-
The search is a disaster.
- It could be done sequentially (one cache block at a time),
but this is much too slow.
- We could have a comparator with each tag and mux
all the blocks to select the one that matches.
- This is too big due to both the many comparators and
the humongous mux.
- However, it is exactly what is done when implementing
translation lookaside buffers (TLBs), which are used with
demand paging.
- Are the TLB designers magicians?
Ans: No. TLBs are small.
-
An alternative is to have a table with one entry per
MEMORY block giving the cache block number. This is too
big and too slow for caches but is used for virtual memory
(demand paging).
Most common for caches is an intermediate configuration called
set associative or n-way associative (e.g., 4-way
associative).
- n is typically 2, 4, or 8.
- If the cache has B blocks, we group them into B/n
sets each of size n. Memory block number K is
then stored in set K mod (B/n).
- Figure 7.15 has a bug. It indicates that the tag for memory
block 12 is 12 for all associativitiese. The figure below
corrects this.
- In the picture we are trying to store memory block 12 in each
of three caches.
- The light blue represents cache blocks in which the memory
block might have been stored.
- The dark blue is the cache block in which the memory block
is stored.
- The arrows show the blocks (i.e., tags) that must be
searched to look for memory block 12. Naturally the arrows
point to the blue blocks.
-
The picture shows 2-way set set associative. Do on the board
4-way set associative.
-
Determining the Set# and Tag.
- The Set# = (memory) block# mod #sets.
- The Tag = (memory) block# / #sets.
-
Ask in class.
- What is 8-way set associative in a cache with 8 blocks (i.e.,
the cache in the picture)?
- What is 1-way set associative?
-
Why is set associativity good? For example, why is 2-way set
associativity better than direct mapped?
-
Consider referencing two modest arrays (<< cache size) that
start at location 1MB and 2MB.
-
Both will contend for the same cache locations in a direct
mapped cache but will fit together in an n-way associative
cache with n>=2.