Computer Architecture
1999-2000 Fall
MW 3:30-4:45
Ciww 109
Allan Gottlieb
gottlieb@nyu.edu
http://allan.ultra.nyu.edu/~gottlieb
715 Broadway, Room 1001
212-998-3344
609-951-2707
email is best
======== START LECTURE #23
========
Translation Lookaside Buffer (TLB)
A TLB is a cache of the page table
- Needed because otherwise every memory reference in the program
would require two memory references, one to read the page table and
one to read the requested memory word.
- Typical TLB parameter values
- Size: hundreds of entries.
- Block size: 1 entry.
- Hit time: 1 cycle.
- Miss time: tens of cycles.
- Miss rate: Low (<= 2%).
- In the diagram on the right:
- The green path is the fastest (TLB hit).
- The red is the slowest (page fault).
- The yellow is in the middle (TLB miss, no page fault).
- Really the page table doesn't point to the disk block for an
invalid entry, but the effect is the same.
Putting it together: TLB + Cache
This is the decstation 3100
- Virtual address = 32 bits
- Physical address = 32 bits
- Fully associative TLB (naturally)
- Direct mapped cache
- Cache blocksize = one word
- Pagesize = 4KB = 2^12 bytes
- Cache size = 16K entries = 64KB
Actions taken
- The page number is searched in the fully associative TLB
- If a TLB hit occurs, the frame number from the TLB together with
the page offset gives the physical address. A TLB miss causes an
exception to reload the TLB from the page table, which the figure does
not show.
- The physical address is broken into a cache tag and cache index
(plus a two bit byte offset that is not used for word references).
- If the reference is a write, just do it without checking for a
cache hit (this is possible because the cache is so simple as we
discussed previously).
- For a read, if the tag located in the cache entry specified by the
index matches the tag in the physical address, the referenced word has
been found in the cache; i.e., we had a read hit.
- For a read miss, the cache entry specified by the index is fetched
from memory and the data returned to satisfy the request.
Hit/Miss possibilities
TLB | Page | Cache | Remarks |
---|
hit | hit | hit |
Possible, but page table not checked on TLB hit, data from cache |
hit | hit | miss |
Possible, but page table not checked, cache entry loaded from memory |
hit | miss | hit |
Impossible, TLB references in-memory pages |
hit | miss | miss |
Impossible, TLB references in-memory pages |
miss | hit | hit |
Possible, TLB entry loaded from page table, data from cache |
miss | hit | miss |
Possible, TLB entry loaded from page table, cache entry loaded from memory |
miss | miss | hit |
Impossible, cache is a subset of memory |
miss | miss | miss |
Possible, page fault brings in page, TLB entry loaded, cache loaded |
Homework: 7.31, 7.33
7.5: A Common Framework for Memory Hierarchies
Question 1: Where can/should the block be placed?
This question has three parts.
- In what slot are we able to place the block.
- For a direct mapped cache, there is only one choice.
- For an n-way associative cache, there are n choices.
- For a fully associative cache, any slot is permitted.
- The n-way case includes both the direct mapped and fully
associative cases.
- For a TLB any slot is permitted. That is, a TLB is a fully
associative cache of the page table.
- For paging any slot (i.e., frame) is permitted. That is,
paging uses a fully associative mapping (via a page table).
- For segmentation, any large enough slot (i.e., region) can be
used.
- If several possible slots are available, which one should
be used?
- I call this question the placement question.
- For caches, TLBs and paging, which use fixed size
slots, the question is trivial; any available slot is just fine.
- For segmentation, the question is interesting and there are
several algorithms, e.g., first fit, best fit, buddy, etc.
- If no possible slots are available, which victim should be chosen?
- For direct mapped caches, the question is trivial. Since the
block can only go in one slot, if you need to place the block and
the only possible slot is not available, it must be the victim.
- For all the other cases, n-way associative caches (n>1), TLBs
paging, and segmentation, the question is interesting and there
are several algorithms, e.g., LRU, Random, Belady min, FIFO, etc.
- See question 3, below.
Question 2: How is a block found?
Associativity | Location method | Comparisons Required |
---|
Direct mapped | Index | 1 |
Set Associative | Index the set, search among elements
| Degree of associativity |
Full | Search all cache entries
| Number of cache blocks |
Separate lookup table | 0 |
Typical sizes and costs
Feature |
Typical values for caches |
Typical values for demand paging |
Typical values for TLBs |
Size |
8KB-8MB |
16MB-2GB |
256B-32KB |
Block size |
16B-256B |
4KB-64KB |
4B-32B |
Miss penalty in clocks |
10-100 |
1M-10M |
10-100 |
Miss rate |
.1%-10% |
.000001-.0001% |
.01%-2% |
|
The difference in sizes and costs for demand paging vs. caching,
leads to different algorithms for finding the block.
Demand paging always uses the bottom row with a separate table (page
table) but caching never uses such a table.
- With page faults so expensive, misses must be reduced as much as
possible. Hence full associativity is used.
- With such a large associativity (fully associative with many
slots), hardware would be prohibitively expensive and software
searching too slow. Hence a page table is used with a TLB acting as a
cache.
- The large block size (called the page size) means that the extra table
is a small fraction of the space.
Question 3: Which block should be replaced?
This is called the replacement question and is much
studied in demand paging (remember back to 202).
- For demand paging, with miss costs so high and associativity so
large, the replacement policy is very important and some approximation
to LRU is used.
- For caching, even the miss time must be small so simple schemes
are used. For 2-way associativity, LRU is trivial. For higher
associativity (but associativity is never very high) crude
approximations to LRU may be used and sometimes even random
replacement is used.