NOTE: These notes are by Allan Gottlieb, and are reproduced here, with superficial modifications, with his permission. "I" in this text generally refers to Prof. Gottlieb, except in regards to administrative matters.


================ Start Lecture #9 (Feb. 26) ================

Multilevel page tables

Recall the previous diagram. Most of the virtual memory is the unused space between the data and stack regions. However, with demand paging this space does not waste real memory. But the single large page table does waste real memory.

The idea of multi-level page tables (a similar idea is used in Unix inode-based file systems) is to add a level of indirection and have a page table containing pointers to page tables.

Do an example on the board

The VAX used a 2-level page table structure, but with some wrinkles (see Tanenbaum for details).

Naturally, there is no need to stop at 2 levels. In fact the SPARC has 3 levels and the Motorola 68030 has 4 (and the number of bits of Virtual Address used for P#1, P#2, P#3, and P#4 can be varied).

4.3.3: TLBs--Translation Lookaside Buffers (and General Associative Memory)

Note: Tanenbaum suggests that ``associative memory'' and ``translation lookaside buffer'' are synonyms. This is wrong. Associative memory is a general structure and translation lookaside buffer is a special case.

An associative memory is a content addressable memory. That is you access the memory by giving the value of some field and the hardware searches all the records and returns the record whose field contains the requested value.

For example

Name  | Animal | Mood     | Color
======+========+==========+======
Moris | Cat    | Finicky  | Grey
Fido  | Dog    | Friendly | Black
Izzy  | Iguana | Quiet    | Brown
Bud   | Frog   | Smashed  | Green
If the index field is Animal and Iguana is given, the associative memory returns
Izzy  | Iguana | Quiet    | Brown

A Translation Lookaside Buffer or TLB is an associate memory where the index field is the page number. The other fields include the frame number, dirty bit, valid bit, and others.

4.3.4: Inverted page tables

Keep a table indexed by frame number with the entry f containing the number of the page currently loaded in frame f.

4.4: Page Replacement Algorithms (PRAs)

These are solutions to the replacement question.

Good solutions take advantage of locality.

Pages belonging to processes that have terminated are of course perfect choices for victims.

Pages belonging to processes that have been blocked for a long time are good choices as well.

Random PRA

A lower bound on performance. Any decent scheme should do better.

4.4.1: The optimal page replacement algorithm (opt PRA) (aka Belady's min PRA)

Replace the page whose next reference will be furthest in the future.

4.4.2: The not recently used (NRU) PRA

Divide the frames into four classes and make a random selection from the lowest nonempty class.

  1. Not referenced, not modified
  2. Not referenced, modified
  3. Referenced, not modified
  4. Referenced, modified

Assumes that in each PTE there are two extra flags R (sometimes called U, for used) and M (often called D, for dirty).

Also assumes that a page in a lower priority class is cheaper to evict.

We again have the prisoner problem, we do a good job of making little ones out of big ones, but not the reverse. Need more resets.

Every k clock ticks, reset all R bits

What if the hardware doesn't set these bits?

4.4.3: FIFO PRA

Simple but poor since usage of the page is ignored.

Belady's Anomaly: Can have more frames yet generate more faults. Example given later.

4.4.4: Second chance PRA

Similar to the FIFO PRA but when time choosing a victim, if the page at the head of the queue has been referenced (R bit set), don't evict it. Instead reset R and move the page to the rear of the queue (so it looks new). The page is being a second chance.

What if all frames have been referenced?
Becomes the same as fifo (but takes longer).

Might want to turn off the R bit more often (say every k clock ticks).

4.4.5: Clock PRA

Same algorithm as 2nd chance, but a better (and I would say obvious) implementation: Use a circular list.

Do an example.

LIFO PRA

This is terrible! Why?
Ans: All but the last frame are frozen once loaded so you can replace only one frame. This is especially bad after a phase shift in the program when it is using all new pages.

4.4.6:Least Recently Used (LRU) PRA

When a page fault occurs, choose as victim that page that has been unused for the longest time, i.e. that has been least recently used.

LRU is definitely

Homework: 29, 23

A hardware cutsie in Tanenbaum

4.4.7: Simulating (Approximating) LRU in Software

The Not Frequently Used (NFU) PRA

R counter
110000000
001000000
110100000
111010000
001101000
000110100
110011010
111001101
001100110

The Aging PRA

NFU doesn't distinguish between old references and recent ones. The following modification does distinguish.

Homework: 25, 34

4.4.8: The Working Set Page Replacement Problem (Peter Denning)

The working set policy (Peter Denning)

The goal is to specify which pages a given process needs to have memory resident in order for the give process to run without too many page faults.

The idea of the working set policy is to ensure that each process keeps its working set in memory.

Interesting questions include:

... Various approximations to the working set, have been devised.

The WSClock Page Replacement Algorithm

Contrary to the textbook, the WSClock Page Replacement Algorithm is not really a natural outgrowth of the idea of a working set. It is, rather, a somewhat arbitrary hodge-podge of independent ideas, one of which is dimly connected to the idea of a working set. All the same it's worth teaching because: However, there's not much point in memorizing the details of this algorithm.

0. Let P be the process currently running. Records for active pages for P are kept in a circular list, as in the clock algorithm. In each record, there is an M and R bit set by hardware at each memory references, as in NRU. At each clock tick the R bit is reset to 0, as in NRU.

1. Each page record has a field for storing the time of the most recent reference. At each clock tick, the current cumulative CPU time for P is stored in the record of each page where the R bit is 1. Thus, the time field is an approximation to the time of the most recent reference, accurate to the clock period, so similar to the information needed in the LRU strategy, but does not require the imaginary hardware of having a clock time recorded at every memory reference. This is the most elegant idea in the WSClock algorithm.

2. One might think, now, that we would continue to approximate the LRU strategy by choosing the page with the earliest time field to replace. But instead, we do as follows: There is a fixed parameter Tau, and any page whose latest reference is older than Tau is considered equally a candidate for replacement. The presumption is that pages older than Tau are not in the working set. The OS designer works on tuning Tau so that this is more or less true in practice. This is where the influence of the idea of the working set comes into the algorithm.

The advantage of doing this is that you don't have to search through all the active pages to find the earliest time stamp; you can stop when you find one older than Tau.

If no pages have a reference older than Tau, then the page with the earliest time field is chosen for replacement.

3. We prefer, of course, to replace a clean page than a dirty page, because clean pages don't have to be copied out. So we search until we find a clean page that is older than Tau, if there is one; if not, we use a dirty page older than Tau.

4. On the other hand, if we encounter a dirty page that is older than Tau, then we may as well copy it out, on the presumption that it's not part of the working set, and it will have to be written out sooner or later in any case. Suppose we've decided to write out old dirty pages D1 through Dk and to replace old clean page C with new page N. Then we ask the disk driver to schedule these k writes and 1 read. We certainly have to block P until N is completely read in, but there's no need to block P to wait for the writings out of D1 through Dk. These can go on concurrently with P. (As we shall see, the order in which disk requests are taken is up to the disk driver, which has its own agenda.) Presumably, though Tanenbaum doesn't specifically say, the M bits of D1 through Dk are set to 0 as each is successfully written out.

Similarly, suppose we can't find an old clear page, so we've decided to write out old dirty pages D1 through Dk and to replace old dirty page D0 with new page N. Then P has to block until D0 has been written out and N has been read in, but does not have to block for D1 ... Dk.

5. However, we don't want to completely tie up the disk driver with this stuff, so we set a limit on the number k of dirty pages to be written out at once.

6. Finally, as in the clock algorithm, we keep the page table entries in a circular list. Each time we start searching circularly at the current value of the list pointer. As soon as we encounter an old clean page we stop. If there is not old clean page, we use an old dirty page. If there are no pages older than Tau, then we use the oldest page, clean or dirty. At the next search, we continue on from where the list pointer left off. I don't see that this circular structure buys you a whole lot, but presumably it prevents a situation where you always start by searching over the most frequently used pages. or it achieves some kind of fairness, or adds a soupcon of the FIFO strategy, or something.

That's the algorithm.

4.4.10: Summary of Page Replacement Algorithms

AlgorithmComment
RandomPoor, used for comparison
OptimalUnimplementable, use for comparison
LIFOHorrible, useless
NRUCrude
FIFONot good ignores frequency of use
Second ChanceImprovement over FIFO
ClockBetter (natural) implementation of Second Chance
LRUGreat but impractical
NFUCrude LRU approximation
AgingBetter LRU approximation
Working SetGood, but expensive
WSClockGood approximation to working set