==== Start Lecture #6 ====

A Translation Lookaside Buffer or TLB is an associate memory where the index field is the page number. The other fields include the frame number, dirty bit, valid bit, and others.

A TLB is small and expensive but at least it is fast. When the page number is in the TLB, the frame number is returned very quickly delay.
On a miss, the page number is looked up in the page table. The record found is placed in the TLB and a victim is discarded. There is no placement question since all entries are accessed at the same time. But there is a replacement question.

Homework: 15.

3.3.5: Inverted page tables

Keep a table indexed by frame number with the entry f containg the number of the page currently loaded in frame f.

Since modern machine have a smaller physical address space than virtual address space, the table is smaller
But on a TLB miss, must search the inverted page table.
Would be hopelessly slow except that some tricks are employed.
The book mentions some but not all of the tricks, we are skipping this topic.

3.4: Page Replacement Algorithms

These are solns to the replacement question.

Good solutions take advantage of locality.

Temporal locality: If a word is referenced now, it is likely to be referenced in the near future.
- This argues for caching referenced words, i.e. keeping the referenced word near the processor for a while
Spacial locality: If a word is referenced now, nearby words are likely to be referenced in the near future.
- This argues for prefetching words around the currently referenced word.
These are lumped together into locality: If a page is referenced, it is likely to be referenced in the near future.
- So it is good to bring in the entire page on a miss and to keep the page in memory for a while.
When programs begin there is no history so nothing to base locality on. At this point the paging system is said to be undergoing a ``cold start''.
Programs exhibit ``phase changes'', when the set of pages referenced changes abruptly (similar to a cold start). At the point of a phase change, many page faults occur because locality is poor.

Pages belonging to processes that have terminated are of course perfect choices for victims.

Pages belonging to processes that have been blocked for a long time are good choices as well.

Random

A lower bound on performance. Any decent scheme should do better.

3.4.1: The optimal page replacement algorithm (opt PRA)

Replace the page whose next reference will be furthest in the future

Also called belady's min algoritm
Proveably optimal. That is generates the fewest number of page faults.
Unimplementable: Requires predicting the future.
Good upper bound on performance

3.4.2: The not recently used (NRU) PRA

Divide the frames into four classes and make a random selection from the lowest nonempty class.

Not referenced, not modified
Not referenced, modified
Referenced, not modified
Referenced, modified

Assumes that in each PTE there are two extra flags R (sometimes called U, for used) and M (often called D, for dirty).

Also assumes that a page in a lower number class is cheaper to evict

If not referenced, probably not referenced again soon so not so important.
If not modified, do not have to write it out so the cost of the eviction is lower

When a page is brought in, OS resets R and M (i.e. R=M=0)
On a read, hardware sets R
On a write, hardware sets R and M

We again have the prisoner problem, we do a good job of making little ones out of big ones, but not the reverse. Need more resets

Every k clock ticks, reset all R bits

Why not reset M?
Ans: Must have M accurate to know if victim must be written back
Could have two M bits one accurate and one reset, but I don't know of any system (or proposal) that does so.

What if hardware doesn't set these bits?

OS can use tricks
When the bits are reset, make the PTE indicate the page is not resident (i.e. lie). On the page fault, set the appropriate bit(s).

3..4.3: FIFO PRA

Simple but poor since usage of page is given no weight.

Belady's Anomaly: Can have more frames yet more faults. Example given later.

3.4.4: Second chance PRA

Fifo but when time to choose a victim if page at the head of the queue has been referenced (R bit), don't evict it but reset R move it to the rear of the queue (so it looks new). The page is being a second chance.

What if all frames have been referenced?
Becomes the same as fifo (but takes longer)

Might want to turn off the R bit more often (k clock ticks).

3.4.5: Clock PRA

Same algorithm as 2nd chance, but a better (and I would say obvious) implementation: Use a circular list.

Do an example.

3.4.6:Least Recently Used (LRU) PRA

When a page fault occurs, choose as victim that page that has been unused for the longest time, i.e. that has been least recently used.

LRU is definitely

Implementable: The past is knowable
Good: Simulation studies
Difficult
- Essentially need to either
  1. Keep a time stamp in each PTE updated on each reference and scan all the PTEs when choosing a victim to find the PTE with the oldest timestamp.
  2. Keep the PTEs in a linked list in usage order, which means on each reference moving the PTE to the end of the list

Homework: 19, 20

A hardware cutsie in in tanenbaum

For n pages, keep an nxn bit matrix.
On a reference to page i, set row i to all 1s and col i to all 0s
At any time the 1 bits in the rows are ordered by inclusion. I.e. one row's 1s are a subset of another row's 1s which is a subset of a third. (Tanenbaum forgets to mention this)
So the row with the fewest 1s is a subset of all the others and is hence least recently used
Cute, but still impractical.

3.4.7: Simulating LRU in Software

The Not Frequently Used (NFU) PRA

Include a counter in each PTE (and have R in each PTE)
Set counter to zero when page is brought into memory
For each PTE, add R to counter every k clock ticks
Choose as victim the PTE with lowest count

The Aging PRA

NFU doesn't distinguish between old references and recent one. Modify NFU so that, for all PTEs, at every k clock ticks

Counter is shifted right one bit
R is inserted as the new high order bit (HOB)

R counter
1 10000000
0 01000000
1 10100000
1 11010000
0 01101000
0 00110100
1 10011010
1 11001101
0 01100110

R	counter
1	10000000
0	01000000
1	10100000
1	11010000
0	01101000
0	00110100
1	10011010
1	11001101
0	01100110

Homework: 21, 25

3.5: Modeling Paging Algorithms

3.5.1: Belady's anomaly

Consider the following ``reference string'' (sequence of pages referenced), which is assumed to occur on a system with no pages loaded initially that uses the FIFO PRU.

 0 1 2 3 0 1 4 0 4 1 2 3 4

If we have 3 frames this generates 9 page faults.

If we have 4 frames this generates 10 page faults.

Theory has been developed and certain PRA (so called ``stack algorithms'') cannot suffer this anomaly for any reference string. FIFO is clearly not a stack algorithm. LRU is.

Repeat the above for LRU.

3.6: Design issues for (demand) Paging

3.6.1 & 3.6.2: The Working Set Model and Local vs Global Policies

I will do these in the reverse order (which makes more sense). Also tanenbaum doesn't actually define the working set model, but I shall.

A local PRA is one is which a victim page is chosen among the pages of the same process that requires a new page. That is the number of pages for each process is fixed. So LRU means the page least recently used by this process.

Of course we can't have a purely local policy, why?
Ans: A new process has no pages and even if we didn't apply this for the first page loaded, the process would remain with only one page.

Perhaps wait until a process has been running a while.

A global policy is one in which the choice of victim is made among all pages of all processes

If we apply global LRU indiscrimanently with some sort of RR processor scheduling policy, and memory is somewhat over-committed, then by the time we get around to a process, all the others have run and have probably paged out this process.

If this happens each process will need to page fault at a high rate; this is called thrashing. It would therefore be good to get an idea of how many pages a process needs, so that we can balance the local and global desires.