Lecture Notes for Operating Systems

================ Start Lecture #11 ================

4.4.7: Simulating (Approximating) LRU in Software

The Not Frequently Used (NFU) PRA

Include a counter in each PTE (and have R in each PTE).
Set counter to zero when page is brought into memory.
For each PTE, every k clock ticks.
1. Add R to counter.
2. Clear R.
Choose as victim the PTE with lowest count.

R	counter
1	10000000
0	01000000
1	10100000
1	11010000
0	01101000
0	00110100
1	10011010
1	11001101
0	01100110

The Aging PRA

NFU doesn't distinguish between old references and recent ones. The following modification does distinguish.

Include a counter in each PTE (and have R in each PTE).
Set counter to zero when page is brought into memory.
For each PTE, every k clock ticks.
1. Shift counter right one bit.
2. Insert R as new high order bit (HOB).
3. Clear R.
Choose as victim the PTE with lowest count.

Homework: 25, 34

4.4.8: The Working Set Page Replacement Problem (Peter Denning)

The working set policy (Peter Denning)

The goal is to specify which pages a given process needs to have memory resident in order for the give process to run without too many page faults.

But this is impossible since it requires predicting the future.
So we make the assumption that the immediate future is well approximated by the immediate past.
Measure time in units of memory references, so t=1045 means the time when the 1045th memory reference is issued.
In fact we measure time separately for each process, so t=1045 really means the time when this process made its 1045th memory reference.
W(t,&omega) is the set of pages referenced (by the given process) from time t-ω to time t.
That is, W(t,ω) is the set pages referenced during the window of size ω ending at time t.
That is, W(t,ω) is the set of pages referenced by the last ω memory references ending at reference t.
W(t,ω) is called the working set at time t (with window ω).
Does this Netscape support the ω notation to give the Greek letter?
w(t,ω) is the size of the set W(t,ω), i.e. is the number of pages referenced in the window.

The idea of the working set policy is to ensure that each process keeps its working set in memory.

Allocate w(t,ω) frames to each process. This number differs for each process and changes with time.
On a fault, one replaces a page not in the working set. But it is not easy to find such a page quickly.
Indeed determining W(t,ω) is difficult.
We will see that the working set algorithm is essentially a ``global policy'' (defined below). I would actually prefer covering the working set policy after defining local and global policies but decided to follow Tannenbaum.

Interesting questions include:

What value should be used for ω?
Experiments have been done and ω is surprisingly robust (i.e., for a given system a fixed value works reasonably for a wide variety of job mixes)
How should we calculate W(t,ω)?
Hard so do exactly so ...

... Various approximations to the working set, have been devised.

4.4.9: The WSClock Page Replacement Algorithm
- Use the aging algorithm above to maintain a counter for each PTE and declare a page whose counter is above a certain threshold to be part of the working set.
- Apply the clock algorithm globally (i.e. to all pages) but refuse to page out any page in a working set, the resulting algorithm is called wsclock.
- What if we find there are no pages we can page out?
  Simple answer: Pick some page (almost at random).
  Another answer: Reduce the multiprogramming level (explained in 4.6 below).
Page Fault Frequency (PFF): Described in 4.6 below.

4.4.10: Summary of Page Replacement Algorithms

Algorithm	Comment
Random	Poor, used for comparison
Optimal	Unimplementable, use for comparison
LIFO	Horrible, useless
NRU	Crude
FIFO	Not good ignores frequency of use
Second Chance	Improvement over FIFO
Clock	Better (natural) implementation of Second Chance
LRU	Great but impractical
NFU	Crude LRU approximation
Aging	Better LRU approximation
Working Set	Good, but expensive
WSClock	Good approximation to working set

4.5: Modeling Paging Algorithms

4.5.1: Belady's anomaly

Consider a system that has no pages loaded and that uses the FIFO PRU.
Consider the following ``reference string'' (sequences of pages referenced).

 0 1 2 3 0 1 4 0 1 2 3 4

If we have 3 frames this generates 9 page faults (do it).

If we have 4 frames this generates 10 page faults (do it).

Theory has been developed and certain PRA (so called ``stack algorithms'') cannot suffer this anomaly for any reference string. FIFO is clearly not a stack algorithm. LRU is. Tannenbaum has a few details, but we are skipping it.

Repeat the above calculations for LRU.

4.6: Design issues for (demand) Paging Systems

4.6.1: Local vs Global Allocation Policies

A local PRA is one is which a victim page is chosen among the pages of the same process that requires a new page. That is the number of pages for each process is fixed. So LRU means the page least recently used by this process.

Of course we can't have a purely local policy, why?
Answer: A new process has no pages and even if we didn't apply this for the first page loaded, the process would remain with only one page.
Perhaps wait until a process has been running a while or give the process an initial allocation based on the size of the executable.
A global policy is one in which the choice of victim is made among all pages of all processes.

If we apply global LRU indiscriminately with some sort of RR processor scheduling policy, and memory is somewhat over-committed, then by the time we get around to a process, all the others have run and have probably paged out this process.

If this happens each process will need to page fault at a high rate; this is called thrashing.

It is therefore important to get a good idea of how many pages a process needs, so that we can balance the local and global desires. The working set W(t,ω) is good for this.

An approximation to the working set policy that is useful for determining how many frames a process needs (but not which pages) is the Page Fault Frequency (PFF) algorithm.

For each process keep track of the page fault frequency, which is the number of faults divided by the number of references.
Actually, must use a window or a weighted calculation since you are really interested in the recent page fault frequency.
If the PFF is too high, allocate more frames to this process. Either
1. Raise its number of frames and use a local policy; or
2. Bar its frames from eviction (for a while) and use a global policy.
What if there are not enough frames?
Answer: Reduce the MPL (see next section).

As mentioned above a question arises what to do if the sum of the working set sizes exceeds the amount of physical memory available. This question is similar to the final point about PFF and brings us to consider controlling the load (or memory pressure).

4.6.2: Load Control

To reduce the overall memory pressure, we must reduce the multiprogramming level (or install more memory while the system is running, which is hardly practical). That is, we have a connection between memory management and process management. This is the suspend/resume arcs we saw way back when.

4.6.3: Page size

Page size ``must'' be a multiple of the disk block size. Why?
Answer: When copying out a page if you have a partial disk block, you must do a read/modify/write (i.e., 2 I/Os).
Important property of I/O that we will learn later this term is that eight I/Os each 1KB takes considerably longer than one 8KB I/O
Characteristics of a large page size.
- Good for user I/O.
  - If I/O done using physical addresses, then I/O crossing a page boundary is not contiguous and hence requires multiple I/Os
  - If I/O uses virtual addresses, then page size doesn't effect this aspect of I/O. That is the addresses are contiguous in virtual address and hence one I/O is done.
- Good for demand paging I/O.
  - Better to swap in/out one big page than several small pages.
  - But if page is too big you will be swapping in data that is really not local and hence might well not be used.
- Large internal fragmentation (1/2 page size).
- Small page table.
- A very large page size leads to very few pages. Process will have many faults if using demand paging and the process frequently references more regions than frames.
A small page size has the opposite characteristics.

4.6.4: Separate Instruction and Data (I and D) Spaces

Skipped.

4.6.5: Shared pages

Really should share segments.

Must keep reference counts or something so that when a process terminates, pages (even dirty pages) it shares with another process are not automatically discarded.
Similarly, a reference count would make a widely shared page (correctly) look like a poor choice for a victim.
A good place to store the reference count would be in a structure pointed to by both PTEs. If stored in the PTEs, must keep them consistent between processes.

4.6.6: Cleaning Policy (Paging Daemons)

Done earlier

4.6.7: Virtual Memory Interface

Skipped.

4.7: Implementation Issues

4.7.1: Operating System Involvement with Paging

4.7.2: Page Fault Handling

What happens when a process, say process A, gets a page fault?

The hardware detects the fault and traps to the kernel (switches to supervisor mode and saves state).
Some assembly language code save more state, establishes the C-language (or another programming language) environment, and ``calls'' the OS.
The OS determines that a page fault occurred and which page was referenced.
If the virtual address is invalid, process A is killed. If the virtual address is valid, the OS must find a free frame. If there is no free frames, the OS selects a victim frame. Call the process owning the victim frame, process B. (If the page replacement algorithm is local process B is process A.)
If the victim frame is dirty, the OS schedules an I/O write to copy the frame to disk. Thus, if the victim frame is dirty, process B is blocked (it might already be blocked for some other reason). Process A is also blocked since it needs to wait for this frame to be free. The process scheduler is invoked to perform a context switch.
- Tanenbaum ``forgot'' some here.
- The process selected by the scheduler (say process C) runs.
- Perhaps C is preempted for D or perhaps C blocks and D runs and then perhaps D is blocked and E runs, etc.
- When the I/O to write the victim frame completes, a Disk interrupt occurs. Assume processes C is running at the time.
- Hardware trap / assembly code / OS determines I/O done.
- Processes B is moved from blocked to ready (unless B is also blocked for some other reason).
- The scheduler picks a process to run, maybe A, maybe B, maybe C, maybe another processes.
- At some point the scheduler does pick process A to run. Recall that at this point A is still executing OS code.
Now the O/S has a clean frame (this may be much later in wall clock time if a victim frame had to be written). The O/S schedules an I/O to read the desired page into this clean frame. Process A is blocked (perhaps for the second time) and hence the process scheduler is invoked to perform a context switch.
A Disk interrupt occurs when the I/O completes (trap / asm / OS determines I/O done). The PTE is updated.
The O/S may need to fix up process A (e.g. reset the program counter to re-execute the instruction that caused the page fault).
Process A is placed on the ready list and eventually is chosen by the scheduler to run. Recall that process A is executing O/S code.
The OS returns to the first assembly language routine.
The assembly language routine restores registers, etc. and ``returns'' to user mode.

Process A is unaware that all this happened.

4.7.3: Instruction Backup

A cute horror story. The 68000 was so bad in this regard that early demand paging systems for the 68000, used two processors one running one instruction behind. If the first got a page fault, there wasn't always enough information to figure out what to do so the system switched to the second processor after it did the page fault. Don't worry about instruction backup. Very machine dependent and modern implementations tend to get it right. The next generation machine, 68010, provided extra information on the stack so the horrible 2-processor kludge was no longer necessary.

4.7.4: Locking (Pinning) Pages in Memory

We discussed pinning jobs already. The same (mostly I/O) considerations apply to pages.

4.7.5: Backing Store

The issue is where on disk do we put pages.

For program text, which is presumably read only, a good choice is the file itself.
What if we decide to keep the data and stack each contiguous on the backing store. Data and stack grow so must be prepared to grow the space on disk and leads to the same issues and problems as we saw with MVT.
If those issues/problems are painful, we can scatter the pages on the disk.
- That is we employ paging!
- This is NOT demand paging.
- Need a table to say where the backing space for each page is located.
  - This corresponds to the page table used to tell where in real memory a page is located.
  - The format of the ``memory page table'' is determined by the hardware since the hardware modifies/accesses it.
  - The format of the ``disk page table'' is decided by the OS designers and is machine independent.
  - If the format of the memory page table was flexible, then we might well keep the disk information in it as well.

4.7.6: Separation of Policy and Mechanism

Skipped.

4.8: Segmentation

Up to now, the virtual address space has been contiguous.

Among other issues this makes memory management difficult when there are more that two dynamically growing regions.
With two regions you start them on opposite sides of the virtual space as we did before.
Better is to have many virtual address spaces each starting at zero.
This split up is user visible.
Without segmentation (equivalently said with just one segment) all procedures are packed together so if one changes in size all the virtual addresses following are changed and the program must be re-linked.
Eases flexible protection and sharing (share a segment). For example, can have a shared library.

Homework: 37.

** Two Segments

Late PDP-10s and TOPS-10

One shared text segment, that can also contain shared (normally read only) data.
One (private) writable data segment.
Permission bits on each segment.
Which kind of segment is better to evict?
- Swap out shared segment hurts many tasks.
- The shared segment is read only (probably) so no writeback is needed.
``One segment'' is OS/MVT done above.

** Three Segments

Traditional (early) Unix shown at right.

Shared text marked execute only.
Data segment (global and static variables).
Stack segment (automatic variables).
(In reality, since the text doesn't grow, this was sometimes treated as 2 segments.)

** Four Segments

Just kidding.

4.4.7: Simulating (Approximating) LRU in Software

The Not Frequently Used (NFU) PRA

The Aging PRA

4.4.8: The Working Set Page Replacement Problem (Peter Denning)

The working set policy (Peter Denning)

4.4.9: The WSClock Page Replacement Algorithm

4.4.10: Summary of Page Replacement Algorithms

4.5: Modeling Paging Algorithms

4.5.1: Belady's anomaly

4.6: Design issues for (demand) Paging Systems

4.6.1: Local vs Global Allocation Policies

4.6.2: Load Control

4.6.3: Page size

4.6.4: Separate Instruction and Data (I and D) Spaces

4.6.5: Shared pages

4.6.6: Cleaning Policy (Paging Daemons)

4.6.7: Virtual Memory Interface

4.7: Implementation Issues

4.7.1: Operating System Involvement with Paging

4.7.2: Page Fault Handling

4.7.3: Instruction Backup

4.7.4: Locking (Pinning) Pages in Memory

4.7.5: Backing Store

4.7.6: Separation of Policy and Mechanism

4.8: Segmentation

** Two Segments

** Three Segments

** Four Segments