Operating Systems

================ Start Lecture #11 ================

4.5: Modeling Paging Algorithms

4.5.1: Belady's anomaly

Consider a system that has no pages loaded and that uses the FIFO PRU.
Consider the following “reference string” (sequences of pages referenced).

 0 1 2 3 0 1 4 0 1 2 3 4

If we have 3 frames this generates 9 page faults (do it).

If we have 4 frames this generates 10 page faults (do it).

Theory has been developed and certain PRA (so called “stack algorithms”) cannot suffer this anomaly for any reference string. FIFO is clearly not a stack algorithm. LRU is. Tannenbaum has a few details, but we are skipping it.

Repeat the above calculations for LRU.

4.6: Design issues for (demand) Paging Systems

4.6.1: Local vs Global Allocation Policies

A local PRA is one is which a victim page is chosen among the pages of the same process that requires a new page. That is the number of pages for each process is fixed. So LRU for a local policy means the page least recently used by this process. A global policy is one in which the choice of victim is made among all pages of all processes.

If we apply global LRU indiscriminately with some sort of RR processor scheduling policy, and memory is somewhat over-committed, then by the time we get around to a process, all the others have run and have probably paged out this process.

If this happens each process will need to page fault at a high rate; this is called thrashing.

It is therefore important to get a good idea of how many pages a process needs, so that we can balance the local and global desires. The working set size w(t,ω) is good for this.

An approximation to the working set policy that is useful for determining how many frames a process needs (but not which pages) is the Page Fault Frequency (PFF) algorithm.

As mentioned above a question arises what to do if the sum of the working set sizes exceeds the amount of physical memory available. This question is similar to the final point about PFF and brings us to consider controlling the load (or memory pressure).




4.6.2: Load Control

To reduce the overall memory pressure, we must reduce the multiprogramming level (or install more memory while the system is running, which is hardly practical). That is, we have a connection between memory management and process management. These are the suspend/resume arcs we saw way back when.

4.6.3: Page size

Homework: Consider a 32-bit address machine using paging with 8KB pages and 4 byte PTEs. How many bits are used for the offset and what is the size of the largest page table? Repeat the question for 128KB pages.

4.6.4: Separate Instruction and Data (I and D) Spaces

Skipped.

4.6.5: Shared pages

Permit several processes to each have a page loaded in the same frame. Of course this can only be done if the processes are using the same program and/or data.

Homework: 33

4.6.6: Cleaning Policy (Paging Daemons)

Done earlier

4.6.7: Virtual Memory Interface

Skipped.

4.7: Implementation Issues

4.7.1: Operating System Involvement with Paging

  1. Process creation. OS must guess at the size of the process and then allocate a page table and a region on disk to hold the pages that are not memory resident. A few pages of the process must be loaded.
  2. Ready→Running transition by the scheduler. Real memory must be allocated for the page table if the table has been swapped out (which is permitted when the process is not running). Some hardware register(s) must be set to point to the page table. (There can be many page tables resident, but the hardware must be told the location of the page table for the running process--the "active" page table.
  3. Page fault. Lots of work. See 4.7.2 just below.
  4. Process termination. Free the page table and the disk region for swapped out pages.

4.7.2: Page Fault Handling

What happens when a process, say process A, gets a page fault?
  1. The hardware detects the fault and traps to the kernel (switches to supervisor mode and saves state).

  2. Some assembly language code save more state, establishes the C-language (or another programming language) environment, and “calls” the OS.

  3. The OS determines that a page fault occurred and which page was referenced.

  4. If the virtual address is invalid, process A is killed. If the virtual address is valid, the OS must find a free frame. If there is no free frames, the OS selects a victim frame. Call the process owning the victim frame, process B. (If the page replacement algorithm is local, the victim is process A.)

  5. The PTE of the victim page is updated to show that the page is no longer resident.

  6. If the victim page is dirty, the OS schedules an I/O write to copy the frame to disk and blocks A waiting for this I/O to occur.

  7. Assuming process A needed to be blocked (i.e., the victim page is dirty) the scheduler is invoked to perform a context switch.
  8. Now the O/S has a free frame (this may be much later in wall clock time if a victim frame had to be written). The O/S schedules an I/O to read the desired page into this free frame. Process A is blocked (perhaps for the second time) and hence the process scheduler is invoked to perform a context switch.

  9. Again, another process is selected by the scheduler as above and eventually a Disk interrupt occurs when the I/O completes (trap / asm / OS determines I/O done). The PTE in process A is updated to indicate that the page is in memory.

  10. The O/S may need to fix up process A (e.g. reset the program counter to re-execute the instruction that caused the page fault).

  11. Process A is placed on the ready list and eventually is chosen by the scheduler to run. Recall that process A is executing O/S code.

  12. The OS returns to the first assembly language routine.

  13. The assembly language routine restores registers, etc. and “returns” to user mode.

The user's program running as process A is unaware that all this happened (except for the time delay).

4.7.3: Instruction Backup

A cute horror story. The 68000 was so bad in this regard that early demand paging systems for the 68000, used two processors one running one instruction behind. If the first got a page fault, there wasn't always enough information to figure out what to do so the system switched to the second processor after it did the page fault. Don't worry about instruction backup. Very machine dependent and modern implementations tend to get it right. The next generation machine, 68010, provided extra information on the stack so the horrible 2-processor kludge was no longer necessary.

4.7.4: Locking (Pinning) Pages in Memory

We discussed pinning jobs already. The same (mostly I/O) considerations apply to pages.

4.7.5: Backing Store

The issue is where on disk do we put pages.

Homework: Assume every instruction takes 0.1 microseconds to execute providing it is memory resident. Assume a page fault takes 10 milliseconds to service providing the necessary disk block is actually on the disk. Assume a disk block fault takes 10 seconds service. So the worst case time for an instruction is 10.0100001 seconds. Finally assume the program requires that a billion instructions be executed.

  1. If the program is always completely resident, how long does it take to execute?
  2. If 0.1% of the instructions cause a page fault, but all the disk blocks are on the disk, how long does the program take to execute and what percentage of the time is the program waiting for a page fault to complete?
  3. If 0.1% of the instructions cause a page fault and 0.1% of the page faults cause a disk block fault, how long does the program take to execute, what percentage of the time is the program waiting for a disk block fault to complete?

4.7.6: Separation of Policy and Mechanism

Skipped.

4.8: Segmentation

Up to now, the virtual address space has been contiguous.

Homework: 37.

** Two Segments

Late PDP-10s and TOPS-10

** Three Segments

Traditional (early) Unix shown at right.

** Four Segments

Just kidding.

** General (not necessarily demand) Segmentation

** Demand Segmentation

Same idea as demand paging, but applied to segments.

The following table mostly from Tanenbaum compares demand paging with demand segmentation.

Consideration Demand
Paging
Demand
Segmentation
Programmer aware NoYes
How many addr spaces 1Many
VA size > PA size YesYes
Protect individual
procedures separately
NoYes
Accommodate elements
with changing sizes
NoYes
Ease user sharing NoYes
Why invented let the VA size
exceed the PA size
Sharing, Protection,
independent addr spaces

Internal fragmentation YesNo, in principle
External fragmentation NoYes
Placement question NoYes
Replacement question YesYes

** 4.8.2 and 4.8.3: Segmentation With (demand) Paging

(Tanenbaum gives two sections to explain the differences between Multics and the Intel Pentium. These notes cover what is common to all segmentation+paging systems).

Combines both segmentation and demand paging to get advantages of both at a cost in complexity. This is very common now.

Although it is possible to combine segmentation with non-demand paging, I do not know of any system that did this.

Homework: 38.

Homework: Consider a 32-bit address machine using paging with 8KB pages and 4 byte PTEs. How many bits are used for the offset and what is the size of the largest page table? Repeat the question for 128KB pages. So far this question has been asked before. Repeat both parts assuming the system also has segmentation with at most 128 segments.

4.9: Research on Memory Management

Skipped

4.10: Summary

Read

Some Last Words on Memory Management