NOTE: These notes are by Allan Gottlieb, and are
reproduced here, with superficial modifications, with his permission.
"I" in this text generally refers to Prof. Gottlieb, except
in regards to administrative matters.
================ Start Lecture #11
Page replacement with multi-level page tables
A level-2 page in a multi-level page table is not replaced until all
the pages it points to are out of memory.
Paging with multiple process
Each process needs a separate page table. Conceptually part of
the process table entry. Page table for current process in specific
array in kernel space. Saved and reloaded at context switch.
4.6: Design issues for (demand) Paging Systems
4.6.1: Local vs Global Allocation Policies
A local PRA is one is which a victim page is chosen
among the pages of the same process that requires a new page. That is
the number of pages for each process is fixed. So LRU means the page
least recently used by this process.
- Of course we can't have a purely local policy, why?
Answer: A new process has no pages and even if we didn't apply this for
the first page loaded, the process would remain with only one page.
- Perhaps wait until a process has been running a while or give
the process an initial allocation based on the size of the executable.
- A global policy is one in which the choice of
victim is made among all pages of all processes.
If we apply global LRU indiscriminately with some sort of RR processor
scheduling policy, and memory is somewhat over-committed, then by the
time we get around to a process, all the others have run and have
probably paged out this process.
If this happens each process will need to page fault at a high
rate; this is called thrashing.
It is therefore important to get a good
idea of how many pages a process needs, so that we can balance the
local and global desires. The working set W(t,ω) is good for
An approximation to the working set policy that is useful for
determining how many frames a process needs (but not which pages)
is the Page Fault Frequency (PFF) algorithm.
- For each process keep track of the page fault frequency, which
is the number of faults divided by the number of references.
- Actually, must use a window or a weighted calculation since
you are really interested in the recent page fault frequency.
- If the PFF is too high, allocate more frames to this process.
- Raise its number of frames and use a local policy; or
- Bar its frames from eviction (for a while) and use a
- What if there are not enough frames?
Answer: Reduce the MPL (see next section).
As mentioned above a question arises what to do if the sum of the
working set sizes exceeds the amount of physical memory available.
This question is similar to the final point about PFF and brings us to
consider controlling the load (or memory pressure).
4.6.3: Page size
- Page size ``must'' be a multiple of the disk block size. Why?
Answer: When copying out a page if you have a partial disk block, you
must do a read/modify/write (i.e., 2 I/Os).
- Important property of I/O that we will learn later this term is
that eight I/Os each 1KB takes considerably longer than one 8KB I/O
- Characteristics of a large page size.
- Good for user I/O.
- If I/O done using physical addresses, then I/O crossing a
page boundary is not contiguous and hence requires multiple
- If I/O uses virtual addresses, then page size doesn't effect
this aspect of I/O. That is the addresses are contiguous
in virtual address and hence one I/O is done.
- Good for demand paging I/O.
- Better to swap in/out one big page than several small
- But if page is too big you will be swapping in data that is
really not local and hence might well not be used.
- Large internal fragmentation (1/2 page size).
- Small page table.
- A very large page size leads to very few pages. Process will
have many faults if using demand
paging and the process frequently references more regions than
- A small page size has the opposite characteristics.
4.7: Implementation Issues
4.7.1: Operating System Involvement with Paging
4.7.2: Page Fault Handling
What happens when a process, say process A, gets a page fault?
- The hardware detects the fault and traps to the kernel (switches
to supervisor mode and saves state).
- Some assembly language code save more state, establishes the
C-language (or another programming language) environment, and
``calls'' the OS.
- The OS determines that a page fault occurred and which page was
- If the virtual address is invalid, process A is killed.
If the virtual address is valid, the OS must find a free frame.
If there is no free frames, the OS selects a victim frame.
Call the process owning the victim frame, process B.
(If the page replacement algorithm is local process B is process A.)
- If the victim frame is dirty, the OS schedules an I/O write to
copy the frame to disk.
Thus, if the victim frame is dirty, process B is
blocked (it might already be blocked for some other reason).
Process A is also blocked since it needs to wait for this frame to be free.
The process scheduler is invoked to perform a context switch.
- Tanenbaum ``forgot'' some here.
- The process selected by the scheduler (say process C) runs.
- Perhaps C is preempted for D or perhaps C blocks and D runs
and then perhaps D is blocked and E runs, etc.
- When the I/O to write the victim frame completes, a Disk
interrupt occurs. Assume processes C is running at the time.
- Hardware trap / assembly code / OS determines I/O done.
- Processes B is moved from blocked to ready
(unless B is also blocked for some other reason).
- The scheduler picks a process to run, maybe A, maybe B, maybe
C, maybe another processes.
- At some point the scheduler does pick process A to run.
Recall that at this point A is still executing OS code.
- Now the O/S has a clean frame (this may be much later in wall clock
time if a victim frame had to be written).
The O/S schedules an I/O to read the desired page into this clean
Process A is blocked (perhaps for the second time) and hence the
process scheduler is invoked to perform a context switch.
- A Disk interrupt occurs when the I/O completes (trap / asm / OS
determines I/O done). The PTE is updated.
- The O/S may need to fix up process A (e.g. reset the program
counter to re-execute the instruction that caused the page fault).
- Process A is placed on the ready list and eventually is chosen by
the scheduler to run.
Recall that process A is executing O/S code.
- The OS returns to the first assembly language routine.
- The assembly language routine restores registers, etc. and
``returns'' to user mode.
Process A is unaware that all this happened.
4.7.3: Instruction Backup
A cute horror story. The 68000 was so bad in this regard that
early demand paging systems for the 68000, used two processors one
running one instruction behind. If the first got a page fault, there
wasn't always enough information to figure out what to do so the
system switched to the second processor after it did the page fault.
Don't worry about instruction backup. Very machine dependent and
modern implementations tend to get it right. The next generation
machine, 68010, provided extra information on the stack so the
horrible 2-processor kludge was no longer necessary.
4.7.4: Locking (Pinning) Pages in Memory
We discussed pinning jobs already. The
same (mostly I/O) considerations apply to pages.
4.7.5: Backing Store
The issue is where on disk do we put pages.
- For program text, which is presumably read only, a good choice is
the file itself.
- What if we decide to keep the data and stack each contiguous on
the backing store.
Data and stack grow so must be prepared to grow the space on
disk and leads to the same issues and problems as we saw with MVT.
- If those issues/problems are painful, we can scatter the pages on
- That is we employ paging!
- This is NOT demand paging.
- Need a table to say where the backing space for each page is
- This corresponds to the page table used to tell where in
real memory a page is located.
- The format of the ``memory page table'' is determined by
the hardware since the hardware modifies/accesses it.
- The format of the ``disk page table'' is decided by the OS
designers and is machine independent.
- If the format of the memory page table was flexible, then
we might well keep the disk information in it as well.
Suspending: Taking an active process entirely out of
memory and saving its state on disk.
Resuming: Loading a suspended process from disk to
A suspended process may be blocked (waiting for some other event to
complete) or it may be suspended and ready (i.e. ready to run
as soon as it is swapped in.)
In an OS that uses variable-length partitions, swapping can be done
when some waiting process can't fit in memory; especially when
the memory requirements of all active process is larger than total
memory. In an OS that uses paging, swapping can be done when the
working set sizes of the active processes are together larger than
physical memory, so that keeping all processes active leads to
Unlike previous similar issues, such as short-term scheduling and
page replacement, I am not going to give a list of a half-dozen
simple swapping algorithms. Swapping is more complicated and
doesn't lend itself to simple algorithms. (Also, it is done
comparatively infrequently, so one can afford to spend time on a complicated
algorithm.) Rather, there are a number of considerations that must be
First, note that there are two decisions to be made: which,
if any, jobs to swap in and which, if any, to swap out. With variable
length partitioning, these are always combined. The only reason to swap
one job out is to swap another in. You can only swap a job in if
either an active job terminates or if you swap some other job out.
With a paging system, it may become desirable to swap one job out and not
bring any in, if the working sets of the active jobs grow; or it may
become feasible to swap a job in without any other job terminating or
being suspended, if the working sets shrink.
Criteria for choosing jobs to swap in / out
The bottom line is that if you have to do a lot of swapping, you're
- Fairness. You want to play fair among the various jobs. Debatable
whether you want jobs to have equal time in memory; or equal time waiting
suspended; or equal time running the CPU; etc.
- Size. It's always easier and less costly to swap in a small job.
In choosing a job to swap out, there's a trade-off: Swapping out a large
job takes longer (to write to disk); on the other hand, it clears up
more space that you can use for other jobs. In particular, swapping
in a large job may require swapping out several small jobs.
- Blockedness. If the job is blocked in any case waiting for some
event that may be slow, why not swap it out?
- Real-time/interactive. If a job is supposed to run in real time
or if it is supposed to be interactive, it had better not be swapped out.
- Page faulting: If all the processes are page-faulting a lot, then
you have better swap one or more of them out.