Operating Systems
================ Start Lecture #10 ================
Notes:
-
I will be out of town next week. The class WILL
be held as usual.
-
Lab #4 (the last lab) is on the web.
It is due in three weeks.
The extra week is not because it is harder than the others; it
isn't.
The reason is that I won't be here next week to answer questions
about the lab.
-
Is anyone graduating this semester (conflict between 2250 final
exam time and convocation)?
4.4.9: The WSClock Page Replacement Algorithm
This treatment is based on one by Prof. Ernie Davis.
Tannenbaum suggests that the WSClock Page
Replacement Algorithm is a natural outgrowth of the idea of a working set.
However, reality is less clear cut.
WSClock is actually embodies several ideas,
one of which is connected to the idea of a working set.
As the name suggests another of the ideas is the clock implementation
of 2nd chance.
The actual implemented algorithm is somewhat complicated and not a
clean elegant concept.
It is important because
-
It works well and is in common use.
-
The embodied ideas are themselves interesting.
-
Inelegant amalgamations of ideas are more commonly used in real
systems than clean, elegant, one-idea algorithms.
Since the algorithm is complicated we present it in stages.
As stated above this is an important algorithm since it works well and
is used in practice. However, I certainly do not assume you remember
all the details.
-
We start by associating a node with every page loaded in memory
(i.e., with every frame given to this process).
In the node are stored R and M bits that we assume are set by the
hardware.
(Of course we don't design the hardware so really the R and M bits
are set in a hardware defined table and the nodes reference the
entries in that table.)
Every k clock ticks the R bit is reset.
So far this looks like NRU.
To ease the explanation we will assume k=1, i.e., actions
are done each clock tick.
-
We now introduce an LRU aspect (with the virtual time
approximation described above for working set): At each clock
tick we examine all the nodes for the running process and store
the current virtual time in all nodes for which R is 1.
Thus, the time field is an approximation to the time of the
most recent reference, accurate to the clock period. Note that
this is done every clock tick (really every k ticks) and
not every memory reference. That is why it is feasible.
If we chose as victim the page with the smallest time field, we
would be implementing a virtual time approximation to LRU.
But in fact we do more.
-
We now introduce some working set aspects into the algorithm by
first defining a time constant τ (analogous to ω in the
working set algorithm) and consider all pages older than τ
(i.e., their stored time is smaller than the current time minus
τ) as candidate victims.
The idea is that these pages are not in the working set.
The OS designer needs to tune τ just as one would need to
tune ω and, like ω, τ is quite robust (the same
value works well for a variety of job mixes).
The advantage of introducing τ is that a victim search can
stop as soon as a page older than τ is found.
If no pages have a reference time older than Tau, then the page
with the earliest time is the victim.
-
Next we introduce the other aspect of NRU, preferring clean to
dirty victims.
We search until we find a clean page older than τ, if
there is one; if not, we use a dirty page older than τ.
-
Now we introduce an optimization similar to prefetching (i.e.,
speculatively fetching some data before it is known to be needed).
Specifically, when we encounter a dirty page older than τ
(while looking for a clean old page), we write the dirty page back
to disk (and clear the M bit, which Tanenbaum forgot to mention)
without evicting the page, on the
presumption that, since the page is not in (our approximation to)
the working set, this I/O will be needed eventually.
The down side is that the page could become dirty again, rendering
our speculative I/O redundant.
Suppose we've decided to write out old dirty pages
D1 through Dd and to replace old clean page
C with new page N.
We must block the current process P until N is completely read
in, but P can run while D1 through Dd are
being written. Hence we would desire the I/O read to be done
before the writes, but we shall see later, when we study I/O, that
there are other considerations for choosing the order to perform
I/O operations.
Similarly, suppose we can not find an old clean page and have
decided to replace old dirty page D0 with new page N,
and have detected additional old dirty pages D1 through
Dd (recall that we were searching for an old clean
page). Then P must block until D0 has been written
and N has been read, but can run while D1 through
Dd are being written.
-
We throttle the previous optimization to prevent overloading the
I/O subsystem.
Specifically we set a limit on the number of dirty pages the
previous optimization can request be written.
-
Finally, as in the clock algorithm, we keep the data structure
(nodes associated with pages) organized as a circular list with a
single pointer (the hand of the clock).
Hence we start each victim search where the previous one left
off.
4.4.10: Summary of Page Replacement Algorithms
Algorithm | Comment
|
---|
Random | Poor, used for comparison
|
Optimal | Unimplementable, used for comparison
|
LIFO | Horrible, useless
|
NRU | Crude
|
FIFO | Not good ignores frequency of use
|
Second Chance | Improvement over FIFO
|
Clock | Better implementation of Second Chance
|
LRU | Great but impractical
|
NFU | Crude LRU approximation
|
Aging | Better LRU approximation
|
Working Set | Good, but expensive
|
WSClock | Good approximation to working set
|
4.5: Modeling Paging Algorithms
4.5.1: Belady's anomaly
Consider a system that has no pages loaded and that uses the FIFO
PRU.
Consider the following “reference string” (sequences of
pages referenced).
0 1 2 3 0 1 4 0 1 2 3 4
If we have 3 frames this generates 9 page faults (do it).
If we have 4 frames this generates 10 page faults (do it).
Theory has been developed and certain PRA (so called “stack
algorithms”) cannot suffer this anomaly for any reference string.
FIFO is clearly not a stack algorithm. LRU is. Tannenbaum has a few
details, but we are skipping it.
Repeat the above calculations for LRU.
4.6: Design issues for (demand) Paging Systems
4.6.1: Local vs Global Allocation Policies
A local PRA is one is which a victim page is
chosen among the pages of the same process that requires a new page.
That is the number of pages for each process is fixed. So LRU for a
local policy means the page least recently used by this process.
A global policy is one in which the choice of
victim is made among all pages of all processes.
-
Of course we can't have a purely local policy, why?
Answer: A new process has no pages and even if we didn't apply this for
the first page loaded, the process would remain with only one page.
-
Perhaps wait until a process has been running a while or give
the process an initial allocation based on the size of the executable.
If we apply global LRU indiscriminately with some sort of RR processor
scheduling policy, and memory is somewhat over-committed, then by the
time we get around to a process, all the others have run and have
probably paged out this process.
If this happens each process will need to page fault at a high
rate; this is called thrashing.
It is therefore important to get a good
idea of how many pages a process needs, so that we can balance the
local and global desires. The working set size w(t,ω) is good for
this.
An approximation to the working set policy that is useful for
determining how many frames a process needs (but not which pages)
is the Page Fault Frequency (PFF) algorithm.
-
For each process keep track of the page fault frequency, which
is the number of faults divided by the number of references.
-
Actually, must use a window or a weighted calculation since
you are really interested in the recent page fault frequency.
-
If the PFF is too high, allocate more frames to this process.
Either
-
Raise its number of frames and use a local policy; or
-
Bar its frames from eviction (for a while) and use a
global policy.
-
What if there are not enough frames?
Answer: Reduce the MPL (see next section).
As mentioned above a question arises what to do if the sum of the
working set sizes exceeds the amount of physical memory available.
This question is similar to the final point about PFF and brings us to
consider controlling the load (or memory pressure).
4.6.2: Load Control
To reduce the overall memory pressure, we must reduce the
multiprogramming level (or install more memory while the system is
running, which is hardly practical). That is, we have a
connection between memory management and process management. These are
the suspend/resume arcs we saw way back when.
4.6.3: Page size
-
Page size “must” be a multiple of the disk block size. Why?
Answer: When copying out a page if you have a partial disk block, you
must do a read/modify/write (i.e., 2 I/Os).
-
Important property of I/O that we will learn later this term is
that eight I/Os each 1KB takes considerably longer than one 8KB I/O
-
Characteristics of a large page size.
-
Good for demand paging I/O.
-
Better to swap in/out one big page than several small
pages.
-
But if page is too big you will be swapping in data that is
really not local and hence might well not be used.
-
Large internal fragmentation (1/2 page size).
-
Small page table.
-
A very large page size leads to very few pages. Process will
have many faults if using demand
paging and the process frequently references more regions than
the number of (large) frames that the process has been allocated.
-
Possibly good for user I/O (unofficial).
-
If I/O done using physical addresses, then an I/O crossing a
page boundary is not contiguous and hence requires multiple
actual I/Os. A large page size makes it less likely that
a single user I/O will span multiple pages.
-
If I/O uses virtual addresses, then page size doesn't effect
this aspect of I/O. That is, the addresses are contiguous
in virtual address and hence one I/O is done.
-
A small page size has the opposite characteristics.
Homework: Consider a 32-bit address machine using
paging with 8KB pages and 4 byte PTEs. How many bits are used for
the offset and what is the size of the largest page table?
Repeat the question for 128KB pages.
4.6.4: Separate Instruction and Data (I and D) Spaces
Skipped.
4.6.5: Shared pages
Permit several processes to each have a page loaded in the same
frame.
Of course this can only be done if the processes are using the same
program and/or data.
-
Really should share segments.
-
Must keep reference counts or something so that when a process
terminates, pages (even dirty pages) it shares with another process
are not automatically discarded.
-
Similarly, a reference count would make a widely shared page (correctly)
look like a poor choice for a victim.
-
A good place to store the reference count would be in a structure
pointed to by both PTEs. If stored in the PTEs themselves, we
must keep somehow keep the count consistent between processes.
Homework: 33
4.6.6: Cleaning Policy (Paging Daemons)
Done earlier
4.6.7: Virtual Memory Interface
Skipped.
4.7: Implementation Issues
4.7.1: Operating System Involvement with Paging
-
Process creation. OS must guess at the size of the process and
then allocate a page table and a region on disk to hold the pages
that are not memory resident. A few pages of the process must be loaded.
-
Ready→Running transition by the scheduler. Real memory must
be allocated for the page table if the table has been swapped out
(which is permitted when the process is not running). Some
hardware register(s) must be set to point to the page table.
(There can be many page tables resident, but the hardware must be
told the location of the page table for the running process--the
"active" page table.
-
Page fault. Lots of work. See 4.7.2 just below.
-
Process termination. Free the page table and the disk region for
swapped out pages.
4.7.2: Page Fault Handling
What happens when a process, say process A, gets a page fault?
-
The hardware detects the fault and traps to the kernel (switches
to supervisor mode and saves state).
-
Some assembly language code save more state, establishes the
C-language (or another programming language) environment, and
“calls” the OS.
-
The OS determines that a page fault occurred and which page was
referenced.
-
If the virtual address is invalid, process A is killed.
If the virtual address is valid, the OS must find a free frame.
If there is no free frames, the OS selects a victim frame.
Call the process owning the victim frame, process B.
(If the page replacement algorithm is local, the victim is process A.)
-
The PTE of the victim page is updated to show that the page is no
longer resident.
-
If the victim page is dirty, the OS schedules an I/O write to
copy the frame to disk and blocks A waiting for this I/O to occur.
-
Assuming process A needed to be blocked (i.e., the victim page is
dirty) the scheduler is invoked to perform a context switch.
-
Tanenbaum “forgot” some here.
-
The process selected by the scheduler (say process C) runs.
-
Perhaps C is preempted for D or perhaps C blocks and D runs
and then perhaps D is blocked and E runs, etc.
-
When the I/O to write the victim frame completes, a disk
interrupt occurs. Assume processes C is running at the time.
-
Hardware trap / assembly code / OS determines I/O done.
-
The scheduler marks A as ready.
-
The scheduler picks a process to run, maybe A, maybe B, maybe
C, maybe another processes.
-
At some point the scheduler does pick process A to run.
Recall that at this point A is still executing OS code.
-
Now the O/S has a free frame (this may be much later in wall clock
time if a victim frame had to be written).
The O/S schedules an I/O to read the desired page into this free
frame.
Process A is blocked (perhaps for the second time) and hence the
process scheduler is invoked to perform a context switch.
-
Again, another process is selected by the scheduler as above and
eventually a Disk interrupt occurs when the I/O completes (trap /
asm / OS determines I/O done). The PTE in process A is updated to
indicate that the page is in memory.
-
The O/S may need to fix up process A (e.g. reset the program
counter to re-execute the instruction that caused the page fault).
-
Process A is placed on the ready list and eventually is chosen by
the scheduler to run.
Recall that process A is executing O/S code.
-
The OS returns to the first assembly language routine.
-
The assembly language routine restores registers, etc. and
“returns” to user mode.
The user's program running as process A is unaware
that all this happened (except for the time delay).
4.7.3: Instruction Backup
A cute horror story. The 68000 was so bad in this regard that
early demand paging systems for the 68000, used two processors one
running one instruction behind. If the first got a page fault, there
wasn't always enough information to figure out what to do so the
system switched to the second processor after it did the page fault.
Don't worry about instruction backup. Very machine dependent and
modern implementations tend to get it right. The next generation
machine, 68010, provided extra information on the stack so the
horrible 2-processor kludge was no longer necessary.
4.7.4: Locking (Pinning) Pages in Memory
We discussed pinning jobs already. The
same (mostly I/O) considerations apply to pages.
4.7.5: Backing Store
The issue is where on disk do we put pages.
-
For program text, which is presumably read only, a good choice is
the file executable itself.
-
What if we decide to keep the data and stack each contiguous on
the backing store.
Data and stack grow so we must be prepared to grow the space on
disk, which leads to the same issues and problems as we saw with
MVT.
-
If those issues/problems are painful, we can scatter the pages on
the disk.
-
That is we employ paging!
-
This is NOT demand paging.
-
Need a table to say where the backing space for each page is
located.
-
This corresponds to the page table used to tell where in
real memory a page is located.
-
The format of the “memory page table” is determined by
the hardware since the hardware modifies/accesses it. It
is machine dependent.
-
The format of the “disk page table” is decided by the OS
designers and is machine independent.
-
If the format of the memory page table was flexible, then
we might well keep the disk information in it as well.
But normally the format is not flexible and this
is not done.
-
What if we felt disk space was too expensive and wanted to put
some of these disk pages on say tape?
Ans: We use demand paging of the disk blocks! That way
"unimportant" disk blocks will migrate out to tape and are brought
back in if needed.
Since a tape read requires seconds to complete (because the
request is not likely to be for the sequentially next tape block),
it is crucial that we get very few disk block faults.
Homework: Assume every instruction takes 0.1
microseconds to execute providing it is memory resident. Assume a page
fault takes 10 milliseconds to service providing the necessary disk
block is actually on the disk.
Assume a disk block fault takes 10 seconds service. So the worst case
time for an instruction is 10.0100001 seconds.
Finally assume the program requires that a billion instructions be
executed.
-
If the program is always completely resident, how long does it
take to execute?
-
If 0.1% of the instructions cause a page fault, but all the disk
blocks are on the disk, how long does the program take to execute
and what percentage of the time is the program waiting for a page
fault to complete?
-
If 0.1% of the instructions cause a page fault and 0.1% of the
page faults cause a disk block fault, how long does the program
take to execute, what percentage of the time is the program
waiting for a disk block fault to complete?
4.7.6: Separation of Policy and Mechanism
Skipped.