================ Start Lecture #13 ================

3.6: Design issues for (demand) Paging

3.6.1 & 3.6.2: The Working Set Model and Local vs Global Policies

I will do these in the reverse order (which makes more sense). Also Tanenbaum doesn't actually define the working set model, but I shall.

A local PRA is one is which a victim page is chosen among the pages of the same process that requires a new page. That is the number of pages for each process is fixed. So LRU means the page least recently used by this process.

Of course we can't have a purely local policy, why?
Answer: A new process has no pages and even if we didn't apply this for the first page loaded, the process would remain with only one page.
Perhaps wait until a process has been running a while.
A global policy is one in which the choice of victim is made among all pages of all processes.

If we apply global LRU indiscriminately with some sort of RR processor scheduling policy, and memory is somewhat over-committed, then by the time we get around to a process, all the others have run and have probably paged out this process.

If this happens each process will need to page fault at a high rate; this is called thrashing. It is therefore important to get a good idea of how many pages a process needs, so that we can balance the local and global desires.

The working set policy (Peter Henning)

The goal is to specify which pages a given process needs to have memory resident in order for the give process to run without too many page faults.

But this is impossible since it requires predicting the future.
So we make the assumption that the immediate future is well approximated by the immediate past.
Measure time in units of memory references, so t=1045 means the time when the 1045th memory reference is issued.
In fact we measure time separately for each process, so t=1045 really means the time when this process made its 1045th memory reference.
W(t,&omega) is the set of pages referenced (by the given process) from time t-omega to time t.
That is, W(t,&omega) is the set pages referenced during the window of size omega ending at time t.
That is, W(t,&omega) is the set of pages referenced by the last &omega memory references ending at reference t.
W(t,&omega) is called the working set at time t (with window &omega).
(Netscape doesn't support &omega to give the Greek letter, ouch)
w(t,omega) is the size of the set W(t,omega), i.e. is the number of pages referenced in the window.

The idea of the working set policy is to ensure that each process keeps its working set in memory.

One possibility is to allocate w(t,&omega) frames to each process (this number differs for each process and changes with time) and then use a local policy.
What if there aren't enough frames to do this?
Reduce the multiprogramming level (MPL)! That is, we have a connection between memory management and process management. This is the suspend/resume arcs we saw way back when.

Interesting questions include:

What value should be used for &omega?
How should we calculate W(t,&omega)?

Various approximations to the working set frequency have been devised.

Wsclock
- Use the aging algorithm above to maintain a counter for each PTE and declare a page whose counter is above a certain threshold to be part of the working set.
- Apply the clock algorithm globally (i.e. to all pages) but refuse to page out any page in a working set, the resulting algorithm is called wsclock.
- What if we find there are no pages we can page out?
  Answer: Reduce the multiprogramming level (MPL).
Page Fault Frequency (PFF)
- For each process keep track of the page fault frequency, which is the number of faults divided by the number of references.
- Actually, must use a window or a weighted calculation since you are really interested in the recent page fault frequency
- If the PFF is too high, allocate more frames to this process. Either
  1. Raise its number of frames and use local policy; or
  2. Bar its frames from eviction (for a while) and use a global policy.
- What if there are not enough frames?
  Answer: Lower the MPL.

3.6.3: Page size

Page size ``must'' be a multiple of the disk block size. Why?
Answer: When copying out a page if you have a partial disk block, you must do a read/modify/write (i.e., 2 I/Os).
Important property of I/O that we will learn later this term is that eight I/Os each 1KB takes considerably longer than one 8KB I/O
Large page size
- Good for user I/O
  - If I/O done using physical addresses, then I/O crossing a page boundary is not contiguous and hence requires multiple I/Os
  - If I/O uses virtual addresses, then page size doesn't effect this aspect of I/O. That is the addresses are contiguous in virtual address and hence one I/O is done.
- Good for demand paging I/O.
  - Better to swap in/out one big page than several small pages
  - But if page too big you will be swapping in data that is really not local and hence might well not be used.
- Large internal fragmentation (1/2 page size)
- Small page table
- Very few pages. Process will have many faults if using demand paging and the process frequently references more regions than frames.
Small page size has the opposite properties

3.6.4: Implementation Issues

Don't worry about instruction backup. Very machine dependent and modern implementations tend to get it right.

Locking (pinning) pages

We discussed pinning jobs already. The same (mostly I/O) considerations apply to pages.

Shared pages

Really should share segments

Must keep reference counts or something so that when a process terminates, pages (even dirty pages) it shares with another process are not automatically discarded.
Similarly a reference count would make a widely shared page (correctly) look like a poor choice for a victim.
A good place to store the reference count would be in a structure pointed to by both PTEs.

Backing Store

The issue is where on disk do we put pages

For program text, which is presumably read only, a good choice is the file itself.
Data and stack grow so either be prepared to grow the space on disk (same issues as MVT).

Paging Daemons

Done earlier

Page Fault Handling (not on 202 exams)

Hardware traps to the kernel (switches to supervisor mode; saves state)
Assembly language code save more state, establishes the C-language environment, calls the OS
OS determines that a fault occurred and which page
If virtual address is invalid, shoot process. If valid, seek a free frame. If no free frames, select a victim.
If the victim frame is dirty, schedule an I/O write to copy the frame to disk. This process is blocked so the process scheduler is invoked to perform a context switch.
- Tanenbaum ``forgot'' some here
- Disk interrupt occurs when I/O complete
- Hardware trap / assembly code / OS determines I/O done
- Process moved from blocked to ready
- Some time later a context switch occurs to this ready process. Since this process is in kernel mode, perhaps it was scheduled to run as soon as it was ready. (I am using a ``self-service'' model where the process moves from user mode to kernel mode.)
Now the frame is clean (this may be much later in wall clock time). Schedule an I/O to read the desired page into this clean frame. The process is again blocked and hence the process scheduler is invoked to perform a context switch.
Disk interrupt occurs when I/O complete (trap / asm / OS determines I/O done) / process made ready / process starts running). PTE updated
Fix up process (e.g. reset PC)
Process put in ready queue and eventually runs. The OS returns to the first asm routine.
Asm routine restores registers, etc. and returns to user mode.

The process is unaware that all this happened.

3.7: Segmentation

Up to now, the virtual address space has been contiguous.

Among other issues this makes it difficult when there are more that two dynamically growing regions.
With two regions you start them on opposite sides of the virtual space as we did before.
Better is to have many virtual address spaces each starting at zero.
This split up is user visible.
Without segmentation (equivalently said with just one segment) all procedures and are packed together so if one changes in size all the virtual addresses following are change and the program must be re-linked.
Eases flexible protection and sharing (share a segment). For example, can have a shared library.

The following table mostly from Tanenbaum compares demand paging with demand segmentation.

Consideration Demand
Paging Demand
Segmentation
Programmer aware No Yes

How many addr spaces 1 Many

VA size > PA size Yes Yes

Protect individual
procedures separately No Yes

Accommodate elements
with changing sizes No Yes

Ease user sharing No Yes

Why invented let the VA size
exceed the PA size Sharing, Protection,
independent addr spaces

Internal fragmentation Yes No, in principle

External fragmentation No Yes

Placement question No Yes

Replacement question Yes Yes

Consideration	Demand Paging	Demand Segmentation
Programmer aware	No	Yes
How many addr spaces	1	Many
VA size > PA size	Yes	Yes
Protect individual procedures separately	No	Yes
Accommodate elements with changing sizes	No	Yes
Ease user sharing	No	Yes
Why invented	let the VA size exceed the PA size	Sharing, Protection, independent addr spaces

Internal fragmentation	Yes	No, in principle
External fragmentation	No	Yes
Placement question	No	Yes
Replacement question	Yes	Yes

Homework: 29.

** Two Segments

Late PDP-10s and TOPS-10

One shared text segment, that can also contain shared (normally read only) data.
One (private) writable data segment
Permission bits on each segment.
Which kind of segment is better to evict?
- Swap out shared segment hurts many tasks
- The shared segment is read only (probably) so no writeback is needed.
``One segment'' is OS/MVT done above.

** Three Segments

Traditional Unix shown above.

Shared text execute only
Data segment (global and static variables)
Stack segment (automatic variables)

** Four Segments

Just kidding.

** General (not necessarily demand) Segmentation

Permits fine grained sharing and protection
Visible division of program
Variable size segments
Address = (seg#, offset)
Does not mandate how stored in memory.
- One possibility is that the entire program must be in memory to run it. Use whole process swapping. Early unix did this
- Can also implement demand segmentation.
- Can combine with demand paging (done below)
Segment table with a base and limit value for each segment
(seg#, offset) --> base address + offset
External fragmentation as with whole program swapping. Since segments are smaller than programs (several segments make up one program), the external fragmentation is not as bad.

** Demand Segmentation

Same idea as demand paging applied to segments

If segment is loaded, base and limit are stored in processor registers and used in each memory reference.
If segment is not loaded, the base and limit as well as the disk address of the segment is store in the STE (segment table entry).
Generate a segment fault (analogous to page fault) if reference a non-loaded segment
To load a segment have the placement question in addition to replacement question.
I believe this was one implemented but am not sure; it is not done now.

** 3.7.2: Segmentation with paging

Combines both segmentation and paging to get advantages of both at a cost in complexity. This is very common now.

Virtual address is now a triple (seg#, page#, offset)
Each segment table entry points to the page table for that segment (similar to multi-level paging).

The page# field in the address gives the entry in the chosen page table and the offset gives the offset in the page.
Instead of a limit field, the STE contains the length of the segment in pages (which equals the size of the corresponding page table in PTEs)
Straightforward implementation requires 3 memory references so a TLB is crucial.
Sometimes say the segments are of fixed size. This is wrong. They are of variable size with a fixed maximum.
The first example of this was Multics
Keep protection and sharing information on segments
Do replacement and placement on pages (no external fragmentation)
Share a segment by sharing the page table for that segment.
- Thus there is no aliasing
- Good thing or eviction would be problematic

Homework: 30.

Some last words

Segmentation / Paging / Demand Loading
- Each is a yes or no alternative
- Gives 8 possibilities
Placement and Replacement
Internal and External Fragmentation
Page Size and locality of reference
Multiprogramming level and medium term scheduling