Class 6
CS 372H
3 February 2011

On the board
------------

(One handout)

1. Continuing with paging and JOS memory map
2. Other page structures
3. Page faults
   --mechanics
   --uses
   --costs

	4KB    = 2^{12} = 0x00001000   
	4MB    = 2^{22} = 0x00400000  

---------------------------------------------------------------------------

1. Continuing with paging and JOS memory map 

    --two classes ago: virtual memory on the x86 is implemented first
    via a segment translation and second by a paging translation

    --last class we discussed:
    
	--how the processor actually maps from virtual page to physical
	page when a process does a load or store, which in turn
	determined:

	--what data structures the operating system must set up for the
	processor

    --here are some points to reinforce the ideas:

	--KEY IDEA IN VIRTUAL MEMORY: **OS inserts appropriate entries
	in the page directory and page table, and then the program, or
	the OS itself, can reference the address**

	--the above is a powerful thing. it amounts to the ability to
	manufacture (and remove) opaque handles on the fly, just by
	inserting and removing entries in the mapping.
	
	    --the program itself can make such requests implicitly (as
	    it page faults) or explicitly (via mmap, which can be told
	    to fail if it can't create a particular entry in virtual
	    space).

	    --the OS can certainly proactively set up the virtual
	    address space for the program 

    --example #1 of the KEY IDEA: if OS wants a program to be able to
    use address 0x0040 2000 to refer to physical address 0x0a37 0000 but
    in a read-only way, the OS, conceptually speaking, creates an entry
    
	<0x00402000, 0x0a370000>

	That mapping is implemented like this:

	PGDIR

	............
				       <20 bits> <12 bits>
	............                   | a370   |  W=0    |   [entry 2]
				       |        |         |   [entry 1]
	.....[entry 1]          ---->  |________|_________|   [entry 0]
				       
	........


    --example #2 of the KEY IDEA: recall that JOS itself maps physical
    memory at the top of the virtual address space (ask yourself how
    this works)

	--see handout: everything above KERNBASE

	--conclude: any physical memory that is in use is actually
	mapped in multiple places

	--why does the kernel do this? because the kernel needs to be
	able to get access to physical memory when setting up page
	tables: *kernel* has to be able to use physical addresses from
	time to time. common use: setting up page directories and page
	tables

    --example #3 of the KEY IDEA: VPT and UVPT. these are virtual addresses
    where the entire page structure appears to the OS (and, in the case
    of UVPT, to user-level processes).

	--see handout: VPT and UVPT

	--see notes from last time about how this is implemented

	--if you truly understand how and why this implementation trick
	works -- and what it's accomplishing -- then you understand the
	important pieces of virtual memory.

---------------------------------------------------------------------------
Admin notes

    --Did everyone get email about the course mailing list?

    --Tomorrow is the deadline for letting us know that you want to code
    in a pair

	--Pair method: we're serious about it

    --Good JOS advice from someone who took the class last year:
    "convince yourself that your code is working correctly rather than
    relying on passing the test cases".

---------------------------------------------------------------------------

2. Other page structures

    A. Very large page sizes (e.g., 4 MB)

	--advantage: small page tables

	--disadvantage: lots of wasted memory

	--PSE (set bit 7 in PDE and get 4MB pages, no PTs)

	--**there is trade-off between large page sizes and small page
	sizes**. what is the nature of the trade-off?

	    --large page sizes means wasting actual memory

	    --small page sizes means lots of page table entries (which
	    may or may not get consumed)

	    --Tanenbaum gives an equation (section 3.5.3):
		s = size of virtual space used
		e = size of entry
		p = page size

		overhead = se/p + p/2

		d (ovhd)/dp = -se/p^2 + 1/2

		finds its min. at p = sqrt(2se)

    B. Many levels of page table

	--advantage: not much memory spent on page tables if address
	space is sparse

	--disadvantage: lots of page table walking

    C. What happens when memory gets huge?
	
	--many levels of page table; or 

	--inverted page table

	    --works as a hash table

	    --stores <vpn,ppn> entries

	    [[--NOTE: the book and other references say that this thing
	    has to have the same number of entries as the number of
	    physical pages in the machine, but that is bogus. That
	    number is neither a useful minimum nor a useful maximum. It
	    is not a useful minimum because the table has to deal with
	    collisions from the fact that a potentially very large
	    number of VPNs are mapping to a much smaller number of PPNs
	    (e.g., mapping the same PPN at different places in the
	    address space), so the table needs to be able to live with a
	    number of entries greater than the number of physical frames
	    (i.e., it must handle being oversubscribed). Hence, it could
	    presumably have a smaller number of entries than the number
	    of physical frames (which is just another kind of
	    oversubscription). It is not a useful maximum because in
	    general when one is using hash tables, one wants the hash
	    table to be a little bit larger than the number of entries
	    that one is storing; adding even a little bit of "wiggle
	    room" in the form of blank entries tends to reduce
	    collisions a lot. (See Knuth, chapter 6.4.)
	    
	    So it's not at all clear how big the inverted page table
	    should be, except that the whole point is to be smaller than
	    a traditional page table. Thus, one presumably wants it to
	    be O(number of physical pages), with a small constant.]]

3A. Page faults: mechanics

    --what happens if the address isn't in the page table or there is
    a protection violation? [page fault!]

    --NOTE: TLB MISS != PAGE FAULT

	--not all TLB misses generate page faults, and not all page
	faults began with TLB misses (on a store instruction, when might
	an appropriate entry be in the TLB but there is a still a page
	fault? what about on a load instruction?)

    --what happens on the x86?

	[see handout]

	--kernel executes a trap frame:

		ss
		esp    [former value of stack pointer]
		eflags [former value of eflags]
		cs 
      %esp-->	eip    [instruction that caused the trap]
		[error code] 

	%eip is now executing code to handle the trap
	    [how did processor know what to load into %eip?]

	error code:
	    [ ................................ U/S | W/R | P]
		     unused

	    U/S: user mode fault / supervisor mode fault
	    R/W: access was read / access was write
	    P: not-present page / protection violation

	on a page fault, %cr2 holds the faulting linear address

	idea is that when page fault happens, the kernel sets up the
	process's page entries properly, or kills the process

3B. Page faults: uses

    --exhibit A for the use of paging is virtual memory:

	--your program thinks it has, say, 512 MB of memory, but your
	hardware has only 4 MB of memory

	--the way that this worked is that the disk was (is) used to
	store memory pages

	--advantage: address space looks huge

	--disadvantage: accesses to "paged" memory (as disk pages that
	live on the disk are known) are sllooooowwwww:

	--the implementation of this is described in Tanenbaum 3.6.
	Roughly:

	    --on a page fault, the kernel reads in the faulting page

	    --QUESTION: what is listed in the page structures? how does
	    kernel know whether the address is invalid, in memory,
	    paged, what?

	    --called demand paging, and it's one way to get program code
	    into memory "lazily"
    
	    --kernel may need to send a page to disk (under what
	    conditions? answer: two conditions must hold for kernel to
	    HAVE to write to disk)

		(1) kernel is out of memory
		(2) the page that it selects to write out is dirty


	--Many 32-bit machines have 4GB of memory, so less common to
	hear the sound of swapping these days. You either need 36-bit
	addressing and memory hogs, or multiple large memory consumers
	running on the same computer

    --many, many other uses for page faults and virtual memory

	--high-level idea: by giving kernel (or even user-level program)
	the opportunity to do interesting things on page faults, you can
	build interesting functionality:

	    --store memory pages across the network! (Distributed Shared
	    Memory)

		--basic idea was that on a page fault, the page fault
		handler went and retrieved the needed page from some
		other machine

	    --copy-on-write

		--when creating a copy of another process, don't copy
		its memory. just copy its page tables, mark the pages as
		read-only

		--QUESTION: do you need to mark the parent's pages
		as read-only as well? 

		--program semantics aren't violated when programs do
		reads

		--when a write happens, a page fault results. at that
		point, the kernel allocates a new page, copies the 
		memory over, and restarts the user program to do a write

		    --then, only do copies of memory when there is a
		    fault as a result of a write

		--this idea is all over the place

	    --accounting

		--good way to sample what percentage of the memory pages
		are written to in any time slice: mark a fraction of
		them not present, see how often you get faults

	    --if you are interested in this, check out the paper
	    "Virtual Memory Primitives for User Programs", by Andrew W.
	    Appel and Kai Li, Proc. ASPLOS, 1991.

    --Paging in day-to-day use

	 --Demand paging 

	 --Growing the stack 

	 --BSS page allocation 

	 --Shared text 

	 --Shared libraries 

	 --Shared memory 
	 
	 --Copy-on-write (fork, mmap, etc.) 

3C. Page faults: costs

    --What does demand paging (i.e., paging from the disk) cost?

	--let's look at average memory access time (AMAT)

	--AMAT = (1-p)*memory access time + p * page fault time,
	where p is the prob. of a page fault.
	
	memory access time ~ 100ns 
	disk access time   ~ 10 ms = 10^7 ns

	--QUESTION: what does p need to be to ensure that paging hurts
	performance by less than 10%?

	1.1*t_M = (1-p)*t_M + p*t_D
	p = .1*t_M / (t_D - t_M) ~ 10^1 / 10^7 = 10^{-6} 

	so only one access out of 1,000,000 can be a page fault!!

	--basically, page faults are super-expensive (good thing the
	machine can do other things during a page fault)

    --Thrashing is even worse

	Memory overcommitted -- pages tossed out while still needed 
     
	Example:

	    --one program touches 50 pages (each equally likely); only 
	      have 40 physical page frames 
	    
	    --If have enough pages, 100ns/ref 
     
	    --If have too few pages, assume every 5th reference leads
	    to a  page fault 
     
	    --4refs x 100ns  and 1 page fault x 10ms for disk I/O 

	    --this gets us
		5 refs per (10ms + 400ns) = 2ms/ref = 20,000x slowdown!!! 
     

	--What we wanted: virtual memory the size of disk with access
	time the speed of physical memory 

	--What we have here: memory with access time roughly of disk
	(2 ms/mem_ref compare to 10 ms/disk_access)

	Concept is much larger than OSes: need to pay attention to the
	slow case if it's really slow and common enough to matter.