Class 4
CS372H
26 January 2012

On the board
------------

1. Last time
2. Virtual memory reinforcement
3. TLBs
4. Page faults
    --mechanics
    --uses
    --costs
5. Page replacement

    potentially useful reference:
	4KB    = 2^{12} = 0x00001000  = 0x00000fff + 1
	4MB    = 2^{22} = 0x00400000  = 0x003fffff + 1
	256 MB = 2^{28} = 0x10000000  = 0x0fffffff + 1
	4GB    = 2^{32} =0x100000000  = 0xffffffff (+1) = ~0x00000000

	(0xef800000 >> 22) = 0x3be = 958
	(0xf0000000 >> 22) = 0x3c0 = 960

---------------------------------------------------------------------------

1. Last time

    --varargs, PC emulation, segmentation, paging

    --make sure you perfectly understand the pseudocode in yesterday's notes

    --also make sure you understand the 4MB contiguous page table trick;
    the notes step you through it

2. Virtual memory reinforcement

    A. What does the processor's MMU do? What algorithm does it follow?

    B. KEY IDEA IN VIRTUAL MEMORY: **OS inserts appropriate entries
    in the page directory and page table, and then the program, or
    the OS itself, can reference the address**

    --the above is a powerful thing. it amounts to the ability to
    manufacture (and remove) opaque handles on the fly, just by
    inserting and removing entries in the mapping.
    
	--the program itself can make such requests implicitly (as it
	page faults, which we'll discuss shortly) or explicitly (via
	mmap, which can be told to fail if it can't create a particular
	entry in virtual space).

	--certainly, the OS can proactively set up the virtual address
	space for the program 

    --example #1 of the KEY IDEA: if OS wants a program to be able to
    use address 0x0040 2000 to refer to physical address 0x0a37 0000 but
    in a read-only way, the OS, conceptually speaking, creates an entry
    
	<0x00402000, 0x0a370000>

	That mapping is implemented like this:

	PGDIR

	............
				       <20 bits> <12 bits>
	............                   | a370   |  W=0    |   [entry 2]
				       |        |         |   [entry 1]
	.....[entry 1]          ---->  |________|_________|   [entry 0]
				       
	........

    --example #1a:

	what if we wanted to change the physical address pointed to?


    --example #2 of the KEY IDEA: JOS itself maps physical memory at the
    top of the virtual address space (ask yourself how this works)

	--see handout: everything above KERNBASE

	--conclude: any physical memory that is in use by user processes
	is actually mapped in multiple places

	--why does the kernel do this? because the kernel needs to be
	able to get access to physical memory when setting up page
	tables: *kernel* has to be able to use physical addresses from
	time to time. common use: setting up page directories and page
	tables

    --example #3 of the KEY IDEA: UVPT. this is the virtual address
    where the entire page structure appears to the 
    user-level processes).

	--see handout: UVPT

	--see notes from last time about how this is implemented

	--if you truly understand how and why this implementation trick
	works -- and what it's accomplishing -- then you probably
	understand the important pieces of virtual memory.

    C. Always remember

	--each entry in the page *directory* corresponds to [that is,
	"helps map"] 4MB of virtual address space

	--each entry in the page *table* corresponds to [that is, "helps
	map"] 4KB of virtual address space

	--so how much virtual memory is each page *table*
	responsible for translating? 4KB? 4MB? something else?

	--each page directory and each page table itself consumes
	4KB of physical memory, i.e., each one of these fits on a
	page

    D. virtual memory in JOS

	--paging limits a process's memory access to its own address
	space

	--see handout for JOS virtual memory map

	--why are kernel and current process both mapped into address space?

	    --convenient for kernel

	--we mentioned before that all of physical memory is mapped at
	the top.

	--wouldn't it be awesome if the 4MB worth of page table appeared
	inside the virtual address space, at address, say,
	0xef800000 (which we call UVPT)?

	--to see how the above is implemented, see the notes from
	last time.

3. TLBs
	
    --so it looks like the CPU (specifically its MMU) has to go out
    to memory on every memory reference?

	--called "walking the page tables"

    --to make this fast, we need a cache

    --TLB: translation lookaside buffer

	hardware that stores [virtual address --> physical address]; the
	reason that all of this page table walking does not slow down
	the process too much
	
	--hardware managed?

	--software managed? (MIPS. OS's job is to load the TLB when
	the OS receives a "TLB miss". Not the same thing as a page
	fault.)

    --what happens to the TLB when %cr3 is loaded? [answer: flushed]

    --can we flush individual entries in the TLB otherwise? 
	INVLPG addr

    --how does stuff get in the TLB?

	--answer: hardware populates it

    --Questions:

	--Does TLB miss imply page fault?

	--Does the existence of a page fault imply that there was a TLB
	miss?

4. Page faults

    A. Page faults: mechanics

    --what happens if the address isn't in the page table or there is
    a protection violation? [page fault!]

    --NOTE: TLB MISS != PAGE FAULT

	--not all TLB misses generate page faults, and not all page
	faults began with TLB misses (on a store instruction, when might
	an appropriate entry be in the TLB but there is a still a page
	fault? what about on a load instruction?)

    --what happens on the x86?

	[see handout]

	--kernel executes a trap frame:

		ss
		esp    [former value of stack pointer]
		eflags [former value of eflags]
		cs 
      %esp-->	eip    [instruction that caused the trap]
		[error code] 

	%eip is now executing code to handle the trap
	    [how did processor know what to load into %eip?]

	error code:
	    [ ................................ U/S | W/R | P]
		     unused

	    U/S: user mode fault / supervisor mode fault
	    R/W: access was read / access was write
	    P: not-present page / protection violation

	on a page fault, %cr2 holds the faulting linear address

	idea is that when page fault happens, the kernel sets up the
	process's page entries properly, or kills the process

    B. Page faults: uses

    --exhibit A for the use of paging is virtual memory:

	--your program thinks it has, say, 512 MB of memory, but your
	hardware has only 4 MB of memory

	--the way that this worked is that the disk was (is) used to
	store memory pages

	--advantage: address space looks huge

	--disadvantage: accesses to "paged" memory (as disk pages that
	live on the disk are known) are sllooooowwwww:

	--the implementation of this is described in Tanenbaum 3.6.
	Roughly:

	    --on a page fault, the kernel reads in the faulting page

	    --QUESTION: what is listed in the page structures? how does
	    kernel know whether the address is invalid, in memory,
	    paged, what?

	    --called demand paging, and it's one way to get program code
	    into memory "lazily"
    
	    --kernel may need to send a page to disk (under what
	    conditions? answer: two conditions must hold for kernel to
	    HAVE to write to disk)

		(1) kernel is out of memory
		(2) the page that it selects to write out is dirty

	--Many 32-bit machines have 4GB of memory, so less common to
	hear the sound of swapping these days. You either need 36-bit
	addressing and memory hogs, or multiple large memory consumers
	running on the same computer

    --many, many other uses for page faults and virtual memory

	--high-level idea: by giving kernel (or even user-level program)
	the opportunity to do interesting things on page faults, you can
	build interesting functionality:

	    --store memory pages across the network! (Distributed Shared
	    Memory)

		--basic idea was that on a page fault, the page fault
		handler went and retrieved the needed page from some
		other machine

	    --copy-on-write

		--when creating a copy of another process, don't copy
		its memory. just copy its page tables, mark the pages as
		read-only

		--QUESTION: do you need to mark the parent's pages
		as read-only as well? 

		--program semantics aren't violated when programs do
		reads

		--when a write happens, a page fault results. at that
		point, the kernel allocates a new page, copies the 
		memory over, and restarts the user program to do a write

		    --then, only do copies of memory when there is a
		    fault as a result of a write

		--this idea is all over the place

	    --accounting

		--good way to sample what percentage of the memory pages
		are written to in any time slice: mark a fraction of
		them not present, see how often you get faults

	    --if you are interested in this, check out the paper
	    "Virtual Memory Primitives for User Programs", by Andrew W.
	    Appel and Kai Li, Proc. ASPLOS, 1991.