Class 6 CS 372H 3 February 2011 On the board ------------ (One handout) 1. Continuing with paging and JOS memory map 2. Other page structures 3. Page faults --mechanics --uses --costs 4KB = 2^{12} = 0x00001000 4MB = 2^{22} = 0x00400000 --------------------------------------------------------------------------- 1. Continuing with paging and JOS memory map --two classes ago: virtual memory on the x86 is implemented first via a segment translation and second by a paging translation --last class we discussed: --how the processor actually maps from virtual page to physical page when a process does a load or store, which in turn determined: --what data structures the operating system must set up for the processor --here are some points to reinforce the ideas: --KEY IDEA IN VIRTUAL MEMORY: **OS inserts appropriate entries in the page directory and page table, and then the program, or the OS itself, can reference the address** --the above is a powerful thing. it amounts to the ability to manufacture (and remove) opaque handles on the fly, just by inserting and removing entries in the mapping. --the program itself can make such requests implicitly (as it page faults) or explicitly (via mmap, which can be told to fail if it can't create a particular entry in virtual space). --the OS can certainly proactively set up the virtual address space for the program --example #1 of the KEY IDEA: if OS wants a program to be able to use address 0x0040 2000 to refer to physical address 0x0a37 0000 but in a read-only way, the OS, conceptually speaking, creates an entry <0x00402000, 0x0a370000> That mapping is implemented like this: PGDIR ............ <20 bits> <12 bits> ............ | a370 | W=0 | [entry 2] | | | [entry 1] .....[entry 1] ----> |________|_________| [entry 0] ........ --example #2 of the KEY IDEA: recall that JOS itself maps physical memory at the top of the virtual address space (ask yourself how this works) --see handout: everything above KERNBASE --conclude: any physical memory that is in use is actually mapped in multiple places --why does the kernel do this? because the kernel needs to be able to get access to physical memory when setting up page tables: *kernel* has to be able to use physical addresses from time to time. common use: setting up page directories and page tables --example #3 of the KEY IDEA: VPT and UVPT. these are virtual addresses where the entire page structure appears to the OS (and, in the case of UVPT, to user-level processes). --see handout: VPT and UVPT --see notes from last time about how this is implemented --if you truly understand how and why this implementation trick works -- and what it's accomplishing -- then you understand the important pieces of virtual memory. --------------------------------------------------------------------------- Admin notes --Did everyone get email about the course mailing list? --Tomorrow is the deadline for letting us know that you want to code in a pair --Pair method: we're serious about it --Good JOS advice from someone who took the class last year: "convince yourself that your code is working correctly rather than relying on passing the test cases". --------------------------------------------------------------------------- 2. Other page structures A. Very large page sizes (e.g., 4 MB) --advantage: small page tables --disadvantage: lots of wasted memory --PSE (set bit 7 in PDE and get 4MB pages, no PTs) --**there is trade-off between large page sizes and small page sizes**. what is the nature of the trade-off? --large page sizes means wasting actual memory --small page sizes means lots of page table entries (which may or may not get consumed) --Tanenbaum gives an equation (section 3.5.3): s = size of virtual space used e = size of entry p = page size overhead = se/p + p/2 d (ovhd)/dp = -se/p^2 + 1/2 finds its min. at p = sqrt(2se) B. Many levels of page table --advantage: not much memory spent on page tables if address space is sparse --disadvantage: lots of page table walking C. What happens when memory gets huge? --many levels of page table; or --inverted page table --works as a hash table --stores entries [[--NOTE: the book and other references say that this thing has to have the same number of entries as the number of physical pages in the machine, but that is bogus. That number is neither a useful minimum nor a useful maximum. It is not a useful minimum because the table has to deal with collisions from the fact that a potentially very large number of VPNs are mapping to a much smaller number of PPNs (e.g., mapping the same PPN at different places in the address space), so the table needs to be able to live with a number of entries greater than the number of physical frames (i.e., it must handle being oversubscribed). Hence, it could presumably have a smaller number of entries than the number of physical frames (which is just another kind of oversubscription). It is not a useful maximum because in general when one is using hash tables, one wants the hash table to be a little bit larger than the number of entries that one is storing; adding even a little bit of "wiggle room" in the form of blank entries tends to reduce collisions a lot. (See Knuth, chapter 6.4.) So it's not at all clear how big the inverted page table should be, except that the whole point is to be smaller than a traditional page table. Thus, one presumably wants it to be O(number of physical pages), with a small constant.]] 3A. Page faults: mechanics --what happens if the address isn't in the page table or there is a protection violation? [page fault!] --NOTE: TLB MISS != PAGE FAULT --not all TLB misses generate page faults, and not all page faults began with TLB misses (on a store instruction, when might an appropriate entry be in the TLB but there is a still a page fault? what about on a load instruction?) --what happens on the x86? [see handout] --kernel executes a trap frame: ss esp [former value of stack pointer] eflags [former value of eflags] cs %esp--> eip [instruction that caused the trap] [error code] %eip is now executing code to handle the trap [how did processor know what to load into %eip?] error code: [ ................................ U/S | W/R | P] unused U/S: user mode fault / supervisor mode fault R/W: access was read / access was write P: not-present page / protection violation on a page fault, %cr2 holds the faulting linear address idea is that when page fault happens, the kernel sets up the process's page entries properly, or kills the process 3B. Page faults: uses --exhibit A for the use of paging is virtual memory: --your program thinks it has, say, 512 MB of memory, but your hardware has only 4 MB of memory --the way that this worked is that the disk was (is) used to store memory pages --advantage: address space looks huge --disadvantage: accesses to "paged" memory (as disk pages that live on the disk are known) are sllooooowwwww: --the implementation of this is described in Tanenbaum 3.6. Roughly: --on a page fault, the kernel reads in the faulting page --QUESTION: what is listed in the page structures? how does kernel know whether the address is invalid, in memory, paged, what? --called demand paging, and it's one way to get program code into memory "lazily" --kernel may need to send a page to disk (under what conditions? answer: two conditions must hold for kernel to HAVE to write to disk) (1) kernel is out of memory (2) the page that it selects to write out is dirty --Many 32-bit machines have 4GB of memory, so less common to hear the sound of swapping these days. You either need 36-bit addressing and memory hogs, or multiple large memory consumers running on the same computer --many, many other uses for page faults and virtual memory --high-level idea: by giving kernel (or even user-level program) the opportunity to do interesting things on page faults, you can build interesting functionality: --store memory pages across the network! (Distributed Shared Memory) --basic idea was that on a page fault, the page fault handler went and retrieved the needed page from some other machine --copy-on-write --when creating a copy of another process, don't copy its memory. just copy its page tables, mark the pages as read-only --QUESTION: do you need to mark the parent's pages as read-only as well? --program semantics aren't violated when programs do reads --when a write happens, a page fault results. at that point, the kernel allocates a new page, copies the memory over, and restarts the user program to do a write --then, only do copies of memory when there is a fault as a result of a write --this idea is all over the place --accounting --good way to sample what percentage of the memory pages are written to in any time slice: mark a fraction of them not present, see how often you get faults --if you are interested in this, check out the paper "Virtual Memory Primitives for User Programs", by Andrew W. Appel and Kai Li, Proc. ASPLOS, 1991. --Paging in day-to-day use --Demand paging --Growing the stack --BSS page allocation --Shared text --Shared libraries --Shared memory --Copy-on-write (fork, mmap, etc.) 3C. Page faults: costs --What does demand paging (i.e., paging from the disk) cost? --let's look at average memory access time (AMAT) --AMAT = (1-p)*memory access time + p * page fault time, where p is the prob. of a page fault. memory access time ~ 100ns disk access time ~ 10 ms = 10^7 ns --QUESTION: what does p need to be to ensure that paging hurts performance by less than 10%? 1.1*t_M = (1-p)*t_M + p*t_D p = .1*t_M / (t_D - t_M) ~ 10^1 / 10^7 = 10^{-6} so only one access out of 1,000,000 can be a page fault!! --basically, page faults are super-expensive (good thing the machine can do other things during a page fault) --Thrashing is even worse Memory overcommitted -- pages tossed out while still needed Example: --one program touches 50 pages (each equally likely); only have 40 physical page frames --If have enough pages, 100ns/ref --If have too few pages, assume every 5th reference leads to a page fault --4refs x 100ns and 1 page fault x 10ms for disk I/O --this gets us 5 refs per (10ms + 400ns) = 2ms/ref = 20,000x slowdown!!! --What we wanted: virtual memory the size of disk with access time the speed of physical memory --What we have here: memory with access time roughly of disk (2 ms/mem_ref compare to 10 ms/disk_access) Concept is much larger than OSes: need to pay attention to the slow case if it's really slow and common enough to matter.