Class 6 CS 372H 4 Feburary 2010 On the board ------------ 1. Last time (x86 paging, JOS memory map) 2. Other page structures 3. Page faults and their uses 4. Page faults and their costs --------------------------------------------------------------------------- 1. last time key idea in virtual memory: insert an entry in the page tables, and then the program can reference the address this is a powerful thing. it amounts to the ability to manufacture (and remove) opaque handles on the fly, just by inserting and removing entries in the mapping the program itself can make such requests implicitly (as it page faults) or explicitly (via mmap, which can be told to fail if it can't create a particular entry in virtual space). the OS can certainly create such abstractions for the program example: if we want a program to be able to use address 0x00402000 to refer to physical address 0x0a37 0000 but in a read-only way, we conceptually insert the entry <0x00402000, 0x0a370000> we implement that mapping like this: ............ <20 bits> <12 bits> ............ | a370 | W=0 | [entry 2] | | | [entry 1] PGTABLE..... [entry 1] ----> |________|_________| [entry 0] ........ (without the two-level page table but with 4KB pages and a 4GB address space, every process would need 4MB of contiguous physical memory to implement its page table) something it is probably worth internalizing: one of the things that a second-level page table is doing is to take a bunch of disparate physical pages, perhaps scattered throughout the disk, and logically glue them together, making them appear as a contiguous 4MB region in virtual space (just as the entire page structure glues disparate physical pages into a 4GB "region"). if the second-level page table that is chosen for this gluing is the page directory itself, then the disparate physical pages that all appear as a contiguous 4MB region wind up being the page tables themselves more detail is below..... implementation trick in JOS: want the page tables to look linear and to appear at addresses {0xef400000, 0xefc00000} = {UVPT, VPT}. so insert pointers at entries 957 and 959 back to the page directory itself the result is that the page tables *themselves* show up in the program's virtual address space at, say, [UVPT,UVPT+4MB) and [VPT,VPT+4MB) result: the picture of [UVPT,UVPT+4MB) in virtual space is: UVPT+4MB __________________ PGTABLE 1023 __________________ . . . __________________ PGTABLE 2 __________________ PGTABLE 1 ___________________ PGTABLE 0 UVPT ___________________ further detail on the JOS implementation trick: this works because the page directory has the same structure as a page table and because the CPU just "follows arrows", namely: (1) From the relevant entry in the pgdir [which entry, recall, covers 4MB worth of VA space] to the physical page number where the relevant page table lives (2) From the physical page number where the relevant page table lives, more specifically the relevant entry in the relevant page table (which is relevant to 4KB of address space), to the physical page number that is the target of the mapping. now, if you "trick" the CPU into following the first arrow back to the pgdir itself, and the program references an address 0xef400000+x, where x < 4MB, then the logic goes like this (compare the exact words below to the exact words of the numbered items above): (1) From the relevant entry in the pgdir [which entry, recall, is covering the 4MB worth of VA space from [0xef40000,0xef800000)] to the physical page number where the page directory lives (2) From the physical page number where the page directory lives, more specifically the relevant entry in the page directory (which now is relevant to only 4KB of address space), to the physical page number that is the target of the <0xef40000+x,PA> mapping. that physical page holds a second-level page table! result: the second-level page table appears at 0xef40000+x 2. Other page structures A. Very large page sizes (e.g., 4 MB) --advantage: small page tables --disadvantage: lots of wasted memory --PSE (set bit 7 in PDE and get 4MB pages, no PTs) --**there is trade-off between large page sizes and small page sizes**. what is the nature of the trade-off? --large page sizes means wasting actual memory --small page sizes means lots of page table entries (which may or may not get consumed) --Tanenbaum gives an equation (section 3.5.3): overhead = se/p + p/2 d (ovhd)/dp = -se/p^2 + 1/2 finds its min. at p = sqrt(2se) B. Many levels of page table --advantage: not much memory spent on page tables if address space is sparse --disadvantage: lots of page table walking C. What happens when memory gets huge? --many levels of page table; or --inverted page table --works as a hash table --stores entries [[--NOTE: the book and other references say that this thing has to have the same number of entries as the number of physical pages in the machine, but that is bogus. That number is neither a useful minimum nor a useful maximum. It is not a useful minimum because the table has to deal with collisions from the fact that a potentially very large number of VPNs are mapping to a much smaller number of PPNs (e.g., mapping the same PPN at different places in the address space), so the table needs to be able to live with a number of entries greater than the number of physical frames (i.e., it must handle being oversubscribed). Hence, it could presumably have a smaller number of entries than the number of physical frames (which is just another kind of oversubscription). It is not a useful maximum because in general when one is using hash tables, one wants the hash table to be a little bit larger than the number of entries that one is storing; adding even a little bit of "wiggle room" in the form of blank entries tends to reduce collisions a lot. (See Knuth, chapter 6.4.) So it's not at all clear how big the inverted page table should be, except that the whole point is to be smaller than a traditional page table. Thus, one presumably wants it to be O(number of physical pages).]] 3. Page faults --what happens if the address isn't in the page table or there is a protection violation? [page fault!] --NOTE: TLB MISS != PAGE FAULT --not all TLB misses generate page faults, and not all page faults began with TLB misses --what happens on the x86? [see handout from last time] --kernel executes a trap frame: ss esp [former value of stack pointer] eflags [former value of eflags] cs %esp--> eip [instruction that caused the trap] [error code] %eip is now executing code to handle the trap [how did processor know what to load into %eip?] error code: [ ................................ U/S | W/R | P] unused U/S: user mode fault / supervisor mode fault R/W: access was read / access was write P: not-present page / protection violation on a page fault, %cr2 holds the faulting linear address idea is that when page fault happens, the kernel sets up those maps properly, or kills the process --exhibit A for the use of paging is virtual memory: --your program thinks it has, say, 512 MB of memory, but your hardware has only 4 MB of memory --the way that this worked is that the disk was (is) used to store memory pages --advantage: address space looks huge --disadvantage: accesses to "paged" memory (as disk pages that live on the disk are known) are sllooooowwwww: --the implementation of this is described in Tanenbaum 3.6. Roughly: --on a page fault, the kernel reads in the faulting page --QUESTION: what is listed in the page structures? how does kernel know whether the address is invalid, in memory, paged, what? --called demand paging, and it's one way to get program code into memory "lazily" --kernel may need to send a page to disk (under what conditions? answer: two conditions must hold for kernel to have to write to disk) (1) kernel is out of memory (2) the page that it selects to write out is dirty --Many 32-bit machines have 4GB of memory, so less common to hear the sound of swapping these days. You either need 36-bit addressing and memory hogs, or multiple large memory consumers running on the same computer --many, many other uses for page faults and virtual memory --high-level idea: by giving kernel (or even user-level program) the opportunity to do interesting things on page faults, you can build interesting functionality: --store memory pages across the network! (Distributed Shared Memory) --basic idea was that on a page fault, the page fault handler went and retrieved the needed page from some other machine --copy-on-write --when creating a copy of another process, don't copy its memory. just copy its page tables, mark the pages as read-only --QUESTION: do you need to mark the parent's pages as read-only as well? --program semantics aren't violated when programs do reads --when a write happens, a page fault results. at that point, the kernel allocates a new page, copies the memory over, and restarts the user program to do a write --then, only do copies of memory when there is a fault as a result of a write --this idea is all over the place --accounting --good way to sample what percentage of the memory pages are written to in any time slice: mark a fraction of them not present, see how often you get faults --if you are interested in this, check out the paper "Virtual Memory Primitives for User Programs", by Andrew W. Appel and Kai Li, Proc. ASPLOS, 1991. --Paging in day-to-day use --Demand paging --Growing the stack --BSS page allocation --Shared text --Shared libraries --Shared memory --Copy-on-write (fork, mmap, etc.) --Okay, but in the case of demand paging, which pages do we bring to and from the disk? --------------------------------------------------------------------------- admin --homeworks posted --lab 2 is to be done individually --lab 3 released --pair programming option: deadline this Friday night --once you have decided, you may not switch tracks (i.e., you may not go from pair to individual or from individual to pair). doing so constitutes cheating. --no class this Tuesday --will be recording and assigning lecture --------------------------------------------------------------------------- 4. Page faults and their costs --What does demand paging (i.e., paging from the disk) cost? --let's look at average memory access time (AMAT) --AMAT = (1-p)*memory access time + p * page fault time, where p is the prob. of a page fault. memory access time ~ 100ns disk access time ~ 10 ms = 10^7 ns --QUESTION: what does p need to be to ensure that paging hurts performance by less than 10%? 1.1*t_M = (1-p)*t_M + p*t_D p = .1*t_M / (t_D - t_M) ~ 10^1 / 10^7 = 10^{-6} so only one access out of 1,000,000 can be a page fault!! --basically, page faults are super-expensive (good thing the machine can do other things during a page fault) --Thrashing is even worse Memory overcommitted -- pages tossed out while still needed Example: --one program touches 50 pages (each equally likely); only have 40 physical page frames --If have enough pages, 100ns/ref --If have too few pages, assume every 5th reference leads to a page fault --4refs x 100ns and 1 page fault x 10ms for disk I/O --this gets us 5 refs per (10ms + 400ns) = 2ms/ref = 20,000x slowdown!!! --What we wanted: virtual memory the size of disk with access time the speed of physical memory --What we have here: memory with access time of disk Concept is much larger than OSes: need to pay attention to the slow case if it's really slow and common enough to matter.