Class 5 CS 372H 2 Feburary 2010 On the board ------------ 1. Last time (x86, segmentation) 2. Paging in general 3. Paging on the x86 --page tables --TLB --memory map in JOS potentially useful reference: 4KB = 2^{12} = 0x00001000 = 0x00000fff + 1 4MB = 2^{22} = 0x00400000 = 0x003fffff + 1 256 MB = 2^{28} = 0x01000000 = 0x0fffffff + 1 4GB = 2^{32} =0x100000000 = 0xffffffff (+1) = ~0x00000000 (0xef400000 >> 22) = 0x3bd = 957 (0xef800000 >> 22) = 0x3be = 958 (0xefc00000 >> 22) = 0x3bf = 959 (0xf0000000 >> 22) = 0x3c0 = 960 --------------------------------------------------------------------------- 1. Last time --reality v abstraction phys memory virtual memory no protection each program isolated limited mem size infinite memory sharing of phys frames everyone thinks they're loaded at addr "0" easy to share code/data ability to share code/data --segmentation --basic idea: GDT/LDT live in memory --processor told about where it lives via --LLDT, LGDT, SLDT, SGDT --every instruction comes with a segment selector --the segment selector chooses "base", "limit", protection, type --adds "base" to reference --reference better be less than limit --example: when JOS begins, the base is -0xf000 0000. that way, the kernel's access to, say, 0xf0010000 gets translated to physical memory access at, say, 0x00010000. --later, paging will be used to ensure that kernel's access to, say, 0xf0010000 gets translated to physical memory access at, say, 0x00010000 2. Paging in general --Basic idea: all of memory (physical and virtual) gets broken up into chunks called **PAGES**. those chunks have size = **PAGE SIZE** --we will be working almost exclusively with PAGES of PAGE SIZE = 4096 B = 4KB = 2^{12} --how many pages are there on a 32-bit architecture? --2^{32} bytes / (2^{12} bytes/page) = 2^{20} pages --it is proper and fitting to talk about pages having **NUMBERS**. --page 0: [0,4095] --page 1: [4096, 8191] --page 2: [8192, 12277] --page 3: [12777, 16384] ..... --page 2^{20}-1 [ ......, 2^{32} - 1] --unfortunately, it is also proper and fitting to talk about _both_ virtual and physical pages having numbers. --sometimes we will try to be clear with terms like: vpn ppn --why isn't segmentation enough? segmentation can be a bummer when a segment grows or shrinks paging much more flexible: instead of mapping a large range onto a large range, we are going to independently control the mapping for every 4 KB. turns out to be highly useful to have a layer of indirection in memory mappings. we'll see more of this later, but for now, the idea is that because the mapping is maintained by the OS, whenever there is a fault, the OS can arrange to do stuff. copy on write, shared memory, demand paging, many more still segments have uses easy to share: just use the same segment registers 3. Paging on the x86 [rest of class, assume segmentation implements identity mapping. but you may get exam questions on segmentation.] A. page mapping --4KB pages and 4GB address space so 2^{20} pages --top bits of VA selects the PPN --bottom bits indicate where in the page the memory reference is happening. sometimes called offset. --QUESTION: if our pages are of size 4KB = 2^{12}, then how many bottom bits are we talking about, and how many top bits are used for the layer of indirection? [answer: top 20 bits are doing the indirection. bottom 12 bits just figure out where on the page the access should take place.] --conceptual model: there is in the sky a 2^{20} sized array that maps the linear address to a *physical* page table[20-bit linear page number] = 20-bit physical page # so now all we have to do is create this mapping why is this hard? why not just create the mapping? --answer: then you need, per process, roughly 4MB (2^{20} entries * 32 bits per entry). so here's an idea: --break the 4MB table up into 4096 byte chunks, and reference those chunks in another table. --so how many entries does that other table need? --1024 --so how big is that other table? --4096 bytes! --so basically every data structure is going to be 4096 bytes here's how it works in the standard configuration on the x86, but there are others two-level mapping structure....... [refer to handout as we go through this example....] --%cr3 is the address of the page directory. --top 10 bits select an entry in the page directory, which picks a **page table** --next 10 bits select the entry in the page table, which is a physical page number --so there are 1024 entries in page directory --how big is entry in page directory? 4 bytes --entry in page directory and page table: [ base address | bunch of bits | U/S R/W P ] 31..............12 why 20 bits? [answer: there are 2^20 4KB pages in the system] --EXAMPLE JOS maps 0xf0000000 to 0x00000000 0xf0001000 to 0x00001000 WHAT DOES THIS LOOK LIKE? [ pgdir with entry 960 pointing to page table. [put the physical page table at PPN 3.] page table has PPN(0th entry) = to 0 page table has PPN(1st entry) = to 1 ] --EXAMPLE what if JOS wanted 0xf0001000 to 0x91210000 [no problem] point of this example: the mapping from VA to PA can be all over the place --ALWAYS REMEMBER --each entry in the page *directory* corresponds to 4MB of virtual address space --each entry in the page *table* corresponds to 4KB of virtual address space --so how much virtual memory is each page *table* responsible for translating? 4KB? 4MB? something else? --each page directory and each page table itself consumes 4KB of memory, i.e., each one of these fits on a page VA: 32 bits pg dir table offset 31 ....... 22 21 ...... 12 11 ....... 0 31 ................................... 0 --go back to entry in page directory and page table: [ base address | bunch of bits | U/S R/W P ] 31..............12 bunch includes dirty acccessed cache disabled write through --is that base address a physical address, a linear address, a virtual address, what? [answer: it is a physical address. hardware needs to be able to follow the page table structure.] --what do these U/S and R/W bits do? --are these for the kernel, the hardware, what? --who is setting them? what is the point? --what happens if U/S and R/W differ in pgdir and table? [processor does something deterministic; look up in references] --can user modify page tables? they are in memory....... --but how can the user see them? --the page tables themselves can be mapped into the user's address space! --we will see this in the case of JOS below ------------------------------------------------------------------ putting it all together.... here is how the x86's MMU translates a linear address to a physical address: [not discussing in class but make sure you perfectly understand what is written below.] uint translate (uint la, bool user, bool write) { uint pde; pde = read_mem (%CR3 + 4*(la >> 22)); access (pde, user, write); pte = read_mem ( (pde & 0xfffff000) + 4*((la >> 12) & 0x3ff)); access (pte, user, write); return (pte & 0xfffff000) + (la & 0xfff); } // check protection. pxe is a pte or pde. // user is true if CPL==3. // write is true if the attempted access was a write. // PG_P, PG_U, PG_W refer to the bits in the entry above void access (uint pxe, bool user, bool write) { if (!(pxe & PG_P) => page fault -- page not present if (!(pxe & PG_U) && user) => page fault -- not access for user if (write && !(pxe & PG_W)) { if (user) => page fault -- not writable if (%CR0 & CR0_WP) => page fault -- not writable } } -------------------------------------------------------------------- B. TLBS --so it looks like the CPU (specifically its MMU) has to go out to memory on every memory reference? --called "walking the page tables" --to make this fast, we need a cache --TLB: translation lookaside buffer hardware that stores virtual address --> physical address; the reason that all of this page table walking does not slow down the process too much --hardware managed? --software managed? (MIPS. OS's job is to load the TLB when the OS receives a "TLB miss". Not the same thing as a page fault.) --what happens to the TLB when %cr3 is loaded? [answer: flushed] --can we flush individual entries in the TLB otherwise? INVLPG addr --how does stuff get in the TLB? --answer: hardware populates it C. memory in JOS --segments only used to switch privilege level into and out of kernel --paging structures the address space --paging limits process memory access to its own address space --see handout for JOS virtual memory map --why are kernel and current process both mapped into address space? --convenient for kernel --why is all of physical memory mapped at the top? that must mean that there are physical memory pages that are mapped in multiple places.... --need to be able to get access to physical memory when setting up page tables: *kernel* has to be able to use physical addresses from time to time --what the heck are VPT and UVPT? ...... --remember how we wanted a contiguous set of entries? --wouldn't it be awesome if the 4MB worth of page table appeared inside the virtual address space, at address, say, 0xefc00000? --to do that, we sneakily insert a pointer in the pgdir back to the pgdir itself --the result is that virtual address space looks like this: [0xef400000,0xef800000) --> looks like one contiguous page table, visible to users. read only. rock! [0xefc00000,0xf0000000) --> looks like one contiguous page table, only visible to kernel. awesome! 1023 | | ..... ........ 960 | | 959 | self... | 958 | <....> | 957 | self.. U | .... 0 | ........ not present| --QUESTION: * where does the pgdir itself live in the virtual address space? [hint: now it lives in several places. not all of the ones below are correct.] --0xef400000 ? --0xef400000 + 4KB ? --0xef400000 + 4KB * 957 ? --0xef400000 + 4KB * 959 ? --0xefc00000 ? --0xefc00000 + 4KB * 957 ? --so user processes can also see their own page tables, but we will set the R/W to 0 so that they cannot modify them --but the kernel maps another copy where it can work on them even when there's no user process running