Class 17 CS 202 7 April 2015 On the board ------------ 1. Last time 2. Finish directories 3. FFS 4. mmap 5. Midterm review --------------------------------------------------------------------------- 1. Last time --indexed files --directories --a directory *is* a file. its data is simply a table that maps name to inode 2. Finish directories --special names: "/", ".", ".." --given those names, we need only two operations to navigate the entire name space: --"cd name": (change context to directory "name") --"ls": (list all names in current directory) --example: [DRAW PICTURE FROM LAST TIME] --links: --hard link: multiple dir entries point to same inode; inode contains refcount "ln a b": creates a synonym ("b") for file ("a") --how do we avoid cycles in the graph? (answer: can't hard link to directories) --soft link: synonym for a *name* "ln -s /d/a b": --creates a new inode, not just a new directory entry --new inode has "sym link" bit set --contents of that new file: "/d/a" 3. File systems : performance Case study: FFS --Unix FS was simple, elegant and ... slow --blocks too small --file index (inode) too large --too many layers of mapping indirection --transfer rate low (they were getting one block at a time) --poor clustering of related objects --consecutive file blocks not close together --Inodes far from data blocks --Inodes for a given directory not close together --result: poor enumeration performance, meaning things like: "ls" and "grep foo *.c" were slowwwww --other problems: --14 character names were the limit --can't atomically update file in crash-proof way --FFS (fast file system) fixes these problems to a degree. [Reference: "M. K. McKusik, W. N. Joy, S. J. Leffler, and R. S. Fabry. A Fast File System for UNIX. ACM Trans. on Computer Systems, Vol. 2, No. 3, Aug. 1984, pp. 181-197.] what can we do to above? [ask for suggestions] * make block size bigger (4 KB, 8KB, or 16 KB) * cluster related objects "cylinder groups" (one or more consecutive cylinders) [superblock | bookkeeping info | inodes | bitmap | data blocks (512 bytes each) ] --try to put inodes and data blocks in the same cylinder group --try to put all inodes of files in the same directory in the same cylinder group --new directories placed in cylinder group with greater than average number of free inodes --as files are allocated, use a heuristic: spill to next cylinder group after 48 KB of file (which would be the point at which an indirect block would be required, assuming 4096-byte blocks) and at every megabyte thereafter. * bitmaps (to track free blocks) --Easier to find contiguous blocks --Can keep the entire thing in memory --500 GB disk / 4KB disk blocks = 125,000,000 entries = 15MB. not outrageous these days. * reserve space --but don't tell users. (df makes full disk look 110% full) * total performance --20-40% of disk bandwidth for large files --10-20x of original Unix file system! --still not the best we can do (meta-data writes happen synchronously, which really hurts performance. but making asynchronous requires story for crash recovery.) Others: --Most obvious: big file cache --kernel maintains a *buffer cache* in memory --internally, all uses of ReadDisk(blockNum, readbuf) replaced with: ReadDiskCache(blockNum, readbuf) { ptr = buffercache.get(blockNum); if (ptr) { copy BLKSIZE bytes from ptr to readbuf } else { newBuf = malloc(BLKSIZE); ReadDisk(blockNum, newBuf); buffercache.insert(blockNum, newBuf); copy BLKSIZE bytes from newBuf to readbuf } --no rotation delay if you're reading the whole track. --so try to read the whole track --more generally, try to work with big chunks (lots of disk blocks) --write in big chunks --read ahead in big chunks (64 KB) --why not just read/write 1 MB at a time? --(for writes: may not get data to disk often enough) --(for reads: may waste read bandwidth) 4. mmap --recall some syscalls: fd = open(pathname, mode) write(fd, buf, sz) read(fd, buf, sz) --we've seen fds before, but what is it? --indexes into a table maintained by the kernel on behalf of the process --what's in the given entry in the table? --inumber! --inode, probably! --and per-open-file data (file position, etc.) --syscall: void* mmap(void* addr, size_t len, int prot, int flags, int fd, off_t offset); --means, roughly, "map the specified open file (fd) into a region of my virtual memory (at addr, or at a kernel-selected place if addr is 0), and return a pointer to it" --after this, loads and stores to addr[x] are equivalent to reading and writing to the file at offset+x. --how's this implemented?! (answer: through virtual memory, with the VA being addr [or whatever the kernel selects] and the PA being what? answer: the physical address storing the given page in the kernel's buffer cache). --have to deal with eviction from buffer cache, so kernel will need a data structure that maps from: Phys page --> {list of (proc,va) pairs} note that the kernel needs this data structure anyway: when a page is evicted from RAM, the kernel needs to be able to invalidate the given virtual address in the page table(s) of the process(es) that have the page mapped. --------------------------------------------------------------------------- midterm review Ground rules: same as last time Material: everything we've covered: readings, labs, homeworks, lectures only exception, as usual, is if there was new material covered in review sessions (but there shouldn't have been) lecture topics I/O --architecture --how CPUs and devices interact --mechanics (explicit I/O instructions, mem-mapped I/O, interrupts, memory) --polling vs. interrupts --DMA vs. programmed I/O --device drivers async vs sync I/O (non-blocking vs blocking I/O) virtual memory segmentation paging segmentation vs paging virtual memory on the x86 virtual address: [10bits 10bits 12bits] --entry in pgdir and page table: [20 bits more bits bottom 3 bits] --protection (user/kernel | read/write | present/not) what's a TLB? page faults mechanics costs uses page replacement policies (FIFO, LRU, CLOCK, OPT) thrashing [latter two topics more general than paging: applies to caching in general] alternatives to preemptively scheduled shared memory threads cooperatively scheduled threads, implemented in user space event-driven programming disks geometry performance interface scheduling (skipped in lecture; see book) technology trends file systems --basic objects: files, directories, meta-data, links, inodes --how does naming work? what allows system to map /usr/homes/bob/index.html to a file object? --types of file layout --extents/contiguous, linked, FAT, indexed structure --classic Unix and FFS are variants of indexed structure --analogy between inode and page directory --tradeoffs --performance questions from all of you ...