Class 20 CS 202 12 April 2021 On the board ------------ 1. Last time 2. Finish Directories 3. Performance 4. Crash recovery --intro --------------------------------------------------------------------------- 1. Last time Intro to file systems Files Implementation of files 2. Finish directories --example: /a/foo.c /b/c/essay.txt what does the file system look like? [ i0 ... i7 || [block ] [ block ] [block ] .....] [Draw picture] --links: --hard link: multiple dir entries point to same inode; inode contains refcount "ln a b": creates a synonym ("b") for file ("a") --how do we avoid cycles in the graph? (answer: can't hard link to directories). --soft link: synonym for a *name* "ln -s /d/a b": --creates a new inode, not just a new directory entry --new inode has "sym link" bit set --contents of that new file: "/d/a" 3. Performance Case study: FFS --Unix FS was simple, elegant and ... slow --blocks too small --inode array all at the beginning of the disk --inode had: --too many layers of mapping indirection --transfer rate low (they were getting one block at a time) --free blocks were stored in a linked list on the disk --poor clustering of related objects --consecutive file blocks not close together --Inodes far from data blocks --Inodes for a given directory not close together --result: poor enumeration performance, meaning things like: "ls" and "grep foo *.c" were slowwwww --other problems: --14 character names were the limit --can't atomically update file in crash-proof way --FFS (fast file system) fixes these problems to a degree. [Reference: "M. K. McKusik, W. N. Joy, S. J. Leffler, and R. S. Fabry. A Fast File System for UNIX. ACM Trans. on Computer Systems, Vol. 2, No. 3, Aug. 1984, pp. 181-197.] what can we do to above? [ask for suggestions] * make block size bigger (4 KB, 8KB, or 16 KB) * cluster related objects "cylinder groups" (one or more consecutive cylinders) [superblock | bookkeeping info | inodes | bitmap | data blocks (512 bytes each) ] [note: it's 512 above, not 4KB, because the file system doesn't exactly _insist_ that data blocks are larger. there can be _fragments_. this is a way to group larger writes to together but to not waste space if file size isn't a multiple of 4KB.] --try to put inodes and data blocks in the same cylinder group --try to put all inodes of files in the same directory in the same cylinder group --new directories placed in cylinder group with greater than average number of free inodes --as files are allocated, use a heuristic: spill to next cylinder group after 40 KB of file (which would be the point at which an indirect block would be required, assuming 4096-byte blocks) and at every megabyte thereafter. * bitmaps (to track free blocks) --Easier to find contiguous blocks --Can keep the entire thing in memory --500 GB disk / 4KB disk blocks = 125,000,000 entries = 15MB. not outrageous these days. * reserve space --but don't tell users. (df makes full disk look 110% full) * atomic "rename" * symbolic links * total performance --20-40% of disk bandwidth for large files --10-20x of original Unix file system! --still not the best we can do (meta-data writes happen synchronously, which really hurts performance. but making asynchronous requires story for crash recovery.) Others: --Most obvious: big file cache --kernel maintains a *buffer cache* in memory --internally, all uses of ReadDisk(blockNum, readbuf) replaced with: ReadDiskCache(blockNum, readbuf) { ptr = buffercache.get(blockNum); if (ptr) { copy BLKSIZE bytes from ptr to readbuf } else { newBuf = malloc(BLKSIZE); ReadDisk(blockNum, newBuf); buffercache.insert(blockNum, newBuf); copy BLKSIZE bytes from newBuf to readbuf } --no rotation delay if you're reading the whole track. --so try to read the whole track --more generally, try to work with big chunks (lots of disk blocks) --write in big chunks --read ahead in big chunks (64 KB) --why not just read/write 1 MB at a time? --(for writes: may not get data to disk often enough) --(for reads: may waste read bandwidth) 4. Crash recovery --intro --ad-hoc --copy-on-write --journaling --There are a lot of data structures used to implement the file system: bitmap of free blocks, directories, inodes, indirect blocks, data blocks, etc. --We want these data structures to be *consistent*: we want invariants to hold --We also want to ensure that data on the disk remains consistent. --Thorny issue: *crashes* or power failures. --Making the problem worse is: (a) write-back caching and (b) non-ordered disk writes. (a) means the OS delays writing back modified disk blocks. (b) means that the modified disk blocks can go to the disk in an unspecified order. --Example: [DRAW PICTURE] INODE DATA BLOCK ADDED DATA BITMAP UPDATED crash. restart. uh-oh. --Solution: the system requires a notion of atomicity