Class 17
CS 202
7 April 2015

On the board
------------

1. Last time
2. Finish directories
3. FFS
4. mmap
5. Midterm review

---------------------------------------------------------------------------

1. Last time

    --indexed files

    --directories
    
        --a directory *is* a file. its data is simply a table that maps
        name to inode

2. Finish directories

    --special names: "/", ".", ".."

    --given those names, we need only two operations to navigate the
    entire name space:

	--"cd name": (change context to directory "name")
	--"ls": (list all names in current directory)


    --example:

	[DRAW PICTURE FROM LAST TIME]


    --links:

	--hard link: multiple dir entries point to same inode; inode
	contains refcount

	    "ln a b": creates a synonym ("b") for file ("a")

	    --how do we avoid cycles in the graph? (answer: can't
	    hard link to directories)

	--soft link: synonym for a *name*

	    "ln -s /d/a b": 

	    --creates a new inode, not just a new directory entry

	    --new inode has "sym link" bit set

	    --contents of that new file:

		"/d/a"

3. File systems : performance

    Case study: FFS

	--Unix FS was simple, elegant and ... slow

	    --blocks too small

	    --file index (inode) too large
		--too many layers of mapping indirection
		--transfer rate low (they were getting one block at a time)

	    --poor clustering of related objects

		--consecutive file blocks not close together

		--Inodes far from data blocks

		--Inodes for a given directory not close together

		--result: poor enumeration performance, meaning things like:
			"ls" and "grep foo *.c" were slowwwww

	    --other problems:
		--14 character names were the limit
		--can't atomically update file in crash-proof way
 
 
       --FFS (fast file system) fixes these problems to a degree.
	
	    [Reference: "M. K. McKusik, W. N. Joy, S. J. Leffler, and R.
	    S.  Fabry. A Fast File System for UNIX. ACM Trans. on
	    Computer Systems, Vol. 2, No. 3, Aug. 1984, pp. 181-197.]

      what can we do to above?

      [ask for suggestions]

      * make block size bigger (4 KB, 8KB, or 16 KB)

      * cluster related objects

	  "cylinder groups" (one or more consecutive cylinders)

	[superblock | bookkeeping info | inodes | bitmap | data blocks (512 bytes each) ]
	
	    --try to put inodes and data blocks in the same cylinder group

	    --try to put all inodes of files in the same directory in
	    the same cylinder group
	    
	    --new directories placed in cylinder group with greater than
	    average number of free inodes

	    --as files are allocated, use a heuristic: spill to next
	    cylinder group after 48 KB of file (which would be the point
	    at which an indirect block would be required, assuming
	    4096-byte blocks) and at every megabyte thereafter.
	    
      * bitmaps (to track free blocks)

	    --Easier to find contiguous blocks

	    --Can keep the entire thing in memory 

	    --500 GB disk / 4KB disk blocks = 125,000,000 entries = 15MB.
	    not outrageous these days.

      * reserve space
	   --but don't tell users. (df makes full disk look 110% full)

      * total performance

	--20-40% of disk bandwidth for large files

	--10-20x of original Unix file system!

	--still not the best we can do
	    (meta-data writes happen synchronously, which really hurts
	    performance. but making asynchronous requires story for
	    crash recovery.)

      Others:

	--Most obvious: big file cache

	    --kernel maintains a *buffer cache* in memory

	    --internally, all uses of ReadDisk(blockNum, readbuf)
	    replaced with:

		ReadDiskCache(blockNum, readbuf) {
		    ptr = buffercache.get(blockNum); 
		    if (ptr) {
			copy BLKSIZE bytes from ptr to readbuf
		    } else {
			newBuf = malloc(BLKSIZE);
			ReadDisk(blockNum, newBuf);
			buffercache.insert(blockNum, newBuf);
			copy BLKSIZE bytes from newBuf to readbuf
		    }

	--no rotation delay if you're reading the whole track.
	    --so try to read the whole track

	--more generally, try to work with big chunks (lots of disk
	blocks) 
	    --write in big chunks
	    --read ahead in big chunks (64 KB)

	--why not just read/write 1 MB at a time?
	    --(for writes: may not get data to disk often enough)
	    --(for reads: may waste read bandwidth)

4. mmap

    --recall some syscalls: 
	fd = open(pathname, mode)
	write(fd, buf, sz)
	read(fd, buf, sz)

    --we've seen fds before, but what is it?
	--indexes into a table maintained by the kernel on behalf of the
	process
	--what's in the given entry in the table?
	    --inumber!
	    --inode, probably!
	    --and per-open-file data (file position, etc.)

    --syscall:
	void* mmap(void* addr, size_t len, int prot, int flags,
		   int fd, off_t offset);


        --means, roughly, "map the specified open file (fd) into a
        region of my virtual memory (at addr, or at a kernel-selected
        place if addr is 0), and return a pointer to it"

    --after this, loads and stores to addr[x] are
    equivalent to reading and writing to the file at offset+x.

    --how's this implemented?! (answer: through virtual memory,
    with the VA being addr [or whatever the kernel selects] and
    the PA being what? answer: the physical address storing the
    given page in the kernel's buffer cache).

    --have to deal with eviction from buffer cache, so kernel will need
    a data structure that maps from:
        Phys page --> {list of (proc,va) pairs}
      
    note that the kernel needs this data structure anyway: when a
    page is evicted from RAM, the kernel needs to be able to invalidate
    the given virtual address in the page table(s) of the process(es)
    that have the page mapped.
    
---------------------------------------------------------------------------

midterm review

    Ground rules: same as last time

    Material: everything we've covered: readings, labs, homeworks,
    lectures 
    
        only exception, as usual, is if there was new material covered
        in review sessions (but there shouldn't have been)

lecture topics

    I/O 
        
        --architecture

        --how CPUs and devices interact

            --mechanics (explicit I/O instructions, mem-mapped I/O,
                interrupts, memory)
            --polling vs. interrupts
            --DMA vs. programmed I/O

        --device drivers

    async vs sync I/O (non-blocking vs blocking I/O)

    virtual memory

        segmentation

        paging

        segmentation vs paging

        virtual memory on the x86

	    virtual address: [10bits  10bits  12bits]

	    --entry in pgdir and page table:

	        [20 bits  more bits   bottom 3 bits]

	    --protection (user/kernel | read/write |  present/not)

	what's a TLB?

        page faults

            mechanics

            costs

            uses

        page replacement policies (FIFO, LRU, CLOCK, OPT)

        thrashing

        [latter two topics more general than paging: applies to caching
        in general]


    alternatives to preemptively scheduled shared memory threads

        cooperatively scheduled threads, implemented in user space

        event-driven programming


    disks

        geometry

        performance

        interface

        scheduling (skipped in lecture; see book)

        technology trends

    file systems

 	--basic objects: files, directories, meta-data, links, inodes

	--how does naming work? what allows system to map
	    /usr/homes/bob/index.html to a file object?

	--types of file layout

	    --extents/contiguous, linked, FAT, indexed structure
	    
	    --classic Unix and FFS are variants of indexed structure

        --analogy between inode and page directory

	--tradeoffs

	--performance


questions from all of you ...