Class 17
CS 202
2 April 2020

---------------------------------------------------------------------------

On the board
------------

1. Last time
2. mmap(), continued
3. File systems
     intro
     files
     implementing files
        preface
        contiguous
        linked
        indexed

---------------------------------------------------------------------------

1. Last time
    - user-level context switches
    - disks
    - mmap()

2. mmap(), continued

    --why is this cool?

        - example: mmap enables copying a file to stdout without
        transferring data to user space

            see handout

            NOTE: the process never itself dereferences a pointer to
            memory containing file data.

            NOTE: this saves two sets of memory-to-memory copies
            (kernel-to-user, user-to-kernel), versus the "naive"
            solution of read()ing into a buffer in user space, and then
            write()ing

            [Also, a well-tuned buffer cache manages which file pages
            are kept in RAM, rather than leaving the app developer to
            have to explicitly try to manage that (and potentially have
            the OS page replacement algorithm underneath make
            conflicting decisions).]

        - other examples:

            - reading big files. map the whole thing, rely on the paging
            mechanism to bring the needed pieces into memory as necessary

            - shared data structures, when flag is MAP_SHARED 

            - file-based data structures:
                - load data from file, update it, write it back
                - this is implemented entirely with loads/stores

            Question: how does the OS ensure that it's only writing back
            modified pages?


    --how's mmap implemented?! (answer: through virtual memory,
    with the VA being addr [or whatever the kernel selects] and
    the PA being what? answer: the physical address storing the
    given page in the kernel's buffer cache).

    --have to deal with eviction from buffer cache, so kernel will need
    a data structure that maps from:
        Phys page --> {list of (proc,va) pairs}
      
    note that the kernel needs this data structure anyway: when a
    page is evicted from RAM, the kernel needs to be able to invalidate
    the given virtual address in the page table(s) of the process(es)
    that have the page mapped.


3. File systems

    A. intro
    B. files
    C. implementing files
        1. contiguous
        2. linked files
        3. indexed files

    A. intro

    --what does a FS do?

	--provide persistence (don't go away ... ever)

	--somehow associate bytes on the disk with names (files)

	--somehow associates names with each other (directories)

    --where are FSes implemented?

	--can implement them on disk, over network, in memory, in NVRAM
	(non-volatile RAM), on tape, with paper (!!!!)

	--we are going to focus on the disk and generalize later. we'll
	see what it means to implement a FS over the network
 
    --a few quick notes about disks in the context of FS design

	--disk is the first thing we've seen that (a) doesn't go away;
	and (b) we can modify (BIOS ROM, hardware configuration, etc.
	don't go away, but we weren't able to modify these things). two
	implications here:

	    (i) we're going to have to put all of our important state on
	    the disk

	    (ii) we have to live with what we put on the disk! scribble
	    randomly on memory --> reboot and hope it doesn't happen
	    again. scribbe randomly on the disk --> now what? (answer:
	    in many cases, we're hosed.)


    B. Files

	--what is a file?
	    --answer from user's view: a bunch of named bytes on the disk
	    --answer from FS's view: collection of disk blocks
	
	--big job of a FS: map name and offset to disk blocks
	   
                                 FS
                   {file,offset} --> disk address
	    
	    --operations are create(file), delete(file), read(), write()

	    --***goal: operations have as few disk accesses as possible
	    and minimal space overhead
	    
		--wait, why do we want minimal space overhead, given that
		the disk is huge?

		--answer: cache space never enough; the amount of data
		that can be retrieved in one fetch is never enough.
		hence, really don't want to waste.

	[[--note that we have seen translation/indirection before:

	    page table:

		                    page table 
		    virtual address ----------> physical address

    
	    per-file metadata:

			    inode
		    offset ------>  disk block address


	    how'd we get the inode?

			       directory
		    file name ----------> file # 
		    
		(file # *is* an inode in Unix)
		    		
	    ]]


    C.  Implementing files

	--our task: meet the goal marked *** above. 

	--NOTE: for most of today we're going to assume that the file's
	metadata is known to the system
	    
	    --> when we look at directories in a bit, we'll see where
	    the metadata comes from; the above picture should also give
	    a hint
    
	access patterns we could imagine supporting:

	(i) Sequential:
	    --File data processed in sequential order
	    --By far the most common mode
	    --Example: editor writes out new file, compiler reads in file, etc

	(ii) Random access:
	    --Address any block in file directly without passing through
	    --Examples: large data set, demand paging, databases

	(iii) Keyed access
	    --Search for block with particular values
	    --Examples: associative database, index
	    --This thing is everywhere in the field of databases,
	    search engines, but....
	    --...usually not provided by a FS in OS

	helpful observations:

	(i) All blocks in file tend to be used together, sequentially 

	(ii) All files in directory tend to be used together

	(iii) All *names* in directory tend to be used together

	further design parameters:

	(i) Most files are small 
	
	(ii) Much of the disk is allocated to large files

	(iii) Many of the I/O operations are made to large files

	(iv) Want good sequential and good random access 

	candidate designs........

	1. contiguous allocation 

	  "extent based"
	  --when creating a file, make user pre-specify its length, and
	  allocate the space at once
	  --file metadata contains location and size

	  --example: IBM OS/360

		[<free> a1 a2 a3 <5 free> b1 b2 <free> ]

		what if a file c needs 7 sectors?!
	  
	  +: simple
	  +: fast access, both sequential and random
	  -: fragmentation
	  
	2. linked files
	    
	    --keep a linked list of free blocks
	    --metadata: pointer to file's first block
	    --each block holds pointer to next one

	  +: no more fragmentation
	  +: sequential access easy (and probably mostly fast, assuming
	     decent free space management, since the pointers will point
	     close by)
	  -: random access is a disaster
	  -: pointers take up room in blocks; messes up alignment of
	  data

	3. indexed files

	    [DRAW PICTURE]

	    --Each file has an array holding all of its block pointers
		--like a page table, so similar issues crop up

	    --Allocate this array on file creation

	    --Allocate blocks on demand (using free list)
	   
	    +: sequential and random access are both easy
	    -: need to somehow store the array

	    --large possible file size --> lots of unused entries in the
	    block array

	    --large actual block size --> huge contiguous disk chunk
	    needed

	    --solve the problem the same way we did for page tables:

			[............]

		[..........]	    [.........]

	    [ block     block                         block]

            --[above is a drawing of a balanced tree, like the 4-level
            page tables we saw for x86-64.]

	    --okay, so now we're not wasting disk blocks, but what's the
	    problem? (answer: equivalent issues as for page table
	    walking: here, it's extra disk accesses to look up the blocks)

	    --this motivates the classic Unix file system 

	    --inode contains:

		permisssions
		times for file access, file modification, and inode-change
		link count (# directories containing file)
		ptr 1  --> data block
		ptr 2  --> data block
		ptr 3  --> data block
		.....
		ptr 11  --> indirect block 
				      ptr --> 
				      ptr --> 
				      ptr --> 
				      ptr -->
				      ptr -->
		ptr 12 --> indirect block
		ptr 13 --> double indirect block
		ptr 14 --> triple indirect block

    This is just a tree.

    Question: why is this tree intentionally imbalanced?

        (Answer: optimize for short files. each level of this tree
        requires a disk seek...)

    Pluses/minuses:

    +: Simple, easy to build, fast access to small files

    +: Maximum file length can be enormous, with
       multiple levels of indirection 

    -: worst case # of accesses pretty bad

    -: worst case overhead (such as 11 block file) pretty bad

    -: Because you allocate blocks by taking them off unordered
       freelist, metadata and data get strewn across disk


    Notes about inodes:

    --stored in a fixed-size array

    --Size of array fixed when disk is initialized; can't be changed

    --Multiple inodes in a disk block

    --Lives in known location, originally at one side of disk,
    now lives in pieces across disk (helps keep metadata close
    to data)

    --The index of an inode in the inode array is called an
    ***i-number***

    --Internally, the OS refers to files by i-number

    --When a file is opened, the inode brought in memory

    --Written back when modified and file closed or time elapses