Class 20
CS 202
21 April 2015

On the board
------------

1. Last time 
2. Transactions: isolation
3. RPC, client/server systems
4. NFS (case study of client/server)

---------------------------------------------------------------------------

1. Last time

    --saw fault tolerance and crash recovery via logging

    --we sketched a simple undo/redo protocol, and also discussed the
    amendments that could make it an undo-only protocol, or a redo-only
    protocol

        --NOTE: in everyday life, go with redo logging! (Undo logging is
        needed in database context but not for many things.)

    --today, move to the second piece of transactions that we will
    cover: handling concurrent, conflicting transactions (recall that we
    assumed last time that transactions did not conflict)

2. Transactions: isolation

    --easiest approach: one giant lock. only one transaction active
    at a time. so everything really is serialized
	--advantage: easy to reason about
	--disadvantage: no concurrency

    --next approach: fine-grained locks (e.g., one per cell, or
    per-table, or whatever), and acquire all needed locks at
    begin_transaction and release all of them at end_transaction
	--advantage: easy to reason about. works great if it could
	be implemented
	--disadvantage: requires transaction to know all of its
	needed locks in advance

    --actual approach: two-phase locking. gradually acquire locks as
    needed inside the transaction manager (phase 1), and then
    release all of them together at the commit point (phase 2)

	--your intuition from the concurrency unit will tell you
	that this creates a problem....namely deadlock. we'll come
	back to that in a second.

	--why does this actually preserve a serial ordering? here's
	an informal argument:

	    --consider the _lock point_, that is, the point in time at
	    which the transaction owns all of the locks that it will
	    ever acquire.

	    --consider any lock that is acquired. call it L. from the
	    point that L is acquired to the lock point, the application
	    always sees the same values for the data that lock L
	    protects (because no other transaction or thread can get the
	    lock).

	    --so regard the application as having done all of its
	    reads and writes instantly at the lock point

	    --the lock points create the needed serialization.
	    Here's why, informally.  Regard the transactions as
	    having taken place in the order given by their lock
	    points. Okay, but how do we know that the lock points
	    serialize? Answer: each lock point takes place at an
	    instant, and any lock points with intersecting lock sets
	    must be serialized with respect to each other as a
	    result of the mutual exclusion given by locks.
	    
	--to fix deadlock, several possibilities:

	    --one of them is to remove the no-preempt condition:
	    have the transaction manager abort transactions (roll
	    them back) after a timeout

---------------------------------------------------------------------------

Source for a lot of the transactions material:

    J. H. Saltzer and M. F. Kaashoek, Principles of Computer System
    Design: An Introduction, Morgan Kaufmann, Burlington, MA, 2009.
    Chapter 9. Available online

---------------------------------------------------------------------------

3. RPC, client/server systems

    --what the heck is RPC? (compare to local function calls.)

    --client/server systems

    --potential of RPC: fantastic way to build distributed systems
       --RPC system takes care of all the distributed/network issues

    --how well does all of this work?

    --question to begin answering for yourself: does RPC look like a
    local function call, or no?

4. NFS: case study of client/server, and case study of network file system

    Networked file systems:

	--What's a network file system?
	    --Looks like a file system (e.g., FFS) to applications
	    --But data potentially stored on another machine
	    --Reads and writes must go over the network
	    --Also called distributed file systems
  
	--Advantages of network file systems
	    --Easy to share if files available on multiple machines
	    --Often easier to administer servers than clients
	    --Access way more data than fits on your local disk
	    --Network + remote buffer cache faster than local disk
	      
	--Disadvantages
	    --Network + remote disk slower than local disk
	    --Network or server may fail even when client OK
	    --Complexity, security issues

    NFS: seminal networked file system (NFS = Network File System)
	
    * Intro and background
    * How it works
    * Statelessness
    * Transparency
    * [next time] Security

    A. Intro and background

	--Reasons to study it

	    --case study of RPC transparency

	    --NFS was very successful.

	    --Still in widespread use today (CIMS machines, for example)

	    --Much research uses it.

	    --Can view much networked file systems research as fixing
	    problems with NFS

	--background and context
	    --designed in mid 1980s
	    --before this, Sun was selling Unix workstations
		--diskless (to save money)
		--"ND" network disk protocol (use one big central disk,
		and let the diskless workstations use it)
		--allowed disk to live somewhere else, but did not allow for
		shared file system (every workstation had a partitioned
		piece of the ND *not* a shared file system)

	   More detail on context:

	   NFS arose in the early-to-mid 1980s. Prior to NFS, each
	   computer had its own private disk and file system. That
	   worked for expensive central time-sharing systems when there
	   weren't many workstations. But in the LAN environment, with
	   workstations becoming cheaper, people wanted ways to share
	   files within organizations. The goal was to allow a user to
	   sit down at any workstation and access his or her files even
	   though the files might live on a central server. 

	       --Advantages:
		  --convenience (get your files anywhere)
		  --cost (buy workstations without disks)
	      
	--only sysadmin has to know where files live. shell, user
	program, etc. do _not_ have to know (way better than competitors
	at the time)


    B. How it works

	--What's the software/hardware structure?

	    [DRAW PICTURE]

	  --array of vnodes in both client and server

	      --vnode like a primitive C++ or Java object, with methods

	      --represents an open (or openable) file

	     --Bunch of generic "vnode operations":

		--lookup, create, open, close, getattr, setattr, read,
		write, fsync, remove, link, rename, mkdir, rmdir, symlink,
		readdir, readlink, ...

		--Called through function pointers, so most system calls don't
		care what type of file system a file resides on

		--We have seen this function-pointer-as-abstraction
		pattern before with device drivers (more generally, it
		shows up everywhere, for instance on your FUSE-based lab).
     
	--NFS implements vnode operations through RPC
	    --Client request to server over network, awaits response
	    --Each system call may require a series of RPCs
	    --System mostly determined by NFS RPC **protocol**

	--How does it work?

	    [TRACE RPC FOR OPEN AND WRITE: LOOKUP AND WRITE]


	--nice separation between interface and implementation

	    --loopback server

	    --replace NFS server altogether with something that *acts*
	    like an NFS server to the client.

	    --can make lots of things *look* like a file system just by
	    implementing the NFS interface. extremely powerful technique

	    --this gain mostly arises because of the power of RPC and
	    modularity, rather than anything about NFS in particular
	 
	--What does a file handle look like?

	    [FS ID | inode # | generation #]

	    Why not embed file name in file handle? (file names can change;
	    would mess everything up. client needs to use an identifier
	    that's invariant across such renames.)

	    How does client know what file handle to send? (stored with the
	    vnode)

    C. Statelessness

	--What the heck do they mean? The file server keeps files;
	that's certainly state!!

	    --What they really mean is that every network protocol
	    request contains all of the information needed to carry out
	    that request, without relying on anything remembered from
	    previous protocol requests.

	    --convince yourself of this by looking at the calls

	    --but are operations really idempotent? (hint: no, not all
	    of them.)

		--what happens if two renames() are sent, and the reply
		to the first one is lost? client sends another one. then
		the second one returns an error code, even though the
		operation conceptually succeeded.

		    --similar issue with "mkdir", "create", etc.

	--How are READ and WRITE stateless?

	    (Answer: they contain the disk address (the inode at the
	    server) as well as an offset.)

	--What are the advantages and disadvantages?

	    +: simplifies implementation
	    +: simplifies server failure recovery 
	    -: messes up traditional Unix semantics; will discuss below

	--What happens if the server reboots while the client has a file open?

	   --Nothing!

	   --Client just uses the same file handle.
	     (file handles are usable across server failures.)

	   --NOTE: a crashed and rebooted server looks the same to
	   clients as a slow server. Which is cool.

	--Why doesn't NFS have RPCs called OPEN() and CLOSE()?

    D. Transparency and non-traditional Unix semantics

	--Note: transparency is not just about preserving the syscall
	API (which they do). Transparency requires that the system calls
	*mean* the same things. Otherwise, existing programs may compile
	and run but experience different behavior. In other words,
	formerly correct programs may now be incorrect. (This happened
	with NFS because of its close-to-open consistency.)

	--what is generation number for?

		(*) What if client A deletes a file and it (or another
		client) creates a new one that uses the same i-node?

		--generation number prevents
		    --Stale FH error
		
		--served file systems must support

		--So not fully transparent

		More detail:
    
		For *all* files that could ever be exposed by NFS, the
		server stores, in the i-node on disk, a generation
		number. Every time the server allocates a given i-node,
		it increments the i-node's generation number. When the
		server passes a FH to the client (say, in response to a
		LOOKUP RPC from the client), the server puts the given
		i-node's _current_ generation number in the FH. 

		How: The way the generation number avoids problems that
		arise from the special case in (*) is as follows: for
		each request the client makes of the server, the server
		checks to see whether the generation number in the
		client's FH matches the on-disk generation number for
		the i-node in question. If so, the client has a current
		FH, and the special case has not arisen. If not, the
		client's generation number must be older, so we are in
		the special case, and the client gets a "stale FH" error
		when it tries to READ() or WRITE().

		Why: Without the generation number, the special case in
		(*) would cause a client to read and write data it had
		no business reading or writing (since the given i-node
		now belongs to some other file).

	
	--non-traditional Unix semantics

	(i) we mentioned one example above: error returns on successful
	operations. now we'll go through some other examples of changed
	semantics

	(ii) server failure	

	    --previously, open() failed only if file didn't exist

	    --now, if server has failed, open() can fail or apps hang
	    [fundamental trade-off if server is remote]

	(iii) deletion or permissions change of open files

            (a) What if client A deletes a file that client B has "open"?
		--Unix: my reads still work (file exists until all
		   clients close() it)
		--NFS: my reads fail

	        --Why?

		    --To get Unix-like behavior using NFS, server would
		    have to keep track of all kinds of stuff. That state
		    would have to persist across reboots.

		    --But they wanted stateless server

		    --So NFS just does the wrong thing: RPCs fail if
		    another client deletes a file you have open.

	    --(Hack if the *same* client does the delete: the NFS client
	    asks the NFS server to do a rename to .nfsXXX.  That's where
	    stale file handles come from.)

            (b) "chmod -r 0700" while file is open()

                similar issue to the one above, in (a)

		(in Unix, nothing happens. in NFS, future reads fail,
		since NFS checks permissions on every RPC.)

    
        (iv) execute-only implies read, unlike in Unix

		(in Unix, the operating system draws a distinction
		between demand-paging in the executable, to execute it,
		versus returning bytes in a file to a requesting
		program. a user might have permission to do one, or the
		other, or both. in NFS, the NFS server cannot care about
		this distinction because the NFS client needs the data
		blocks in the file, period. thus, if a file is marked
		execute-only on the NFS server, the NFS client will
		still be able to read it if the NFS client really wants
		(once the NFS client has the data blocks, it has the
		data blocks).

		 (put differently, under NFS, once a client has the
		 file, it has the file. compare to Unix, where Unix
		 really can execute a file for a user but not let the
		 user read it.)

        (v) big one: close-to-open consistency

            --result: can get errors on close() [which legacy apps would
            not have expected] instead of write(). means the app has to
            change. if server ran out of space, the app finds out about
            it at a different point than if the file system were local.

            --another issue: the following pattern does not work well: 
                "some_proc > out" on one client ; "tail -f out" on another
            issue is that the second client may not find out about the
            updates done by the first client

            --let's look at the source of these issues...

	    * Server must flush to disk before returning (why?)
		  --Inode with new block # and new length safe on disk.
		  --Indirect block safe on disk.
		  --So writes have to be synchronous
		  --So why isn't performance bad?
		     caching. not all RPCs actually go to server [see
		     below]

		[NFSv3 handles this a bit better. WRITES() go to server
		but don't necessarily cause disk accesses at server.]

	    * what kind of caching do they have?

		  --Read-caching of data. Why does this help? (re-reading files)

		  --Write-caching of data. Why does this help? (see above)

		  --Caching of file attributes. Why does this help? 
		    A command like "ls -l" gets all of the file
		    attributes, at which point a successive open can
		    work against the cached copy of the file or the
		    remote copy, depending on how recently the file was
		    updated

		  --Caching of name->fh mappings. Why does this help?
		  (cache prefix like /home/bob)

            * but once you have a cache, you have to worry about
            coherence and semantics. what kind of coherence/consistency
            does it actually give? (Answer: close-to-open)
	    
		A: write(), then close(), B: open(), read(). B
		    sees A's data. Otherwise, B has an "old" picture
		    because data not sent by A until close().
     
                At a high level, how do they implement it?
		    --writing client forces dirty blocks during a close()

		    --reading client checks with server during open()
		    and asks, "is this data current?"
                       (can reduce traffic from this last one by caching
                       file attributes)

            * this is the source of the issues above (that close() can
            fail, etc.)

	    * Why do they give this guarantee instead of a stronger
	    guarantee? (Performance. They are trading off the
	    semantics for performance.)

	
---
	Areas of RPC non-transparency (a more general point than NFS)
	  * Partial failure, network failure
	  * Latency
	  * Efficiency/semantics tradeoff
	  * Security. You can rarely deal with it transparently (see
	    below)
	  * Pointers. Write-sharing. Portable object references is hard
	    under RPC
	  * Concurrency (if multiple clients)
	  Solution 1: expose RPC to application
	  Solution 2: work harder on transparent RPC
---

    E. Concluding note

	--None of the above issues prevent NFS from being useful.
	    --People fix their programs to handle new semantics.
	    --Or install firewalls for security.
	    --And get most advantages of transparent client/server.

    References
    --"RFC 1094": NFS v2
    --"RFC 1813": NFS v3

[thanks to Robert Morris for some of the NFS content.]