Class 21
CS 202
16 April 2020

On the board
------------

1. Last time
2. NFS, continued
    - finish statelessness
    - transparency
3. Debugging, I
    A. intro
    B. attaching to, and controlling, a process
    C. reading and writing a program's memory and registers
    D. resolving addresses to program names/state
    E. single-stepping
    F. breakpoints
    G. watchpoints

---------------------------------------------------------------------------

1. Last time
    - finished crash recovery
    - introduced NFS
    - discussed what it means to be stateless

    Today: finish NFS
    Then: debuggers. Thanks, Panda, for this material.
    
2. NFS, continued

    C. Statelessness, continued

        --Why doesn't NFS have RPCs called OPEN() and CLOSE()?

    D. Transparency and non-traditional Unix semantics

	--Note: transparency is not just about preserving the syscall
	API (which they do). Transparency requires that the system calls
	*mean* the same things. Otherwise, existing programs may compile
	and run but experience different behavior. In other words,
	formerly correct programs may now be incorrect. (This happened
	with NFS because of its close-to-open consistency.)

	--what is generation number for?

		(*) What if client A deletes a file and it (or another
		client) creates a new one that uses the same i-node?

		--generation number prevents
		    --Stale FH error
		
		--served file systems must support

		--So not fully transparent

		More detail:
    
		For *all* files that could ever be exposed by NFS, the
		server stores, in the i-node on disk, a generation
		number. Every time the server allocates a given i-node,
		it increments the i-node's generation number. When the
		server passes a FH to the client (say, in response to a
		LOOKUP RPC from the client), the server puts the given
		i-node's _current_ generation number in the FH. 

		How: The way the generation number avoids problems that
		arise from the special case in (*) is as follows: for
		each request the client makes of the server, the server
		checks to see whether the generation number in the
		client's FH matches the on-disk generation number for
		the i-node in question. If so, the client has a current
		FH, and the special case has not arisen. If not, the
		client's generation number must be older, so we are in
		the special case, and the client gets a "stale FH" error
		when it tries to READ() or WRITE().

		Why: Without the generation number, the special case in
		(*) would cause a client to read and write data it had
		no business reading or writing (since the given i-node
		now belongs to some other file).

	
	--non-traditional Unix semantics

	(i) we mentioned one example above: error returns on successful
	operations. now we'll go through some other examples of changed
	semantics

	(ii) server failure	

	    --previously, open("some_file", RD_ONLY) failed only if file didn't exist

	    --now, if server has failed, open() can fail or apps hang
	    [fundamental trade-off if server is remote]

	(iii) deletion or permissions change of open files

            (a) What if client A deletes a file that client B has "open"?
		--Unix: my reads still work (file exists until all
		   clients close() it)
		--NFS: my reads fail

	        --Why?

		    --To get Unix-like behavior using NFS, server would
		    have to keep track of all kinds of stuff. That state
		    would have to persist across reboots.

		    --But they wanted stateless server

		    --So NFS just does the wrong thing: RPCs fail if
		    another client deletes a file you have open.

	    --(Hack if the *same* client does the delete: the NFS client
	    asks the NFS server to do a rename to .nfsXXX.  That's
	    another place that stale file handles come from.)

            (b) "chmod -r 0700" while file is open()

                similar issue to the one above, in (a)

		(in Unix, nothing happens. in NFS, future reads fail,
		since NFS checks permissions on every RPC.)

    
        (iv) execute-only implies read, unlike in Unix

		(in Unix, the operating system draws a distinction
		between demand-paging in the executable, to execute it,
		versus returning bytes in a file to a requesting
		program. a user might have permission to do one, or the
		other, or both. in NFS, the NFS server cannot care about
		this distinction because the NFS client needs the data
		blocks in the file, period. thus, if a file is marked
		execute-only on the NFS server, the NFS client will
		still be able to read it if the NFS client really wants
		(once the NFS client has the data blocks, it has the
		data blocks).

		 (put differently, under NFS, once a client has the
		 file, it has the file. compare to Unix, where Unix
		 really can execute a file for a user but not let the
		 user read it.)

        (v) big one: close-to-open consistency

            --result: can get errors on close() [which legacy apps would
            not have expected] instead of write(). means the app has to
            change. if server ran out of space, the app finds out about
            it at a different point than if the file system were local.

            --another issue: the following pattern does not work well: 
                "some_proc > out" on one client ; "tail -f out" on another
            issue is that the second client may not find out about the
            updates done by the first client

            --let's look at the source of these issues...

	    * Server must flush to disk before returning (why? because
	    if they didn't, and there were a server crash, the client
	    wouldn't know to retry -- the client would be unaware that
	    the write never took effect). So before returning "success",
	    the server has to make sure:

		  --Inode with new block # and new length safe on disk.
		  --Indirect block safe on disk.
		  --So writes have to be synchronous

		  So why isn't performance bad?
		     caching at client. not all RPCs actually go to server [see
		     below]

		[NFSv3 handles this a bit better. WRITES() go to server
		but don't necessarily cause disk accesses at server.]

	    * what kind of caching do they have?

		  --Read-caching of data. Why does this help? (re-reading files)

		  --Write-caching of data. Why does this help? (see above)

		  --Caching of file attributes. Why does this help? 
		    A command like "ls -l" gets all of the file
		    attributes, at which point a successive open can
		    work against the cached copy of the file or the
		    remote copy, depending on how recently the file was
		    updated

		  --Caching of name->fh mappings. Why does this help?
		  (cache prefix like /home/bob)

            * but once you have a cache, you have to worry about
            coherence and semantics. what kind of coherence
            does it actually give? (Answer: close-to-open)
	    
		A: write(), then close(), B: open(), read(). B
		    sees A's data. Otherwise, B has an "old" picture
		    because data not sent by A until close().
     
                At a high level, how do they implement it?
		    --writing client forces dirty blocks during a close()

		    --reading client checks with server during open()
		    and asks, "is this data current?"
                       (can reduce traffic from this last one by caching
                       file attributes)

            * this is the source of the issues above (that close() can
            fail, etc.)

	    * Why do they give this guarantee instead of a stronger
	    guarantee? (Performance. They are trading off the
	    semantics for performance.)

	
---
	Areas of RPC non-transparency (a more general point than NFS)
	  * Partial failure, network failure
	  * Latency
	  * Efficiency/semantics tradeoff
	  * Security. You can rarely deal with it transparently (see
	    below)
	  * Pointers. Write-sharing. Portable object references is hard
	    under RPC
	  * Concurrency (if multiple clients)
	  Solution 1: expose RPC to application
	  Solution 2: work harder on transparent RPC


    E. Security
    
	--Only security is via IP address

	--Another case of non-transparency:

	    --On local system: UNIX enforces read/write protections
		Can't read my files w/o my password

	    --On NFS:
		--Server believes whatever UID appears in NFS request
		--Anyone on the Internet can put whatever they like in the request
		--Or you (on your workstation) can su to root, then su to me
		  --2nd su requires no password
		  --Then NFS will let you read/write my files

	    --In other words, to steal data, just adopt the uid of the
	    person whose files you're trying to read....or just spoof
	    packets.

	--So why aren't NFS servers ridiculously vulnerable?
	    --Hard to guess correct file handles.
	    --(Which rules out one class of attacks but not spoofed
	    UIDs)

	--Observe: the vulnerabilities are fixable
	    --Other file systems do it
	    --Require clients to authenticate themselves cryptographically.
	    --But very hard to reconcile with statelessness.


    F. Concluding note

	--None of the above issues prevent NFS from being useful.
	    --People fix their programs to handle new semantics.
	    --Or install firewalls for security.
	    --And get most advantages of transparent client/server.

    References
    --"RFC 1094": NFS v2
    --"RFC 1813": NFS v3

    Other distributed file systems (disconnected operation, etc.)

        --disconnected operation: where have we seen this? (answer: git)

        --long literature on this (Coda, Bayou, Andrew File System, etc.,
        etc.)

3. Debugging

A. Intro

	* Something you have (ideally) been using for the last many labs
	(gdb stands for GNU Debugger). This class: how do they work?


        * There is a lot of coolness to debuggers:

            - first of all, the high-level functionality is invaluable
            to software developers. here are some examples: 
            
                In the listing below, we include the corresponding gdb
                command names in parentheses; note that what is
                parentheses is a sliver of gdb's abilities. For any
                given command, type within the gdb shell:
                    
                    (gdb) help <command_name>

              + Set breakpoints, so the process automatically stops when some
	         piece of code is executed ('break' in gdb)
	      + Pause a process  ('attach')
	      + Single step through the process ('stepi' is one assembly
	          instruction, 'step' is one line of C code, and of course
	          those aren't the same granularity of stepping)
	      + or continue process execution from the paused points ('continue')
	      + Generate a stack trace ('backtrace')
	      + Read and modify values of variables, which might be on the stack,
	        heap or data segment ('x' or 'print' to read, `set $varname` to modify)
	      + Read and modify program code (the TEXT area) ('disassemble' to read)
	      + Read and modify program registers ('info registers' and 'print' to read, and 'set $regname' to write)
	      + Modify parameters and return values for system calls
	      ('call' and 'catch syscall')
	      + Set watchpoints, so the process stops whenever it accesses some
	        memory address ('watch', 'awatch' and 'rwatch')

            - second, the overall arrangement seems as though it flies
            in the face of everything we've learned this semester.  For
            the purposes of this class, the debugger is a process that
            has some control over another process (called the
            **target**, or the process being **debugged**).  Meanwhile,
            how can one process actually do all of the things above to
            another? Specifically:

             + Processes are supposed to be isolated from each other, which means
               one process cannot access memory allocated to another. Here we want
               the debugger to have access to a target's memory. How?
             + How can the debugger stop **another** processes execution at a
               specific address: the only times a process transfers control to
               the kernel is during interrupts or syscalls, but how can we control
               what instruction the interrupt occurs at?
             + and how can one process single-step another?

            - third, the resolution to the above questions, and the
            overall workings of a debugger, draws on OS concepts that
            we've been studying all semester:

	  	    a. Stack frames
	  	    b. Virtual memory
	  	    c. Interrupts
	  	    d. Signals

	      in fact, debuggers require support from the Operating
	      System and processor (CPU) to actually implement.


        We are going to study the workings of debuggers in the context
        of Unix-like operating systems and the AMD64 architecture (aka
        x86-64).

        As we go over these, start thinking about how these are
        implemented: what tools might the OS be providing to the
        debugger and target processes, what processor functionality are
        we relying on, and how is the OS abstracting that functionality
        for the debugger and target process?
	

B. Attaching and controlling a process

        [DRAW PICTURE:
            
            [debugging process]  ---->  [target]
                                 ---->
                                 ---->

                the arrow/channel between them expresses that the one on
                the left is acting as puppeteer and the target as a
                marionette. "attaching is getting all of these lines
                into place".]
	
	* NOTE: For what follows we only consider single-threaded processes, and
	  hence conflate process and thread.
	  
	* Attaching to a process:

	  Almost all debugging functionality we talk about today is controlled
	  using the ptrace(2) syscall. Remember the (2) here just tells you 
	  the man page section to look at for a call, use `man 2 ptrace` for more
	  information.

	  There are two canonical techniques to attach to a process [see handout]:

	  - Debugger launches a process to which it is attached:

	  	 void launch_attached(const char* path, char* const argv[]) {
	  	 	int pid = fork();
	  	 	if (pid == 0) {
	  	 		ptrace(PTRACE_TRACEME, 0, NULL, NULL);
	  	 		execv(path, argv);
	  	 	}
	  	 	return pid;
	  	 }

                (this is what happens when you pull up gdb, and then
                type "run")

	     + Debugger forks a new child process.
	     + Child process calls ptrace to say that it should be **attached**
	       to its parent.
	     + The child then uses execv() to launch a new process. The call
	       to ptrace makes it so that process execution **pauses** right after
	       the call to execv(). Meaning, that issuing execv() causes
	       kernel to send a SIGTRAP signal back to the child (forked) process.
         
	     + There is however a race between when execv() is executed (and hence the new
	       process is blocked) and when launch_attached() returns in the parent
	       process.

	  - Debugger attaches to an already running process:

	  	void attach_to_process(pid_t pid) {
	  		ptrace(PTRACE_ATTACH, pid, NULL, NULL);
	  	}

	  	+ Debugger tells the kernel it wants to attach to process with PID pid.
	  	+ Kernel checks whether the debugger has permission to attach to the 
	  	  process with the given PID. If not, ptrace returns an error, which 
	  	  we have not checked for here.
	  	+ Kernel sends the target process a SIGSTOP signal, which blocks it
	  	  until the debugger explicitly unblocks the process. Note, however that there is a race between
	  	  when the `ptrace` call (executed in the debugger process) returns and
	  	  when the target process (to which the debugger is
	  	  attached) is blocked,  since blocking requires that the 
	  	  target process transfer control to the
	  	  kernel either in order to execute a syscall or due to preemption.

        * When a debugger attaches to a process, the Operating System
        arranges that: signals sent to the target process stop the
        target process; the debugging process has to acknowledge them
        before the target process can continue execution. Equivalently,
        the target process is **blocked** on signal until the debugging
        process **unblocks** it.

	* How the debugger "synchronizes" with a process.

	  As we noted above, either of the methods for attaching to a
	  process can **return** before the debugger is actually
	  attached. Therefore, the debugger needs a mechanism through
	  which it can wait until it has been attached to the process.
	  In the handout you can see this in the `continue_once_attached()`
	  function.

	  +  Debugger waits for the target using the `waitpid` syscall:

	  	  int status;
	  	  waitpid(pid, &status);

	  	  status returns the new state of process with PID pid. For instance:

	  	  * WIFSTOPPED(status) is true if the process is stopped (as opposed to
	  	    having been terminated).
	  	  * WSTOPSIG(status) returns why the process stopped.

	  + Once `waitpid` indicates that the process is attached, we can
	  resume execution ('continue' in GDB). continue_once_attached()
	  does so by again using ptrace().

	* In general when the process is executing, the debugger executes `waitpid`
	  in a loop in order to find out when errors occur, etc. This loop is 
	  similar to the `continue_once_attached` loop you just saw. 
    
    * In the next section of the class, we will study how the debugger
    can read/write a process's memory and registers. However, to do so
    the debugger needs to stop the target from executing before
    accessing it, since otherwise the debugger might see inconsistent
    state due to concurrent access. (The concurrent actors being the
    debugger and the target). A process can be stopped for two reasons:

        (1) The process can be stopped because of an error -- for
        example the process accessed an invalid memory address and
        received a segfault (SIGSEGV).

        (2) The debugger can also force the process to stop by sending
        it a signal.  In this case the debugger must wait (by calling
        `waitpid`) until the process has stopped.
        [See handout 2. Interrupting the running process] 

        In both cases, the kernel will send a signal to the process.
        * Since the debugger is attached to the process, this signal will block
          the process.
        * The debugger will be unblocked if it was already waiting for the process
          using `waitpid` and will see a status indicating the fact that process was stopped
          for a signal. If the debugger **was not** already waiting, its next call to waitpid
          will return such a signal immediately.


C. Reading and writing a program's memory, registers

    * In the handout "3. Other ptrace commands" lists a set of low-level
      ptrace commands including ones to
        - Read and write registers.
        - Read and write memory.
        - Get information on why a signal occured.
      Next, going to look at how to use these low level primitives for various
      tasks.

    * Get the address of the instruction that is being executed:
      Use PTRACE_GETREGS to get the processes RIP, which gives us the address.

      Note, we only get an address, not the function name or line of code.

    * Figure out access to which address resulted in a segfault.

    	- Remember when a segfault occured, waitpid stopped due to a
    	SIGSEGV signal. That is if we ran 
    		`status = waitpid(pid);`
    	Then `WIFSTOPPED(status) && WSTOPSIG(status) == SIGSEGV` is true.

    	- The debubgger uses `PTRACE_GETSIGINFO` to read information about
    	the signal that was delivered to the attached process. This signal
    	information is returned in a `siginfo_t` struct:

    	    siginfo_t sinfo;
    	    ptrace(PTRACE_GETSIGINFO, pid, &sinfo, NULL);

    	  The siginfo_t structure (you can find all fields by looking at
    	  the documentation of sigaction(2)) contains a field si_addr 
    	  with the "memory location which caused fault", which is what
    	  we are looking for:

    	    siginfo_t sinfo;
    	    ptrace(PTRACE_GETSIGINFO, pid, &sinfo, NULL);
    	    printf("Faulting address is %p\n", sinfo.si_addr);

    * Often in a debugger you want to get a call stack, aka call trace,
    aka backtrace.  To do so, the debugger needs to reconstruct the call
    stack from registers and stack, using a process called **stack
    unwinding**.

      	- In order to see how the stack is unwound, recall the structure of
      	stack frames from earlier in the class. See page 3 of the handout.
      	
      	- The debugger can use PTRACE_GETREGS to get the current %rbp. This points
      	  to the beginning of the stack frame.

      	- The word before the beginning of the stack frame is the return
      	address. Recall that this is one instruction past the calling
      	instruction. We can read its value using PTRACE_PEEKDATA, and
      	then adjust to get the address of the calling instruction, and
      	thus what function (and line) invoked the called function.
      	
      	- The current value of %rbp is a pointer to the stack, which in
          turn points to the base of the previous stack frame, which also
          allows us to retrieve the address of the previous caller, and the 
          base of the previous stack frame. We can recursively apply this 
          process, until we bottom out, i.e. until we observe an RBP value
          of 0, which the __start function (where execution begins) pushes
          to the stack to help with stack unwinding. We have now reconstructed (unwound)
          the call stack.

   So far everything we have covered gives you addresses, and no means
   to translate it into function names or variables, etc. We discuss
   this translation/resolution step next.

[thanks to Robert Morris for some of the NFS content.]
[thanks to Aurojit Panda for the debugging content.]