Class 22
CS 202
20 November 2024

On the board
------------

1. Last time
2. Debugging
    A. intro
    B. attaching to, and controlling, a process
    C. reading and writing a program's memory and registers
    D. resolving addresses to program names/state
    E. single-stepping
    F. breakpoints
    G. watchpoints
    H. multithreading

---------------------------------------------------------------------------


review session reminder
---------------------------------------------------------------------------

1. Last time

    Crash recovery. Studied redo-only logging. The notes describe other
    protocols: undo-only logging, and redo-undo logging.

2. Debugging

A. Intro

	* Something you have (ideally) been using for the last many labs
	(gdb stands for GNU Debugger). This class: how does it work?

        * There is a lot of coolness to debuggers:

            - first of all, the high-level functionality is invaluable
            to software developers. here are some examples: 
            
                In the listing below, we include the corresponding gdb
                command names in parentheses; note that what is
                parentheses is a sliver of gdb's abilities. For any
                given command, type within the gdb shell:
                    
                    (gdb) help <command_name>

              + Set breakpoints, so the process automatically stops when some
	         piece of code is executed ('break' in gdb)
	      + Pause a process  ('attach')
	      + Single step through the process ('stepi' is one assembly
	          instruction, 'step' is one line of C code, and of course
	          those aren't the same granularity of stepping)
	      + or continue process execution from the paused points ('continue')
	      + Generate a stack trace ('backtrace')
	      + Read and modify values of variables, which might be on the stack,
	        heap or data segment ('x' or 'print' to read, `set $varname` to modify)
	      + Read and modify program code (the TEXT area) ('disassemble' to read)
	      + Read and modify program registers ('info registers' and 'print' to read, and 'set $regname' to write)
	      + Modify parameters and return values for system calls
	      ('call' and 'catch syscall')
	      + Set watchpoints, so the process stops whenever it accesses some
	        memory address ('watch', 'awatch' and 'rwatch')

            - second, the overall arrangement seems as though it flies
            in the face of everything we've learned this semester.  For
            the purposes of this class, the debugger is a process that
            has some control over another process (called the
            **target**, or the process being **debugged**).  Meanwhile,
            how can one process actually do all of the things above to
            another? Specifically:

             + Processes are supposed to be isolated from each other, which means
               one process cannot access memory allocated to another. Here we want
               the debugger to have access to a target's memory. How?
             + How can the debugger stop **another** processes execution at a
               specific address: the only times a process transfers control to
               the kernel is during interrupts or syscalls, but how can we control
               what instruction the interrupt occurs at?
             + and how can one process single-step another?

            - third, the resolution to the above questions, and the
            overall workings of a debugger, draws on OS concepts that
            we've been studying all semester:

	  	    a. Stack frames
	  	    b. Virtual memory
	  	    c. Interrupts
	  	    d. Signals  
	  	        [NOTE: OS Signals are not the same thing as a
	  	        condition variable's "signal()" call. Today we
	  	        are talking about the former.]

	      in fact, debuggers require support from the Operating
	      System and processor (CPU) to actually implement.


        We are going to study the workings of debuggers in the context
        of Unix-like operating systems and the AMD64 architecture (aka
        x86-64).

        As we go over these, start thinking about how these are
        implemented: what tools might the OS be providing to the
        debugger and target processes, what processor functionality are
        we relying on, and how is the OS abstracting that functionality
        for the debugger and target process?
	

B. Attaching and controlling a process

        [DRAW PICTURE:
            
            [debugging process]  ---->  [target]
                                 ---->
                                 ---->

                the arrow/channel between them expresses that the one on
                the left is acting as puppeteer and the target as a
                marionette. "attaching is getting all of these lines
                into place".]
	
	* NOTE: For what follows we only consider single-threaded processes, and
	  hence conflate process and thread.
	  
	* Attaching to a process:

	  Almost all debugging functionality we talk about today is controlled
	  using the ptrace(2) syscall. Remember the (2) here just tells you 
	  the man page section to look at for a call, use `man 2 ptrace` for more
	  information.

	  There are two canonical techniques to attach to a process [see handout]:

	  - Debugger launches a process to which it is attached:

	  	 void launch_attached(const char* path, char* const argv[]) {
	  	 	int pid = fork();
	  	 	if (pid == 0) {
	  	 		ptrace(PTRACE_TRACEME, 0, NULL, NULL);
	  	 		execv(path, argv);
	  	 	}
	  	 	return pid;
	  	 }

                (this is what happens when you pull up gdb, and then
                type "run")

	     + Debugger forks a new child process.
	     + Child process calls ptrace to say that it should be **attached**
	       to its parent.
	     + The child then uses execv() to launch a new process. The call
	       to ptrace makes it so that process execution **pauses** right after
	       the call to execv(). Meaning, that issuing execv() causes
	       kernel to send a SIGTRAP signal back to the child (forked) process.
         
	     + There is however a race between when execv() is executed (and hence the new
	       process is blocked) and when launch_attached() returns in the parent
	       process. See below for how the two processes synchronize.

	  - Debugger attaches to an already running process:

	  	void attach_to_process(pid_t pid) {
	  		ptrace(PTRACE_ATTACH, pid, NULL, NULL);
	  	}

	  	+ Debugger tells the kernel it wants to attach to process with PID pid.
	  	+ Kernel checks whether the debugger has permission to attach to the 
	  	  process with the given PID. If not, ptrace returns an error, which 
	  	  we have not checked for here.
	  	+ Kernel sends the target process a SIGSTOP signal, which blocks it
	  	  until the debugger explicitly unblocks the process. Note, however that there is a race between
	  	  when the `ptrace` call (executed in the debugger process) returns and
	  	  when the target process (to which the debugger is
	  	  attached) is blocked,  since blocking requires that the 
	  	  target process transfer control to the
	  	  kernel either in order to execute a syscall or due to preemption.

        * When a debugger attaches to a process, the Operating System
        arranges that: signals sent to the target process stop the
        target process; the debugging process has to acknowledge them
        before the target process can continue execution. Equivalently,
        the target process is **blocked** on signal until the debugging
        process **unblocks** it.

	* How the debugger "synchronizes" with a process.

	  As we noted above, either of the methods for attaching to a
	  process can **return** before the debugger is actually
	  attached. Therefore, the debugger needs a mechanism through
	  which it can wait until it has been attached to the process.
	  In the handout you can see this in the `continue_once_attached()`
	  function.

	  +  Debugger waits for the target using the `waitpid` syscall:

	  	  int status;
	  	  waitpid(pid, &status);

	  	  status returns the new state of process with PID pid. For instance:

	  	  * WIFSTOPPED(status) is true if the process is stopped (as opposed to
	  	    having been terminated).
	  	  * WSTOPSIG(status) returns why the process stopped.

	  + Once `waitpid` indicates that the process is attached, we can
	  resume execution ('continue' in GDB). continue_once_attached()
	  does so by again using ptrace(), this time with first argument
	  PTRACE_CONT.

	* In general when the process is executing, the debugger executes `waitpid`
	  in a loop in order to find out when errors occur, etc. This loop is 
	  similar to the `continue_once_attached` loop in the handout.
    
    * In the next section of the class, we will study how the debugger
    can read/write a process's memory and registers. However, to do so
    the debugger needs to stop the target from executing before
    accessing it, since otherwise the debugger might see inconsistent
    state due to concurrent access. (The concurrent actors being the
    debugger and the target). A process can be stopped for two reasons:

        (1) The process can be stopped because of an error -- for
        example the process accessed an invalid memory address and
        received a segfault (SIGSEGV).

        (2) The debugger can also force the target process to stop by
        asking the OS to send the target process a signal; this is done
        using the kill() system call, which does not mean killing the
        process (although it can meet that); it means raising an OS
        signal. After sending a signal, the debugger must wait (again,
        by calling `waitpid`) until the process has stopped.

        [See handout item 2 "Interrupting the running process".]

        In cases (1) and (2), the kernel will send a signal to the process.
        * Since the debugger is attached to the target process, this signal will block
          the target process.
        * The debugger will be unblocked if it was already waiting for the process
          using `waitpid` and will see a status indicating the fact that process was stopped
          for a signal. If the debugger **was not** already waiting, its next call to waitpid
          will return such a signal immediately.


C. Reading and writing a program's memory, registers

    * In the handout "3. Other ptrace commands" lists a set of low-level
      ptrace commands including ones to
        - Read and write registers.
        - Read and write memory.
        - Get information on why a signal occured.
      Next, going to look at how to use these low level primitives for various
      tasks.

    * Get the address of the instruction that is being executed:
      Use PTRACE_GETREGS to get the processes RIP, which gives us the address.

      Note, we only get an address, not the function name or line of code.

    * Figure out access to which address resulted in a segfault.

    	- Remember when a segfault occured, waitpid stopped due to a
    	SIGSEGV signal. That is if we ran 
    		`status = waitpid(pid);`
    	Then `WIFSTOPPED(status) && WSTOPSIG(status) == SIGSEGV` is true.

    	- The debubgger uses `PTRACE_GETSIGINFO` to read information about
    	the signal that was delivered to the attached process. This signal
    	information is returned in a `siginfo_t` struct:

    	    siginfo_t sinfo;
    	    ptrace(PTRACE_GETSIGINFO, pid, &sinfo, NULL);

    	  The siginfo_t structure (you can find all fields by looking at
    	  the documentation of sigaction(2)) contains a field si_addr 
    	  with the "memory location which caused fault", which is what
    	  we are looking for:

    	    siginfo_t sinfo;
    	    ptrace(PTRACE_GETSIGINFO, pid, &sinfo, NULL);
    	    printf("Faulting address is %p\n", sinfo.si_addr);

    * Often in a debugger you want to get a call stack, aka call trace,
    aka backtrace.  To do so, the debugger needs to reconstruct the call
    stack from registers and stack, using a process called **stack
    unwinding**.

      	- In order to see how the stack is unwound, recall the structure of
      	stack frames from earlier in the class. See page 3 of the handout.
      	
      	- The debugger can use PTRACE_GETREGS to get the current %rbp. This points
      	  to the beginning of the stack frame.

      	- The word before the beginning of the stack frame is the return
      	address. Recall that this is one instruction past the calling
      	instruction. We can read its value using PTRACE_PEEKDATA, and
      	then adjust to get the address of the calling instruction, and
      	thus what function (and line) invoked the called function.
      	
      	- The current value of %rbp is a pointer to the stack, and at
      	that stack address is the frame pointer for the *previous* stack
      	frame (because %rbp is pushed to the stack in the prolog). This
      	allows us to retrieve the address of the previous caller. We can
      	recursively apply this process (following the "linked list" of
      	%rbp values), until we bottom out, i.e. until we observe an RBP
      	value of 0, which the __start function (where execution begins)
      	pushes to the stack to help with stack unwinding. We have now
      	reconstructed (unwound) the call stack.

   So far everything we have covered gives you addresses, and no means
   to translate it into function names or variables, etc. We discuss
   this translation/resolution step next.


D. Getting function names, variable names, values, lines of code, etc.
   
   There are debuggers, e.g., Solaris mdb, that are deliberately designed
   to offer only such information. However, this is not very human friendly.

   Symbol tables and symbol files are how we generally translate addresses
   into human readable form. What do symbol files usually contain:

   + Mappings from address to global variable names.
   + Mappings from address to function names.
     * This is usually provided as a set of extents. For instance 
            0xa0000 - 0xa2200: main
       For each function, symbol tables also include information mapping
       stack offset (from the frame pointer) to local variable name.
       For instance:
            0xa0000 - 0xa2200: main
                offset 0: argc
                offset 1: *argv
                ...
        [for another example, take a look at our class 2:
        handout01.pdf, as.txt. At compile time, the compiler "knows"
        that (for example) a given variable's address is %rbp - 8. ]
        
       Can use this information to convert addresses in the backtrace to
       function names, print values for variables, etc.
    
    + Mappings from addresses to source file names and line numbers.

    Of course compiler optimizations can make this harder. For example:
       + optimized code might never write a variable's value to a stack location.
       + a single stack slot might be used for multiple variables.
       + ...
    Symbols are best efforts, and in practice debuggers cannot always resolve
    names to values due to these problems.

    For context: when you compile your code with the "-g" flag, the
    compiler is embedding in the binary the information above: symbol
    table, offsets, function names, and so forth. Then, when gdb
    actually runs on the binary, it reads in this information.

    [This is partly why debuggers like mdb are preferred by some.]

E. Single-stepping

    * Single step: In gdb the `step` command can be used to execute a single
      instruction in the program being debugged.

      The PTRACE_SINGLESTEP command (line 2 in handout section 3) single steps
      the process being debugged.

      What happens under the hood is the following: the OS sets the TF
      bit in the RFLAGS/EFLAGS register (bit 8), which has the processor
      generate a debug interrupt (INT 1) after a single instruction is
      executed.  The TF bit (along with several other bits in there) is
      something that is reserved for system software, and should be
      manipulated in ring 0 (kernel mode).  One emulate
      single-stepping using the breakpoint mechanism described below,
      but using hardware is easier.

F. Breakpoints
   
    * Breakpoints set with the `break` command in gdb pause execution **before**
      a process being debugged is about to execute an instruction at a particular
      address.
       + In reality you often use `break` with function names, filename : line number,
       etc. Rely on symbols to translate human readable values to an address in the
       program's text section.

    So: how do breakpoints actually work??

    * A naive way for a debugger to implement breakpoints would be to **single-step**
    through the program, examine RIP after each step and stop whenever the desired address
    is reached.
       + This is very slow: each single step and read from a register involves a system call,
       which means about 1 microsecond per step. This is equivalent to executing your program
       on a 1MHz processor.

    * Sketch of a better performing solution:
        + Debugger sets it up so program generates an interrupt whenever it arrives at an
          address with a breakpoint.
        + Kernel translates interrupt to a signal, delivers signal to the _target_ process.
          (Recall, from user-mode threading: signals are just like user mode interrupts.)
        + ptrace semantics mean that the target process is stopped, and waitpid() at the debugger
          receives the signal number as a part of the status.
      Need to address several questions in order to use this technique.

    * Adding a breakpoint given an address
        + The debugger uses PTRACE_PEEKDATA to read the current instruction at the address.
          The debugger needs to save the original instruction in order to continue from 
          a breakpoint.
        + The debugger then uses PTRACE_POKEDATA to change the code at the address so it
        generates an interrupt.
        + The breakpoint is now set, the debugger can resume the process when desired.

    * How to generate an interrupt, and what interrupt.
        + Processor provides an instruction `int` that generates an interrupt.
        + By convention on Intel x86 and AMD64 interrupt 3 is reserved for breakpoints.
          (This is a convention dictated by Intel.)
        + The `int 3` instructions can be encoded in 1-byte: `int 3` can be encoded in 
          two ways on x86/AMD64: `0xcc` and `0xcd 0x03`. The latter is how all other `int` 
          instructions are encoded, e.g., `int 5` is `0xcd 0x05`. `0xcc` is a special 
          encoding Intel provides just for `int 3`.
          
          Being byte-length is important since x86 and AMD64 instructions are variable length.
          The shortest instruction is 1 byte, the longest is 15 bytes. A byte length instruction
          can substitute for any of these.

    * Hitting a breakpoint: What happens when the program executes `int 3`
        + When the program executes `int 3`, control gets transferred to the kernel,
          calls aptly named `do_int3` function.
        + Kernel marks the process to deliver a SIGTRAP signal.
        + Since a debugger is attached, the kernel blocks the target process as a result of
        SIGTRAP and the debugger can use waitpid() to observe that breakpoint has been hit.
    
    * Determining which breakpoint was hit
      + A program might have many active breakpoints at a time. To determine
       which breakpoint was hit, the debugger reads RIP to determine where the
       program is and then searches through the list of breakpoints.

    * Continuing from a breakpoint
      Usually when one uses the `continue` command on GDB, the intention is
      to continue execution **without** disabling the breakpoint that has been
      hit.

      Continuing is however complicated by the fact that when setting a breakpoint
      the debugger modified the program. To continue the debugger must undo the change,
      execute the original program and then change the program back in order to restore
      the breakpoint. This is done as follows:

      + First, the debugger uses ptrace(PTRACE_POKEDATA,...) to restore the original instruction
       to the address. The original instruction is recorded as a part of the breakpoint information.
      + Second, the debugger single-steps the target process, executing the restored instruction
      and returning control back to the debugger
      + Finally, the debugger changes the program code by writing 'int 3' (0xcc) to the address,
        re-enabling the breakpoint.

     [Note: This technique is a bit complicated for multithreaded applications since all
     threads share the same text (code) region.]

     * Other kinds of breakpoints: Sometimes it is useful to break whenever a
     process is about to make a syscall. This is a common enough occurrence that
     `ptrace` provides a special `PTRACE_SYSCALL` command that breaks on syscalls.

G. Watchpoints

    * Watchpoints set with the `rwatch <address>` and `awatch <address>` commands in GDB stop
    execution whenever the specified memory address is read (rwatch) or accessed (either read
    or written).
        + Invaluable when debugging memory corruption.

    * How to implement watchpoints?
        + Single step implementation
        + Page fault based implementation
        + Hardware assisted.

    * Single-step: This looks much like the breakpoint case, except the
    debugger also needs a way to decode instructions (to see whether a
    given MOV or LEA accesses the watched address, as the address is
    either explicitly or implicitly part of the instruction). GDB and
    others use single-stepping when necessary.

    * Page-fault based: The idea here is simple: the debugger asks the kernel to mark
      a page in the process as inaccessible,
        + Done with a combination of mprotect(), and other calls. We won't go into details.
        + Processor will generate SIGSEGV whenever target process accesses the watched address. Accesses to this
        page will generate a SEGFAULT, blocking the process, The debugger can then
        retrieve the signal using waitpid and signal information using PTRACE_GETSIGINFO.
        + When continuing, debugger uses the same trick as for breakpoints: it changes page 
          access bits, single steps, and then removes access to the page.
    
    * Problem with page-fault based approaches: rwatch and awatch can be used to watch addresses
    at byte, 2 byte, 4 byte or 8 byte granularity. Page faults occur at page granularity.
        + Potentially many needless page faults.
        + Slows things down.

    * Hardware assisted:
        + On Intel (also ARM, Power-PC, etc.) hardware provides support for watchpoints.
        + On Intel there are:
            - Four watchpoint registers DR0 -- DR3: each register contains a virtual address.
              The processor will generate an interrupt whenever the program accesses one
              these addresses. These registers specify the start of the memory area that the
              processor is watching.
            - One control register DR7, which controls a few things:
                * the size of each memory area being watched. This can be 1, 2, 4 or 8 bytes, and any
                access to a virtual memory address with [start, start+length] will result in an interrupt.
            - Another control register DR6 that records why a debug interrupt occurred.
                * Both which register's address triggered the interrupt and the type of
                  access performed.
            - DR0 to DR3, DR6 and DR7 can only be read or written to from kernel.
        + Using Intel's hardware watches
            - GDB can manipulate these registers using PTRACE_PEEKUSER and PTRACE_POKEUSER.
                * Since ptrace is a syscall, there is no permission issue with using PTRACE
                to read or write from these registers.
            - When the program accesses a watched address, the processor generates *Interrupt 1* (by convention
              the debug interrupt).
            - The kernel turns interrupt 1 into a SIGTRAP (this is the same as what was
              used for breakpoints).
              * The value of DR6 is included in siginfo_t, and the debugger can retrieve
              this using PTRACE_GETSIGINFO.
        + Hardware-assisted watchpoints are very efficient, but there are very few of them,
          which limits their use.
    
    * Reality: GDB uses a combination of all three techniques, with single stepping serving
      as the ultimate fallback, and virtual memory techniques used only when a lot of adjacent
      memory addresses are monitored. 
      Implication: Recommended to not have more than a few (< 4) watchpoints at a time.


H. Considerations for multithreading

    [didn't cover this]

    * So far we have not considered multiple threads. It gets a bit more complicated. Why?    

    + In Linux every pthread is backed by a kernel thread. For historic
    reasons each kernel thread has its own PID. So really ptrace() can
    target individual threads, not just processes. But also means that
    the debugger needs to attach to all threads individually.

    + A signal sent to a process can be delivered to *any* thread
    belonging to the process, and the kernel chooses the
    signal-receiving thread non-deterministically. This means that only
    one of many threads in the application stops. Consequences: 
        
        + Any memory accesses from the debugger runs the risk of causing races.
          [However, register accesses to the stopped thread are safe: why?]

        + Solution: whenever any thread is paused, debugger uses kill(2)
        to pause all other threads in the process before it does
        anything. (The kill(2) can be delivered to a chosen thread.)


[thanks to Aurojit Panda for this content]