Class 6 (Make-up)
CS 480-008
12 February 2016

On the board
------------

1. Last time
2. Heap smashing
3. BROP
4. Defenses against BROP

---------------------------------------------------------------------------

1. Last time

    Buffer overflow: defenses and attacks
        canaries
        NX stack, W^X
        ASLR

    Exercise: go through each attack and think through the threat model
    (for example, does the attacker have access to the source code? to
    the binary? is the attacker concerned with making few tries? etc.)

    Canaries can be defeated by:

        --overwriting something other than the return address
            --function pointer
            --global variable 

        --heap smashing (see below), as a way of indirectly
          writing to the return address on the stack

        --stack reading, as a way of *directly* fooling the
        canary-checking code (again, see below)

    Context:

        --Defenses are still not perfect, but the skill and resources
        required of an attacker have gone way up.

        --Bugs/vulnerabilities are probably inevitable. So we want to
        structure applications to limit damage. This will be our concern
        in privilege separation and lab 3.

2. Heap-smashing

    --say we have an overflow of heap-allocated buffer. example:

          foo() {
            char *p = malloc(16);
            gets(p);
          }

          Can attacker predict what is after p in memory?

    --simplified version of a real attack:

        free list: array in memory, with doubly-linked list structure

        prev
        next
        data
        ....
        ....
        prev
        next
        data
        ....
        ....
        prev
        next
        data


        So: if the attacker overflows a malloc()ed block, the attacker can
        modify the next and prev pointers in the next block.

        Meanwhile, here's an excerpt from malloc()'s logic when it
        allocates a block:
            b = choose a free block
            b->next->prev = b->prev;
            b->prev->next = b->next;

        So: suppose attacker writes x y to start of next block.
            Call next block b. So now:
                b->prev = x
                b->next = y

            Assume b is chosen by next malloc().

            When 
                b->next->prev = b->prev executes, this is effectively:
                    *y = x

                [because prev is the first item in the struct, so 
                    (b->next)->prev <-- foo becomes 
                        *(b->next + 0) <-- foo

                
                b->prev->next = b->next becomes:
                    *(x + 4) = y


            This means that an attacker chosen value can be written to
            any memory location!

            Thus writing an attacker-chosen value to any memory location!

          If attacker can guess address of saved return PC,
            and can guess address of the buffer being overflowed,
            can load instructions into the buffer and cause
            PC to point to injected instructions.

          Similarly for *any* function pointer with predictable address.

          Q: how could the attacker predict such addresses?

          Real attacks have to be more complex
          
          For details see:
            http://www.win.tue.nl/~aeb/linux/hh/hh-11.html
            http://phrack.org/issues/57/8.html
            http://phrack.org/issues/57/9.html#article

3. BROP

    Many of the attacks that we have seen require the attacker to have
    access to the source code and, ideally, the binary.

    Meanwhile, the binary varies a lot, depending on compiler, OS, etc.

    What if the attacker doesn't have access to the binary?

    ASK: what's the threat model in this paper?
        (attacker has network access to a server.
         attacker can tell whether a server crashes.
         server restarts on crash but DOES NOT rerandomize its canaries
            and ASLR on crash
         server can be compiled with PIE (which makes the attacker's 
            life harder, because it means that ASLR is applied to the
            entire address space)
        )

        targets servers with stack buffer overflow (as opposed to heap
        buffer overflow, as above)

    (Figure 4 is a nice summary of how the authors extend the state of
    the art)

    Game board: Attacker does the following:

        (A) exploit the stack buffer overflow vulnerability to mount a
        stack reading attack, to defeat canaries and ASLR
        
        (B) identify gadgets, without seeing the executable

        (C) exploit that same vulernability to construct a ROP chain to
        write the binary itself over the socket, possibly in small
        pieces

        (D) on the attacker's computer, analyze the binary to scan for more
        gadgets and get complete information about the randomization

        (E) exploit the vulnerability yet again, this time with a ROP
        chain that spawns a shell

        The paper's contribution and focus is (A)-(C); steps (D) and
        (E) rely on known techniques. We will cover (A)-(C) below.

    A. Stack reading
    
       --How does this work?

       char canary[8];
       for(int i = 1; i <= 8; i++){  //For each canary byte . . .
           for(char c = 0; c < 256; c++){  //. . . guess the value.
               canary[i-1] = c;
               server_crashed = try_i_byte_overflow(i, canary);
               if(!server_crashed){
                   //We've discovered i-th byte of the
                   //the canary!
                   break;
               }
           }
       }

        (This works because of the assumption that the server is not
        rerandomizing after crashes.)

        How many tries does it take to guess the canary?

        128 on average for the inner loop.
          8 for the outer loop (on a 64-bit system, the canary is 8 bytes)
        ----
        1024 on average. 

        This isn't many. And it's much faster than "brute force", which
        would mean 2^{27} guesses on average (because there are 28
        "free" bits; see Table I).
   
        --This technique was known. The authors' contribution in this
        piece is extending it to read out the frame pointer and saved
        return address. They do this by writing bytes into the return
        address until the program does not crash. Once they have done
        that, they know an address where code lives, and they have
        partially defeated ASLR.

            Because the top two bytes of code addresses are always zero
            and because "the third byte is 0x7f for libraries and the
            stack," they only need to try this many times:

        128 on average for the inner loop
          5 for the outer loop (only 5 vary)
        ---
        640 on average


        --Note that they also want to read the frame pointer. How can
        they read it, and why do they want to?

            The how: Because if the frame pointer isn't set correctly,
            the next stack frame (the higher one) will be set up wrong.

            The why: Because knowing the saved frame pointer gives them
            information, and helps validate that the attack is working
        
        --If stack reading fails, i.e., if the program crashes for
        *every* byte value tried (or keeps going for more than one value
        in the canary-reading phase), then the attacker gives up: this
        isn't a stack buffer overflow vulnerability.

 
    B. BROP

        Once they know approximately where code lives, they go in search
        of gadgets...

        Step 1: find a stop gadget

          --A stop gadget is a return address that points to code
           that will hang the program (or more generally, have some
           identifiable effect), but not crash it (which is also
           identifiable, because the remote OS closes the socket)

          --Once the attacker can defeat canaries, the attacker can
          overwrite the function's return address and start guessing
          locations for a stop gadget. If the client network connection
          suddenly closes, the guessed address was not a stop gadget. If
          the connection stays open, the gadget is a stop gadget.

            [Q: why doesn't hanging the server stop the attack?
             A: 
                --maybe there is a daemon that checks for liveness and 
                restarts the app if it's not doing anything useful
                --or maybe server blocks on accept, and creates new
                thread for each newly returned socket from accept
                    (if the attacker suspects this is the case, would
                    want to crash the process in a second connection, to
                    prevent the case that there is a limit on the number
                    of threads)
                --or stop gadget doesn't have to literally "stop"; just
                has to be something that can be detected
                (see last paragraph of VIII-B; page 6, and VIII-J)
            ]

        Step 2: find gadgets that pop stack entries

          --start guessing addresses for such gadgets

          --place the guessed address (g1) in the location of the return
          address, followed by one or more stop gadgets

          --if the program crashes, the guessed address was no good; if
          the program stops, the guessed address was a gadget

          --this technique lets them figure out *how many* entries are
          popped by the gadget (by fiddling with the locations of the
          stop/crash gadgets; see VIII-C)

          --Example:
             say there's a gadget at address 0x400000 (=probe) that does
                pop rdi; ret

             if the attacker notices that
                <probe> <stop> <crash> <crash>
             results in a stop, then the attacker knows that its gadget
             does not pop the stack.

             if the attacker notices that
                <probe> <crash> <stop> <crash> <crash>....
             results in a stop, then the attacker knows that the gadget
             pops one element from the stack. so the attacker knows that 
             probe is the address of a gadget with the form:
                
                 pop REG; ret

             but the attacker does not know the value of REG
            

        Step 3: Figure out exactly what the gadgets do
    
            Two options
            (a) "first principles"
            (b) optimized version that looks for the BROP gadget

            For (b), the attacker can locate the gadget based on the
            probe address not leading to a crash if the stack is
            arranged as:
                <probe> <stop> ... <stop> <crash>....<crash>
                            [7 stops]
               
            Once the attacker has the address of the BROP gadget, he
            or she can control %rdi and %rsi, which are the first two
            arguments to a system call (see part C below)

            For (a), it's even more devious:
                build mega-gadget that chains together all pop
                instructions, hoping that one of them pops into %rax.
                Give each gadget an "argument" that is the pause
                syscall.

                Then place the guessed address of syscall()
                
                If there is a pause, the attacker now has the address of
                syscall()

                Once the attacker has *that* address, it tries each of
                its pop gadgets one-by-one, loading the number of the
                pause syscall into a (currently known) register, and
                then following that on the stack with the address of
                syscall().

                If there is a pause, then the attacker knows that the
                given gadget-under-test is the one that pops into %rax
                (since %rax holds the syscall number).

                There are related tricks for learning what other gadgets
                do.


    C. Invoke write()

        See VIIIA: they need 5 gadgets (4, under the "call write"
        optimization).

        They need:
           pop rdi; ret (socket)
           pop rsi; ret (buffer)
           pop rdx; ret (length)
           pop rax; ret (write syscall number)
           syscall
        or
           pop rdi; ret (socket)
           pop rsi; ret (buffer)
           pop rdx; ret (length)
           call write

        Part B above told us how to identify the first three gadgets.
        Finding "pop rdx; ret" is difficult, so the authors use yet
            another trick: treat all of strcmp as a gadget (!!!)

        They also need to guess the socket number, but that's fairly
        easy to do, since Linux restricts processes to 1024
        simultaneously open file descriptors, and new file descriptors
        have to be the lowest one available (so guessing a small file
        descriptor works well in practice).

        To test whether we've guessed the correct file descriptor,
        simply try the write and see if we receive anything! 
 
        Once there's a ROP chain that invokes write() with the correct
        arguments, the attacker starts getting the binary over the
        socket (the 'buffer' is just the address of the program's .text
        segment, which by now the attacker has learned).

    D. Some loose ends:

        What's going on in Figure 13? How does the pointer arithmetic
        allow them to bypass the canary?
            
            (choose randomLen > RAN_LEN; then
                the read starts at a negative offset...and whacks the
                return address in the stack frame of input.read)

        
4. Defenses against BROP

    --Rerandomize after crash!
        
        Note that Windows is less vulnerable to this attack because it
        has no fork() call, and hence rerandomizes after crashes.

        But even Windows rerandomizes *system* libraries only when the
        computer boots. Creates attack surface

    --Or even better: generate new canary randomly before entering
    functions.
        
        Still, if the attack can circumvent the canary (see the attack
        on yaSSL; Figure 13), then randomizing the canary won't be
        effective.

        (But randomizing the address space still will be helpful.)

    --After a crash, delay the fork. 
        +: slows down attacker
        -: attacker can now conduct denial-of-service (DoS)

    --Extreme version of that: after a crash, don't restart at all!
        +: defeats BROP
        -: DoS again

--------------------
   
More info on ROP and x86 calling conventions:
http://codearcana.com/posts/2013/05/21/a-brief-introduction-to-x86-calling-conventions.html
http://codearcana.com/posts/2013/05/28/introduction-to-return-oriented-programming-rop.html
http://www.slideshare.net/saumilshah/dive-into-rop-a-quick-introduction-to-return-oriented-programming
https://cseweb.ucsd.edu/~hovav/dist/rop.pdf


--------------------

Acknowledgment: MIT 6.858 staff (for some of these notes and refs)