Class 14
CS 480-008
24 March 2016

On the board
------------

1. Last time
2. EXE
    Operation
    Constraint solver
    Mechanics
    Applicability/coverage
    Evaluation
    Details and discussion
3. concolic testing
4. lab notes

---------------------------------------------------------------------------

1. Last time

    --bug finding and program correctness

    --symbolic execution

        intuition: each program run is much longer than it would be if
        it were executed normally (because there are extra checks, and
        frequent calls into a run-time) *but* it's a fairly systematic
        way of gaining code coverage

        so holding coverage equal, total time spent running through test
        cases we expect to be much less, versus random fuzzing, 
      
        symbolic execution not only explores tricky corner cases but
        also produces the inputs that trigger those cases. 

    --EXE

    Clarify: when symbolic executor encounters "if":
    
        --asks constraint solver if, in the context of the current
        history, as captured by the current pc, the "if" condition can
        be satisfied. if yes, create pc that forces "if" to evaluate to
        true
    
        --ask constraint solver the same question, but this time about
        whether the "else" condition is ever satisfied. if yes, 
        create pc that forces "if" to evaluate to false.

        --if both questions have positive answers, fork for one of the
        branches.

    example:

        1. read x, y
        2. if x > y:
        3.   x = y
        4. if x < y:
        5.   x = x + 1
        6. assert(x + y == 7)

    NOTE: line 6 expands into:
        6a. if x + y == 7:
        6b.   error()

    The idea here is that the programmer knows that x + y should never
    equal 7, or else something bad happened. Okay, can that line be
    reached?


2. EXE

    A. Operation, continued

    EXE is a C-to-C translator -- it transforms C code, then compiles w/ gcc

      can handle all of C except floating point

      state:

        table indicating which memory ranges are symbolic (3.2)
        symbolic value for each byte of symbolic memory
        path constraint

      1. EXE adds code to every assignment, expression, and branch

         if any argument symbolic, mark result symbolic, record sym value
         if all arguments concrete, execute faster ordinary operation

      2. fork() at each branch
         add if-condition constraint (or "not") to pc in each process


    Figure 9 has a fragment of a real example
      packet filter
        this is what tcpdump and many other network monitoring apps use
        user (attacker) supplies an interpreted filter in a simple language
        kernel interprets filter to decide whether user wants to see each packet
        we're worried about evil user supplying filter that tricks the kernel
      Figure 9 is called when filter wants to read "len" bytes at "offset"
        there is code to check that filter isn't reading beyond the end of the packet
        what is the problem?
        how does EXE spot it?


    B. Constraint solving

        Detour: what is SAT? what does it mean for a logical formula to
        be satisfiable or not?
            
            variables take T/F values. often combined into CNF
            (conjunctive normal form):

            (Z1) 

            (Z1 OR Z2) AND (~Z1 OR Z2) AND (Z1 OR ~Z2) AND (~Z1 OR ~Z2)

        What's a SAT solver?

            Takes a huge SAT instance, and identifies a satisfying
            assignment. Big search problem.
            
            The problem in general is NP-complete. Thus, if P!=NP, then
            the SAT solver, in the *general* case, has a
            super-polynomial amount of work to do.

            But in practice, the types of SAT formulas that people try
            to solve can be solved much faster. (lots of progress in SAT
            solvers.)

            Bedrock algorithm: DPLL (which was developed at NYU many
            decades ago!)

        EXE's constraint solver, STP, leverages this fact.

        What is a constraint solver?

          this is the really hard part of symbolic execution

          constraint solver solves sets of equations

          easy: x + y = 10 AND x = y

          hard: "900 = x*x"; this requires a trick; STP knows many tricks

          too hard: "10 = crypto_hash(x)"; this will time out. (SAT
          solver computationally bound just like everything else.)

        The constraint solver modifies its equations, applies
        simplifications, etc., and then ultimately hands them to a SAT
        solver

        In the context of EXE, the SAT solver is getting a big logical
        formula, in which the literals represent the bits of program
        variables and memory contents.

    arrays can be tough for a constraint solver

      EXE turns many C constructs into arrays (strings, ptrs, structs?)

      s[c] -- concrete index lets STP treat s[c] as a specific symbolic value
        this is the easiest case -- and the most common
        e.g. looping over an input string

      c[s] -- could refer to any element, since s is symbolic
        equivalent to a big disjunction (c[0] or c[1] or ...)

      *p -- if p is symbolic, which array? i.e. which disjunction?


    very slow, so optimizations are critical

      solver knows a lot about arrays (3.3)

      EXE is careful about what it asks the solver to do and how it
      executes:

          ordinary concrete operations/operands when possible

          don't bother with if-branch if no solution

          cache+share constraint solutions (4.1)

          solve and cache independent constraint fragments (4.2)

    C. Mechanics of EXE

        what if the constraint solver times out?
              if solving at termination (e.g. error()) -- print nothing
              if solving at division/dereference -- assume safe
              if solving at "if" -- I don't know, maybe continue on both paths

        how does EXE handle all those fork()ed processes?

          each contacts "search server" and waits

          which process should search server allow to run?

          depth-first search?
            pro: executes deep into program
            con: can get stuck in loops with symbolic bounds
                 thus may never execute many lines of code

          breadth-first search?
            pro: doesn't get stuck, since tries many paths a little bit
            con: may never get very far into the program

          EXE search server uses "best-first" heuristic:
            line of code that's been run the fewest times (much like breadth-first)
            use DFS on that process and children "for a while"


    D. Applicability/coverage

        Can EXE find all bugs?

            no: EXE doesn't know much about what a bug is
              it knows about crashes and asserts but not logic bugs

            no: because time is finite:

                --STP might run out of time before finding a solution.
                Some input could cause an assert to fail, but STP cannot
                find it.

                --EXE may not explore all paths

                there may be a vast # of paths, programmer may give up
                before EXE tries them all
            
            no: because there are things EXE doesn't track

                floating-point
                syscalls: can't do open(symbolic-file-name)
          
        Resync: does symbolic execution "try all values"? Answer: no! It
        captures all *paths*, by consolidating information (handling all
        ways to tax one path at once). This is the purpose of executing
        symbolically.
        
        What bugs does EXE find? How can we be sure it's systematic?

        Can EXE exhaustively test input in the sense of providing all
        inputs that have an effect on the program?
            --In principle, it can
            --But it doesn't do this by enumerating inputs, rather by
            finding *paths*  (See point above.)

    E. Evaluation 
        * does EXE find real bugs?
        * how fast?

        EXE finds real bugs in smallish UNIX utility code
          packet filter vs evil filters
          udhcpd vs evil packets
          pcre (perl compatible regular expressions) vs evil regular expressions
          kernel file system vs corrupt file system disk images
          impressive -- real C programs, real bugs!

        mostly buffer overflow / illegal memory references
          these are errors EXE can find w/o programmer help
          would take more programmer help to find application-specific bugs
            e.g. missing permission checks

        how fast?
          Table 2 gives run-time for above programs (bpf, udhcpd, pcre)
          tens of minutes -- not so bad
          but complexity might be exponential in program size...

        and limitations in input space: they need to bound packet length
        and filter length, for example (cannot find bugs that would be
        stressed by input lengths beyond the fixed-size that they
        choose.)

    F. Some more details

        Pointer-to-pointer issue

            Basic reason: STP understands *arrays*. EXE keeps
            constraints under control by treating each array separately
            (otherwise, all of the constraints for all of the arrays
            would "interact", and blow up).

            For each pointer, EXE knows what array it can point to. 
            EXE learns this because it's tracking the assignment of all
            pointer values. 
           
            But what if EXE reads from an array, and casts the read
            value to a pointer? This is a pointer to a pointer
            situation. The trouble is that EXE didn't know that the
            corresponding *write* affected which array the read value
            was pointing to.

            So on a read of a value, which is then used as a pointer,
            EXE/STP does not know what array that symbolic value refers
            to. So the options are:
                --EXE has to assume that any array can be referenced
                (constraints blow up)
                --STP has to develop a model of memory (constraints
                again blow up)
                --they have to decide not to handle the
                pointer-to-a-pointer case

            The issue is explained (and handled) somewhat better in the
            authors' follow-up work, KLEE (Cadar et al., Proc. OSDI,
            2008):
              https://www.usenix.org/legacy/event/osdi08/tech/full_papers/cadar/cadar.pdf

            "As with other dangerous operations, load and store
            instructions generate checks: in this case to check that the
            address is in-bounds of a valid memory object. However, load
            and store operations present an additional complication. The
            most straightforward representation of the memory used by
            checked code would be a flat byte array.  In this case,
            loads and stores would simply map to array read and write
            expressions respectively.  Unfortunately, our constraint
            solver STP would almost never be able to solve the resultant
            constraints (and neither would the other constraint solvers
            we know of). Thus, as in EXE, KLEE maps every memory object
            in the checked code to a distinct STP array ... This
            representation dramatically improves performance since it
            lets STP ignore all arrays not referenced by a given
            expression.

            Many operations (such as bound checks or object-level
            copy-on-write) require object-specific information. If a
            pointer can refer to many objects, these operations become
            difficult to perform. For simplicity, KLEE sidesteps this
            problem as follows. When a dereferenced pointer p can refer
            to N objects, KLEE clones the current state N times. In each
            state it constrains p to be within bounds of its respective
            object and then performs the appropriate read or write
            operation. Although this method can be expensive for
            pointers with large points-to sets, most programs we have
            tested only use symbolic pointers that refer to a single
            object, and KLEE is well-optimized for this case."

    
        Loops with symbolic variables as bounds

            idea: get loop to execute 1 time, then 2 times, then 3
            times. etc. Because each check of a loop bound is like an
            "if".

            may be easier to visualize this by imagining loop is unrolled.

        Other student questions

            Lots of people asked why the system treats memory as
            untyped. [A: because this is what helps STP notice any
            memory errors, including those resulting from casting,
            misuse of individual bytes, etc.]

            People asked, "Why not floating-point?" (One answer: to find
            solutions to constraints efficiently, STP would have to
            "know about" floating point.)
            

    G. Discussion


3. Concolic execution

    Lab 4 uses "concolic execution", a variant of symbolic execution

    Motivation: what if there are functions that you can't look inside?
    as when layering symbolic execution on top of a complex language.

    In lab4, want to add symbolic execution to Python without modifying Python.

      example:
        read x, u
        ok = DBlookup(u)
        if x == "GET":
          if ok == True:
            ...
          else
            ...
      if we don't have a symbolic DB, we cannot execute this symbolically

    
    Concolic execution
      execute with concrete inputs -- e.g. empty string
        so we can execute the DBlookup in the example
        it's an ordinary concrete (non-symbolic) execution

      While executing:
        --record symbolic values of variables derived from inputs
          when possible
        --maintain path constraint of executed path
          just one path, since concrete inputs only explore one side of each "if"

      After execution finishes:
        --negate an "if" condition in the pc [path constraint]
        --solve modified pc (up to that "if"), yielding new concrete inputs
        --re-execute on new concrete inputs
        --new execution will follow a different path than first

      Keep re-executing with different "if" conditions negated
        eventually can drive execution down lots of different paths
        and perhaps find inputs that trigger assertion failures

    Advantages:
    + much easier to add to a language like Python
       "proxy" concolic data types replace int, string
    + an easy way to tolerate opaque functions
    - will miss some constraints, e.g. relation of ok to u
       thus may not be able to execute down some "if" branches

4. Lab notes 

    --what is the correct way to add 1 to a number in C?
    
        int x;
        ...
        x = x + 1;

        ?

    --what about averaging?


    --lab uses concolic execution. python variable types in which there
    is a (symbol, value) pair.

    --lab uses an SMT solver

        SMT vs SAT: SMT capable of reasoning about "higher-level"
        objects than boolean values. Leads to efficiency.
        
        NYU's Clark Barrett is a leader in SMT solvers.

    --lab advice:
        
        part of your job is to figure out what is going on. read the
        code!

        strategy: start from the functions that invoke the tester

        concepts/objects:
            AST: what is this?
            the solver: where and how is this invoked?

        concolic_test(): 

            this function runs another function in an environment, and
            creates path constraints. your job is to write the core of
            this function.

            concolic_bool: what is this?

        advice:
            stop between exercises 2 and 3. make sure you understand the
            entire contents of fuzzy.py, and its interplay with the
            "check" functions.

            only when it "clicks" should you start filling in
            concolic_test()

Conclusion:

  symbolic execution is powerful and productive...

  ... but not so practical as programs grow large

  it's a promising research area as well as a useful tool
---------------------------------------------------------------------------

Acknowledgment: The staff of 6.858