Class 14 CS 480-008 24 March 2016 On the board ------------ 1. Last time 2. EXE Operation Constraint solver Mechanics Applicability/coverage Evaluation Details and discussion 3. concolic testing 4. lab notes --------------------------------------------------------------------------- 1. Last time --bug finding and program correctness --symbolic execution intuition: each program run is much longer than it would be if it were executed normally (because there are extra checks, and frequent calls into a run-time) *but* it's a fairly systematic way of gaining code coverage so holding coverage equal, total time spent running through test cases we expect to be much less, versus random fuzzing, symbolic execution not only explores tricky corner cases but also produces the inputs that trigger those cases. --EXE Clarify: when symbolic executor encounters "if": --asks constraint solver if, in the context of the current history, as captured by the current pc, the "if" condition can be satisfied. if yes, create pc that forces "if" to evaluate to true --ask constraint solver the same question, but this time about whether the "else" condition is ever satisfied. if yes, create pc that forces "if" to evaluate to false. --if both questions have positive answers, fork for one of the branches. example: 1. read x, y 2. if x > y: 3. x = y 4. if x < y: 5. x = x + 1 6. assert(x + y == 7) NOTE: line 6 expands into: 6a. if x + y == 7: 6b. error() The idea here is that the programmer knows that x + y should never equal 7, or else something bad happened. Okay, can that line be reached? 2. EXE A. Operation, continued EXE is a C-to-C translator -- it transforms C code, then compiles w/ gcc can handle all of C except floating point state: table indicating which memory ranges are symbolic (3.2) symbolic value for each byte of symbolic memory path constraint 1. EXE adds code to every assignment, expression, and branch if any argument symbolic, mark result symbolic, record sym value if all arguments concrete, execute faster ordinary operation 2. fork() at each branch add if-condition constraint (or "not") to pc in each process Figure 9 has a fragment of a real example packet filter this is what tcpdump and many other network monitoring apps use user (attacker) supplies an interpreted filter in a simple language kernel interprets filter to decide whether user wants to see each packet we're worried about evil user supplying filter that tricks the kernel Figure 9 is called when filter wants to read "len" bytes at "offset" there is code to check that filter isn't reading beyond the end of the packet what is the problem? how does EXE spot it? B. Constraint solving Detour: what is SAT? what does it mean for a logical formula to be satisfiable or not? variables take T/F values. often combined into CNF (conjunctive normal form): (Z1) (Z1 OR Z2) AND (~Z1 OR Z2) AND (Z1 OR ~Z2) AND (~Z1 OR ~Z2) What's a SAT solver? Takes a huge SAT instance, and identifies a satisfying assignment. Big search problem. The problem in general is NP-complete. Thus, if P!=NP, then the SAT solver, in the *general* case, has a super-polynomial amount of work to do. But in practice, the types of SAT formulas that people try to solve can be solved much faster. (lots of progress in SAT solvers.) Bedrock algorithm: DPLL (which was developed at NYU many decades ago!) EXE's constraint solver, STP, leverages this fact. What is a constraint solver? this is the really hard part of symbolic execution constraint solver solves sets of equations easy: x + y = 10 AND x = y hard: "900 = x*x"; this requires a trick; STP knows many tricks too hard: "10 = crypto_hash(x)"; this will time out. (SAT solver computationally bound just like everything else.) The constraint solver modifies its equations, applies simplifications, etc., and then ultimately hands them to a SAT solver In the context of EXE, the SAT solver is getting a big logical formula, in which the literals represent the bits of program variables and memory contents. arrays can be tough for a constraint solver EXE turns many C constructs into arrays (strings, ptrs, structs?) s[c] -- concrete index lets STP treat s[c] as a specific symbolic value this is the easiest case -- and the most common e.g. looping over an input string c[s] -- could refer to any element, since s is symbolic equivalent to a big disjunction (c[0] or c[1] or ...) *p -- if p is symbolic, which array? i.e. which disjunction? very slow, so optimizations are critical solver knows a lot about arrays (3.3) EXE is careful about what it asks the solver to do and how it executes: ordinary concrete operations/operands when possible don't bother with if-branch if no solution cache+share constraint solutions (4.1) solve and cache independent constraint fragments (4.2) C. Mechanics of EXE what if the constraint solver times out? if solving at termination (e.g. error()) -- print nothing if solving at division/dereference -- assume safe if solving at "if" -- I don't know, maybe continue on both paths how does EXE handle all those fork()ed processes? each contacts "search server" and waits which process should search server allow to run? depth-first search? pro: executes deep into program con: can get stuck in loops with symbolic bounds thus may never execute many lines of code breadth-first search? pro: doesn't get stuck, since tries many paths a little bit con: may never get very far into the program EXE search server uses "best-first" heuristic: line of code that's been run the fewest times (much like breadth-first) use DFS on that process and children "for a while" D. Applicability/coverage Can EXE find all bugs? no: EXE doesn't know much about what a bug is it knows about crashes and asserts but not logic bugs no: because time is finite: --STP might run out of time before finding a solution. Some input could cause an assert to fail, but STP cannot find it. --EXE may not explore all paths there may be a vast # of paths, programmer may give up before EXE tries them all no: because there are things EXE doesn't track floating-point syscalls: can't do open(symbolic-file-name) Resync: does symbolic execution "try all values"? Answer: no! It captures all *paths*, by consolidating information (handling all ways to tax one path at once). This is the purpose of executing symbolically. What bugs does EXE find? How can we be sure it's systematic? Can EXE exhaustively test input in the sense of providing all inputs that have an effect on the program? --In principle, it can --But it doesn't do this by enumerating inputs, rather by finding *paths* (See point above.) E. Evaluation * does EXE find real bugs? * how fast? EXE finds real bugs in smallish UNIX utility code packet filter vs evil filters udhcpd vs evil packets pcre (perl compatible regular expressions) vs evil regular expressions kernel file system vs corrupt file system disk images impressive -- real C programs, real bugs! mostly buffer overflow / illegal memory references these are errors EXE can find w/o programmer help would take more programmer help to find application-specific bugs e.g. missing permission checks how fast? Table 2 gives run-time for above programs (bpf, udhcpd, pcre) tens of minutes -- not so bad but complexity might be exponential in program size... and limitations in input space: they need to bound packet length and filter length, for example (cannot find bugs that would be stressed by input lengths beyond the fixed-size that they choose.) F. Some more details Pointer-to-pointer issue Basic reason: STP understands *arrays*. EXE keeps constraints under control by treating each array separately (otherwise, all of the constraints for all of the arrays would "interact", and blow up). For each pointer, EXE knows what array it can point to. EXE learns this because it's tracking the assignment of all pointer values. But what if EXE reads from an array, and casts the read value to a pointer? This is a pointer to a pointer situation. The trouble is that EXE didn't know that the corresponding *write* affected which array the read value was pointing to. So on a read of a value, which is then used as a pointer, EXE/STP does not know what array that symbolic value refers to. So the options are: --EXE has to assume that any array can be referenced (constraints blow up) --STP has to develop a model of memory (constraints again blow up) --they have to decide not to handle the pointer-to-a-pointer case The issue is explained (and handled) somewhat better in the authors' follow-up work, KLEE (Cadar et al., Proc. OSDI, 2008): https://www.usenix.org/legacy/event/osdi08/tech/full_papers/cadar/cadar.pdf "As with other dangerous operations, load and store instructions generate checks: in this case to check that the address is in-bounds of a valid memory object. However, load and store operations present an additional complication. The most straightforward representation of the memory used by checked code would be a flat byte array. In this case, loads and stores would simply map to array read and write expressions respectively. Unfortunately, our constraint solver STP would almost never be able to solve the resultant constraints (and neither would the other constraint solvers we know of). Thus, as in EXE, KLEE maps every memory object in the checked code to a distinct STP array ... This representation dramatically improves performance since it lets STP ignore all arrays not referenced by a given expression. Many operations (such as bound checks or object-level copy-on-write) require object-specific information. If a pointer can refer to many objects, these operations become difficult to perform. For simplicity, KLEE sidesteps this problem as follows. When a dereferenced pointer p can refer to N objects, KLEE clones the current state N times. In each state it constrains p to be within bounds of its respective object and then performs the appropriate read or write operation. Although this method can be expensive for pointers with large points-to sets, most programs we have tested only use symbolic pointers that refer to a single object, and KLEE is well-optimized for this case." Loops with symbolic variables as bounds idea: get loop to execute 1 time, then 2 times, then 3 times. etc. Because each check of a loop bound is like an "if". may be easier to visualize this by imagining loop is unrolled. Other student questions Lots of people asked why the system treats memory as untyped. [A: because this is what helps STP notice any memory errors, including those resulting from casting, misuse of individual bytes, etc.] People asked, "Why not floating-point?" (One answer: to find solutions to constraints efficiently, STP would have to "know about" floating point.) G. Discussion 3. Concolic execution Lab 4 uses "concolic execution", a variant of symbolic execution Motivation: what if there are functions that you can't look inside? as when layering symbolic execution on top of a complex language. In lab4, want to add symbolic execution to Python without modifying Python. example: read x, u ok = DBlookup(u) if x == "GET": if ok == True: ... else ... if we don't have a symbolic DB, we cannot execute this symbolically Concolic execution execute with concrete inputs -- e.g. empty string so we can execute the DBlookup in the example it's an ordinary concrete (non-symbolic) execution While executing: --record symbolic values of variables derived from inputs when possible --maintain path constraint of executed path just one path, since concrete inputs only explore one side of each "if" After execution finishes: --negate an "if" condition in the pc [path constraint] --solve modified pc (up to that "if"), yielding new concrete inputs --re-execute on new concrete inputs --new execution will follow a different path than first Keep re-executing with different "if" conditions negated eventually can drive execution down lots of different paths and perhaps find inputs that trigger assertion failures Advantages: + much easier to add to a language like Python "proxy" concolic data types replace int, string + an easy way to tolerate opaque functions - will miss some constraints, e.g. relation of ok to u thus may not be able to execute down some "if" branches 4. Lab notes --what is the correct way to add 1 to a number in C? int x; ... x = x + 1; ? --what about averaging? --lab uses concolic execution. python variable types in which there is a (symbol, value) pair. --lab uses an SMT solver SMT vs SAT: SMT capable of reasoning about "higher-level" objects than boolean values. Leads to efficiency. NYU's Clark Barrett is a leader in SMT solvers. --lab advice: part of your job is to figure out what is going on. read the code! strategy: start from the functions that invoke the tester concepts/objects: AST: what is this? the solver: where and how is this invoked? concolic_test(): this function runs another function in an environment, and creates path constraints. your job is to write the core of this function. concolic_bool: what is this? advice: stop between exercises 2 and 3. make sure you understand the entire contents of fuzzy.py, and its interplay with the "check" functions. only when it "clicks" should you start filling in concolic_test() Conclusion: symbolic execution is powerful and productive... ... but not so practical as programs grow large it's a promising research area as well as a useful tool --------------------------------------------------------------------------- Acknowledgment: The staff of 6.858