Class 13 CS372H 1 March 2012 On the board ------------ 1. Last time and last week 2. Finish linking and loading 3. SFI --Intro --Details --Discussion --------------------------------------------------------------------------- 1. Last time and last week --last time: Jon's lecture on binary rewriting --last week: --implementation of swtch() --linking --clarification: patch up (using refs) can work with relative addresses. the patched program need not have absolute addresses. 2. Finish linking and loading A. Intro B. What does a process look like in memory? C. What does the assembler do? D. Overview of linking E. Details F. Summary E. Details --variation 1: dynamic linking --link at runtime --example: when someone calls a function called func(), the code that is executed is: void* p = dlopen("func.o", RTLD_LAZY); void (*fp)(void) = dlsym(p, "func"); /* map symbol to address */ fp(); and meanwhile what we had was: void func(void) { puts("hello"); } gcc -c func.c --> func.o so what's going on here is that the "reference" to "func" in the *calling* program gets "resolved" via the call to dlopen. --issues: what happens if the resolution doesn't work? how is behavior different from static linking? where do we get "puts" from? --variation 2: static shared libraries --observation: libc.a (the std C library) is linked into every executable --idea/insight: have one copy on disk, and don't include this code in the executable. --approach: --every program has a "shared library segment" at the same address --every shared library gets a unique range in this segment, and computes where its external definitions will live --linker links program against the library ... (why?) --answer: need to get references right --... _but_ linker does not bring in the actual code --the loader marks the shared library region as unreadable --when process calls into the library code, it faults. an embedded linker then brings in the library code from a known place and maps it in --result: different running programs are sharing code! --variation 3: dynamic shared libraries --variation 2 is a bummer because: --it requires system-wide pre-allocation of address space. --this is clumsy, inconvenient, wasteful. also, what if library gets too big for its space? --solution: --any library can be loaded at any virtual address --need a stub library. why? --(otherwise linker can't actually patch up references) --but now the position of functions can vary, so how can we call them without rerunning the linker at runtime? --answer: layer of indirection! [draw picture] --now, only the GOT (global offset table) needs to be patched up when the program is loaded --can even do the GOT patching up dynamically (since linking all the functions at startup would cost time and be potentially wasteful, e.g., if program uses only some of them) [draw picture] --idea: link function at first call --key point: the GOT is *data* but contains an array of function pointers (another instance of how code and data are the same thing) F. Summary --compiler outputs 1 object file for each source file --problem: this is an incomplete world view --where to put variables and code? how to refer to them? --compiler names definitions symbolically (for instance as "printf"), and then refers to routines and variables by _symbolic_ name --linker --has a global view of everything, which is a powerful lever. --decides where everything lives, finds all references, and updates them. --meets OS interface: indicates to OS what is code, what is data, where is start point, etc. --OS loader --reads object files into memory --allows code sharing and other optimizations --the OS provides an interface for the process to extend its data segment (i.e., to allocate memory) as it is running. the system call is "sbrk()". --so the "load" function does not run only at process invocation. --------------------------------------------------------------------------- Admin notes: when do you want the midterm review: Mon, Tue, Wed of midterm week? --------------------------------------------------------------------------- 3. SFI A. Intro Problem: how to use untrusted code (an "extension") in a trusted program? Intellectual challenge: --need to let code run but somehow control it, without using the normal approach to such control, which is the protections enforced by hardware (specifically page tables, which create the isolated memory view ). Examples --Use untrusted, legacy jpeg codec in Web browser [draw picture of JPEG decoder in browser memory] --Use an untrusted driver in the kernel (e.g. loadable kernel module) Now a classic paper --everyone is trying to do this --most obvious example: the Web, and plugins --here's some context: --SFI (this paper) --> PittSFIeld (SFI for x86) --> Google NativeClient PittSFIeld reference: [http://people.csail.mit.edu/smcc/projects/pittsfield/] Evaluating SFI for a CISC Architecture. Stephen McCamant and Greg Morrisett. In 15th USENIX Security Symposium, (Vancouver, BC, Canada), August 2-4, 2006 NativeClient reference: [http://research.google.com/pubs/archive/34913.pdf] Native Client: A Sandbox for Portable, Untrusted x86 Native Code, Bennet Yee, David Sehr, Gregory Dardyk, Brad Chen, Robert Muth, Tavis Ormandy, Shiki Okasaka, Neha Narula, Nicholas Fullagar . 30th IEEE Symposium on Security & Privacy, May 17-20, 2009. --other related work --Xax (by Jon Howell et al.) and NativeClient have the identical motivation but different realizations --vx32 (related work) --different approach to sandboxing but similar motivation to the works above http://pdos.csail.mit.edu/papers/vx32:usenix08.pdf The paper we're discussing interestingly missed the Web.... ...but is still a classic paper --at the time, the audience may have been more worried about performance... --but now, everyone thinks, "yeah, of course we want that", and performance may be secondary. (maybe.) [defn: "trusted" part of a system is the part of a system assumed to be correct.] What bad things can the extension do? --Write trusted data or code --Read private data from trusted code's memory --Execute privileged instructions --Call trusted functions with bad arguments --Jump to unexpected trusted location (e.g. not start of fn) --Contain exploitable security flaws that allow others to do the above What is it probably OK for an extension to do? --Read/write its own memory --Execute its own code --Call *particular* functions in trusted code Possible solutions/approaches: --Run extension in its own address space with minimal privileges. Rely on hardware and operating system protection mechanism. --Restrict the language in which the extension is written: --Packet filter language. Language is limited in its capabilities, and it easy to guarantee "safe" execution. --Type-safe language. Language runtime and compiler guarantee "safe" execution. --What's the disadvantages to the above? --own address space: expensive context switches --safe language: restricts the language that people can use so doesn't work for lots of common and legacy code --Software-based sandboxing --the big idea: isolate code *within* the same address space, thereby achieving isolation without context switches --these ideas are now everywhere. This paper was first, or one of the first. Elements: --Sandboxer. A compiler or binary-rewriter sandboxes all unsafe instructions in an extension by inserting additional instructions. For example, every indirect store is preceded by a few instructions that compute and check the target of the store at runtime. --Verifier. --When the extension is loaded in the trusted program, the verifier checks if the extension is appropriately sandboxed (e.g., all direct stores/calls refer to extension's memory, all indirect stores/calls sandboxed, no privileged instructions). --If not, the extension is rejected. --If yes, the extension is loaded, and can run. --If the extension runs, that means that the sandboxing of unsafe instructions ensures that unsafe instructions are used in a safe way. --The verifier must be trusted, but the sandboxer doesn't have to be. Meaning: the compiler can screw up and as long as the verifier is correct, it doesn't matter. --We can do without the verifier, if the host can establish that the extension has been sandboxed by a trusted sandboxer. --You can think of sandboxing as a software version of the memory protection you get with page-tables or segments. B. Details of SFI --Implemented for RISC processors --simplifies SFI. why? (two reasons) --because every instruction is 32 bits wide, and because one can only jump/call to 32-bit aligned targets, so one can investigate every possible entry point --big register set; makes it easy to use "dedicated registers". --Approach: 0x101f..........f code 0x1010..........0 0x100f..........f data 0x100000000000000 Firefox/Chrome/etc. Code Seg ID = 0x101 Data Seg ID = 0x100 --[draw the picture above.] the key point is that because the verifier enforces that the sandboxed code always uses particular upper bits, the code is dealing with a "sandboxed" region of memory. --why are there two segments, one for code and the other for data, heap and stack? --answer: to prevent application from modifying its own code --verifier can check: --that direct calls/jumps and stores refer to addresses inside the segment (since such instructions have the address embedded within them). --PC-relative branches --privileged instructions --The verifier probably has a table of legal call targets that lie in trusted code. --hard part: indirect jump/calls (i.e., jump to the contents of this register, or store to the address given by this register) [on x86, this is an instruction like "jmp *%ecx"] --first cut: verifier enforces segment matching: Suppose the original unsafe instruction is: STORE R1, R0 (i.e. write R1 to Mem[R0]) Here's how we could sandbox the STORE: Ra <- R0 Rb <- Ra >> Rc // Rb = segment ID of target CMP Rb, Rd // Rd holds extension's data segment ID BNE fault // Rd != Rb, branch to error handling code STORE R1, Ra --uh-oh. what if the extension jumps directly to the STORE, bypassing the check instructions? solution: --Ra, Rc, and Rd are _dedicated_ (they cannot be used by the extension code.) --now the verifier must check that the extension doesn't use the dedicated registers. --the extension CAN jump to the store, but (1) it can't set Ra and (2) the sandbox code always leaves a legal segment address in Ra. --thus, the extension can store only to its own memory. --how many registers and check instructions does this cost? --4 instructions --5 registers (though paper says 4) --Rc (shift amount) --Rd (segment id for data) --Rx (segment id for code) --Ra (address in data segment) --Ry (address in code segment) --second cut: verifier enforces sandboxing: Ra <- R0 & Re // zero out segment ID in Ra Ra <- Ra | Rf // replace with the valid segment ID STORE R1, Ra --This code forces the segment part of the address bits to be correct. It doesn't catch illegal addresses; it just ensures that illegal addresses are within the segment, harming the extension but no other code. --how many registers and check instructions? --2 instructions --4 registers this time (the paper says 5) --note that the segments that they use have an exact analog in x86. in fact, using segments, *all* of a process's memory references *must* point into the segment. plus, one can arrange things so that the process can't change its own segment descriptors. --this is what VX32 takes advantage of (see above for pointer to the related project, vx32) --optimizations --save a sandboxing instruction for instructions of the form: STORE value, offset(R3) naive way: Ra <- offset + R3 Ra <- Ra & Re Ra <- Ra | Rf STORE value, Ra optimization: Ra <- R3 & Re Ra <- Ra | Rf STORE value, offset(Ra) works because offset is limited to [-32KB,32KB], so no matter the value of Ra, Ra+offset is guaranteed to live in [segment_beg-32KB, segment_end+32KB] to prevent code from writing before or after the segment, create guard zones --stack pointer --sandbox the stack pointer (SP) only when it is explicitly set, not when it is used to form an address so no need to sandbox STORE value, offset(SP) --optimization works because it's far more common to *read* the SP than to *set* it. --what do they do about system calls? --answer: rewrite them to be "RPC" (really a "call" into the trusted portion of the code) into arbitration code that decides whether the requested system call is acceptable --how can we verify if the thing has been sandboxed properly? --any time there's a modification to a dedicated register, read linearly downward and make sure that the code is such that the register becomes valid before it branches or before another region starts --question: what does valid mean? --answer: its upper bits remain in the segment --basically, the algorithm just makes sure that the code blocks above are in effect --summary of properties --Prevents writes and calls/jumps outside extension's data memory. --Can allow direct calls to specific functions in trusted code. --Prevents privileged instructions. --Allows any write or call/jump within extension's memory, so an extension can wreck itself (or be wrecked by buffer overrun). --performance --what types of programs will tend to have higher overheads? (answer: those that write to memory and jump around. tight inner loops not likely to cause much or any overheads) --why do they say that sandboxing increases the available instruction-level parallelism? (answer: there are fewer context switches, so processor can make better predictions about which code will execute) --overall, what do you think of their results? high overhead? low overhead? --all 5.4 is saying is that, even though encapsulation has an overhead, there's a trade-off from avoiding context switches (which is abstractly called "crossing fault domains"). their analysis captures this trade-off and states the break-even point for various constants. C. Discussion --At a high level, this thing is doing in software what is really hardware's job --Can guest read host memory? answer: yes why? (because loads aren't protected without paying a high price) --Do stack smashing attacks work? answer: no --Do return-from-libc attacks work? answer: Yes, if the libc was in the fault domain --Note that extensions can still be wrecked by, say, buffer overflow. there are techniques that protect against that (CFI, XFI, etc.) --What about control flow? Can SFI enforce control flow? answer: no --What did you think of this paper?