Class 13
CS372H
1 March 2012

On the board
------------

1. Last time and last week
2. Finish linking and loading
3. SFI
    --Intro
    --Details
    --Discussion

---------------------------------------------------------------------------

1. Last time and last week

    --last time: Jon's lecture on binary rewriting

    --last week:

	--implementation of swtch()

	--linking

	--clarification: patch up (using refs) can work with relative
	addresses. the patched program need not have absolute addresses.

2. Finish linking and loading
    A. Intro
    B. What does a process look like in memory?
    C. What does the assembler do?
    D. Overview of linking
    E. Details
    F. Summary

    E. Details

    --variation 1: dynamic linking

	--link at runtime 

	--example: when someone calls a function called func(),
	the code that is executed is:
	   
	    void* p = dlopen("func.o", RTLD_LAZY);
	    void (*fp)(void) = dlsym(p, "func"); /* map symbol to address */
	    fp();
    
	and meanwhile what we had was:

	    void func(void) {
		puts("hello");
	    }

	    gcc -c func.c --> func.o

	so what's going on here is that the "reference" to "func" in the
	*calling* program gets "resolved" via the call to dlopen.

	--issues: what happens if the resolution doesn't work? how is
	behavior different from static linking? where do we get "puts"
	from?

    --variation 2: static shared libraries

	--observation: libc.a (the std C library) is linked into every
	executable

	--idea/insight: have one copy on disk, and don't include this
	code in the executable.

	--approach:

	    --every program has a "shared library segment" at the same
	    address

	    --every shared library gets a unique range in this segment,
	    and computes where its external definitions will live

	    --linker links program against the library ... (why?)
		
		--answer: need to get references right

	    --... _but_ linker does not bring in the actual code

	    --the loader marks the shared library region as unreadable

	    --when process calls into the library code, it faults.
	    an embedded linker then brings in the library code from a
	    known place and maps it in

	    --result: different running programs are sharing code!

    --variation 3: dynamic shared libraries

	--variation 2 is a bummer because:
	
	    --it requires system-wide pre-allocation of address space.

	    --this is clumsy, inconvenient, wasteful. also,
	    what if library gets too big for its space? 

	--solution: 

	    --any library can be loaded at any virtual address

	    --need a stub library. why?
	    
		--(otherwise linker can't actually patch up references)

	    --but now the position of functions can vary, so how can we
	    call them without rerunning the linker at runtime?

	    --answer: layer of indirection!

		[draw picture]

	    --now, only the GOT (global offset table) needs to be
	    patched up when the program is loaded

	    --can even do the GOT patching up dynamically
	    (since linking all the functions at startup would cost time
	    and be potentially wasteful, e.g., if program uses only some
	    of them)

		[draw picture]

	    --idea: link function at first call

	--key point: the GOT is *data* but contains an array of function
	pointers (another instance of how code and data are the same
	thing)

    F. Summary

    --compiler outputs 1 object file for each source file

	--problem: this is an incomplete world view

	--where to put variables and code? how to refer to them?

	--compiler names definitions symbolically (for instance as
	"printf"), and then refers to routines and variables by
	_symbolic_ name

    --linker

	--has a global view of everything, which is a powerful lever. 

	--decides where everything lives, finds all references, and
	updates them.

	--meets OS interface: indicates to OS what is code, what is
	data, where is start point, etc.

    --OS loader

	--reads object files into memory

	--allows code sharing and other optimizations

	--the OS provides an interface for the process to extend its
	data segment (i.e., to allocate memory) as it is running. the
	system call is "sbrk()". 
	
	    --so the "load" function does not run only at process
	    invocation. 

---------------------------------------------------------------------------

Admin notes:

    when do you want the midterm review: Mon, Tue, Wed of midterm week?


---------------------------------------------------------------------------
3. SFI
    
A. Intro

    Problem: how to use untrusted code (an "extension") in a trusted
    program?

    Intellectual challenge:
	--need to let code run but somehow control it, without using the
	normal approach to such control, which is the protections
	enforced by hardware (specifically page tables, which create the
	isolated memory view ).

    Examples

    --Use untrusted, legacy jpeg codec in Web browser

	[draw picture of JPEG decoder in browser memory]

    --Use an untrusted driver in the kernel (e.g. loadable kernel module)


    Now a classic paper

	--everyone is trying to do this

	--most obvious example: the Web, and plugins

	--here's some context:

	--SFI (this paper) --> PittSFIeld (SFI for x86) --> Google NativeClient

	    PittSFIeld reference: 

		[http://people.csail.mit.edu/smcc/projects/pittsfield/]

		Evaluating SFI for a CISC Architecture. Stephen McCamant
		and Greg Morrisett. In 15th USENIX Security Symposium,
		(Vancouver, BC, Canada), August 2-4, 2006


	    NativeClient reference:
		[http://research.google.com/pubs/archive/34913.pdf]

		Native Client: A Sandbox for Portable, Untrusted x86 Native
		Code, Bennet Yee, David Sehr, Gregory Dardyk, Brad Chen,
		Robert Muth, Tavis Ormandy, Shiki Okasaka, Neha Narula,
		Nicholas Fullagar . 30th IEEE Symposium on Security &
		Privacy, May 17-20, 2009.

	--other related work

	    --Xax (by Jon Howell et al.) and NativeClient have the identical
	    motivation but different realizations

	    --vx32 (related work)
		--different approach to sandboxing but similar
		motivation to the works above
		http://pdos.csail.mit.edu/papers/vx32:usenix08.pdf

    The paper we're discussing interestingly missed the Web....

    ...but is still a classic paper
	--at the time, the audience may have been more worried about
	performance...
	--but now, everyone thinks, "yeah, of course we want that", and
	performance may be secondary. (maybe.)

    [defn: "trusted" part of a system is the part of a system assumed to
    be correct.]

    What bad things can the extension do?
    --Write trusted data or code
    --Read private data from trusted code's memory
    --Execute privileged instructions
    --Call trusted functions with bad arguments
    --Jump to unexpected trusted location (e.g. not start of fn)
    --Contain exploitable security flaws that allow others to do the above

    What is it probably OK for an extension to do?
    --Read/write its own memory
    --Execute its own code
    --Call *particular* functions in trusted code

    Possible solutions/approaches:

    --Run extension in its own address space with minimal privileges.
    Rely on hardware and operating system protection mechanism.

    --Restrict the language in which the extension is written:

	--Packet filter language.  Language is limited in its
	capabilities, and it easy to guarantee "safe" execution.

	--Type-safe language. Language runtime and compiler guarantee
	"safe" execution.

    --What's the disadvantages to the above?

	--own address space: expensive context switches

	--safe language: restricts the language that people can use so
	doesn't work for lots of common and legacy code

    --Software-based sandboxing

	--the big idea: isolate code *within* the same address space,
	thereby achieving isolation without context switches
	
	--these ideas are now everywhere. This paper was first, or one
	of the first.
	
	Elements:

	--Sandboxer. A compiler or binary-rewriter sandboxes all unsafe
	instructions in an extension by inserting additional
	instructions. For example, every indirect store is preceded by a
	few instructions that compute and check the target of the store
	at runtime.

	--Verifier.

	    --When the extension is loaded in the trusted program, the
	    verifier checks if the extension is appropriately sandboxed
	    (e.g., all direct stores/calls refer to extension's memory,
	    all indirect stores/calls sandboxed, no privileged
	    instructions).
	
	    --If not, the extension is rejected.
	    
	    --If yes, the extension is loaded, and can run.
	
	    --If the extension runs, that means that the sandboxing of
	    unsafe instructions ensures that unsafe instructions are used in a
	    safe way.

	--The verifier must be trusted, but the sandboxer doesn't have
	to be. Meaning: the compiler can screw up and as long as the
	verifier is correct, it doesn't matter.
	
	--We can do without the verifier, if the host can establish that
	the extension has been sandboxed by a trusted sandboxer.

	--You can think of sandboxing as a software version of the memory
	protection you get with page-tables or segments. 

B. Details of SFI

    --Implemented for RISC processors

	--simplifies SFI. why? (two reasons)
	
	    --because every instruction is 32 bits wide, and because one
	    can only jump/call to 32-bit aligned targets, so one can
	    investigate every possible entry point

	    --big register set; makes it easy to use "dedicated
	    registers".

    --Approach:

	    0x101f..........f
				     code
	    0x1010..........0
	    0x100f..........f       
	                             data
	    0x100000000000000
				    
				    Firefox/Chrome/etc.

	    Code Seg ID = 0x101
	    Data Seg ID = 0x100
    
	--[draw the picture above.] the key point is that because the verifier
	enforces that the sandboxed code always uses particular upper
	bits, the code is dealing with a "sandboxed" region of memory.

	--why are there two segments, one for code and the other for data,
	heap and stack?

	    --answer: to prevent application from modifying its own code

	--verifier can check:
	
	    --that direct calls/jumps and stores refer to addresses
	    inside the segment (since such instructions have the address
	    embedded within them).

	    --PC-relative branches
	    
	    --privileged instructions

	    --The verifier probably has a table of legal call targets that lie in
	    trusted code.

	--hard part: indirect jump/calls (i.e., jump to the contents of
	this register, or store to the address given by this register)

	    [on x86, this is an instruction like "jmp *%ecx"]

	    --first cut: verifier enforces segment matching:

		Suppose the original unsafe instruction is:
		  STORE R1, R0 (i.e. write R1 to Mem[R0])

		Here's how we could sandbox the STORE:
		  Ra <- R0
		  Rb <- Ra >> Rc // Rb = segment ID of target
		  CMP Rb, Rd     // Rd holds extension's data segment ID
		  BNE fault      // Rd != Rb, branch to error handling code
		  STORE R1, Ra

	    --uh-oh. what if the extension jumps directly to the STORE,
	    bypassing the check instructions?

		solution: 
		
		--Ra, Rc, and Rd are _dedicated_ (they cannot be used by
		the extension code.)
		
		--now the verifier must check that the extension doesn't use
		the dedicated registers. 
		
		--the extension CAN jump to the store, but (1) it can't
		set Ra and (2) the sandbox code always leaves a legal
		segment address in Ra.
		
		--thus, the extension can store only to its own memory.

	   --how many registers and check instructions does this cost?

		--4 instructions

		--5 registers (though paper says 4)
		    --Rc (shift amount)
		    --Rd (segment id for data)
		    --Rx (segment id for code)
		    --Ra (address in data segment)
		    --Ry (address in code segment)

	    --second cut: verifier enforces sandboxing:

		  Ra <- R0 & Re // zero out segment ID in Ra
		  Ra <- Ra | Rf // replace with the valid segment ID
		  STORE R1, Ra

		--This code forces the segment part of the address bits
		to be correct.  It doesn't catch illegal addresses; it
		just ensures that illegal addresses are within the
		segment, harming the extension but no other code.

	    --how many registers and check instructions?

		--2 instructions

		--4 registers this time (the paper says 5)

    --note that the segments that they use have an exact analog in x86.
    in fact, using segments, *all* of a process's memory references
    *must* point into the segment. plus, one can arrange things so that
    the process can't change its own segment descriptors. 

	--this is what VX32 takes advantage of (see above for pointer to
	the related project, vx32)

    --optimizations
	
	--save a sandboxing instruction for instructions of the form:
	    STORE value, offset(R3)

	    naive way:
		Ra <- offset + R3
		Ra <- Ra & Re
		Ra <- Ra | Rf
		STORE value, Ra

	    optimization:
		Ra <- R3 & Re
		Ra <- Ra | Rf
		STORE value, offset(Ra)

	    works because offset is limited to [-32KB,32KB], so no
	    matter the value of Ra, Ra+offset is guaranteed to live in
		[segment_beg-32KB, segment_end+32KB]
	    
	    to prevent code from writing before or after the segment,
	    create guard zones

	--stack pointer

	    --sandbox the stack pointer (SP) only when it is explicitly
	    set, not when it is used to form an address
		so no need to sandbox 
		    STORE value, offset(SP)

	    --optimization works because it's far more common to *read*
	    the SP than to *set* it.

    --what do they do about system calls?

	--answer: rewrite them to be "RPC" (really a "call" into the
	trusted portion of the code) into arbitration code that decides
	whether the requested system call is acceptable

    --how can we verify if the thing has been sandboxed properly?

	--any time there's a modification to a dedicated register, read
	linearly downward and make sure that the code is such that the
	register becomes valid before it branches or before another
	region starts

	    --question: what does valid mean?
	   
		--answer: its upper bits remain in the segment

	    --basically, the algorithm just makes sure that the code
	    blocks above are in effect

    --summary of properties
	--Prevents writes and calls/jumps outside extension's data memory.
	--Can allow direct calls to specific functions in trusted code.
	--Prevents privileged instructions.
	--Allows any write or call/jump within extension's memory, so
         an extension can wreck itself (or be wrecked by buffer overrun).

    --performance

	--what types of programs will tend to have higher overheads?
	    (answer: those that write to memory and jump around. tight
	    inner loops not likely to cause much or any overheads)

	--why do they say that sandboxing increases the available
	instruction-level parallelism?
	    (answer: there are fewer context switches, so processor can
	    make better predictions about which code will execute)

	--overall, what do you think of their results? high overhead?
	low overhead?

	--all 5.4 is saying is that, even though encapsulation has an
	overhead, there's a trade-off from avoiding context switches
	(which is abstractly called "crossing fault domains"). their
	analysis captures this trade-off and states the break-even point
	for various constants.

C. Discussion

    --At a high level, this thing is doing in software what is really
    hardware's job

    --Can guest read host memory?

	answer: yes

	why? (because loads aren't protected without paying a high price)

    --Do stack smashing attacks work?

	answer: no

    --Do return-from-libc attacks work?

	answer: Yes, if the libc was in the fault domain

    --Note that extensions can still be wrecked by, say, buffer
    overflow. there are techniques that protect against that (CFI, XFI,
    etc.)

    --What about control flow? Can SFI enforce control flow?

	answer: no

    --What did you think of this paper?