Class 26
CS 372H
26 April 2012

On the board
------------

1. Last time 
2. Finish VMWare discussion
3. Stack smashing
4. Unix security model

---------------------------------------------------------------------------

1. Last time

    --virtual machines. history of VMWare

    --discussion of VMWare ESX paper

2. Finish VMWare discussion

    I. [last time] technique: ballooning

    II. [last time] technique: content-based page sharing

    III. technique: share-based allocation, with a tax

	--Basic idea: give resource rights based on *shares*, S_1, ...,
	S_n
	    [same author as lottery scheduling!]

	--The VM selected to relinquish should be the one with the
	fewest shares per allocated page  i.e., lowest ratio of S_i /
	P_i. that's the OS that's paying the least. 

	    --example: A, B each have S=1. reclaim from the larger user.
	    A has twice as many shares --> A can use twice as much
	    memory

	--Problem: what if a VM has tons of shares but isn't using its
	memory? Don't want to reclaim pages from other VMs

	--Solution: tax the idle pages:

	    tax arithmetic. if my income tax rate is T, then if I earn
	    $1, I pay T*$1 in taxes. thus:
	    
		-- $1 gross = $(1-T) take home.
	    
		-- $1/(1-T) gross = $1 take home

		-- to get a dollar taken home, need k = 1/(1-T)

	    idea: 
	    
		tax idle memory

		pages that are being used are "tax deductible"

		if you're not using a page, pay a fraction, T, of it
		back to the system (not "yours").

		so each idle page costs, in shares, k times the price of
		a non-idle page.

	    consider # of shares per post-tax dollars/pages:

		rho = S / [(# used) + k*(#idle)]
		    = S / P(f + k(1-f))
	    k is "idle page cost", k = 1/(1-T)
	    f is fraction of active pages


	--ASK: how to measure non-idle memory (f):
	  Statistical sampling:  Pick n pages at random, invalidate, see if accessed
	    If t pages touched out of n at end of period, estimate usage as t/n
	    How expensive is this?  <= 100 page faults over 30 seconds negligible

	  Ridiculously easy

	--ASK: why do they keep three moving averages? What do they keep
	three moving averages of?
	    --> Slow exponentially weighted moving average of t/n over many periods
	    --> Faster weighted average that adapts more quickly
	    --> Version of faster average that incorporates samples in current period

	    --use max of 3. why?
 
	    Basic idea: respond rapidly to increases in memory usage and
	    gradually to decreases in memory usage.
		--When in doubt, want to respect priorities (so give
		credit for having had a high estimate of non-idle pages
		in the past).
		--Spike in usage likely means VM has "woken up"
		--Small pause in usage doesn't necessarily mean pause
		will continue to last

	--ASK: how do they use the estimate?

	--ASK: how well does this do? [answer: figure 6 (p. 9)]
   

    big picture:

	estimate (5.3) -->
	shared-based alloc. based on tax and reclaiming from smallest (5.2) -->
	ballooning (3.2) or paging (3.3) to decide which page

    commentary:

	very nice design in part because it has very few parameters:
	
	    min, max, S [per VM]
	    system-wide [\tao]

3. Stack smashing

    --history

    --('buffer overflow' is one way to conduct a stack smashing attack.)

    --note how exploit works

	--primitive form of linking, at exploit time!

	--relies on fork/exec separation

    --demo

	[NOTE: fork/exec separation is what allows us to write tcpserve:
	after the fork() but before exec() of buggy-server, child
	rearranges its file descriptors to be the socket itself. Also,
	this sample code gives you a chance to see sockets in action.]

        --UTCS host runs server. as Jason.

	--my laptop runs honest client

	--my laptop runs dishonest client

    --note: if this server had been running as root, we'd have been able
    to get a root shell

	--and if the user/syscall interface doesn't check its arguments
	properly, can buffer overflow that interface

	--in practice, once you have a user account on a machine, it's
	often possible to get root access (why? because the syscall
	interface is really hard to secure, as a matter of practice.)

    --other versions of these attacks

	--return-to-libc (see Tanenbaum)

	    [DRAW PICTURE]
    
	--return-oriented programming

	--overwriting function pointers

	--smashing the heap

    --how do people defend against these things?

	--W ^ X (map the stack pages as non-executable, if the hardware
	allows it). But there are some issues....
	
	    --the original 386 did not allow it with page tables.
	    However, all x86 chips that support extended page tables
	    (which are used to help users get at >4GB of physical memory
	    even if the machine is 32 bits) also support an XD bit in
	    those page tables, which means "don't execute code in this
	    page". We haven't worked with this bit in this class, but
	    the architecture on modern 32-bit x86 supports it.
	    
	    --Even on x86s that don't suport extended page tables,
	    segmentation would help with do-not-execute (since the
	    permissions in the segment descriptor can express this).
	    The disadvantage here is that the compiler needs to lay out
	    the code and stack to match what the segments would require.

	    --The bummer with W ^ X, even when it *is* supported, is
	    this: some languages not only don't need it but also are
	    actively harmed by W ^ X. The core of the issue is that a
	    program written in a safe language (Perl, Python, Java,
	    etc.) does not need W ^ X whereas lots of C programs do.
	    Meanwhile some machines *always* enforce W ^ X, even for
	    programs that do not need it. Such enforcement constrains
	    certain languages, namely those that need to do runtime code
	    generation. This is related to the topic of binary
	    translation (recall guest lecture)

	--Address space randomization. This provides some help but
	obviously doesn't help our vulnerable server because our server
	tells the client where the buffer is.

	--StackGuard (in gcc).

	--another defense: don't use C! CPUs are so fast that a language
	with bounds checking probably isn't going to pay a huge
	performance penalty relative to one without bounds checks

    --unfortunately, this is an arms race, and each time a new defense
    arises, a new attack arises too. here's the most advanced current
    technique, and it defeats many of the above defenses:

	--smash the stack with a bunch of return addresses. each return
	address points to the needed instruction followed by "ret"
	(requires the attacker to have previously identified these
	instructions in the code). not too hard in CISC code like on
	x86, where there are lots of sequences of code embedded in the
	binary, even sequences that the programmer didn't mean (because
	instructions are not fixed length). result: the control flow
	bounces around all of these byte sequences in memory, executing
	exactly what the attacker wanted, but not executing off of the
	stack.

	--this is called "return-oriented programming". defending
	against it is hard (though if people use only safe languages,
	that is, languages that do bounds checking and other pointer
	checks, such attacks will be much, much harder)

    --Question: can we instead confine processes and users so that when
    they're broken into, the damage is limited?