Class 4 CS 480-008 09 February 2016 On the board ------------ 1. Last time 2. Lab structure 3. x86 architecture and assembly (review) 4. Stack frames 5. Buffer overflows 6. [next time] Discussion of Aleph One's article --------------------------------------------------------------------------- 1. Last time --(almost) finished up networking: application layer, link layer, bootstrapping, NAT --discussion of NAT was muddy. summary: --by overloading the port field, a single external IP address can represent multiple internal IP addresses (think through what state the NAT has to maintain to make this work, and what kinds of rewriting the NAT has to do to packets) --via a NAT, port 80 "on the external IP address" can refer to a particular port on a particular internal machine (again, think through what state the NAT has to maintain to make this work) --Hypervisors (like VirtualBox) have "built-in NATs"; result is that virtual machine "looks like" an internal machine "behind" the host. Further games with ports are possible. For example, think through what VirtualBox is doing in a scenario in which: ----VirtualBox does not run as root but rather as an ordinary user (for example, on the CIMS machine linserv1.cims.nyu.edu) ----The virtual machine has sshd bound to port 22 (as usual) ----sshd on the virtual machine gets all traffic destined to linserv1.cims.nyu.edu:10001 (this has to be arranged by VirtualBox). Question: what did VirtualBox do on the local machine? What socket calls did it make? The point of this example is that you can now do the following from any ssh client (like your laptop): $ ssh -p 10001 linserv1.cims.nyu.edu and this will cause you to ssh directly to the virtual machine, even though there's no sshd process running directly on linserv1. --Covered the physical layer only quickly. Here are notes: --signals in a medium --medium: coaxial cable, twisted pair (Ethernet), fiber, radio --signals: endless innovation. different electrical profiles correspond to different sets of bits --some media are point-to-point: --fiber, twisted pair --some media are shared transmission medium (coax, radio) --any message can be seen by all nodes --but now there is contention --speed of light matters! --300,000 km/sec in a vacuum, slower in fiber --New York to CA: ~3000 miles = ~5000 km --propagation time: 5000 km / (300,000 km/sec) = ~17 msec --round-trip: ~34 msec, assuming no computation --Technology improvements are not going to fix this --But what the heck? I thought I keep reading that networks keep getting faster.... --*delay* is never going to improve as long as the theory of relativity stands --throughput -- bits per second -- improves ridiculously well --so how do we take advantage of this? --concept: bandwidth-delay product [DRAW CYLINDER: bandwidth is the height, delay is the length] --get full network utilization if you've got # bytes in flight = bandwidth*delay --but what if the network isn't doing bulk transfer? --then you'll get poor throughput. ping/pong (send a packet, wait for a response) has terrible throughput --this is one reason why concurrency is absolutely critical for good network utilization: a bunch of low-throughput flows may add up to good utilization Note that physical connectivity is rare..... --instead, communications usually "hop" through multiple devices --[DRAW PICTURE: source --> bunch of switches --> destination ] --Allows links and devices to be shared for multiple purposes --Must determine which bits are part of which messages intended for which destinations Two kinds of ways to create this indirect connectivity: --Circuit-switched: provide virtual links. Dump bits in at source, they come out at the destination --example: the old telephone network. dialing the number set up a virtual circuit (and before that, human operators set up an actual circuit) --Packet-switched: --Pack a bunch of bytes together intended for same destination --Slap a _header_ on packet describing where it should go --Most networks today are packet switched --------------------------------------------------------------------------- admin notes question as assignment for lecture make-up class Friday, 12:30 CIWW 101. if you cannot make it to the make-up class, that is completely fine, but note that the video will be assigned homework over the weekend. labs heads up: if you fully understand Aleph One's article, and if you're comfortable with gdb, the concept of memory as array of bytes, the interchangeability of code/data, etc., then the lab won't be too bad. otherwise, it could take a while. --------------------------------------------------------------------------- 2. Notes on lab 2 --On Unix, one can pass file descriptors to other processes. Here's how it works: --a socketpair (Unix domain socket is created) --then fork (and maybe exec) --now the Unix domain socket can be used as a channel for passing *other* file descriptors --this is done with the sendmsg(),recvmsg() API calls --note that the OS renumbers the fd in the receiving process, but it means the same thing that it did in the sending process --System structure --zookld: launches a server based on zook.conf --starts dispatch daemon (zookd) and other services --zookd: dispatcher: listens on port 8080; dispatches HTTP requests to the right services --zookfs: file server: an example service --Diagram: sockfd,svcfd zookld ------------------> zookd <-----> clients (Unix domain sock) sockfd | xx |svcfd | | svcfd v ------------------> svc (Unix domain sock) --zookld creates socketpair (connected Unix domain sockets) --zookld passes one socket to zookd on the command line, using exec and argv; repeats the arrangement for each service --zookld uses the Unix domain socket socket to pass a TCP socket and another Unix domain socket to zookd --that other Unix domain socket (svcfd in the diagram) is a channel by which zookd can communicate with 'svc' --zookld also passes svcfd to svc --zookd listens on sockfd --when zookd accepts() connections from sockfd, it gets a new file descriptor (as usual). zookd passes those newly created file descriptors to svc over svcfd. These newly created file descriptors are represented by xx in the diagram --ultimately, there will be more svcs. zookd will decide which service to pas to based on what the requested URL matches. --Use of the environment can use the environment in the shell: $ FOO=4 $ echo $FOO 4 but environment is more general: it is a list of key/value pairs that a given process has access to. environments are inherited across fork(). zoobar uses environment to pass important state around environment is stored as a variable in sender and serialized over a socket. the receiver deserializes, sets the key/value pairs to be part of the "official" environment (via setenv), and thereafter gains access to them (via getenv). 3. Review x86 architecture and assembly stack: -holds function arguments, local variables, temporary storage, etc -used to save registers --%eip is instruction pointer (aka program counter); incremented after each instruction (instructions are different length) --%eip modified by CALL, RET, JMP, and conditional JMP --a computer needs work space; registers: --8086 started with 4 16-bit registers: AX, BX, CX, DX --each in two 8-bit halves: AH and AL, BH and BL, etc. --32 bit versions: EAX, EBX, ECX, ... --need to be able to point into memory --SP: stack pointer (%esp) --BP: frame base pointer (%ebp) --NOTE: NOTE: NOTE: function arguments are referenced from ebp. --for conditional jumps, there are: --FLAGS -- various condition codes --whether last op overflowed -- ... was positive/negative -- ... was [not] zero -- ... carry/borrow on add/subtract -- ... etc. -- whether interrupts are enabled -- direction of data copy instructions --JP, JN, J[N]Z, J[N]C, J[N]O --unfortunately, two conventions Intel: op dst, src ATT/gcc: op src, dst (labs) --uses b,w,l suffix on instructions to specify size --example instructions: movl %eax, %edx ? [edx <- eax] register movl $0x12c, %edx ? [edx <- 0x12c] immediate movl 0x12c, %edx ? [edx <- *(0x12c)] direct load movl (%ebx), %edx ? [edx <- *(ebx)] indirect load movl 4(%ebx), %edx ? [edx <- *(ebx + 4)] displaced movl 4(%ebx,%eax,8), %edx ? [edx <- *(ebx + eax*8 + 4)] xor %eax, %eax ? [eax <- 0] [historical advantage over "mov $0, %eax" is code size] [but also avoids materializing a 0 character] leal: what's this? load effective address: leal 4(%ebx), %edx ? [edx <- (ebx + 4)] // no '*' pop ? take the element at the top of the stack, put it in , increment %esp by 4. in pseudocode: [movl (%esp), addl $4, %esp] push ? decrement %esp by 4; take the contents of , and put it at %esp in pseudocode: [subl $4, %esp movl , (%esp)] call 0x12345 [ pseudo: pushl %eip movl $0x12345, %eip] ret [ pseudo: pop %eip ] summary of important registers: -esp: points to top of the stack (but "top" means "smallest address" in x86) -ebp: points to base of the stack *frame*. This register is like a "holding place", in that (a) arguments are referenced from ebp, and (b) the return address of the current function is stored just above where ebp is pointing (see below). This makes "function return" convenient to implement (see the discussion of function epilogues below). -eax: where return value is stored -eax, ecx, edx: caller saved registers; registers callee can use for computation. caller must push them on the stack before calling a function if it wants to save them. -ebx, esi, edi: callee saved registers; registers caller expects to be same on return. callee must push them on the stack if it wants to use them, and restore them prior to the function's end. -eip: program counter 4. Stack frames and calling conventions if we have one function call, we have this picture.... ... +------------+ | | arg 2 | \ +------------+ >- previous function's stack frame | arg 1 | / +------------+ | | ret %eip | +============+ %ebp-> | saved %ebp | \ +------------+ | | | | | local | \ | variables, | >- current function's stack frame | callee- | / | saved vars,| | etc. | %esp-> +------------+ / ....and then if the currently executing function calls another, we get this picture: +------------+ | arg 2 | +------------+ | arg 1 | +------------+ | ret %eip | +============+ - | saved %ebp | | +------------+ \ | | | | local | | | variables, | | | callee- | | | saved vars,| | | etc. | | +------------+ | | arg 2' | \ +------------+ >- previous function's stack frame | arg 1' | / +------------+ | | ret %eip' | / +============+ %ebp'-> | saved %ebp'| \ +------------+ | | | | | local | \ | variables, | >- new function's stack frame | callee- | / | saved vars,| | | etc. | %esp'-> +------------+ Above, the quote character (') means "these values are different from the ones without the quote character". --gcc arranges for this picture by placing code at the beginning and end of functions. Specifically: --function prologue: pushl %ebp movl %esp, %ebp # make room for local variables subl $128, %esp # code to save callee registers: push %ebx; push %esi; push %edi # function logic .... --function epilogue .... # restore callee registers popl %edi; popl %esi; popl %ebx addl $128, %esp movl %ebp, %esp popl %ebp ret aside: if you're wondering why the convention is that ebx is callee saved while eax,ecx,edx are caller saved, here's an answer. eax: "accumulator" register ebx: "base" register ecx: "count" register edx: "data" register idea was that ebx would point to the base of a data structure (just as ebp meant "buffer pointer" and pointed to the base of a frame). ebx often points to a data segment (e.g., for a dynamic library's data), by setting it up at the beginning of a function and keeping it constant throughout Because it's a "stable" pointer, makes sense to have the callee save it. ("if you're gonna touch this, put it back the way you found it") eax/ecx/edx are more ephemeral: used for particular calculations and such. Makes sense to require the caller to save it if it needs the values in there. ("if you really want to keep these values, save them".) 5. Buffer overflows A. Suppose your lab1 Web server has a bug in the way that it parses inputs. Malformed requests cause crashes. Is this cause for concern? Let's see.... Demo: % cat readreq.c #include #include char * gets(char *buf) { int c; while((c = getchar()) != EOF && c != '\n') *buf++ = c; *buf = '\0'; return buf; } int read_req(void) { char buf[128]; int i; gets(buf); i = atoi(buf); return i; } int main() { int x = read_req(); printf("x = %d\n", x); } % ./readreq 1234 % ./readreq AAAAAAAAAAAA....AAAA Why did it crash? We should think "this is a bug; could an attacker exploit it?" Let's figure out what exactly is happening. % gdb ./readreq b read_req r info reg disas $eip Where is buf[]? print &buf[0] print $esp print &i Aha, buf[] is on the stack, followed by i. The sub $0xa8, %esp allocates space for buf[] and i. Let's draw a picture of what's on the stack. +------------------+ | main()'s frame | | | | | +------------------+ | return address | +------------------+ %ebp ------> | saved %ebp | +------------------+ | i | +------------------+ | buf[127] | | ... | | buf[0] | +------------------+ %esp ------> | ... | +------------------+ The x86 stack grows down in addresses. push == decrement $esp, then write to *$esp $ebp is "frame pointer" -- saved stack ptr at function entry. x $ebp x $ebp+4 Let's see what the saved return $eip refers to: disas 0x0804850d It's the instruction in main() after the call to read_req() OK, back to read_req, just before gets() disas $eip next AAAAAAA...AAA What did gets() do to the stack? print &buf[0] Hmm, 200 is more than 128! How can that be? x $ebp x $ebp+4 Saved frame pointer and return eip are 0x41414141! What's 41? next disas stepi stepi disas Now about to execute get()'s return instruction. x $esp stepi -- the ret info reg -- note eip is 0x41414141 stepi -- crash, this is our seg fault B. If our web server code had this bug, could an attacker exploit it to break into our computer? Yes. How? Concept: overwrite location where program expects return address send the program to code that the programmer wrote; in the classic case, this code is on the stack thus, the "ret" statement has the effect of causing the %eip to point to the beginning of a buffer on the stack...which is where the attacking code itself was placed. For this to work, adversary needs to know: location of return address (approximately) location of vulnerable buffer (approximately) NOTE: once the adversary is running injected code, that code has all of the privileges of the application itself. If the exploited process was running as root, the attacker can do anything to the machine. Even if not, can work mischief (send spam, read files (web server, database), .. Can load bigger program from somewhere on the net that exploits user/syscall interface...) [What happens if stack grows up, instead of down? Stack frame for read_req() has buf[] at highest address, so won't overflow onto read_req()'s return address. BUT: look at the stack frame for gets.] [Thanks to MIT 6.858 and 6.828 for some of this content.]