Class 8 (by video) CS 439 8 Feburary 2013 On the board ------------ 1. Last time 2. PC architecture 3. x86 instructions 4. gcc calling conventions 5. PC emulation --------------------------------------------------------------------------- 1. Last time --finished up some advice about using concurrency primitives [--next week: discuss *implementation* of concurrency and synchronization: what's behind 'create_thread()', 'mutex.acquire()', etc.?] --discussed the Therac-25 today: --understand the interface presented by the hardware so you can build a bare-bones OS to run on the hardware --implication: OS programmer needs to understand some assembly language 2. PC architecture --components --CPU (registers, execution unit, memory management) --I/O --memory map (physical address space) A. components A full PC has: * an x86 CPU with registers, execution unit, and memory management * CPU chip pins include address and data signals * memory * disk * keyboard * display * other resources: BIOS ROM, clock, ... B. CPU --runs instructions: CPU Mem ----- ----- for (;;) { inst run next instruction inst } inst data data instruction pointer --%eip is incremented after each instruction (instructions are different length) --%eip modified by CALL, RET, JMP, and conditional JMP --IP (instruction pointer) called program counter (or PC) in most other contexts --a computer needs work space; registers: --8086 started with 4 16-bit registers: AX, BX, CX, DX --each in two 8-bit halves: AH and AL, BH and BL, etc. --32 bit versions: EAX, EBX, ECX, ... --more work space: memory --address lines and data lines --need to be able to point into memory --SP: stack pointer --BP: frame base pointer [more on these in a bit] --SI, DI: source index, dest index --for conditional jumps, there are: --FLAGS -- various condition codes --whether last op overflowed -- ... was positive/negative -- ... was [not] zero -- ... carry/borrow on add/subtract -- ... etc. -- whether interrupts are enabled -- direction of data copy instructions --JP, JN, J[N]Z, J[N]C, J[N]O C. I/O * Original PC architecture: use dedicated I/O space --Works same as memory accesses but set I/O signal --Only 1024 I/O addresses --Accessed with special instructions (IN, OUT) --Example: write a byte to line printer: [see handout] #define DATA_PORT 0x378 #define STATUS_PORT 0x379 #define BUSY 0x80 #define CONTROL_PORT 0x37A #define STROBE 0x01 void lpt_putc(int c) { /* wait for printer to consume previous byte */ while((inb(STATUS_PORT) & BUSY) == 0) ; /* put the byte on the parallel lines */ outb(DATA_PORT, c); /* tell the printer to look at the data */ outb(CONTROL_PORT, STROBE); outb(CONTROL_PORT, 0); } * Memory-Mapped I/O o Use normal physical memory addresses + Gets around limited size of I/O address space + No need for special instructions + System controller routes to appropriate device o Works like "special" memory: + Addressed and accessed like memory, but ... + ... does not behave like memory! + Reads and writes can have "side effects" + Read results can change due to external events D. physical memory map [draw picture and see handout] --two points here. (1) physical address space is mostly ordinary RAM (2) some low-memory addresses actually refer to other things (that is, "mind the gap: devices, not memory are mapped between 640KB and 1MB) --example: writing to VGA memory makes things appear on the screen --reset or power-on jumps to ROM at 0xffff0 --so what is the first instruction going to have to do? [answer: probably jump] +------------------+ <- 0xFFFFFFFF (4GB) | 32-bit | | memory mapped | | devices | | | /\/\/\/\/\/\/\/\/\/\ /\/\/\/\/\/\/\/\/\/\ | | | Unused | | | +------------------+ <- depends on amount of RAM | | | | | Extended Memory | | | | | +------------------+ <- 0x00100000 (1MB) | BIOS ROM | +------------------+ <- 0x000F0000 (960KB) | 16-bit devices, | | expansion ROMs | +------------------+ <- 0x000C0000 (768KB) | VGA Display | +------------------+ <- 0x000A0000 (640KB) | | | Low Memory | | | +------------------+ <- 0x00000000 --is this an abstraction that the OS provides to others or an abstraction that the hardware is providing to the OS? [the latter] --job of hardware to turn request for address 0x00100004 into a request that goes to the appropriate place in the actual RAM, perhaps at 0x000A0004 (but we don't know). 3. x86 instructions --transition to .... --x86: CISC architecture --unfortunately, two conventions Intel: op dst, src ATT/gcc: op src, dst (labs) --uses b,w,l suffix on instructions to specify size --examples: movl %eax, %edx ? [edx = eax] register movl $0x12c, %edx ? [edx = 0x12c] immediate movl 0x12c, %edx ? [edx = *(0x12c)] direct movl (%ebx), %edx ? [edx = *(ebx)] indirect movl 4(%ebx), %edx ? [edx = *(ebx + 4)] displaced movl 4(%ebx,%eax,8), %edx ? [edx = *(ebx + eax*8 + 4)] xor %eax, %eax ? [eax = 0] [advantage over "mov $0, %eax" is code size] --instruction classes data movement: MOV, PUSH, POP arithmetic: TEST, SHL, ADD, AND... i/o: IN, OUT, .... control: JMP, JZ, JNZ, CALL, RET string: REP MOVSB system: IRET, INT Intel architecture manual Volume 2 is the reference Aside: 16-vs-32 bits -------------------- --80386: 32 bit data and bus addresses --Now: the transition to 64 bit addresses --Chip gives backward compatibility --boots in 16-bit mode, and boot.S switches to 32-bit mode --Prefix 0x66 gets you 32 bit mode: --MOVW = 0x66 MOV -- ".code32" in boot.S tells assembler to insert 0x66 --what do push and pop actually do? --stack grows down......... [draw picture] --examples: pushl %eax [ subl $4, %esp movl %eax, (%esp) ] popl %eax [ movl (%esp), %eax addl $4, %esp ] call 0x12345 [ pseudo: pushl %eip movl $0x12345, %eip] ret [ pseudo: pop %eip ] 4. gcc calling conventions --above we see how call and ret interact with the stack --call: updates %eip and pushes old %eip on the stack --ret: updates %eip by loading it with stored stack value --but what happens to a function's state, that is, the registers, when a function is called? they might need to be saved, or not. --purely a matter of convention in the compiler **not** hardware architecture --here's what gcc does: --[ draw blocks of code: main f g draw registers: eip ebp eax ecx draw stack ] --at entry of a function: looks like this: arg 3 arg 2 arg 1 -->esp [ret_addr] [fill in picture above] %eip points at first instruction of function %esp+4 point at first argument %esp points at return address --after ret instruction: %eip contains return address %esp points at arguments pushed by caller %eax contains return value (or trash if function is void) %ecx, %edx may be trashed %ebp, %ebx, %esi, %edi need to look the way that they did at the time of the call --in other words: %eax, %ecx, %edx are "caller save": caller's job to push them on the stack if it wants to save them %ebp, %ebx, %esi, %edi are "callee save": callee's job to push them on the stack after function call, and pop them (meaning restore their values by removing them from the stack) just before doing return FRAME POINTER --here's the picture of the stack when one function calls another: +------------+ | | arg 2 | \ +------------+ >- previous function's stack frame | arg 1 | / +------------+ | | ret %eip | / +============+ %ebp-> | saved %ebp | \ +------------+ | | | | | local | \ | variables, | >- current function's stack frame | callee- | / | saved vars,| | etc. | %esp-> +------------+ / --%esp moves to make stack bigger/smaller --%ebp points at saved %ebp from previous function --saved %ebs form chain; can walk stack --arguments and locals at fixed offsets from ebp --function prologue: pushl %ebp movl %esp, %ebp --function epilogue movl %ebp, %esp popl %ebp ret --example [see handout] aside: if you're wondering why the convention is that ebx is callee saved while eax,ecx,edx are caller saved, here's an answer. eax: "accumulator" register ebx: "base" register ecx: "count" register edx: "data" register idea was that ebx would point to the base of a data structure (just as ebp meant "buffer pointer" and pointed to the base of a frame). ebx often points to a data segment (e.g., for a dynamic library's data), by setting it up at the beginning of a function and keeping it constant throughout Because it's a "stable" pointer, makes sense to have the callee save it. ("if you're gonna touch this, put it back the way you found it") eax/ecx/edx are more ephemeral: used for particular calculations and such. Makes sense to require the caller to save it if it needs the values in there. ("if you really want to keep these values, save them".) 5. PC emulation --QEMU does exactly what a real PC would --But it is implemented in software, not hardware --Runs as a normal program on "host" operating system. --The layering looks like this: | JOS | ------------------------------- PC emulator| Web browser | ... ------------------------------- Linux ------------------------------- PC hardware ------------------------------- --Uses normal programmatic constructs (if statements, memory, etc.) to emulate processor logic and state --Stores emulated CPU registers in global variables int32_t regs[8]; #define REG_EAX 1; #define REG_EBX 2; #define REG_ECX 3; .... int32_t eip; --Stores emulated physical memory in QEMU's memory char mem[256*1024*1024]; --See handout --Simulate I/O devices, etc., by detecting accesses to "special" memory and I/O space and emulating the correct behavior: e.g., --Reads/writes to emulated hard disk transformed into reads/writes of a file on the host system --Writes to emulated VGA display hardware transformed into drawing into an X window --Reads from emulated PC keyboard transformed into reads from host's keyboard API --------------------------------------------------------------------------- Summary --covered PC and x86, which is the platform for the labs --illustrated some important CS ideas --stored program computer --stack --memory-mapped I/O --equivalence of software and hardware --------------------------------------------------------------------------- [Credit to Frans Kaashoek, Robert Morris, and Nickolai Zeldovich for much of this content.]