Class 2
CS 372H
19 January 2012

On the board
------------
http://www.cs.utexas.edu/~mwalfish/classes/s12-cs372h

1. Yesterday, today, and tomorrow(s)
2. PC architecture
3. x86 instructions
4. gcc calling conventions

---------------------------------------------------------------------------

1. Context

    A. Last time: discussed OS and reviewed some of the history

    B. Today: x86 architecture and assembly

	need this because we're building an OS on top of the chip.
	thus:

	--OS needs to understand the hardware architecture

	    --which means OS programmer needs to understand it

	--OS programmer needs to understand assembly language.....

    C. Looking ahead....

	--review process abstraction, virtual memory on x86, ...

2. PC architecture
    
    write on the board:
    --components
    --CPU (registers, execution unit, memory management)
    --I/O
    --memory map (physical address space)

    A. components

	A full PC has:

	    * an x86 CPU with registers, execution unit, and memory management
	    * CPU chip pins include address and data signals
	    * memory
	    * disk
	    * keyboard
	    * display
	    * other resources: BIOS ROM, clock, ... 

    B. CPU

        --runs instructions:
 
   	 CPU                                            Mem  
        -----                                          -----   
	for (;;) {				       inst    
	    run next instruction                       inst    
	}                                              inst    
                                                       data    
						       data    

	instruction pointer

	--%eip is incremented after each instruction
	    (instructions are different length)

	--%eip modified by CALL, RET, JMP, and conditional JMP 

	--IP (instruction pointer) called program counter (or PC) in
	most other contexts

	--a computer needs work space; registers:

	    --8086 started with 4 16-bit registers:
		AX, BX, CX, DX

	    --each in two 8-bit halves:
		AH and AL, BH and BL, etc.

	    --32 bit versions:
		EAX, EBX, ECX, ...

	--more work space: memory

	    --address lines and data lines

	--need to be able to point into memory

	    --SP: stack pointer

	    --BP: frame base pointer

	    [more on these in a bit]

	    --SI, DI: source index, dest index

	--for conditional jumps, there are:

	    --FLAGS -- various condition codes

		--whether last op overflowed

		-- ... was positive/negative
		-- ... was [not] zero
		-- ... carry/borrow on add/subtract
		-- ... etc.
		--  whether interrupts are enabled
		--  direction of data copy instructions 

	    --JP, JN, J[N]Z, J[N]C, J[N]O

    C. I/O

	*  Original PC architecture: use dedicated I/O space

	    --Works same as memory accesses but set I/O signal
	      
	    --Only 1024 I/O addresses

	     --Accessed with special instructions (IN, OUT)

	     --Example: write a byte to line printer:

		[see handout]

		#define DATA_PORT    0x378
		#define STATUS_PORT  0x379
		#define BUSY	     0x80
		#define CONTROL_PORT 0x37A
		#define STROBE	      0x01

		void
		lpt_putc(int c)
		{
		  /* wait for printer to consume previous byte */
		  while((inb(STATUS_PORT) & BUSY) == 0)
		    ;

		  /* put the byte on the parallel lines */
		  outb(DATA_PORT, c);

		  /* tell the printer to look at the data */
		  outb(CONTROL_PORT, STROBE);
		  outb(CONTROL_PORT, 0);
		}


	* Memory-Mapped I/O

	      o Use normal physical memory addresses

		    + Gets around limited size of I/O address space

		    + No need for special instructions

		    + System controller routes to appropriate device 

	      o Works like "special" memory:

		    + Addressed and accessed like memory, but ...

		    + ... does not behave like memory!

		    + Reads and writes can have "side effects"

		    + Read results can change due to external events 


    D. physical memory map [draw picture and see handout]

    --two points here.

	(1) physical address space is mostly ordinary RAM

	(2) some low-memory addresses actually refer to other things
	(that is, "mind the gap: devices, not memory are mapped between
	640KB and 1MB)

	    --example: writing to VGA memory makes things appear on the
	    screen

	    --reset or power-on jumps to ROM at 0xffff0

	    --so what is the first instruction going to have to do?
	    [answer: probably jump]


	    +------------------+  <- 0xFFFFFFFF (4GB)
	    |      32-bit      |
	    |  memory mapped   |
	    |     devices      |
	    |                  |
	    /\/\/\/\/\/\/\/\/\/\

	    /\/\/\/\/\/\/\/\/\/\
	    |                  |
	    |      Unused      |
	    |                  |
	    +------------------+  <- depends on amount of RAM
	    |                  |
	    |                  |
	    | Extended Memory  |
	    |                  |
	    |                  |
	    +------------------+  <- 0x00100000 (1MB)
	    |     BIOS ROM     |
	    +------------------+  <- 0x000F0000 (960KB)
	    |  16-bit devices, |
	    |  expansion ROMs  |
	    +------------------+  <- 0x000C0000 (768KB)
	    |   VGA Display    |
	    +------------------+  <- 0x000A0000 (640KB)
	    |                  |
	    |    Low Memory    |
	    |                  |
	    +------------------+  <- 0x00000000


    --is this an abstraction that the OS provides to others or an
    abstraction that the hardware is providing to the OS? [the latter]

    --job of hardware to turn request for address 0x00100004 into a request 
    that goes to the appropriate place in the actual RAM, perhaps at
    0x000A0004 (but we don't know).

    --students sometimes ask about the "why" of the picture above. here
    are some answers to natural questions.

	--first off, the 8088 had 20 bit address space, so 0xfffff was
	top of what could be addressed on the ancestor to the x86
	
	--the bios is mapped to 0xf0000 to 0xfffff because it is crammed
	at the top of 1 MB of address space (while 8088 had 20 bit
	address space, they weren't thinking there would actually be a
	megabyte of RAM)

	    --they *thought* they were locating the BIOS in an
	    out-of-the-way place

	    --history repeated itself. the ROMs on PCI cards get mapped
	    to the top of the 32 bit address space because, after all,
	    no one is going to have 4 GB of RAM in a computer

	--%eip starts at 0xffff0 because chip people didn't want to make
	assumptions about where the BIOS started. easier just to require
	that it be at the top of the address space and then start with a
	jump

	    --and why 0xffff0 instead of 0xffffa?

	    --probably because 0xffff:0000 in segment:offset is easy to
	    code in hardware
    
---------------------------------------------------------------------------

Admin notes

--All of you are in this class because you want a challenge, and that's
great.

--There is a range of background preparation in this class (for
instance, some of you have taken 439H, which covers many of the topics
that are classically covered in an intro-to-OS class). We've gone
through the 439H material, and here's what we're going to do:

    * We'll mostly avoid duplication of content. When we cover the same
    topic that 439H did, it will be at a different level of abstraction
    (closer to the hardware).
    
    * But we also don't want people to get lost. The job of filling in
    needed background will be done by three things:
	** Jason's sessions
	** Background reading
	** Homeworks posted on the home page

    * If there are gaps in your background, you should use some
    combination of the above three resources. You probably don't need to
    use all of the resources, but you probably can't get away with using
    none of them. Which you choose and use depends on you and your
    background. If you're unsure what strategy makes sense, please talk
    to Jason and me.

    * There's one special-case exception to all of the above: we're not
    going to cover threaded application programming, a classic topic in
    undergraduate OSes. But you should know how to do it. In fact, it's
    important enough that it's worth making sure you remember it from
    last semester. For that reason, you *will* do a lab and be examined
    on threaded application programming, but we will not cover it in
    lecture.

--If we're not getting either of avoid-duplication or
background-to-fill-gaps, please let us know 

    * We'll try to figure out a way for you to get us feedback
    anonymously if you want.

--As always, reading or assignment listed next to a lecture should be
done *before* the lecture.

---------------------------------------------------------------------------

3. x86 instructions

    --OS programmer needs to understand assembly language?

	can get suprisingly far in OS work without being an assembly
	hacker. the reason is that in general you want to try to get the
	compiler to write as much of the assembly as possible.

	    but of course there are some instructions you can't express
	    in C, such as clearing interrupts, loading segment
	    registers, etc.

	    and sometimes you need good performance

	--but there is debate about this

	    --"if OS is controlling the CPU, it should work directly
	    with the CPU's instructions"

	    --these people are probably annoyed by the fact that even
	    assembly instructions are basically macros interpreted by
	    the x86 to some internal microcode.

    --transition to .... 

	--x86: CISC architecture

    --unfortunately, two conventions

	Intel: op dst, src

	ATT/gcc: op src, dst (labs)

	    --uses b,w,l suffix on instructions to specify size
 
  
    --examples:

	movl %eax, %edx  ?    [edx = eax]   register

	movl $0x12c, %edx ?   [edx = 0x12c] immediate

	movl 0x12c, %edx  ?   [edx = *(0x12c)] direct

	movl (%ebx), %edx ?   [edx = *(ebx)] indirect

	movl 4(%ebx), %edx ?  [edx = *(ebx + 4)] displaced

	movl 4(%ebx,%eax,8), %edx ? [edx = *(ebx + eax*8 + 4)]

	xor %eax, %eax  ?      [eax = 0]
	    
	    [advantage over "mov $0, %eax" is code size]

    --instruction classes

	data movement: MOV, PUSH, POP

	arithmetic: TEST, SHL, ADD, AND...

	i/o: IN, OUT, ....

	control: JMP, JZ, JNZ, CALL, RET

	string: REP MOVSB

	system: IRET, INT


    Intel architecture manual Volume 2 is the reference

	    Aside: 16-vs-32 bits
	    --------------------
	    --80386: 32 bit data and bus addresses 
	    --Now: the transition to 64 bit addresses 
	    --Chip gives  backward compatibility
		--boots in 16-bit mode, and boot.S switches to 32-bit mode 
		--Prefix 0x66 gets you 32 bit mode: 
		    --MOVW = 0x66 MOV 
		    -- ".code32" in boot.S tells assembler to insert 0x66 


    --what do push and pop actually do?

    --stack grows down.........
	[draw picture]

    --examples:

	pushl %eax   [  subl $4, %esp 
			movl %eax, (%esp) ]

	
	popl %eax    [ movl (%esp), %eax
		       addl $4, %esp     ]

	
	call 0x12345  [ pseudo:
			  pushl %eip
			  movl $0x12345, %eip]

	ret	       [ pseudo:
			    pop %eip ]

---------------------------------------------------------------------------

    By the way, how do we go from C code to running x86 instructions?

	--requires compiler, assember, linker, and loader

	--here's the picture:
     
           gcc    as     ld         loader
	.c --> .S --> .o ---> a.out -------> memory
                            ^
                           /
	  .c  --> .S --> .o

	--"ld" is the *linker* (despite the name). it serves a different
	function from the loader.

	--the *loader* takes a binary executable from the file system,
	puts it in memory, and creates a process to begin executing it.

---------------------------------------------------------------------------

4. gcc calling conventions

    --above we see how call and ret interact with the stack
	--call: updates %eip and pushes old %eip on the stack
	--ret: updates %eip by loading it with stored stack value

    --but what happens to a function's state, that is, the registers,
    when a function is called? they might need to be saved, or not. 
	
    --purely a matter of convention in the compiler **not** hardware
    architecture

    --here's what gcc does:

	--[
		draw blocks of code:
		    
		    main
		    f
		    g

		draw registers:

		    eip
		    ebp
		    eax
		    ecx
	    
		draw stack ]


	--at entry of a function:

	    looks like this: 
	
			arg 3
			arg 2
	                arg 1
		-->esp  [ret_addr]

	    [fill in picture above]

	    %eip points at first instruction of function
	    %esp+4 point at first argument 
	    %esp points at return address

	--after ret instruction:

	    %eip contains return address
	    %esp points at arguments pushed by caller
	    %eax contains return value (or trash if function is void)

	    %ecx, %edx may be trashed
	    %ebp, %ebx, %esi, %edi need to look the way that they did at
	      the time of the call
		
	--in other words:

	     %eax, %ecx, %edx are "caller save": caller's job to push
		them on the stack if it wants to save them

	     %ebp, %ebx, %esi, %edi are "callee save": callee's job to
	        push them on the stack after function call, and pop
		them (meaning restore their values by removing them from
		the stack) just before doing return


	FRAME POINTER

	--here's the picture of the stack when one function calls
	another:

	       +------------+   |
	       | arg 2      |   \
	       +------------+    >- previous function's stack frame
	       | arg 1      |   /
	       +------------+   |
	       | ret %eip   |   /
	       +============+   
        %ebp-> | saved %ebp |   \
	       +------------+   |
	       |            |   |
	       |   local    |   \
	       | variables, |    >- current function's stack frame
	       |  callee-   |   /
	       | saved vars,|
	       | etc.       |
	%esp-> +------------+   /


	--%esp moves to make stack bigger/smaller

	--%ebp points at saved %ebp from previous function
	    --saved %ebs form chain; can walk stack
	    --arguments and locals at fixed offsets from ebp

	--function prologue:
	    
		pushl %ebp
		movl %esp, %ebp
	       
	--function epilogue
		
		movl %ebp, %esp
		popl %ebp 
		ret

	--example

	    [see handout]


	aside: if you're wondering why the convention is that ebx
	is callee saved while eax,ecx,edx are caller saved, here's
	an answer.

	    eax: "accumulator" register
	    ebx: "base" register
	    ecx: "count" register
	    edx: "data" register

	    idea was that ebx would point to the base of a data
	    structure (just as ebp meant "buffer pointer" and
	    pointed to the base of a frame).

	    ebx often points to a data segment (e.g., for a dynamic
	    library's data), by setting it up at the beginning of a function
	    and keeping it constant throughout
	    
	    Because it's a "stable" pointer, makes sense to have the
	    callee save it. ("if you're gonna touch this, put it back the
	    way you found it")

	    eax/ecx/edx are more ephemeral: used for particular calculations
	    and such. Makes sense to require the caller to save it if it
	    needs the values in there. ("if you really want to keep
	    these values, save them".)
	    
---------------------------------------------------------------------------

[Credit to Frans Kaashoek, Robert Morris, and Nickolai Zeldovich for
much of this content.]