Class 3 CS 202 05 February 2015 On the board ------------ 1. Last time 2. Lab notes 3. Shell; fork()/exec() separation --------------------------------------------------------------------------- 1. Last time privileged vs. unprivileged mode process's view of memory stacks: main points: function calling is implemented by pushing things on the stack. the stack is the fundamental implementation technique that allows for variables to be "local" to a function. some of the calling function's state is pushed on the stack by the caller; some of the calling function's state is saved by the callee; see notes from last time for more about this. the return address is pushed on the stack. if I somehow arranged to copy the call stack to another process and said "here's your stack", then that other process would *appear* to return exactly the same as the first one would. (this latter point is the connection to fork() and process creation). described some system calls every process can usually expect to begin life with three file descriptors already open: 0: represents the input to the process (e.g., tied to terminal) 1: represents the output 2: represents the error output these are sometimes known as stdin, stdout, stderr picture I wanted to draw: if we have one function call, we have this picture.... ... +------------+ | | arg 2 | \ +------------+ >- previous function's stack frame | arg 1 | / +------------+ | | ret %eip | +============+ %ebp-> | saved %ebp | \ +------------+ | | | | | local | \ | variables, | >- current function's stack frame | callee- | / | saved vars,| | etc. | %esp-> +------------+ / ....and then if the currently executing function calls another, we get this picture: +------------+ | arg 2 | +------------+ | arg 1 | +------------+ | ret %eip | +============+ - | saved %ebp | | +------------+ \ | | | | local | | | variables, | | | callee- | | | saved vars,| | | etc. | | +------------+ | | arg 2' | \ +------------+ >- previous function's stack frame | arg 1' | / +------------+ | | ret %eip' | / +============+ %ebp'-> | saved %ebp'| \ +------------+ | | | | | local | \ | variables, | >- new function's stack frame | callee- | / | saved vars,| | | etc. | %esp'-> +------------+ Above, the quote character (') means "these values are different from the ones without the quote character". 2. Lab notes casting, OS in more detail, do_fork()-vs-fork() (a) pointers and integers can be cast to one another. example: uint32_t program_text_old; uint32_t program_text_new; char* src_pointer = (char*)program_text_old; char* dst_pointer = (char*)program_text_new memcpy(dst, src, n); this is normally a poor programming practice, because it leads to bugs. in low-level systems code, given current tools, platforms, etc., such casting is pretty much unavoidable. be careful when you cast! (b) the whole OS as a unit need to wrap your head around it. a good way to do this is by studying the code, even if it doesn't make sense at first. the hints in the code are fairly helpful. here's a quick rundown: 1. machine turns on. 2. starts executing code from its BIOS, which is a program that lives in ROM (BIOS = basic input/output system) (ROM = read-only memory) 3. BIOS loads the boot loader from the disk. 4. the boot loader loads the kernel, and jumps to its *entry point* (which is an address that the programmer of the kernel specified). This address shows up in the ELF file for the kernel (ELF = executable and linking format; it's a file format for storing binary executables, among other things). --> QUESTION: how did the programmer tell the linker where to set the entry point in the ELF file? Answer: see the excerpt "-e multiboot_start" in lab1/GNUmakefile. This argument is passed to the linker (ld). Now, note that multiboot_start is defined in k-int.S, and that entry point quickly calls "start" 5. the kernel starts executing at start() 6. the kernel breathes life into a process, and transfers control to the process (via "iret"). 7. now behavior is governed by the state diagram. 8. in WeensyOS 1, there are no timer interrupts. A process will not stop executing unless it does something illegal or traps to the kernel. 9. the trap vectors control into the kernel, which starts executing at the interrupt() instruction. there's a convention: user-level process knows that it needs to do, say, int $0x30 and kernel knows that when it gets "interrupt number 0x30" (48 in decimal), it means "user-level process is asking me to report its PID". (c) fork() vs. do_fork(). one of you asked about do_fork() vs. fork(). do_fork() is the code in the kernel that responds to the system call fork(). it executes exactly once. its purpose is to create another process "in the image of" (or "that takes after" or "that is a nearly complete copy") of the invoking process. do_fork() arranges for fork() to appear "to return twice." however, the quoted phrase is perhaps confusing. a better way to say it is that do_fork creates another process, and in both processes, execution resumes at the return from fork(). 3. Shell --a program that creates processes --the human's interface to the computer --GUIs (graphical user interfaces) are another kind of shell. A. How does the shell start programs? --example: $ ls [see panel 1 on handout; go line-by-line] --calls fork(), which creates a copy of the shell. now there are two copies of the shell running --then calls exec(), which loads the new program's instructions into memory and begins executing them. --(exec invokes the loader) while (1) { write(1, "$ ", 2); readcommand(command, args); // parse input if ((pid = fork()) == 0) // child? execve(command, args, 0); else if (pid > 0) // parent? wait(0); //wait for child else perror("failed to fork"); } --how can shell wait for the end of a process? --with wait() or waitpid() system calls --QUESTION: why is fork different from exec? What the heck? * We will come back to this. B. Redirection What does this do? $ ls > tmp1 How is it implemented? --just before exec, shell does: close(1) open("tmp1", O_TRUNC | O_CREAT | O_WRONLY, 0666) which automatically assigns tmp1 to be fd 1 --now, when ls runs, it continues to write(1,...), but "1" now means something else. [skipped in lecture but relevant to lab] What about $ sh < script > tmp1 where script contains echo abc echo def [draw picture] C. Pipelines pipe() system call: see panel 3 on the handout example pipeline: $ yes abcd | head -3 --assume 'yes' and 'head' were written in C --how does one of them _send its input to the other_? without rewriting these programs? see the code that implements "yes": it doesn't know anything about the program "head". how does the shell implement pipelines? 1. $ our_yes abcd abcd abcd ..... look at file descriptors: 0 1 /dev/tty /dev/tty [now look at panels 4 and 5 on the handout] 2. $ our_yes zyxw | head -4 zyxw zyxw zyxw zyxw 3. what is the shell doing? --draw the initial fd table --show what the shell does, using panel 6 on shell handout 0 1 our_yes: /dev/tty pipe head: pipe /dev/tty --now look: it all works --questions/points 1. who is waiting for whom? shell waits for the right-hand end of the pipeline. if left-hand process finishes first, great, it exits if right-hand process has already exited, then left-hand process gets SIGPIPE, and dies 2. Why close read-end/write-end in child/parent? two answers: (a) ensure that every process starts with exactly 3 file descriptors. (b) ensure that reading from the pipe returns "end of file" after the first command exits. (This is confusing. What's going on is that the reading process also started out with a "write end" to the pipe (remember, there were *4* file descriptors in all: the 2 for the pipeline times 2, because of the fork). if the reading process's copy of the write-end is not closed, then the kernel cannot return "done" when the other "write end" -- that is, the child's copy of the write end -- exits.) * Why are pipelines interesting? --what if ls had to be able to paginate its input? --with pipes, program doesn't have to get recompiled --program doesn't have to take file/device/whatever as input --prior to Unix, there was little to no composability. programs had to do everything or use temporary files awkwardly D. The power of the fork/exec separation # [an innovation from the original Unix. possibly/probably lucky # design choice at the time. but turns out to work really well. # allows the child to manipulate environment and file descriptors # *before* exec, so that the *new* program may in fact encounter a # different environment] # --To generalize redirections and pipelines, there are lots of things the parent shell might want to manipulate in the child process: file descriptors, environment, resource limits. --yet fork() requires no arguments! --Contrast with CreateProcess on Windows: BOOL CreateProcess( name, commandline, security_attr, thr_security_attr, inheritance?, other flags, new_env, curr_dir_name, .....) [http://msdn.microsoft.com/en-us/library/ms682425(v=VS.85).aspx] there's also CreateProcessAsUser, CreateProcessWithLogonW, CreateProcessWithTokenW, ... * The issue is that any conceivable manipulation of the environment of the new process has to be passed through arguments, instead of via arbitrary code. # in other words: # # because whoever calls CreateProcess() (or its variant) needs # to perfectly configure the process before it starts running. # # with fork(), whoever calls fork() **is still running** so # can arrange to do whatever it wants, without having to work # through a rigid interface like the above. allows arbitrary # "setup" of the process before exec(). E. Discussion: what makes a good abstraction? --simple but powerful --examples we've seen: --stdin (0), stdout (1), stderr (2) [nice by themselves, but when combined with the mechanisms below, things get even better] --file descriptors --fork/exec() separation --very few mechanisms lead to a lot of possible functionality this is a fork bomb in bash: $ :(){ : | : & }; : http://www.cyberciti.biz/faq/understanding-bash-fork-bomb/ [thanks to Eddie Kohler for this pointer]