Class 6 CS372H 2 February 2012 On the board ------------ 1. Last time 2. Process control: the shell 3. Unix: mechanics 4. Unix: some perspective --------------------------------------------------------------------------- 1. Last time --finished discussing page faults (uses and costs) --introduced processes --syscalls --shell. two reasons to study: --for its role as "process starter" --it's a case study of the use of system calls 2. Process control: the shell write on the board * How does the shell start programs? [last time] * Redirection [last time] * Pipelines (or filters) * The power of the fork/exec separation * What makes a good abstraction? A. How does the shell start programs? [last time] B. Redirection [last time] C. Pipelines (or filters) * What are these? --way of composing programs --example: $ yes abcd | head -3 --'yes' and 'head' are probably C programs --we now ask, "how does one of them _send its input to the other_ without the programs being rewritten?" * Detour: pipes and file descriptors --see panels 3 and 4 on the handout * How does the shell implement pipelines? 1. $ our_yes abcd abcd abcd ..... look at file descriptors: 0 1 /dev/tty /dev/tty 2. $ our_yes zyxw | head -4 zyxw zyxw zyxw zyxw 3. what is the shell doing? --draw the initial fd table --show what the shell does, using panel 5 on shell handout 0 1 our_yes: /dev/tty pipe head: pipe /dev/tty --now look: it all works --questions/points 1. who is waiting for whom? shell waits for the right-hand end of the pipeline. if left-hand process finishes first, great, it exits if right-hand process has already exited, then left-hand process gets SIGPIPE, and dies 2. Why close read-end/write-end in child/parent? two answers: (a) ensure that every process starts with 3 file descriptors. (b) ensure that reading from the pipe returns "end of file" after the first command exits. (This is confusing. What's going on is that the reading process also started out with a "write end" to the pipe (remember, there were *4* file descriptors in all: the 2 for the pipeline times 2, because of the fork). if reading process's copy of the write-end is not closed, then the kernel cannot return "read" when the other "write end" -- that is, the child's copy of the write end -- exits.) * ASK: Why are pipelines useful? --composability --what if ls had to be able to paginate its input? --with pipes, program doesn't have to get recompiled --program doesn't have to take file/device/whatever as input --wait, can't we just use temporary files? isn't echo "abcde...z" | lpr equivalent to echo "abcde...z" > /tmpfoo ; lpr < /tmp/foo ? no. (1) no state left sitting around in case 1 (2) pipe redirection places no limit on data transferred (3) can use pipes for synchronization. quoting the xv6 book (http://pdos.csail.mit.edu/6.828/2011/xv6/book-rev6.pdf), "pipes allow for synchronization: two processes can use a pair of pipes to send messages back and forth to each other, with each read blocking its calling process until the other process has sent data with write." --prior to Unix, there was little to no composability. programs had to do everything or use temporary files awkwardly [Note: this is why you'll share file descriptor state across fork and spawn in lab 7.] * ASK: what is disadvantage of pipelines? --linear: hard to, for example, compare the outputs of two programs just using command line D. The power of the fork/exec separation * Contrast with CreateProcess on Windows: BOOL CreateProcess( name, commandline, security_attr, /* process attributes */ thr_security_attr, /* thread attributes */ inheritance?, /* inherit handles */ other flags, new_env, curr_dir_name, .....) [http://msdn.microsoft.com/en-us/library/ms682425(v=VS.85).aspx] there's also CreateProcessAsUser, CreateProcessWithLogonW, CreateProcessWithTokenW, ... * The issue is that any conceivable manipulation of the environment of the new process has to be passed through arguments, instead of via arbitrary code. E. What makes a good abstraction? --simple but powerful --examples we've seen: --stdin (0), stdout (1), stderr (2) [nice by itself, but when combined with the mechanisms below, things get even better] --file descriptors --fork/exec() separation --very few mechanisms lead to a lot of possible functionality 3. Discussion of Unix A. Why are we reading this paper? 1. great example of a small number of mechanisms going very far (high ratio of capabilities to mechanism) a. stuff was added to Unix in the 1980s, at Berkeley b. Andy Tannenbaum: "System 7 was a dramatic improvement over its predecessors, and over its successors as well". 2. might seem obvious but that's only because this is now the way everyone does everything. at the time, Unix was (mostly) new. 3. paper is a series of inspired choices that have withstood the test of time. B. Wasn't some of this stuff obvious at the time? C. ASK: The other paper says, "the structure of files is controlled by the programs that use them, not by the system". What do you think that means, and why is it a big deal? D. To change directory you issue the system call chdir(). why does cd (change directory), a shell command that calls chdir, have to be inside the shell? What would happen if cd were a separate program like ls that simply called chdir()? E. Sharing of file descriptor state. --ASK: why does this work? (date ; ls) > tmp --ASK: what is disadvantage of this? 4. Perpsective --ASK: Section 8 of the other paper: "the success of the Unix system is largely due to the fact that it was not designed to meet any predefined objectives". --do you believe this? --ASK: How was Unix then different from Unix today? --Ran on *tiny* computers! --ran on PDP-11, which had 16 bits, which means memory was 64KB --also means processes had to access files serially (i.e., read continuously or seek and then start reading). why? because they couldn't be mapped into memory. --also means processes *had* to be small (only had 64 KB of memory) --interestingly, physical memory was larger than a process's virtual memory space. --so why did they need virtual memory? --answer: to allow multiple programs to be resident at once --this is why the development effort was not focused on *intra-process* enhancement (virtual memory, threads, etc.) but rather on *inter-process* glue: pipes, filters, exec --ASK: seems like there was some luck. where? ' process control scheme (shell + fork + exec) Ritchie says it was the easiest thing to implement but huge benefits: detached processes same program for interactive and batched jobs compare to CreateProcess() on Windows: 24 arguments --ASK: what are disdvantages of current pipes interface? --ASK: Besides pipes, what advances did Unix represent? --before Unix, little in the way of scripting; shell scripts let you "script" the system or automate it in some way. this was novel. --modular programs --small number of mechanisms: file descriptors representing sockets, files, devices, pipes --ASK: How did this happen? --some "salvation through suffering", as the paper puts it. The design was forced to be economical.