Class 6
CS372H
2 February 2012

On the board
------------

1. Last time
2. Process control: the shell
3. Unix: mechanics
4. Unix: some perspective

---------------------------------------------------------------------------

1. Last time

    --finished discussing page faults (uses and costs)

    --introduced processes

    --syscalls

    --shell. two reasons to study:
	--for its role as "process starter"
	--it's a case study of the use of system calls

2. Process control: the shell

    write on the board
    * How does the shell start programs? [last time]
    * Redirection [last time]
    * Pipelines (or filters)
    * The power of the fork/exec separation
    * What makes a good abstraction?

    A. How does the shell start programs? [last time]

    B. Redirection [last time]

    C. Pipelines (or filters)

	* What are these?
	    
	    --way of composing programs

	    --example:

		$ yes abcd | head -3

	    --'yes' and 'head' are probably C programs

	    --we now ask, "how does one of them _send its input to the other_
	    without the programs being rewritten?"

	* Detour: pipes and file descriptors

	    --see panels 3 and 4 on the handout

	* How does the shell implement pipelines?

	   1. $ our_yes abcd
	       abcd
	       abcd 
	       .....
		
	       look at file descriptors:
		0                 1       
	      /dev/tty         /dev/tty

	    2. $ our_yes zyxw | head -4

		zyxw
		zyxw
		zyxw
		zyxw

	    3. what is the shell doing? 

		--draw the initial fd table
		
		--show what the shell does, using panel 5 on shell handout

				0          1
		our_yes:     /dev/tty    pipe
		head:         pipe       /dev/tty

		--now look: it all works
	    
		--questions/points

		    1. who is waiting for whom? 

			shell waits for the right-hand end of the pipeline.

			if left-hand process finishes first, great, it exits

			if right-hand process has already exited, then left-hand process
			gets SIGPIPE, and dies
		    
		    2. Why close read-end/write-end in child/parent? two
		    answers:
		    
			(a) ensure that every process starts with 3 file
			descriptors.
			
			(b) ensure that reading from the pipe returns
			"end of file" after the first command exits.
			(This is confusing. What's going on is that the
			reading process also started out with a "write
			end" to the pipe (remember, there were *4* file
			descriptors in all: the 2 for the pipeline times
			2, because of the fork). if reading process's
			copy of the write-end is not closed, then the
			kernel cannot return "read" when the other
			"write end" -- that is, the child's copy of the
			write end -- exits.)

	* ASK: Why are pipelines useful?

	    --composability 

		--what if ls had to be able to paginate its input?

		--with pipes, program doesn't have to get recompiled

		--program doesn't have to take file/device/whatever as input

	    --wait, can't we just use temporary files?

		isn't 
		    echo "abcde...z" | lpr
		equivalent to
		    echo "abcde...z" > /tmpfoo ; lpr < /tmp/foo
		?

		no.

		(1) no state left sitting around in case 1

		(2) pipe redirection places no limit on data transferred

		(3) can use pipes for synchronization. quoting the xv6
		book
		(http://pdos.csail.mit.edu/6.828/2011/xv6/book-rev6.pdf),
		"pipes allow for synchronization: two processes can use
		a pair of pipes to send messages back and forth to each
		other, with each read blocking its calling process until
		the other process has sent data with write."
		
	    --prior to Unix, there was little to no composability. programs
	    had to do everything or use temporary files awkwardly

    [Note: this is why you'll share file descriptor state across fork
    and spawn in lab 7.]

	* ASK: what is disadvantage of pipelines?

	    --linear: hard to, for example, compare the outputs of two
	    programs just using command line

    D. The power of the fork/exec separation
    
	* Contrast with CreateProcess on Windows:

	    BOOL CreateProcess(
		name,
		commandline,
		security_attr, /* process attributes */
		thr_security_attr, /* thread attributes */
		inheritance?, /* inherit handles */
		other flags,
		new_env,
		curr_dir_name,
		.....)

	    [http://msdn.microsoft.com/en-us/library/ms682425(v=VS.85).aspx]

	    there's also CreateProcessAsUser, CreateProcessWithLogonW,
	    CreateProcessWithTokenW, ...

	* The issue is that any conceivable manipulation of the
	environment of the new process has to be passed through 
	arguments, instead of via arbitrary code.

    E. What makes a good abstraction?

	--simple but powerful

	--examples we've seen:
	
	    --stdin (0), stdout (1), stderr (2) [nice by itself, but
	    when combined with the mechanisms below, things get even
	    better]

	    --file descriptors

	    --fork/exec() separation

	    --very few mechanisms lead to a lot of possible
	    functionality


3. Discussion of Unix

    A. Why are we reading this paper?

	1. great example of a small number of mechanisms going very far
	(high ratio of capabilities to mechanism)
	
	    a. stuff was added to Unix in the 1980s, at Berkeley

	    b. Andy Tannenbaum: "System 7 was a dramatic improvement
	    over its predecessors, and over its successors as well".

	2. might seem obvious but that's only because this is now the
	way everyone does everything. at the time, Unix was (mostly) new.

	3. paper is a series of inspired choices that have withstood the
	test of time.

    B. Wasn't some of this stuff obvious at the time?

    C. ASK: The other paper says, "the structure of files is controlled by
    the programs that use them, not by the system". What do you think
    that means, and why is it a big deal?

    D. To change directory you issue the system call chdir(). why does
    cd (change directory), a shell command that calls chdir, have to be
    inside the shell? What would happen if cd were a separate program
    like ls that simply called chdir()?

    E. Sharing of file descriptor state.

	--ASK: why does this work?

	    (date ; ls) > tmp

	--ASK: what is disadvantage of this?


4. Perpsective

    --ASK: Section 8 of the other paper: "the success of the Unix system is
    largely due to the fact that it was not designed to meet any
    predefined objectives".

	--do you believe this?

    --ASK: How was Unix then different from Unix today? 

	--Ran on *tiny* computers!

	--ran on PDP-11, which had 16 bits, which means memory was 64KB

	--also means processes had to access files serially (i.e., read
	continuously or seek and then start reading). why? because they
	couldn't be mapped into memory.

	--also means processes *had* to be small (only had 64 KB of memory)

	    --interestingly, physical memory was larger than a
	    process's virtual memory space.

	    --so why did they need virtual memory?

	    --answer: to allow multiple programs to be resident at once

	--this is why the development effort was not focused on
	*intra-process* enhancement (virtual memory, threads, etc.) but
	rather on *inter-process* glue: pipes, filters, exec

    --ASK: seems like there was some luck. where?
    '
	process control scheme (shell + fork + exec)

	    Ritchie says it was the easiest thing to implement

	    but huge benefits:
		
		detached processes

		same program for interactive and batched jobs	

	compare to CreateProcess() on Windows: 24 arguments

    --ASK: what are disdvantages of current pipes interface?

    --ASK: Besides pipes, what advances did Unix represent?

	--before Unix, little in the way of scripting; shell scripts let you
	"script" the system or automate it in some way. this was novel.

	--modular programs

	--small number of mechanisms: file descriptors representing
	sockets, files, devices, pipes

    --ASK: How did this happen?

	--some "salvation through suffering", as the paper puts it. The
	design was forced to be economical.