Class 24
CS 372H
19 April 2012

On the board
------------

1. Last time 
2. Deterministic execution
3. Determinator
4. Derminator: quick discussion
5. Guest presenter: Chris Cotter

---------------------------------------------------------------------------

1. Last time

    --transactions. discussed:

	--crash recovery (in DB context)

	--isolation (in OS context)
	
	    --there's a classic DB approach to isolation (2-phase
	    locking NOT to be confused with 2-phase commit) described in
	    the notes from last time.


2. Deterministic execution

	y = 12.

	two threads. one executes t1, one t2

	t1() {
	    x = y + 1;
	}

	t2() {
	    y = y * 2;
	}

    --what result would determinator provide for x? 
	--answer: 13

    --interleaving schedule: **predictable** not just deterministic

    --all nondeterministic inputs (time, random numbers, etc.):
    eliminated

    --why do we want this?

3. Determinator


    A. What's the approach?

	--Kernel exposes 
	     Put/Get/Ret

	--User-level abstractions built on top of those

	--Note: kernel internally is not deterministic. if you
	"snapshot" the processor and look at which "spaces" actually run
	on it, it will not be the same every time the system is run
	anew.

	    put differently, the scheduler that assigns spaces to the
	    processor can be utterly non-deterministic! 

    B. Programming model and environment

	--private workspace model

	    --they didn't invent the model, but I believe that they are
	    the first to incorporate it into a programming model exposed
	    by the OS.

	    --cannot get read/write races (like in the example above)
	    because the thing being read is the state from the beginning

	    --can get write/write conflicts (unavoidable if there's
	    concurrency), but they do something cool here:

		--every time the program runs with the same input, the
		write/write conflict is the same

		--so they can either
		
		    --resolve the conflict deterministically, or

		    --delcare the mere existence of a conflict to be a bug

	--synchronization primitives are deterministic

	    --natively support fork/join

	    --emulate locks (though the authors would say that locks are
	    a bad idea)

		--the difference is that the scheduling of acquire()
		[say] will always be deterministic

	--race-free system namespaces

	    --deep idea (in my opinion, this is one of the coolest
	    things about this paper)

	    --"Application code, not the system, decides where to
	    allocate memory and what process IDs to assign children.
	    This principle ensures that naming a resource reveals no
	    shared state information other than what the application
	    itself provided".

		[note contrast with exokernel!]

	    --they then go on to point out that designing things this
	    way may sidestep kernel bottlenecks (like a global lock for
	    a system table), which may help multicore scalability

	--result is that the system does not provide shared memory!! you
	can simulate two threads seeing the same memory, as we'll see
	below....

    C. So what's the approach?

	draw picture:

	      ----------------------------------------------------
	      |  space                        space              | 
              | ---------------------- 	---------------------    |
	      |	|    app             | 	|    app             |   |
	      |	|---<fork>---<exec>--| 	|---<fork>---<exec>--|   |
	      | |      runtime       |  |      runtime       |   |
	      |	----------------------  ---------------------    |
              |	                                                 |
	      |-----------<put>---<get>--<ret>------------------ |
	      |		             Det                         |
	      |__________________________________________________|


	--every space gets its own copy of memory
	    --use copy-on-write for efficiency

	--a parent can create child spaces

	--parent can insist on child's returning control

	    --(how do child spaces go away?
		--if they exit, fine
		--if they don't, they can be stopped.
		--however, if they are stopped, it's not clear how to
		garbage collect them. maybe this wasn't implemented.)

	--parent can do:
	    Put/Get
		--puts the caller to sleep until Ret or processor trap
		--(wait, what prevents processor traps from being
		non-deterministic?)

	--child can do:
	    Ret
		--stops calling space, returning control to parent
		--exceptions cause logical Ret
		
	--assertion: Put/Get/Ret gives determinism, so the kernel is
	done.

	    --rest of the work is how to use these three system calls to
	    build a usable programming model and system

    D. Implementation of fork/exec/wait

	--fork straightforward

	--ASK: how do they do exec?

	    --answer: runtime keeps a child space around

		--loads up the child space

		--calls Get to then load the child's space code into the
		parent space

	--ASK: how do they do wait?

	    --answer: mainly a Get call in parent and a Ret call in
	    child.

    E. File system

	--replicate the file system in every process
    
	    --implement file versioning

	    --and treat it like a distributed file system

	    --note that conflicts here are NOT determinator conflicts,
	    as in a get() with merge set. They're higher-level
	    conflicts.

	--after wait(), the runtime copies the child's file system image
	    (this is easy because they don't have to worry about the
	    disk!!!)

    F. What about actual non-determinism on systems, such as I/O and
    timer events? (the point being that lots of things on the computer
    *are* non-deterministic: timer interrupts, cycle counters, console
    inputs, etc.) So how do they expose these things to user-level code?

	--convert these things into explict I/O channels

	--represent the I/O channels as special files, maintained by the
	supervising process

	--the contents of these files gets merged as sync points.

	--so the idea is that the process just sits there doing
	    read() and write()

	    --if there's no more data, read() turns into Ret and waits for
	    data.

		--parent may ultimately do the same thing

	    --write() sticks data in the console file. when it syncs
	    with its parent, it tells its parent about the data. parent
	    tells its parent. etc. Eventually the kernel is told about
	    the data since the root process has access to the kernel's
	    I/O devices.

	--ASK: wait, where does the determinism come from?

	--to get determinism, the supervising process COULD, if it wanted:
	
	    --replay the events
	    
	    --synthesize the content
	    
	    --etc.
	    
	--this is where the determinism comes from, but it's arguably a
	fudge (on the other hand, they have no choice).

	    --the reason that it's a fudge is that if you ran the code
	    again, you'd get a different answer. to get the same answer,
	    you'd have to run in a special replay mode, and then make
	    sure that you had originally logged.

	    --it's not clear that this approach would help debugging,
	    unless they're logging by default.

    G. How do they provide conventional programming model?

	--see 4.4: easiest thing is to expose the private workspace
	model using fork/join or barriers

	--also see 4.5: can provide conventional shared memory (vs.
	private workspace) but only in terms of deterministic
	scheduling. how do they do it?

	    --run every thread for a certain quantum of time; then merge
	    their memories together

	    --violates some consistency, but after quantum, everything is
	    consistent, and all threads can see each other's memory.

	    --requires some hacks to make it reasonably efficient

	--unhappy tradeoff:

	    --large scheduling quantum: threads waste time

	    --small scheduling quantum: lots of propagation of shared
	    state back and forth

	--and one still cannot predict what will happen in the t1()/t2()
	example above (though it will be the same every time), if we
	rewrite the example as:

	    t1() {
		acquire_lock();
		x = y + 1;
		release_lock();
	    }

	    t2() {
		acquire_lock();
		y = y * 2;
		release_lock();
	    }

4. Questions/discussion

    --clean slate approach (which is nice). they take an idea to its
    extreme.

    --started from JOS code!
	(but, from a quick skim of the pios branch of the JOS repo, does
	not look a lot like JOS)

    A. Limitations?

	--file system is not persistent; makes the "file system" a bit
	of an easier problem 

	--can only synchronize and communicate with immediate parent and
	children.
	    --why do you think that they do that?
		--perhaps remnant of JOS
		--avoids circularity
		--sidesteps nasty permissions issues [CHECK]

	--what else?	
	    --bunch of others. see fourth paragraph of section 5.

    B. Design decisions

	--why implement this functionality in the kernel? why not have a
	user-level thread scheduler do everything?

	    --answer: they are going for complete determinism. complete
	    determinism requires near-total control over the environment
	    that the thread/process sees, and it's really
	    the kernel that creates this environment. hence, they need
	    to make the levels above the kernel see determinism

    C. Performance

	--why giving up performance?

	--why works better with coarse-grained parallelism?

	--a few reasons

	    --first, any synchronization is very expensive. what used to
	    be enqueuing and removing a thread or process from a queue
	    is now a copy and traversal of the page tables; not cheap

		--another way to say this is that fork() and join() are
		the only points at which threads can view each other's
		memory, and those operations are expensive.

	    --second, synchronization is much more coarse-grained. that
	    has a gain -- less synchronization -- but also a cost: if
	    the app required lots of synchronization, then either:

		--we're doing those virtual memory operations a lot; or

		--a thread is asleep waiting to be joined 

	--thus, the applications where this model is likely to work best
	are those that work with coarse-grained parallelism: take a
	chunk of work, compute on it, and "check it back in".

5. Guest presentation