Class 11
CS372H
22 February 2011

(One handout)

On the board
------------

1. Last time
2. Advice
3. Some practice with concurrent programming
4. Trade-offs and problems from locking
    A. deadlock
    
---------------------------------------------------------------------------

1. Last time

    --spinlocks, mutexes, condition variables, monitors

    --standards: you must follow MikeD's "Programming With Threads",
    linked from lab T

	--You are required to follow this document

	--You will lose points (potentially many!) on the lab and on the
	exam if you stray from these standards

	--Note that in his example in section 4, there needs to be
	another line:

	    --right before mutex->release(), he should have:
		assert(invariants hold)

    --more about the standards/advice

       --the primitives may seem strange, and the rules may seem
	arbitrary: why one thing and not another?

	    --there is no absolute answer here

	    --**However**, history has tested the approach that we're
	    using. If you use the recommended primitives and follow
	    their suggested use, you will find it easier to write
	    correct code

	--For now, just take the recommended approaches as a given,
	and use them for a while. If you can come up with something
	better after that, by all means do so!

	--But please remember three things:
	
	    a. lots of really smart people have thought really hard
	    about the right abstractions, so a day or two of
	    thinking about a new one or a new use is unlikely to
	    yield an advance over the best practices.

	    b. the consequences of getting code wrong can be
	    atrocious. see for example:
	    
		http://www.nytimes.com/2010/01/24/health/24radiation.html
		http://sunnyday.mit.edu/papers/therac.pdf
		http://en.wikipedia.org/wiki/Therac-25

	    c. people who tend to be confident about their abilities
	    tend to perform *worse*, so if you are confident you are
	    a Threading and Concurrency Ninja and/or you think you
	    truly understand how these things work, then you may
	    wish to reevaluate.....

		--Dunning-Kruger effect
		--http://www.nytimes.com/2000/01/23/weekinreview/january-16-22-i-m-no-doofus-i-m-a-genius.html

    --MikeD stands on the desk when proclaiming the standards

2. Advice 

    A. Top-level piece of advice: SAFETY FIRST.

	--Locking at coarse grain is easiest to get right, so do
	that (one big lock for each big object or collection of
	them)
	
	--Don't worry about performance at first

	--In fact, don't even worry about liveness at first
	
	    --In other words don't view deadlock as a disaster

	--Key invariant: make sure your program never does the wrong thing

    B. More detailed advice: design approach

	[We will use item #1 on handout as a case study.....]

	--Here's a four-step design approach:
	
	    1. Getting started:
	     
		 1a. Identify units of concurrency. Make each a thread with
		 a go() method or main loop. Write down the actions a thread
		 takes at a high level.  
		 
		 1b. Identify shared chunks of state. Make each shared
		 *thing* an object. Identify the methods on those objects,
		 which should be the high-level actions made *by* threads
		 *on* these objects. Plan to have these objects be monitors.
		 
		 1c. Write down the high-level main loop of each thread. 
	     
	    Advice: stay high level here. Don't worry about synchronization 
	    yet. Let the objects do the work for you. 
	     
	    Separate threads from objects. The code associated with a
	    thread should not access shared state directly (and so there
	    should be no access to locks/condition variables in the
	    "main" procedure for the thread). Shared state and
	    synchronization should be encapsulated in shared objects. 

	    --QUESTION: how does this apply to the example on the
	    handout?
		--separate loops for producer(),consumer(), and
		synchronization happens inside MyBuffer.
	     
	    Now, for each object: 
	     
	    2. Write down the synchronization constraints on the
	    solution. Identify the type of each constraint: mutual
	    exclusion or scheduling. For scheduling constraints, ask,
	    "when does a thread wait"?

		--NOTE: usually, the mutual exclusion constraint is
		satisfied by the fact that we're programming with
		monitors.

		--QUESTION: how does this apply to the example on the
		handout?
		    --Only one thread can manipulate the buffer at a time
		    (mutual exclusion constraint)
		    --Producer must wait for consumer to empty slots if all
		    full (scheduling constraint)
		    --Consumer must wait for producer to fill buffers if all
		    empty (scheduling constraint)

	    3. Create a lock or condition variable corresponding to each 
	    constraint 

		--QUESTION: how does this apply to the example on the
		handout?
		    --Answer: need a lock and two condition variables.
		    But lock was sort of a given from the monitor.
	     
	    4. Write the methods, using locks and condition variables for 
	    coordination  
	
    C. More advice

	1. Don't manipulate synchronization variables or shared state
	variables in the code associated with a thread; do it with the
	code associated with a shared object.  
      
	    --Threads tend to have "main" loops. These loops tend to
	    access shared objects. *However*, the "thread" piece of it
	    should not include locks or condition variables. Instead,
	    locks and CVs should be encapsulated in the shared objects.

	    --Why?

		(a) Locks are for synchronizing across multiple threads.
		Doesn't make sense for one thread to "own" a lock.

		(b) Encapsulation -- details of synchronization are
		internal details of a shared object. Caller should not
		know about these details.  "Let the shared objects do
		the work."

	    --Common confusion: trying to acquire and release locks
	    inside the threads' code (i.e., not following this advice).
	    Bad idea! Synchronization should happen within the shared
	    objects. Mantra: "let the shared objects do the work".
	
	    --Note: our first example of condition variables -- 4c on
	    handout from last class (l10-handout) -- doesn't actually
	    follow the advice, but that is in part so you can all of the
	    parts working together.  

	2. Different way to state what's above:
	
	    --You want to decompose your problem into objects, as in
	    object-oriented style of programming.

	    --Thus:

	       (1) Shared object encapsulates code, synchronization 
		   variables, and state variables 

	       (2) Shared objects are separate from threads 

	    --Warning: most examples in the book talk about "thread 1's
	    code" and "thread 2's code", etc. This is b/c most of the
	    "classic" problems were studied before OO programming was
	    widespread.
	
3. Practice with concurrent programming

    --sleeping barber question from prior midterm posted (as today's
    reading). use it as practice

    --we guarantee to test concurrent programming on the midterm

    --today, we work a different example:

	--workers interact with a database
	    --motivation: banking, airlines, etc.

	--readers never modify database

	--writers read and modify data

	--using only a single mutex lock would be overly restrictive.
	Instead, want
	    --many readers at the same time
	    --only one writer at a time

    --let's follow the concurrency advice from last time (and above).....

	    1. Getting started
		a. what are units of concurrency? [readers/writers]
		b. what are shared chunks of state? [database]
		c. what does the main function look like?
		    read() 
			check in -- wait until no writers
			access DB
			check out -- wake up waiting writer, if appropriate

		    write()
			check in -- wait until no readers or writers
			access DB
			check out -- wake up waiting readers or writers

	    2. and 3. Synchronization constraints and objects

		--reader can access DB when no writers (condition:
		okToRead)

		--writer can access DB when no other readers or writers
		(condition: okToWrite)

		--only one thread manipulates shared variables at a
		time. NOTE: **this does not mean only one thread in the
		DB at a time** (mutex)

    
	    4. write the methods

		--inspiration required:
		    int AR = 0; // active readers
		    int AW = 0; // # active writers
		    int WR = 0; // waiting readers
		    int WW = 0; // waiting writers
	
		--see handout for the code

    --QUESTION: why not just hold the lock all the way through "Execute
    req"? (Answer: the whole point was to provide more concurrency,
    i.e., to move away from exclusive access.)

    --QUESTION: what if we had shared locks? The implementation of
    shared locks is given on the handout

---------------------------------------------------------------------------

--Go over survey feedback and labs

    --seems like people are a bit frustrated with the labs

    --note that detective work is part of the game; that's part of what
    you're learning

    --need to use a combination of cognitive tools (deduction) and
    technical tools (grep, ctags, etags, etc.)

---------------------------------------------------------------------------

5. Trade-offs and problems from locking

    Locking (in all its forms: mutexes, monitors, semaphores) raises
    many issues:

    A. deadlock
    B. starvation
    C. priority inversion
    D. broken modularity
    .....

    A. Deadlock

	--see handout: simple example based on two locks

	--see handout: more complex example
	    --M calls N 
	    --N waits
	    --but let's say condition can only become true if N is invoked
	    through M
	    --now the lock inside N is unlocked, but M remains locked; that
	    is, no one is going to be able to enter M and hence N.

	--can also get deadlocks with condition variables

	--lesson: dangerous to hold locks (M's mutex in the case on the
	handout) when crossing abstraction barriers

	--deadlocks without mutexes:
	    
	    --Real issue is resources & how required 

	    --non-computer example
	    
		**[picture of bridge]**

		--bridge only allows traffic in one direction 

		--Each section of a bridge can be viewed as a resource. 

		--If a deadlock occurs, it can be resolved if one car
		backs up (preempt resources and rollback). 

		--Several cars may have to be backed up if a deadlock occurs. 

		--Starvation is possible. 

	    --other example:
		
		--one thread/process grabs disk and then tries to grab
		scanner

		--another thread/process grabs scanner and then tries to
		grab disk

	--how do we get around deadlock?

	    (i) ignore it: worry about it when happens

	    (ii) detect and recover: not great

		--could imagine attaching debugger

		    --not really viable for production software, but
		    works well in development

		--threads package can keep track of resource-allocation graph

		--see book

		    --For each lock acquired, order with other locks held 
		    
		    --If cycle occurs, abort with error 
		
		    --Detects potential deadlocks even if they do not occur 

	    (iii) avoid algorithmically

		--banker's algorithm (see book)

		    --very elegant but impractical

		    --if you're using banker's algorithm, the gameboard
		    looks like this:

			ResourceMgr::Request(ResourceID resc,
					     RequestorID thrd) {
			    acquire(&mutex);
			    assert(system in a safe state);
			    while (state that would result from giving 
			           resource to thread is not safe) {
				wait(&cv, &mutex);	
			    }
			    update state by giving resource to thread
			    assert(system in a safe state);
			    release(&mutex);
			}

			Now we need to determine if a state is safe....

			To do so, see book

		--disadvantage to banker's algorithm:

		    --requires every single resource request to go
		    through a single broker

		    --requires every thread to state its maximum
		    resource needs up front. unfortunately, if threads
		    are conservative and claim they need huge quantities
		    of resources, the algorithm will reduce concurrency

	    (iv) prevent them by careful coding

		--negate one of the four conditions:
		    1. mutual exclusion
		    2. hold-and-wait
		    3. no preemption
		    4. circular wait

		--can sort of negate 1
		    --put a queue in front of resources, like the printer
		    --virtualize memory

		--not much hope of negating 2

		--can sort of negate 3:
		    --consider physical memory: virtualized with VM, can
		    take physical page away and give to another process! 

		--what about negating #4?

		    --in practice, this is what people do

		    --idea: partial order on locks

			--Establishing an order on all locks and making
			sure that every thread acquires its locks in
			that order

		    --why this works:

			--can view deadlock as a cycle in the resource
			acquisition graph

			--partial order implies no cycles and hence no
			deadlock

		    --three bummers:

			1. hard to represent CVs inside this framework.
			works best only for locks.

			2. compiler can't check at compile time that
			partial order is being adhered to because
			calling pattern is impossible to determine
			without running the program (thanks to function
			pointers and the halting problem)

			3. Picking and obeying the order on *all* locks
			requires that modules make public their locking
			behavior, and requires them to know about other
			modules' locking.  This can be painful and
			error-prone. 

			    --we will see Linux's filemap.c as an
			    example of complexity introduced by having a
			    locking order

	    (v) Static and dynamic detection tools

		--See, for example, these citations, citations
		therein, and papers that cite them:

		    Engler, D. and K. Ashcraft. RacerX: effective,
		    static detection of race conditions and deadlocks.
		    Proc. ACM Symposium on Operating Systems Principles
		    (SOSP), October, 2003, pp237-252.
		    http://portal.acm.org/citation.cfm?id=945468

		    Savage, S., M. Burrows, G. Nelson, P. Sobalvarro,
		    and T. Anderson. Eraser: a dynamic data race
		    detector for multithreaded programs. ACM
		    Transactions on Computer Systems (TOCS), Volume 15,
		    No 4., Nov., 1997, pp391-411.
		    http://portal.acm.org/citation.cfm?id=265927

		    a long literature on this stuff

		--Disadvantage to dynamic checking: slows program down

		--Disadvantage to static checking: many false alarms
		(tools says "there is deadlock", but in fact there is
		none) or else missed problems

		--Note that these tools get better every year. I believe
		that Valgrind has a race and deadlock detection tool