Class 10
CS 372H
18 February 2010

On the board
------------
1. Last time
2. Condition variables
   --Motivation
   --Usage
3. Semaphores
4. Monitors
5. Advice and standards for concurrent programming
   --Advice
   --Standards

---------------------------------------------------------------------------

1. Last time

    --to deal with concurrency, need atomic operations

    --in the presence of multiple CPUs, we get atomic operations by
    using special hardware instructions

	--different options on different architectures

	--test_and_set() very common
	
	--on the x86, one uses xchg to implement test_and_set()

    --if you have a single CPU, can sometimes get atomic operations by
    turning off interrupts

	--confusingly, applies at both kernel-level and user-level

	--"turning off interrupts" in a user-level thread package means
	"ignoring the signals from the timer that would invoke the
	run-time via the handler of the timer signal"
    
    --Aside: here's a better-performing spinlock:

	void acquire(Lock* lock) {
	    pushcli();
	    while (xchg_val(&lock->locked, 1) == 1) {
		while (lock->locked) ;
	    }

	}

	void release(Lock* lock) {
	    xchg_val(&lock->locked, 0);
	    popcli();
	}

    --Today:

	--going to present some more synchronization primitives, along
	with rules for their use (and non-use)

	--the primitives may seem strange, and the rules may seem
	arbitrary: why one thing and not another?

	    --there is no absolute answer here

	    --**However**, history has tested the approach that we're
	    going to present. If you use the recommended primitives and
	    follow their suggested use, you will find it easier to write
	    correct code

	    --For now, just take the recommended approaches as a given,
	    and use them for a while. If you can come up with something
	    better after that, by all means do so!

	    --But please remember three things:
	    
		a. lots of really smart people have thought really hard
		about the right abstractions, so a day or two of
		thinking about a new one or a new use is unlikely to
		yield an advance over the best practices.

		b. the consequences of getting code wrong can be
		atrocious. see for example:
		
		    http://www.nytimes.com/2010/01/24/health/24radiation.html
		    http://sunnyday.mit.edu/papers/therac.pdf
		    http://en.wikipedia.org/wiki/Therac-25

		c. people who tend to be confident about their abilities
		tend to perform *worse*, so if you are confident you are
		a Threading and Concurrency Ninja and/or you think you
		truly understand how these things work, then you may
		wish to reevaluate.....

	    http://www.nytimes.com/2000/01/23/weekinreview/january-16-22-i-m-no-doofus-i-m-a-genius.html


2. Condition variables

    A. Motivation

	--producer/consumer queue 

	    --very common paradigm. also called "bounded buffer":

		--producer puts things into a shared buffer
		--consumer takes them out
		--producer must wait if buffer is full; consumer must
		  wait if buffer is empty
		--shows up everywhere
		    --Soda machine: producer is delivery person, consumer
			is soda drinkers, shared buffer is the machine
		    --DMA buffers

	--producer/consumer queue using mutexes

	    --what's the problem?

	    --answer: busy waiting

	--It is convenient to break synchronization into two types:
	    --*mutual exclusion*: allow only one thread to access a given
	    set of shared state at a time
	    --*scheduling constraints*: wait for some other thread to do
	    something (finish a job, produce work, consume work, accept
	    a connection, get bytes off the disk, etc.)

    B. Usage

	--API

	    --void cond_init (Cond *, ...); 
		--Initialize

	    --void cond_wait(Cond *c, Mutex* m);
		--Atomically unlock m and sleep until c signaled 
		--Then re-acquire m and resume executing 

	    --void cond_signal(Cond* c);
		--Wake one thread waiting on c
		[UPDATE: in some pthreads implementations, the analogous
		call wakes *at least* one thread waiting on c. Check the
		the documentation (or source code) to be sure of the semantics.]

	    --void cond_broadcast(Cond* c);
		--Wake all threads waiting on c

	--QUESTION: Why must cond_wait both release the mutex and sleep?
	(see handout)

	    --Answer: can get stuck waiting.

		Producer: while (count == BUFFER_SIZE)
		Producer: release()
		Consumer: acquire()
		Consumer: .....
		Consumer: cond_signal(&nonfull)
		Producer: cond_wait(&nonfull)

	    --Producer will never hear the signal!

	--QUESTION: Why not use "if"? (Why use "while"?)

	    --Answer: we can get an interleaving like this:
		--The signal() puts the waiting thread on the ready list
		but doesn't run it
		--That now-ready thread is ready to acquire() the mutex
		--But a *different* thread (a third thread: not the
		signaler, not the now-ready thread) could acquire() the
		mutex, work in the critical section, and now
		invalidates whatever condition was being checked
		--Our now-ready thread eventually acquire()s the mutex...
		--...with no guarantees that the condition it was
		waiting for is still true
		
	    --Solution is to use "while" when waiting on a condition
	    variable

	    --DO NOT VIOLATE THIS RULE; doing so will (almost always)
	    lead to incorrect code

3. Semaphores

    --Don't use these. We're mentioning them only for completeness and
    for historical reasons: they were the first general-purpose
    synchronization primitive, and they were the first synchronization
    primitive that Unix supported.

    --Introduced by Edsger Dijkstra in late 1960s
    
	--Dijkstra was a highly notable figure in computer science who
	spent the latter part of his career here at UT

    --Semaphore is initialized with an integer, N

    --Two functions:
	--Down() and Up() [also known as P() and V()]
	--The guarantee is that Down() will return only N more times
	than Up() is called
	--Basically a counter that, when it reaches 0, causes a thread
	to sleep()

    --Another way to say the same thing:
	--Semaphore holds a count
	--Down() is an atomic operation that waits for the count to
	become positive; it then decrements the count by 1
	--Up() is an atomic operation that increments the count by 1 and
	then wakes up a thread waiting on Down(), if any

    --Don't use these! (Notice that Andrew Birrell [who is a Threading
    Ninja] doesn't even mention them in his paper.)

    --Problems:
	--semaphores are dual-purpose (for mutual exclusion and
	scheduling constraints), so hard to read code and hard to get
	code right
	--semaphores have hidden internal state
	--getting a program right requires careful interleaving of
	"synchronization" and "mutex" semaphores

4. Monitors

    --High-level idea: an object (as in object-oriented systems)
	
	--in which methods do not execute concurrently; and

	--that has one or more condition variables

    --More detail
    
	--Every method call starts with acquire(&mutex), and ends with
	release(&mutex)

	--Technically, these acquire()/release() are invisible to the
	programmer because it is the programming language (i.e., the
	compiler+run-time) that is implementing the monitor

	    --So, technically, a monitor is a programming language
	    concept

	    --Book follows this technical definition
	    
	    --But technical definition isn't hugely useful because no
	    programming languages in widespread usage have true monitors

	    --Java has something close: a class in which every method is
	    "synchronized" (i.e., implicitly protected by a mutex)

		--Not exactly a monitor because there's nothing forcing
		every method to be synchronized

	    --And we can *use* mutexes and condition variables to
	    implement our own manual versions of monitors, though we
	    have to be careful
	
	--Given the above, we are going to use the term "monitor" more
	loosely to refer to both the technical definition and also a
	"manually constructed" monitor, wherein:
	
	    --all method calls are protected by a mutex (that is, the
	    programmer inserts those acquire()/release() on entry and
	    exit from every procedure *inside* the object)
	    
	    --synchronization happens with condition variables whose
	    associated mutex is the mutex that protects the method calls

	--In other words, we will use the term "monitor" to refer to the
	programming conventions that you should follow when building
	multithreaded applications

	    --you must follow these conventions on lab T

    --Example: see handout

    --RULE:

	--hold lock when doing condition variable operations

	--Some (e.g., Birrell) will say: for experts only, no need to
	hold the lock when signaling. IGNORE THIS. Putting the signal
	outside the lock is only a small performance optimization, and
	it is likely to lead you to write incorrect code.
	
	--to get credit in Lab T, you must hold the associated mutex
	when doing a condition variable operation

    --Different styles of monitors:

	--Hoare-style: signal() immediately wakes the waiter

	--What the book calls Hansen-style: signal() required to be last
	statement in a procedure

	--What everyone else calls Hansen-style and what we will use:
	signal() eventually wakes the waiter. Not an immediate transfer

    --Can we replace SIGNAL with BROADCAST, given our monitor semantics?
     (Answer: yes, always.) Why?

	--while() condition tests the needed invariant. program
	doesn't progress pass while() unless the needed invariant is
	true.

	--result: spurious wake-ups are acceptable....

	--...which implies you can always wakeup a thread at any
	moment with no loss of correctness....

	--....which implies you can replace SIGNAL with BROADCAST
	[though it may hurt performance to have a bunch of
	needlessly awake threads contending for a mutex that they
	will then acquire() and release().]

    --Can we replace BROADCAST with SIGNAL?

	--Answer: not always. 

	--Example:
	    --memory allocator
	    --threads allocate and free memory in variable-sized chunks
	    --if no memory free, wait on a condition variable
	    --now posit:
		--two threads waiting to allocate chunks of memory
		--no memory free at all
		--then, a third thread frees 10,000 bytes
	    --SIGNAL alone does the wrong thing: we need to awaken both
	    threads

5. Advice and standards for concurrent programming

    A. Advice 

	--Use item #2 on handout as a case study....

	** General approach:

	--Decompose problem into objects 
  
	  object oriented style of programming: encapsulate shared 
	  state and synchronization variables inside of objects 
	   
	   Note: 
	   (1) Shared objects are separate from threads 
	   (2) Shared object encapsulates code, synchronization 
	   variables, and state variables 

	--What are threads, what are shared objects on handout?
	(producer/consumer; MyBuffer)
    
	--Warning: most examples in the book talk about "thread 1's
	code" and "thread 2's code", etc. This is b/c most of the
	"classic" problems were studied before OO programming was
	widespread.
    
	--Don't manipulate synchronization variables or shared state
	variables in the code associated with a thread; do it with the
	code associated with a shared object.  
      
	    --Threads tend to have "main" loops. These loops tend to
	    access shared objects. *However*, the "thread" piece of it
	    should not include locks or condition variables. Instead,
	    locks and CVs should be encapsulated in the shared objects.

	    --Why?
		(1) Locks are for synchronizing across multiple threads. Doesn’t 
		make sense for one thread to "own" a lock.
		(2) Encapsulation -- details of synchronization are internal details 
		of a shared object. Caller should not know about these details. 
		"Let the shared objects do the work."
	    
	--Common confusion: trying to do synchronization within the
	threads' code (i.e., not following the advice above). No!
	Synchronization should happen within the shared objects. Mantra:
	"let the shared objects do the work".

	    --[Note: our earlier examples don't actually follow the
	    advice, but that is in part so you can see a full example.
	    As we enter the object-oriented world, we are going to
	    encapsulate the details inside objects.]

	** Design approach:
	
	    1. Getting started:
	     
		 1a. Identify units of concurrency. Make each a thread with
		 a go() method or main loop. Write down the actions a thread
		 takes at a high level.  
		 
		 1b. Identify shared chunks of state. Make each shared
		 *thing* an object. Identify the methods on those objects,
		 which should be the high-level actions made *by* threads
		 *on* these objects. Plan to have these objects be monitors.
		 
		 1c. Write down the high-level main loop of each thread. 
	     
	    Advice: stay high level here. Don't worry about synchronization 
	    yet. Let the objects do the work for you. 
	     
	    Separate threads from objects. The code associated with a thread 
	    should not access shared state directly (and so there should be no 
	    access to locks/condition variables in the "main" procedure for the 
	    thread). Shared state and synchronization should be encapsulated in 
	    shared objects. 

	    --QUESTION: how does this apply to the example on the
	    handout?
		--separate loops for producer(),consumer(), and
		synchronization happens inside MyBuffer.
	     
	    Now, for each object: 
	     
	    2. Write down the synchronization constraints on the
	    solution. Identify the type of each constraint: mutual
	    exclusion or scheduling. For scheduling constraints, ask,
	    "when does a thread wait"?

		--NOTE: usually, the mutual exclusion constraint is
		upheld by the fact that we're programming with monitors.

		--QUESTION: how does this apply to the example on the
		handout?
		    --Only one thread can manipulate the buffer at a time
		    (mutual exclusion constraint)
		    --Producer must wait for consumer to empty slots if all
		    full (scheduling constraint)
		    --Consumer must wait for producer to fill buffers if all
		    empty (scheduling constraint)

	    3. Create a lock or condition variable corresponding to each 
	    constraint 

		--QUESTION: how does this apply to the example on the
		handout?
		    --Answer: need a lock and two condition variables.
		    But lock was sort of a given from the monitor.
	     
	    4. Write the methods, using locks and condition variables for 
	    coordination  
	     
    B. Standards

	--see Mike D's "Programming With Threads", linked from lab T

	    --You are required to follow this document

	    --You will lose points (potentially many!) on the lab and on
	    the exam if you stray from these standards

	    --Note that in his example in section 4, there needs to be
	    another line:

		--right before mutex->release(), he should have:
		    assert(invariants hold)

    C. Reflections

	--Number one piece of advice: SAFETY FIRST.

	    --Locking at coarse grain is easiest to get right, so do
	    that (one big lock for each big object or collection of
	    them)
	    
	    --Don't worry about performance at first

	    --In fact, don't even worry about liveness at first
	    
		--In other words don't view deadlock as a disaster

	    --Key invariant: make sure your program never does the wrong thing