Class 10
CS372H
17 February 2011

(One handout)

On the board
------------

1. Last time
2. Spinlocks and mutexes
3. Condition variables
    --Motivation
    --Usage
4. Semaphores
5. Monitors
6. Standards/advice

---------------------------------------------------------------------------

1. Last time

    --want: lock()/unlock() or enter()/leave() or
    acquire()/release()

	--lots of names for the same idea

	--mutex_init(mutex_t* m), mutex_lock(mutex_t* m),
	mutex_unlock(mutex_t* m),....

	--pthread_mutex_init(), pthread_mutex_lock(), ...

    --in each case, the semantics are that once the thread of
    execution is executing inside the critical section, no other
    thread of execution is executing there

    --How to implement locks/mutexes/etc.?

2. Spinlocks and mutexes

    [for handout 2(b), draw picture of two CPUs, memory cell, and atomic
    exchange.]

    --Spinlocks

	--Fine for quick operations in kernel

	--Not good in user space or even for waiting for long periods of
	time in kernel

	--Question: why not use spinlocks for access to disk drive?
	    --answer: wastes CPU

	--note: it's unavoidable that we need hardware support because
	at the lowest level, we're trying to decide which particular
	thread is doing something first

    --Mutexes

	[for handout 2(c), draw same picture as above but with fields of
	Mutex in memory]

	--those get implemented in terms of lower-level lock

	--so we again have the pattern that the lower-level lock is
	protecting a list (more generally, data structure). here, the
	list (more generally, data structure) is the queue of threads
	waiting for the mutex (more generally, all of the fields of the
	mutex).

	--how to implement the lower-level lock?

	    --turn off interrupts

		--only works on uni-processor system

		--on such a uni-processor system, it would be valid to
		replace spinlock.acquire() with "disable interrupts" and
		spinlock.release() with "enable interrupts"

	    --spinlocks, which rely on hardware-level synchronization
		for example, xchg ....

    --Notes about interrupts
    
	--As stated above, on a single CPU machine, could replace
	spinlock.acquire() with "disable interrupts" and 
	spinlock.release() with "enable interrupts"

	--On any machine (single or multiple CPUs), spinlocks themselves
	need to run with interrupts off. Why?

	    --consider memory shared between an interrupt handler and a
	    thread-inside-kernel (e.g., interrupt handler enqueues I/O
	    events and inside-kernel thread handles those events).

	    --interrupt handlers do not themselves get interrupted. So
	    if acquiring spinlock *didn't* disable interrupts, we could
	    get the following:
		
		--thread has spinlock. interrupts enabled. interrupt
		happens. interrupt routine tries to acquire spinlock; spins
		forever --> machine wedged 
	   
	    --solution: turn off interrupts before trying to acquire
	    spinlock: spinlock.acquire() "pushes" interrupts (saves
	    their current state), and spinlock.release() "pops"
	    interrupts (restores their current state). 

	--Perhaps confusingly, "interrupt" here should be viewed
	*abstractly*:

	    --certainly if the code is executing inside the kernel,
	    disabling and enabling interrupts means "turning off the
	    processor's interrupts"

	    --but if we're talking about a preemptive user-level
	    threading package, then "interrupt" might just mean a
	    non-deterministic timer signal that invokes the thread
	    scheduler. in that case, "turning off interrupts" could mean
	    "deregistering for the signal from the timer that would
	    otherwise invoke the run-time" or else "record the fact that
	    signals were delivered but don't act on that fact until
	    'interrupts' are reenabled".
    
    --Review how we got here:

	--to deal with concurrency, need atomic operations

	--atomic operations ultimately requires hardware support

	    --single CPU: turning off interrupts sometimes enough

	    --multiple CPUs: use special hardware instructions

		--different options on different architectures

		--test_and_set() very common
		
		--on the x86, one uses xchg to implement test_and_set()


3. Condition variables

    A. Motivation

	--producer/consumer queue 

	    --very common paradigm. also called "bounded buffer":

		--producer puts things into a shared buffer
		--consumer takes them out
		--producer must wait if buffer is full; consumer must
		  wait if buffer is empty
		--shows up everywhere
		    --Soda machine: producer is delivery person, consumer
			is soda drinkers, shared buffer is the machine
		    --DMA buffers

	--producer/consumer queue using mutexes (4b on handout)

	    --what's the problem with 4b?

	    --answer: a form of busy waiting. not quite as bad as
	    spinlock, but the pattern is similar: thread keeps checking
	    a condition -- (count == BUFFER_SIZE) or (COUNT == 0) --
	    until the respective condition is true.

	--It is convenient to break synchronization into two types:
	    --*mutual exclusion*: allow only one thread to access a given
	    set of shared state at a time
	    --*scheduling constraints*: wait for some other thread to do
	    something (finish a job, produce work, consume work, accept
	    a connection, get bytes off the disk, etc.)

    B. Usage

	--API

	    --void cond_init (Cond *, ...); 
		--Initialize

	    --void cond_wait(Cond *c, Mutex* m);
		--Atomically unlock m and sleep until c signaled 
		--Then re-acquire m and resume executing 

	    --void cond_signal(Cond* c);
		--Wake one thread waiting on c
		[in some pthreads implementations, the analogous
		call wakes *at least* one thread waiting on c. Check the
		the documentation (or source code) to be sure of the
		semantics. But, actually, your implementation shouldn't
		change since you need to be prepared to be "woken" at
		any time, not just when another thread calls signal().
		More on this below.]

	    --void cond_broadcast(Cond* c);
		--Wake all threads waiting on c

	--QUESTION: Why must cond_wait both release the mutex and sleep?
	(see handout)

	    --Answer: can get stuck waiting.

		Producer: while (count == BUFFER_SIZE)
		Producer: release()
		Consumer: acquire()
		Consumer: .....
		Consumer: cond_signal(&nonfull)
		Producer: cond_wait(&nonfull)

	    --Producer will never hear the signal!

	--QUESTION: Why not use "if"? (Why use "while"?)

	    --Answer: we can get an interleaving like this:

		--The signal() puts the waiting thread on the ready list
		but doesn't run it

		--That now-ready thread is ready to acquire() the mutex
		(inside cond_wait()).

		--But a *different* thread (a third thread: not the
		signaler, not the now-ready thread) could acquire() the
		mutex, work in the critical section, and now
		invalidates whatever condition was being checked

		--Our now-ready thread eventually acquire()s the mutex...

		--...with no guarantees that the condition it was
		waiting for is still true
		
	    --Solution is to use "while" when waiting on a condition
	    variable

	    --DO NOT VIOLATE THIS RULE; doing so will (almost always)
	    lead to incorrect code

---------------------------------------------------------------------------

ADMIN 

--when do people want midterm review: Mon or Tue the week of the midterm?

--end of class: survey + handing back your labs

---------------------------------------------------------------------------

4. Semaphores

    --Don't use these. We're mentioning them only for completeness and
    for historical reasons: they were the first general-purpose
    synchronization primitive, and they were the first synchronization
    primitive that Unix supported.

    --Introduced by Edsger Dijkstra in late 1960s
    
	--Dijkstra was a highly notable figure in computer science who
	spent the latter part of his career here at UT

    --Semaphore is initialized with an integer, N

    --Two functions:
	--Down() and Up() [also known as P() and V()]
	--The guarantee is that Down() will return only N more times
	than Up() is called
	--Basically a counter that, when it reaches 0, causes a thread
	to sleep()

    --Another way to say the same thing:
	--Semaphore holds a count
	--Down() is an atomic operation that waits for the count to
	become positive; it then decrements the count by 1
	--Up() is an atomic operation that increments the count by 1 and
	then wakes up a thread waiting on Down(), if any

    --Don't use these! (Notice that Andrew Birrell [who is a Threading
    Ninja] doesn't even mention them in his paper.)

    --Problems:
	--semaphores are dual-purpose (for mutual exclusion and
	scheduling constraints), so hard to read code and hard to get
	code right
	--semaphores have hidden internal state
	--getting a program right requires careful interleaving of
	"synchronization" and "mutex" semaphores

5. Monitors

    --High-level idea: an object (as in object-oriented systems)
	
	--in which methods do not execute concurrently; and

	--that has one or more condition variables

    --More detail
    
	--Every method call starts with acquire(&mutex), and ends with
	release(&mutex)

	--Technically, these acquire()/release() are invisible to the
	programmer because it is the programming language (i.e., the
	compiler+run-time) that is implementing the monitor

	    --So, technically, a monitor is a programming language
	    concept

	    --Book follows this technical definition
	    
	    --But technical definition isn't hugely useful because no
	    programming languages in widespread usage have true monitors

	    --Java has something close: a class in which every method is
	    "synchronized" (i.e., implicitly protected by a mutex)

		--Not exactly a monitor because there's nothing forcing
		every method to be synchronized

	    --And we can *use* mutexes and condition variables to
	    implement our own manual versions of monitors, though we
	    have to be careful
	
	--Given the above, we are going to use the term "monitor" more
	loosely to refer to both the technical definition and also a
	"manually constructed" monitor, wherein:
	
	    --all method calls are protected by a mutex (that is, the
	    programmer inserts those acquire()/release() on entry and
	    exit from every procedure *inside* the object)
	    
	    --synchronization happens with condition variables whose
	    associated mutex is the mutex that protects the method calls

	--In other words, we will use the term "monitor" to refer to the
	programming conventions that you should follow when building
	multithreaded applications

	    --you must follow these conventions on lab T

    --Example: see handout, #5

    --RULE:
    
	--acquire/release at beginning/end of functions

    --RULE:

	--hold lock when doing condition variable operations

	--Some (e.g., Birrell) will say: "for experts only, no need to
	hold the lock when signaling". IGNORE THIS. Putting the signal
	outside the lock is only a small performance optimization, and
	it is likely to lead you to write incorrect code.
	
	--to get credit in Lab T, you must hold the associated mutex
	when doing a condition variable operation

    --Different styles of monitors:

	--Hoare-style: signal() immediately wakes the waiter

	--What the book calls Hansen-style: signal() required to be last
	statement in a procedure

	--What everyone else calls Hansen-style and what we will use:
	signal() eventually wakes the waiter. Not an immediate transfer

    --Can we replace SIGNAL with BROADCAST, given our monitor semantics?
     (Answer: yes, always.) Why?

	--while() condition tests the needed invariant. program
	doesn't progress pass while() unless the needed invariant is
	true.

	--result: spurious wake-ups are acceptable....

	--...which implies you can always wakeup a thread at any
	moment with no loss of correctness....

	--....which implies you can replace SIGNAL with BROADCAST
	[though it may hurt performance to have a bunch of
	needlessly awake threads contending for a mutex that they
	will then acquire() and release().]

    --RULE:

	--a thread that is in wait() must be prepared to be restarted at
	any time, not just when another thread calls "signal()".

	--why? because the implementor of the threads and condition
	variables package *assumes* that the user of the threads package
	is doing while(){wait()}.

    --Can we replace BROADCAST with SIGNAL?

	--Answer: not always. 

	--Example:
	    --memory allocator
	    --threads allocate and free memory in variable-sized chunks
	    --if no memory free, wait on a condition variable
	    --now posit:
		--two threads waiting to allocate chunks of memory
		--no memory free at all
		--then, a third thread frees 10,000 bytes
	    --SIGNAL alone does the wrong thing: we need to awaken both
	    threads

6. Advice and standards for concurrent programming

    A. Standards

	--see Mike D's "Programming With Threads", linked from lab T

	    --You are required to follow this document

	    --You will lose points (potentially many!) on the lab and on
	    the exam if you stray from these standards

	    --Note that in his example in section 4, there needs to be
	    another line:

		--right before mutex->release(), he should have:
		    assert(invariants hold)

    B. Top-level piece of advice: SAFETY FIRST.

	--Locking at coarse grain is easiest to get right, so do
	that (one big lock for each big object or collection of
	them)
	
	--Don't worry about performance at first

	--In fact, don't even worry about liveness at first
	
	    --In other words don't view deadlock as a disaster

	--Key invariant: make sure your program never does the wrong thing

    C. More detailed advice: design approach

	[We will use item #5 on handout as a case study.....]

	--Here's a four-step design approach:
	
	    1. Getting started:
	     
		 1a. Identify units of concurrency. Make each a thread with
		 a go() method or main loop. Write down the actions a thread
		 takes at a high level.  
		 
		 1b. Identify shared chunks of state. Make each shared
		 *thing* an object. Identify the methods on those objects,
		 which should be the high-level actions made *by* threads
		 *on* these objects. Plan to have these objects be monitors.
		 
		 1c. Write down the high-level main loop of each thread. 
	     
	    Advice: stay high level here. Don't worry about synchronization 
	    yet. Let the objects do the work for you. 
	     
	    Separate threads from objects. The code associated with a
	    thread should not access shared state directly (and so there
	    should be no access to locks/condition variables in the
	    "main" procedure for the thread). Shared state and
	    synchronization should be encapsulated in shared objects. 

	    --QUESTION: how does this apply to the example on the
	    handout?
		--separate loops for producer(),consumer(), and
		synchronization happens inside MyBuffer.
	     
	    Now, for each object: 
	     
	    2. Write down the synchronization constraints on the
	    solution. Identify the type of each constraint: mutual
	    exclusion or scheduling. For scheduling constraints, ask,
	    "when does a thread wait"?

		--NOTE: usually, the mutual exclusion constraint is
		satisfied by the fact that we're programming with
		monitors.

		--QUESTION: how does this apply to the example on the
		handout?
		    --Only one thread can manipulate the buffer at a time
		    (mutual exclusion constraint)
		    --Producer must wait for consumer to empty slots if all
		    full (scheduling constraint)
		    --Consumer must wait for producer to fill buffers if all
		    empty (scheduling constraint)

	    3. Create a lock or condition variable corresponding to each 
	    constraint 

		--QUESTION: how does this apply to the example on the
		handout?
		    --Answer: need a lock and two condition variables.
		    But lock was sort of a given from the monitor.
	     
	    4. Write the methods, using locks and condition variables for 
	    coordination  
	

    D. More advice

	1. Don't manipulate synchronization variables or shared state
	variables in the code associated with a thread; do it with the
	code associated with a shared object.  
      
	    --Threads tend to have "main" loops. These loops tend to
	    access shared objects. *However*, the "thread" piece of it
	    should not include locks or condition variables. Instead,
	    locks and CVs should be encapsulated in the shared objects.

	    --Why?

		(a) Locks are for synchronizing across multiple threads.  Doesn't 
		make sense for one thread to "own" a lock.

		(b) Encapsulation -- details of synchronization are
		internal details of a shared object. Caller should not
		know about these details.  "Let the shared objects do
		the work."

	    --Common confusion: trying to acquire and release locks
	    inside the threads' code (i.e., not following this advice).
	    Bad idea! Synchronization should happen within the shared
	    objects. Mantra: "let the shared objects do the work".
	
	    --Note: our first example of condition variables -- 4c on
	    today's handout -- doesn't actually follow the advice, but
	    that is in part so you can see a full example.  

	2. Different way to state what's above:
	
	    --You want to decompose your problem into objects, as in
	    object-oriented style of programming.

	    --Thus:

	       (1) Shared object encapsulates code, synchronization 
		   variables, and state variables 

		(2) Shared objects are separate from threads 

	    --Warning: most examples in the book talk about "thread 1's
	    code" and "thread 2's code", etc. This is b/c most of the
	    "classic" problems were studied before OO programming was
	    widespread.