Class 4
CS 439
29 January 2013

On the board
------------

1. Last time
2. Intro to concurrency, cont'd.
3. Managing concurrency: provide atomicity
4. Managing concurrency: protect critical sections
5. Mutexes
6. Condition variables

---------------------------------------------------------------------------

1. Last time

    --process state

    --shell

    --threads
        
        Threads can be preemptive or not, and implemented by kernel (so
        kernel schedules threads) or in user library (so user library
        schedules threads)

            Under user-level threads, the kernel has no idea that these
            threads exist......

            Under kernel-level threads, the kernel knows about threads

            [these bring different trade-offs]
        
        For now, assume preemptive. If your code is safe in preemptive
        context, it will also be safe in non-preemptive (known as
        cooperative) context.


    --intro to concurrency

2. Introduction to concurrency

    --what is concurrency?
        
        --stuff happening at the same time

    --sources of concurrency

        --computers have multiple CPUs and common memory, so
        instructions in multiple threads can happen at the same
        time!
 
        --on a single CPU, processes/threads can have their instructions
        interleaved (helpful to regard the instructions in multiple
        threads as "happening at the same time")
   
        --interrupts (CPU was doing one thing; now it's doing another)

    --why is concurrency hard?

    *** Hard to reason about all possible interleavings

        [we saw this last time]

       cover #3 from last week's handout.

       all of these are called race conditions; not all of them are
       errors, though

        --worst part of errors from race conditions is that a program
        may work fine most of the time but only occasionally show
        problems. why?  (because the instructions of the various threads
        or processes or whatevever get interleaved in a
        non-deterministic order.)

        --and it's worse than that because inserting debugging code may
        change the timing so that the bug doesn't show up

    [pick up where we left off]

    *** Hardware can make our life harder

        [look at the panels; what are the correct answers?]

        [answer: "it depends on the hardware"]

        --sequential consistency not always in effect

        --sequential consistency means:

            --maintain program order on individual processors

            --ensuring that writes happen to each memory location
            (viewed separately) in the order that they are issued

  
3. Managing concurrency: provide atomicity

    --first attempt to deal with race conditions: make the needed
    operations _atomic_.

        Concept of _atomic operation_: whole thing has to be executed
        together, and the rest of the system should observe the group of
        operations before and after the atomic one, not during.

    --how?

    A. A single-instruction add?

	'count' is in memory (that is what the example in #4 stipulates.)
	assume that %ecx holds the address of 'count'

	--Then, can we use the x86 instruction addl? For instance:

		addl $1, (%ecx)   ; count++

	--So looks like we can implement count++/-- with one
	instruction?

	--So we're safe?

	--No: not atomic on multiprocessor! 

	--Will experience same race condition at the hardware level

    B. Critical sections

        --Place accesses of count (reads, increments, decrements) in
        _critical section_

        --Will protect critical sections from concurrent execution

	--Now we need solution to _critical section_ problem

	--Solution must satisfy 3 rules:

	    1. mutual exclusion
		only one thread can be in c.s. at a time		

	    2. progress
		if no threads executing in c.s., one of the threads
		trying to enter a given c.s. will eventually get in
		
	    3. bounded waiting
		once a thread T starts trying to enter the critical
		section, there is a bound on the number of other threads
		that may enter the critical section before T enters


	--Note progress vs. bounded waiting 

	    --If no thread can enter C.S., don't have progress 

	    --If thread A waiting to enter C.S. while B repeatedly
	    leaves and re-enters C.S. ad infinitum, don't have bounded
	    waiting 

    --Gameboard is that we're now going to examine primitives to
    protect critical sections. 

4. Managing concurrency: protect critical sections

    --Peterson's algorithm....
	
	--see book

	--*if* there is sequential consistency, then Peterson's
	algorithm satisfies mutual exclusion, progress, bounded waiting

	--But expensive and not encapsulated


    --High-level:

	--want: lock()/unlock() or enter()/leave() or
	acquire()/release()

	    --lots of names for the same idea

	    --mutex_init(mutex_t* m), mutex_lock(mutex_t* m),
	    mutex_unlock(mutex_t* m),....

	    --pthread_mutex_init(), pthread_mutex_lock(), ...

	--in each case, the semantics are that once the thread of
	execution is executing inside the critical section, no other
	thread of execution is executing there

    --Two questions we must answer:

        --How shoulwd we *use* locks, mutexes, etc.?

        --How do we build them?

    --In this unit of the course, we focus on their use. Later, we
    return to their implementation.

---------------------------------------------------------------------------

lab due

project partners due

---------------------------------------------------------------------------

5. Examples of using mutexes

6. Condition variables

    A. Motivation

	--producer/consumer queue 

	    --very common paradigm. also called "bounded buffer":

		--producer puts things into a shared buffer
		--consumer takes them out
		--producer must wait if buffer is full; consumer must
		  wait if buffer is empty
		--shows up everywhere
		    --Soda machine: producer is delivery person, consumer
			is soda drinkers, shared buffer is the machine
		    --DMA buffers

	--producer/consumer queue using mutexes (see handout, 4a)

	    --what's the problem with that?

	    --answer: a form of busy waiting. 

# [this next line will make more sense after we study spinlocks]
            not quite as bad as spinlock, but the pattern is similar: 

            thread keeps checking a condition, (count == BUFFER_SIZE) or
            (COUNT == 0), until the respective condition is true.

	--It is convenient to break synchronization into two types:

	    --*mutual exclusion*: allow only one thread to access a given
	    set of shared state at a time

	    --*scheduling constraints*: wait for some other thread to do
	    something (finish a job, produce work, consume work, accept
	    a connection, get bytes off the disk, etc.)

    B. Usage

	--API

	    --void cond_init (Cond *, ...); 
		--Initialize

	    --void cond_wait(Cond *c, Mutex* m);
		--Atomically unlock m and sleep until c signaled 
		--Then re-acquire m and resume executing 

	    --void cond_signal(Cond* c);
		--Wake one thread waiting on c
		[in some pthreads implementations, the analogous
		call wakes *at least* one thread waiting on c. Check the
		the documentation (or source code) to be sure of the
		semantics. But, actually, your implementation shouldn't
		change since you need to be prepared to be "woken" at
		any time, not just when another thread calls signal().
		More on this below.]

	    --void cond_broadcast(Cond* c);
		--Wake all threads waiting on c

	--QUESTION: Why must cond_wait both release the mutex and sleep?
	(see handout, 4b)

	    --Answer: can get stuck waiting.

		Producer: while (count == BUFFER_SIZE)
		Producer: release()
		Consumer: acquire()
		Consumer: .....
		Consumer: cond_signal(&nonfull)
		Producer: cond_wait(&nonfull)

	    --Producer will never hear the signal!

	--QUESTION: Why not use "if"? (Why use "while"?)

	    --Answer: we can get an interleaving like this:

		--The signal() puts the waiting thread on the ready list
		but doesn't run it

		--That now-ready thread is ready to acquire() the mutex
		(inside cond_wait()).

		--But a *different* thread (a third thread: not the
		signaler, not the now-ready thread) could acquire() the
		mutex, work in the critical section, and now
		invalidates whatever condition was being checked

		--Our now-ready thread eventually acquire()s the mutex...

		--...with no guarantees that the condition it was
		waiting for is still true
		
	    --Solution is to use "while" when waiting on a condition
	    variable

	    --DO NOT VIOLATE THIS RULE; doing so will (almost always)
	    lead to incorrect code


7. Semaphores

    --Don't use these. We're mentioning them only for completeness and
    for historical reasons: they were the first general-purpose
    synchronization primitive, and they were the first synchronization
    primitive that Unix supported.

    --Introduced by Edsger Dijkstra in late 1960s
    
	--Dijkstra was a highly notable figure in computer science who
	spent the latter part of his career here at UT

    --Semaphore is initialized with an integer, N

    --Two functions:
	--Down() and Up() [also known as P() and V()]
	--The guarantee is that Down() will return only N more times
	than Up() is called
	--Basically a counter that, when it reaches 0, causes a thread
	to sleep()

    --Another way to say the same thing:
	--Semaphore holds a count
	--Down() is an atomic operation that waits for the count to
	become positive; it then decrements the count by 1
	--Up() is an atomic operation that increments the count by 1 and
	then wakes up a thread waiting on Down(), if any

    --Don't use these!

#    (Notice that Andrew Birrell [who is a Threading
#     Ninja] doesn't even mention them in his paper.)

    --Problems:
	--semaphores are dual-purpose (for mutual exclusion and
	scheduling constraints), so hard to read code and hard to get
	code right
	--semaphores have hidden internal state
	--getting a program right requires careful interleaving of
	"synchronization" and "mutex" semaphores