Class 9
CS 372H
16 February 2010

On the board
------------
1. Last time
2. Concurrency
    --What is it?
    --What makes it hard?
    --How can we deal with races?
3. Protecting critical sections
    --Peterson
    --Spinlocks
    --Mutexes
    --Turning off interrupts

---------------------------------------------------------------------------

1. Last time

    a. threads

	--new abstraction

	--basic idea: a thread has its own registers (and stack) but not
	its own memory

	--can implement threads in:
	    --user space
	    --Java virtual machine
	    --Flash player
	    --kernel
	    --etc.

	--two main choices:
	    (i). user-level or kernel
		(but can have N:M threading, in which N user-level
		threads are multiplexed over M kernel threads, so the
		choice is a bit fuzzier)
	    (ii). if user-level:
		non-preemptive (also known as cooperative) or preemptive

	    [be able to answer: why are kernel threads always preemptive?]

	--*only* the presence of multiple kernel threads can give:

	    --true multiprocessing (i.e., different threads running on
	    different processors)

	    --asynchronous disk I/O using Posix interface [because
	    read() blocks and causes the *kernel* scheduler to be
	    invoked]

		--but many modern operating systems provide interfaces
		for asynchronous disk I/O, at least as an extension

		    --Windows 

		    --Linux has AIO extensions

		--thus, even user-level threads can get asynchronous
		disk I/O, by having the run-time translate calls that
		*appear* blocking to the thread [e.g., thread_read()]
		into a series of instructions that: register for
		interest in an I/O event, put the thread to sleep, and
		switch() to another thread


		--[moral of the story: if you find yourself needing
		async disk I/O from user-level threads, use one of the
		non-Posix interfaces!]

    b. confusing thing:

	--The kernel itself uses threads internally, when executing in
	kernel mode. Such threads-in-the-kernel are related to, but not
	the same thing as, the kernel threads mentioned above.

	--We'll try to keep these concepts distinct in this class, but
	we may not always succeed.

    c. Note well: the issues with concurrency that we're going to
    discuss are relevant in nearly all of the above cases

	--the one exception is non-preemptive user-level threads. those
	only yield when the programmer says yield().
	    --so you can do clever things (but be careful)

    d. How many people have programmed with threads before?

    e. Questions?

2. Concurrency

    A. What is it?

    --Stuff happening at the same time

    --Arises in many ways

	--pseudo-concurrency: from scheduling

	--real concurrency: multiple processors

    --Examples:

	--multiple kernel threads within a process

	--multiple processes sharing memory

	--what about multiple hosts distributed across a network?
	(conceptually, issues are the same, but needed mechanisms are
	different)


    --We're going to treat the issues in general...they apply to
    processes sharing memory pages, kernel threads sharing memory
    spaces, user-level threads that are preemptible, etc.

	--so for the rest of today, we're going to talk about two
	threads, but this could mean:

	    --threads inside a single process

	    --threads inside the kernel

	    --even two separate processes that share memory

    B. What makes it hard?

	--lots of things can go wrong.....
	    --we will see others later (deadlock, priority inversion, etc.)
	    --for now, look at data races....

	--some examples; see handout:

	    1a:  x = 1 or x = 2.
	    1b:  x = 13 or x = 25.
	    1c:  x = 1 or x = 2 or x = 3 

	    2: incorrect list structure

	    3: incorrect count in buffer

	--all of these are called *race conditions*

	--worst part of these errors are that a program may work fine
	most of the time but only occasionally show problems. why?
	(because the instructions of the various threads or processes or
	whatevever get interleaved in a non-deterministic order).

	    --and it's worse than that because inserting debugging code
	    may change the timing so that the bug doesn't show up

    C. How can we deal with races?

	--make the needed operations atomic

	--how?
    
	1. A single-instruction add?

	    --E.g., i386 allows single instruction addl $1, count 

	    --So implement count++/-- with one instruction 

	    --Now are we safe? 

	    --No: not atomic on multiprocessor! 

	    --Will experience exact same race condition 


	2. How about using x86 LOCK prefix?
	    
	    --can make read-modify-write instructions atomic by preceding
	    them with "LOCK". examples of such instructions are:
		XADD, CMPXCHG, INC, DEC, NOT, NEG, ADD, SUB...
		(when their destination operand refers to memory)

	    --but using LOCK is very expensive (flushes processor
	    caches) and not a "general-purpose abstraction"
		--only applies to one instruction: what if we need to
		execute three or four instructions as a unit?

	    --compiler won't generate it by default, assumes you don't
	    want penalty

	3. Critical sections
    
	    --Place count++ and count-- in critical section 

	    --Protect critical sections from concurrent execution 

	    --Now we need solution to _critical section_ problem

	    --Solution must satisfy 3 rules:

		1. mutual exclusion
		    only one thread can be in c.s. at a time		

		2. progress
		    if no threads executing in c.s., one of the threads
		    trying to enter a given c.s. will eventually get in
		    
		3. bounded waiting
		    once a thread T starts trying to enter the critical
		    section, there is a bound on the number of other threads
		    that may enter the critical section before T enters


	    --Note progress vs. bounded waiting 

		--If no thread can enter C.S., don't have progress 

		--If thread A waiting to enter C.S. while B repeatedly
		leaves and re-enters C.S. ad infinitum, don't have bounded
		waiting 

	    --Gameboard is that we're now going to build primitives to
	    protect critical sections

   
3. Protecting critical sections

    --Peterson's algorithm....
	
	--see book

	--does satisfy mutual exclusion, progress, bounded waiting

	--But expensive and not encapsulated

    --High-level:

	--want: lock()/unlock() or enter()/leave() or
	acquire()/release()

	    --lots of names for the same idea

	    --mutex_init(mutex_t* m), mutex_lock(mutex_t* m),
	    mutex_unlock(mutex_t* m),....

	    --pthread_mutex_init(), pthread_mutex_lock(), ...

	--in each case, the semantics are that once the thread of
	execution is executing inside the critical section, no other
	thread of execution is executing there

	--How to implement locks/mutexes/etc.?

    --Spinlocks

	--Fine for quick operations in kernel

	--Not good in user space or even for waiting for long periods of
	time in kernel

	--Question: why not use spinlocks for access to disk drive?
	    --answer: wastes CPU

	--note: it's unavoidable that we need hardware support because
	at the lowest level, we're trying to decide which particular
	thread is doing something first

    --Mutexes

	--those get implemented in terms of lower-level lock

	--how to implement the lower-level lock?

	    --turn off interrupts
		--only works on uni-processor system
		
	    --spinlocks, which rely on hardware-level synchronization
		for example, xchg ....


    --How does turning off interrupts relate to all of the above?

	--Answer: on a single CPU machine, could replace
	spinlock.acquire() with "disable interrupts" and 
	spinlock.release() with "enable interrupts"

	--On a multi-CPU machine, spinlocks themselves need to run with
	interrupts off. Why?

	    --consider memory shared between an interrupt handler and a
	    thread-inside-kernel (e.g., interrupt handler enqueues I/O
	    events and inside-kernel thread handles those events).

	    --interrupt handler cannot sleep so cannot wait for a lock
	    (that would wedge the machine)
	   
	    --solution: turn off interrupts before acquiring spinlock:
	    spinlock.acquire() "pushes" interrupts (saves their current
	    state) spinlock.release() "pops" interrupts (restores their
	    current state). 

    --Perhaps even more confusingly, "interrupt" above should be viewed
    *abstractly*:

	--certainly if the code is executing inside the kernel,
	disabling and enabling interrupts means "turning off the
	processor's interrupts"

	--but if we're talking about a preemptive user-level threading
	package, then "interrupt" might just mean a non-deterministic
	timer signal that invokes the thread scheduler. in that case,
	"turning off interrupts" could mean "deregistering for signals"
	or else "record the fact that signals were delivered but don't
	act on that fact until 'interrupts' are reenabled".