Class 7
CS 202
13 February 2023

On the board
------------

1. Last time
2. Implementations of spinlocks, mutexes
3. Deadlock 
4. Other progress issues

---------------------------------------------------------------------------

1. Last time

    - Monitors and standards
    - Advice
    - Practice with concurrent programming

2. Implementation of spinlocks and mutexes

    Going to continue to assume sequential consistency...

    How might we provide the lock()/unlock() abstraction?

    (a) Peterson's algorithm....

        --...solves critical section in that it satisfies mutual
        exclusion, progress, bounded waiting

        --but expensive (busy waiting), requires number of threads to be
        fixed statically, and assumes sequential consistency

        --(see a textbook)

    (b) disable interrupts?

        --works only on a single CPU

        --cannot expose to user processes

    (c) spinlocks

        --see handout

            * buggy approach: what's wrong with this?

            * non-buggy approach: why does this work?

        --works in multi-CPU environment

        --but issue: a spinlock is no good for cases when
        time-to-acquire-lock expected to be long (for example, waiting
        for disk accesses to complete). this is because of busy waiting
        and the fact that waiting chews cycles that could have been
        spent on another task (in the kernel or in user space).

        --for more about spinlocks in Linux, see:
            https://www.kernel.org/doc/Documentation/locking/spinlocks.txt

        --NOTE: the spinlocks that we presented (test-and-set, or
        test-and-test-and-set) can introduce performance issues when
        there is a lot of contention. These performance issues arise
        even if the programmer is using spinlocks correctly. The
        performance issues result from cross-talk among CPUs (which
        undermines caching and generates traffic on the memory bus). If
        you are curious for a remediation of this issue, look up "MCS
        locks".
        
        --In everyday application-level programming, spinlocks will not
        be something you use. Mainly matters inside kernel. But you
        should know what these are for technical literacy, and to see
        where the mutual exclusion is truly enforced on modern hardware.

    (d) mutexes: spinlock + a queue

        --textbook describes one implementation

        --see handout for another

-----

admin notes:
- no class next Wednesday
- make-up class this Thursday
- sign-up sheet will be posted tonight 

-----

3. Deadlock 

    --see handout: simple example based on two locks

    --see handout: more complex example

	    --M calls N 
	    --N waits
	    --but let's say condition can only become true if N is invoked
	    through M
	    --now the lock inside N is unlocked, but M remains locked; that
	    is, no one is going to be able to enter M and hence N.

    --can also get deadlocks with condition variables

    --lesson: dangerous to hold locks (M's mutex in the case on the
    handout) when crossing abstraction barriers

    --deadlocks without mutexes:

        real issue is resources and how/when they are required/acquired

        (a) [draw bridge example]

	    --bridge only allows traffic in one direction 

	    --Each section of a bridge can be viewed as a resource. 

	    --If a deadlock occurs, it can be resolved if one car
	    backs up (preempt resources and rollback). 

	    --Several cars may have to be backed up if a deadlock occurs. 

	    --Starvation is possible. 

	(b) another example:
		
	    --one thread/process grabs disk and then tries to grab
	    camera

	    --another thread/process grabs camera and then tries to
	    grab disk

    --when does deadlock happen? under four conditions. all of them must
    hold for deadlock to happen:

	1. mutual exclusion
	2. hold-and-wait
	3. no preemption
	4. circular wait


    --what can we do about deadlock?

        (a) ignore it: worry about it when it happens. the so-called
        "ostrich solution"

        (b) detect and recover: not great

	    --could imagine attaching debugger

		--not really viable for production software, but
		works well in development

	    --threads package can keep track of resource-allocation graph

	    --see one of the recommended texts:

		--For each lock acquired, order with other locks held 
		
		--If cycle occurs, abort with error 
	    
		--Detects potential deadlocks even if they do not occur 


        (c) avoid algorithmically

            [not covering]

	    --banker's algorithm (see Tanenbaum text for an desription)

		--very elegant but impractical

		--if you're using banker's algorithm, the gameboard
		looks like this:

		    ResourceMgr::Request(ResourceID resc,
					 RequestorID thrd) {
			acquire(&mutex);
			assert(system in a safe state);
			while (state that would result from giving 
			       resource to thread is not safe) {
			    wait(&cv, &mutex);	
			}
			update state by giving resource to thread
			assert(system in a safe state);
			release(&mutex);
		    }

		    Now we need to determine if a state is safe....

		    To do so, see book

	    --disadvantage to banker's algorithm:

		--requires every single resource request to go
		through a single broker

		--requires every thread to state its maximum
		resource needs up front. unfortunately, if threads
		are conservative and claim they need huge quantities
		of resources, the algorithm will reduce concurrency

        (d) negate one of the four conditions using careful coding:

	    --can sort of negate 1
		--put a queue in front of resources, like the printer
		--virtualize memory

	    --not much hope of negating 2

	    --can sort of negate 3:
		--consider physical memory: virtualized with VM, can
		take physical page away and give to another process! 

	    --what about negating #4?

		--in practice, this is what people do

		--idea: partial order on locks

		    --Establishing an order on all locks and making
		    sure that every thread acquires its locks in
		    that order

		--why this works:

		    --can view deadlock as a cycle in the resource
		    acquisition graph

		    --partial order implies no cycles and hence no
		    deadlock

		--two bummers:

		    1. hard to represent CVs inside this framework.
		    works best only for locks.

		    2. Picking and obeying the order on *all* locks
		    requires that modules make public their locking
		    behavior, and requires them to know about other
		    modules' locking.  This can be painful and
		    error-prone. 

			--see Linux's filemap.c example on the handout;
			this is complexity that arises by the need for a
			locking order

	(e) Static and dynamic detection tools

	    --See, for example, these citations, citations
	    therein, and papers that cite them:

		Engler, D. and K. Ashcraft. RacerX: effective,
		static detection of race conditions and deadlocks.
		Proc. ACM Symposium on Operating Systems Principles
		(SOSP), October, 2003, pp237-252.
		http://portal.acm.org/citation.cfm?id=945468

		Savage, S., M. Burrows, G. Nelson, P. Sobalvarro,
		and T. Anderson. Eraser: a dynamic data race
		detector for multithreaded programs. ACM
		Transactions on Computer Systems (TOCS), Volume 15,
		No 4., Nov., 1997, pp391-411.
		http://portal.acm.org/citation.cfm?id=265927

		a long literature on this stuff

	    --Disadvantage to dynamic checking: slows program down

	    --Disadvantage to static checking: many false alarms
	    (tools says "there is deadlock", but in fact there is
	    none) or else missed problems

	    --Note that these tools get better every year. I believe
	    that Valgrind has a race and deadlock detection tool

4. Other progress issues

    Deadlock was one kind of progress (or liveness) issue. Here are two
    others...

    Starvation

	--thread waiting indefinitely (if low priority and/or if
	resource is contended)

    Priority inversion

	--T1, T2, T3: (highest, middle, lowest priority)

	--T1 wants to get lock, T2 runnable, T3 runnable and holding lock

	--System will preempt T3 and run highest-priority runnable thread, namely T2

	--Solutions:

	    --Temporarily bump T3 to highest priority of any thread that is
	    ever waiting on the lock

	    --Disable interrupts, so no preemption (T3 finishes)
		... not great because OS sometimes needs control
		(not for scheduling, under this assumption, but for
		handling memory [page faults], etc.)

	    --Don't handle it; structure app so only adjacent priority
	    processes/threads share locks

	--Happens in real life. For a real-life example, see:
	https://www.microsoft.com/en-us/research/people/mbj/just-for-fun/
        (search for Pathfinder)