Class 7
CS 202
23 September 2025

1. Monitors and standards
2. Deadlocks 
3. Other progress issues
4. Implementations of spinlocks, mutexes

1. Monitors and standards

    Monitors = mutex + condition variables

    --High-level idea: an object (as in object-oriented systems)
	
	--in which methods do not execute concurrently; and

	--that has one or more condition variables

    --More detail
    
	--Every method call starts with acquire(&mutex), and ends with
	release(&mutex)

	--Technically, these acquire()/release() are invisible to the
	programmer because it is the programming language (i.e., the
	compiler+run-time) that is implementing the monitor

	    --So, technically, a monitor is a programming language
	    concept

	    --But technical definition isn't hugely useful because no
	    programming languages in widespread usage have true monitors

	    --Java has something close: a class in which every method is
	    set by the programmer to be "synchronized" (i.e., implicitly
	    protected by a mutex)

		--Not exactly a monitor because there's nothing forcing
		every method to be synchronized

	    --And we can *use* mutexes and condition variables to
	    implement our own manual versions of monitors, though we
	    have to be careful
	
	--Given the above, we are going to use the term "monitor" more
	loosely to refer to both the technical definition and also a
	"manually constructed" monitor, wherein:
	
	    --all method calls are protected by a mutex (that is, the
	    programmer inserts those acquire()/release() on entry and
	    exit from every procedure *inside* the object)
	    
	    --synchronization happens with condition variables whose
	    associated mutex is the mutex that protects the method calls

	--In other words, we will use the term "monitor" to refer to the
	programming conventions that you should follow when building
	multithreaded applications

	    --you must follow these conventions on lab 3

    --Example: see handout05, item 1

    --Different styles of monitors:

	--Hoare-style: signal() immediately wakes the waiter

	--Hansen-style and what we will use:
	signal() eventually wakes the waiter. Not an immediate transfer


    B. Standards: why?

        --Mike Dahlin stands on the desk when proclaiming the standards

        --see Mike D's "Programming With Threads", linked from lab 3

	    --You are required to follow this document

	    --You will lose points (potentially many!) on the lab and on
	    the exam if you stray from these standards

	    --Note that in his example in section 4, there needs to be
	    another line:

		--right before mutex->release(), he should have:
		    assert(invariants hold)

       --the primitives may seem strange, and the rules may seem
	arbitrary: why one thing and not another?

	    --there is no absolute answer here

	    --**However**, history has tested the approach that we're
	    using. If you use the recommended primitives and follow
	    their suggested use, you will find it easier to write
	    correct code

	--For now, take the recommended approaches as a given,
	and use them for a while. If you can come up with something
	better after that, by all means do so!

	--But please remember three things:
	
	    a. lots of really smart people have thought really hard
	    about the right abstractions, so a day or two of
	    thinking about a new one or a new use is unlikely to
	    yield an advance over the best practices.

	    b. the consequences of getting code wrong can be
	    atrocious. see for example:
	    
		http://www.nytimes.com/2010/01/24/health/24radiation.html
		http://sunnyday.mit.edu/papers/therac.pdf
		http://en.wikipedia.org/wiki/Therac-25

	    c. people who tend to be confident about their abilities
	    tend to perform *worse*, so if you are confident you are
	    a Threading and Concurrency Ninja and/or you think you
	    truly understand how these things work, then you may
	    wish to reevaluate.....

		--Dunning-Kruger effect
		--http://www.nytimes.com/2000/01/23/weekinreview/january-16-22-i-m-no-doofus-i-m-a-genius.html

    C. The Commandments

        --RULE:
        
	    --acquire/release at beginning/end of methods

        --RULE:

	    --hold lock when doing condition variable operations

	    --Some people
	        [for example, Andrew Birrell in this paper:
                http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-35.pdf ]
	    will say: "for experts only, no need to
	    hold the lock when signaling". IGNORE THIS. Putting the signal
	    outside the lock is only a small performance optimization, and
	    it is likely to lead you to write incorrect code.
	    
	    --to get credit in Lab 3, you must hold the associated mutex
	    when doing a condition variable operation

        --RULE:

	    --a thread that is in wait() must be prepared to be restarted at
	    any time, not just when another thread calls "signal()".

	    --why? because the implementor of the threads and condition
	    variables package *assumes* that the user of the threads package
	    is doing while(){wait()}.

        --Can we replace SIGNAL with BROADCAST, given our monitor semantics?
         (Answer: yes, always.) Why?

	    --while() condition tests the needed invariant. program
	    doesn't progress pass while() unless the needed invariant is
	    true.

	    --result: spurious wake-ups are acceptable....

	    --...which implies you can always wakeup a thread at any
	    moment with no loss of correctness....

	    --....which implies you can replace SIGNAL with BROADCAST
	    [though it may hurt performance to have a bunch of
	    needlessly awake threads contending for a mutex that they
	    will then acquire() and release().]


        --Can we replace BROADCAST with SIGNAL?

	    --Answer: not always. 

	    --Example:
	        --memory allocator
	        --threads allocate and free memory in variable-sized chunks
	        --if no memory free, wait on a condition variable
	        --now posit:
		    --two threads waiting to allocate chunks of memory
		    --no memory free at all
		    --then, a third thread frees 10,000 bytes
	        --SIGNAL alone does the wrong thing: we need to awaken both
	        threads


1B. Advice for concurrent programming

    A. Top-level piece of advice: SAFETY FIRST.

	--Locking at coarse grain is easiest to get right, so do
	that (one big lock for each big object or collection of
	them)
	
	--Don't worry about performance at first

	--In fact, don't even worry about liveness at first
	
	    --In other words don't view deadlock as a disaster

	--Key invariant: make sure your program never does the wrong thing

    B. More detailed advice: design approach

	[We will use item #1 on handout as a case study.....]

	--Here's a four-step design approach:
	
	    1. Getting started:
	     
		 1a. Identify units of concurrency. Make each a thread with
		 a go() method or main loop. Write down the actions a thread
		 takes at a high level.  
		 
		 1b. Identify shared chunks of state. Make each shared
		 *thing* an object. Identify the methods on those objects,
		 which should be the high-level actions made *by* threads
		 *on* these objects. Plan to have these objects be monitors.
		 
		 1c. Write down the high-level main loop of each thread. 
	     
	    Advice: stay high level here. Don't worry about synchronization 
	    yet. Let the objects do the work for you. 
	     
	    Separate threads from objects. The code associated with a
	    thread should not access shared state directly (and so there
	    should be no access to locks/condition variables in the
	    "main" procedure for the thread). Shared state and
	    synchronization should be encapsulated in shared objects. 

	    --QUESTION: how does this apply to the example on the
	    handout?
		--separate loops for producer(), consumer(), and
		synchronization happens inside MyBuffer.
	     
	    Now, for each object: 
	     
	    2. Write down the synchronization constraints on the
	    solution. Identify the type of each constraint: mutual
	    exclusion or scheduling. For scheduling constraints, ask,
	    "when does a thread wait"?

		--NOTE: usually, the mutual exclusion constraint is
		satisfied by the fact that we're programming with
		monitors.

		--QUESTION: how does this apply to the example on the
		handout?
		    --Only one thread can manipulate the buffer at a time
		    (mutual exclusion constraint)
		    --Producer must wait for consumer to empty slots if all
		    full (scheduling constraint)
		    --Consumer must wait for producer to fill slots if all
		    empty (scheduling constraint)

	    3. Create a lock or condition variable corresponding to each 
	    constraint 

		--QUESTION: how does this apply to the example on the
		handout?
		    --Answer: need a lock and two condition variables.
		    (lock was sort of a given from the fact of a monitor).
	     
	    4. Write the methods, using locks and condition variables for 
	    coordination  
	

    C. More advice

	1. Don't manipulate synchronization variables or shared state
	variables in the code associated with a thread; do it with the
	code associated with a shared object.  
      
	    --Threads tend to have "main" loops. These loops tend to
	    access shared objects. *However*, the "thread" piece of it
	    should not include locks or condition variables. Instead,
	    locks and CVs should be encapsulated in the shared objects.

	    --Why?

		(a) Locks are for synchronizing across multiple threads.
		Doesn't make sense for one thread to "own" a lock.

		(b) Encapsulation -- details of synchronization are
		internal details of a shared object. Caller should not
		know about these details.  "Let the shared objects do
		the work."

	    --Common confusion: trying to acquire and release locks
	    inside the threads' code (i.e., not following this advice).
	    Bad idea! Synchronization should happen within the shared
	    objects. Mantra: "let the shared objects do the work".
	
	    --Note: our first example of condition variables --
	    handout04, item 2b -- doesn't actually follow the advice, but
	    that is in part so you can see all of the parts working
	    together.

	2. Different way to state what's above:
	
	    --You want to decompose your problem into objects, as in
	    object-oriented style of programming.

	    --Thus:

	       (1) Shared object encapsulates code, synchronization 
		   variables, and state variables 

               (2) Shared objects are separate from threads 


1C. Practice with concurrent programming

    --sleeping barber question (will post). use it as practice.
    
        side note: (definition of practice when it comes to technical
        work = trying it on your own WITHOUT looking at the solution.)

    --we guarantee to test concurrent programming in this course

    --today, we work a different example:

	--workers interact with a database
	    --motivation: banking, airlines, etc.

	--readers never modify database

	--writers read and modify data

	--using only a single mutex lock would be overly restrictive.
	Instead, want
	    --many readers at the same time
	    --only one writer at a time

    --let's follow the concurrency advice .....

	    1. Getting started
		a. what are units of concurrency? [readers/writers]
		b. what are shared chunks of state? [database]
		c. what does the main function look like?
		    read() 
			check in -- wait until no writers
			access DB
			check out -- wake up waiting writer, if appropriate

		    write()
			check in -- wait until no readers or writers
			access DB
			check out -- wake up waiting readers or writers

	    2. and 3. Synchronization constraints and objects

		--reader can access DB when no writers (condition:
		okToRead)

		--writer can access DB when no other readers or writers
		(condition: okToWrite)

		--only one thread manipulates shared variables at a
		time. NOTE: **this does not mean only one thread in the
		DB at a time** (mutex)

    
	    4. write the methods

		--inspiration required:
		    int AR = 0; // # active readers
		    int AW = 0; // # active writers
		    int WR = 0; // # waiting readers
		    int WW = 0; // # waiting writers
	
		--see handout for the code

    --QUESTION: why not just hold the lock all the way through "Execute
    req"? (Answer: the whole point was to expose more concurrency,
    i.e., to move away from exclusive access.)

    --QUESTION: what if we had shared locks? The implementation of
    shared locks is given on the handout


2. Deadlock 

    --see handout: simple example based on two locks

    --see handout: more complex example

	    --M calls N 
	    --N waits
	    --but let's say condition can only become true if N is invoked
	    through M
	    --now the lock inside N is unlocked, but M remains locked; that
	    is, no one is going to be able to enter M and hence N.

    --can also get deadlocks with condition variables

    --lesson: dangerous to hold locks (M's mutex in the case on the
    handout) when crossing abstraction barriers

    --deadlocks without mutexes:

        real issue is resources and how/when they are required/acquired

        (a) [draw bridge example]

	    --bridge only allows traffic in one direction 

	    --Each section of a bridge can be viewed as a resource. 

	    --If a deadlock occurs, it can be resolved if one car
	    backs up (preempt resources and rollback). 

	    --Several cars may have to be backed up if a deadlock occurs. 

	    --Starvation is possible. 

	(b) another example:
		
	    --one thread/process grabs disk and then tries to grab
	    camera

	    --another thread/process grabs camera and then tries to
	    grab disk

    --when does deadlock happen? under four conditions. all of them must
    hold for deadlock to happen:

	1. mutual exclusion
	2. hold-and-wait
	3. no preemption
	4. circular wait


    --what can we do about deadlock?

        (a) ignore it: worry about it when it happens. the so-called
        "ostrich solution"

        (b) detect and recover: not great

	    --could imagine attaching debugger

		--not really viable for production software, but
		works well in development

	    --threads package can keep track of resource-allocation graph

	    --see one of the recommended texts:

		--For each lock acquired, order with other locks held 
		
		--If cycle occurs, abort with error 
	    
		--Detects potential deadlocks even if they do not occur 


        (c) avoid algorithmically

            [not covering]

	    --banker's algorithm (see Tanenbaum text for an desription)

		--very elegant but impractical

		--if you're using banker's algorithm, the gameboard
		looks like this:

		    ResourceMgr::Request(ResourceID resc,
					 RequestorID thrd) {
			acquire(&mutex);
			assert(system in a safe state);
			while (state that would result from giving 
			       resource to thread is not safe) {
			    wait(&cv, &mutex);	
			}
			update state by giving resource to thread
			assert(system in a safe state);
			release(&mutex);
		    }

		    Now we need to determine if a state is safe....

		    To do so, see book

	    --disadvantage to banker's algorithm:

		--requires every single resource request to go
		through a single broker

		--requires every thread to state its maximum
		resource needs up front. unfortunately, if threads
		are conservative and claim they need huge quantities
		of resources, the algorithm will reduce concurrency

        (d) negate one of the four conditions using careful coding:

	    --can sort of negate 1
		--put a queue in front of resources, like the printer
		--virtualize memory

	    --not much hope of negating 2

	    --can sort of negate 3:
		--consider physical memory: virtualized with VM, can
		take physical page away and give to another process! 

	    --what about negating #4?

		--in practice, this is what people do

		--idea: partial order on locks

		    --Establishing an order on all locks and making
		    sure that every thread acquires its locks in
		    that order

		--why this works:

		    --can view deadlock as a cycle in the resource
		    acquisition graph

		    --partial order implies no cycles and hence no
		    deadlock

		--two bummers:

		    1. hard to represent CVs inside this framework.
		    works best only for locks.

		    2. Picking and obeying the order on *all* locks
		    requires that modules make public their locking
		    behavior, and requires them to know about other
		    modules' locking.  This can be painful and
		    error-prone. 

			--see Linux's filemap.c example on the handout;
			this is complexity that arises by the need for a
			locking order

	(e) Static and dynamic detection tools

	    --See, for example, these citations, citations
	    therein, and papers that cite them:

		Engler, D. and K. Ashcraft. RacerX: effective,
		static detection of race conditions and deadlocks.
		Proc. ACM Symposium on Operating Systems Principles
		(SOSP), October, 2003, pp237-252.
		http://portal.acm.org/citation.cfm?id=945468

		Savage, S., M. Burrows, G. Nelson, P. Sobalvarro,
		and T. Anderson. Eraser: a dynamic data race
		detector for multithreaded programs. ACM
		Transactions on Computer Systems (TOCS), Volume 15,
		No 4., Nov., 1997, pp391-411.
		http://portal.acm.org/citation.cfm?id=265927

		a long literature on this stuff

	    --Disadvantage to dynamic checking: slows program down

	    --Disadvantage to static checking: many false alarms
	    (tools says "there is deadlock", but in fact there is
	    none) or else missed problems

	    --Note that these tools get better every year. I believe
	    that Valgrind has a race and deadlock detection tool

3. Other progress issues

    Deadlock was one kind of progress (or liveness) issue. Here are two
    others...

    Starvation

	--thread waiting indefinitely (if low priority and/or if
	resource is contended)

    Priority inversion

	--T1, T2, T3: (highest, middle, lowest priority)

	--T1 wants to get lock, T2 runnable, T3 runnable and holding lock

	--System will preempt T3 and run highest-priority runnable thread, namely T2

	--Solutions:

	    --Temporarily bump T3 to highest priority of any thread that is
	    ever waiting on the lock

	    --Disable interrupts, so no preemption (T3 finishes)
		... not great because OS sometimes needs control
		(not for scheduling, under this assumption, but for
		handling memory [page faults], etc.)

	    --Don't handle it; structure app so only adjacent priority
	    processes/threads share locks

	--Happens in real life. For a real-life example, see:
	https://www.microsoft.com/en-us/research/people/mbj/just-for-fun/
        (search for Pathfinder)


4. Implementation of mutexes

    Going to continue to assume sequential consistency...

    How might we provide the lock()/unlock() abstraction?

    (a) Peterson's algorithm....

        --...solves critical section in that it satisfies mutual
        exclusion, progress, bounded waiting

        --but expensive (busy waiting), requires number of threads to be
        fixed statically, and assumes sequential consistency

        --(see a textbook)

    (b) disable interrupts?

        --works only on a single CPU

        --cannot expose to user processes

    (c) spinlocks

        --see handout

            * buggy approach: what's wrong with this?

            * non-buggy approach: why does this work?

        --works in multi-CPU environment

        --but issue: a spinlock is no good for cases when
        time-to-acquire-lock expected to be long (for example, waiting
        for disk accesses to complete). this is because of busy waiting
        and the fact that waiting chews cycles that could have been
        spent on another task (in the kernel or in user space).

        --for more about spinlocks in Linux, see:
            https://www.kernel.org/doc/Documentation/locking/spinlocks.txt

        --NOTE: the spinlocks that we presented (test-and-set, or
        test-and-test-and-set) can introduce performance issues when
        there is a lot of contention. These performance issues arise
        even if the programmer is using spinlocks correctly. The
        performance issues result from cross-talk among CPUs (which
        undermines caching and generates traffic on the memory bus). If
        we have time later, we will study a remediation of this issue
        (or search the Web for "MCS locks").
        
        --In everyday application-level programming, spinlocks will not
        be something you use. Mainly matters inside kernel. But you
        should know what these are for technical literacy, and to see
        where the mutual exclusion is truly enforced on modern hardware.

    (d) mutexes: spinlock + a queue

        --textbook describes one implementation

        --see handout for another


[Acknowledgments: Mike Walfish]