Class 7 CS 202 13 February 2023 On the board ------------ 1. Last time 2. Implementations of spinlocks, mutexes 3. Deadlock 4. Other progress issues --------------------------------------------------------------------------- 1. Last time - Monitors and standards - Advice - Practice with concurrent programming 2. Implementation of spinlocks and mutexes Going to continue to assume sequential consistency... How might we provide the lock()/unlock() abstraction? (a) Peterson's algorithm.... --...solves critical section in that it satisfies mutual exclusion, progress, bounded waiting --but expensive (busy waiting), requires number of threads to be fixed statically, and assumes sequential consistency --(see a textbook) (b) disable interrupts? --works only on a single CPU --cannot expose to user processes (c) spinlocks --see handout * buggy approach: what's wrong with this? * non-buggy approach: why does this work? --works in multi-CPU environment --but issue: a spinlock is no good for cases when time-to-acquire-lock expected to be long (for example, waiting for disk accesses to complete). this is because of busy waiting and the fact that waiting chews cycles that could have been spent on another task (in the kernel or in user space). --for more about spinlocks in Linux, see: https://www.kernel.org/doc/Documentation/locking/spinlocks.txt --NOTE: the spinlocks that we presented (test-and-set, or test-and-test-and-set) can introduce performance issues when there is a lot of contention. These performance issues arise even if the programmer is using spinlocks correctly. The performance issues result from cross-talk among CPUs (which undermines caching and generates traffic on the memory bus). If you are curious for a remediation of this issue, look up "MCS locks". --In everyday application-level programming, spinlocks will not be something you use. Mainly matters inside kernel. But you should know what these are for technical literacy, and to see where the mutual exclusion is truly enforced on modern hardware. (d) mutexes: spinlock + a queue --textbook describes one implementation --see handout for another ----- admin notes: - no class next Wednesday - make-up class this Thursday - sign-up sheet will be posted tonight ----- 3. Deadlock --see handout: simple example based on two locks --see handout: more complex example --M calls N --N waits --but let's say condition can only become true if N is invoked through M --now the lock inside N is unlocked, but M remains locked; that is, no one is going to be able to enter M and hence N. --can also get deadlocks with condition variables --lesson: dangerous to hold locks (M's mutex in the case on the handout) when crossing abstraction barriers --deadlocks without mutexes: real issue is resources and how/when they are required/acquired (a) [draw bridge example] --bridge only allows traffic in one direction --Each section of a bridge can be viewed as a resource. --If a deadlock occurs, it can be resolved if one car backs up (preempt resources and rollback). --Several cars may have to be backed up if a deadlock occurs. --Starvation is possible. (b) another example: --one thread/process grabs disk and then tries to grab camera --another thread/process grabs camera and then tries to grab disk --when does deadlock happen? under four conditions. all of them must hold for deadlock to happen: 1. mutual exclusion 2. hold-and-wait 3. no preemption 4. circular wait --what can we do about deadlock? (a) ignore it: worry about it when it happens. the so-called "ostrich solution" (b) detect and recover: not great --could imagine attaching debugger --not really viable for production software, but works well in development --threads package can keep track of resource-allocation graph --see one of the recommended texts: --For each lock acquired, order with other locks held --If cycle occurs, abort with error --Detects potential deadlocks even if they do not occur (c) avoid algorithmically [not covering] --banker's algorithm (see Tanenbaum text for an desription) --very elegant but impractical --if you're using banker's algorithm, the gameboard looks like this: ResourceMgr::Request(ResourceID resc, RequestorID thrd) { acquire(&mutex); assert(system in a safe state); while (state that would result from giving resource to thread is not safe) { wait(&cv, &mutex); } update state by giving resource to thread assert(system in a safe state); release(&mutex); } Now we need to determine if a state is safe.... To do so, see book --disadvantage to banker's algorithm: --requires every single resource request to go through a single broker --requires every thread to state its maximum resource needs up front. unfortunately, if threads are conservative and claim they need huge quantities of resources, the algorithm will reduce concurrency (d) negate one of the four conditions using careful coding: --can sort of negate 1 --put a queue in front of resources, like the printer --virtualize memory --not much hope of negating 2 --can sort of negate 3: --consider physical memory: virtualized with VM, can take physical page away and give to another process! --what about negating #4? --in practice, this is what people do --idea: partial order on locks --Establishing an order on all locks and making sure that every thread acquires its locks in that order --why this works: --can view deadlock as a cycle in the resource acquisition graph --partial order implies no cycles and hence no deadlock --two bummers: 1. hard to represent CVs inside this framework. works best only for locks. 2. Picking and obeying the order on *all* locks requires that modules make public their locking behavior, and requires them to know about other modules' locking. This can be painful and error-prone. --see Linux's filemap.c example on the handout; this is complexity that arises by the need for a locking order (e) Static and dynamic detection tools --See, for example, these citations, citations therein, and papers that cite them: Engler, D. and K. Ashcraft. RacerX: effective, static detection of race conditions and deadlocks. Proc. ACM Symposium on Operating Systems Principles (SOSP), October, 2003, pp237-252. http://portal.acm.org/citation.cfm?id=945468 Savage, S., M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: a dynamic data race detector for multithreaded programs. ACM Transactions on Computer Systems (TOCS), Volume 15, No 4., Nov., 1997, pp391-411. http://portal.acm.org/citation.cfm?id=265927 a long literature on this stuff --Disadvantage to dynamic checking: slows program down --Disadvantage to static checking: many false alarms (tools says "there is deadlock", but in fact there is none) or else missed problems --Note that these tools get better every year. I believe that Valgrind has a race and deadlock detection tool 4. Other progress issues Deadlock was one kind of progress (or liveness) issue. Here are two others... Starvation --thread waiting indefinitely (if low priority and/or if resource is contended) Priority inversion --T1, T2, T3: (highest, middle, lowest priority) --T1 wants to get lock, T2 runnable, T3 runnable and holding lock --System will preempt T3 and run highest-priority runnable thread, namely T2 --Solutions: --Temporarily bump T3 to highest priority of any thread that is ever waiting on the lock --Disable interrupts, so no preemption (T3 finishes) ... not great because OS sometimes needs control (not for scheduling, under this assumption, but for handling memory [page faults], etc.) --Don't handle it; structure app so only adjacent priority processes/threads share locks --Happens in real life. For a real-life example, see: https://www.microsoft.com/en-us/research/people/mbj/just-for-fun/ (search for Pathfinder)