Class 6 CS 439 5 Feburary 2013 On the board ------------ 1. Last time 2. Reinforce atomicity 3. Trade-offs and problems from locking A. Hard to get right B. performance v. complexity trade-off C. starvation D. priority inversion E. deadlock F. broken modularity 4. More advice --------------------------------------------------------------------------- 1. Last time --clarified condition variables --monitors --standards --bit of practice 2. Reinforce atomicity --atomicity is required if you want to reason about code without contorting your brain to reason about all possible interleavings --atomicity requires mutual exclusion aka a solution to critical sections --mutexes provide that solution --once you have mutexes, don't have to worry about arbitrary interleavings. critical sections are interleaved, but those are much easier to reason about than individual operations. --why? because of _invariants_. example of an invariant: "list structure has integrity" the meaning of lock.acquire() is that if and only if you get past that line, it's safe to violate the invariants. the meaning of lock.release() is that right _before_ that line, any invariants need to be restored. the above is abstract. let's make it concrete: invariant: "list structure has integrity" so protect the list with a mutex only after acquire() is it safe to manipulate the list just before release(), the list needs to be in a sane state ASK: based on the above, what do we need to be careful about before/after wait()? that invariants hold. (because wait() releases the mutex.) this is why there is state manipulation (of AW, WW, WR, AR) before/after wait() in the example from last time 3. Trade-offs and problems from locking Locking (in all its forms: mutexes, monitors, semaphores) raises many issues: A. Hard to get right --this is a programming model where, unfortunately, the incorrect version of the code is far easier to write than the correct version of the code. B. Performance/complexity trade-off --one big lock is often not great for performance --indeed, locking itself is the issue: changing the lock type is unlikely to be as big of a performance win as restructuring the code --the fundamental issue with coarse-grained locking is that only one CPU can execute anywhere in the part of your code protected by a lock. If your code is called a lot, this may reduce the performance of an expensive multiprocessor to that of a single CPU. --if this happens inside the kernel, it means that applications will inherit the performance problems from the kernel --Perhaps locking at smaller granularity would get higher performance through more concurrency. --But how to best reduce lock granularity is a bit of an art. --And unfortunately finer-grained locking makes incorrect code far more likely --And modularity further suffers (see item F. below) --Two examples of the above issues: --Example 1: imagine that every file in the file system is represented by a number, in a big table --You might inspect the file system code and notice that most operations use just one file or directory, leading you to have one lock per file --You could imagine the code implementing directories exporting various operations like dir_lookup(d, name) dir_add(d, name, file_number) dir_del(d, name) --With fine-grained locking, these directory operations would *internally* acquire the lock on d, do their work, and release the lock --Then higher-level code could implement operations like moving a file from one directory to another: move(olddir, oldname, newdir, newname) { file_number = dir_lookup(olddir, oldname) dir_del(olddir, oldname) dir_add(newdir, newname, file_number) } --Unfortunately, this isn't great: --period of time when file is visible in neither directory. to fix that requires that the directory locks _not_ be hidden inside the dir_* operations. --so we need something like this: move(olddir, oldname, newdir, newname){ acquire(olddir.lock) acquire(newdir.lock) file_number = dir_lookup(olddir, oldname) dir_del(olddir, oldname) dir_add(newdir, newname, file_number) release(newdir.lock) release(olddir.lock) --The above code is a bummer in that it exposes the implementation of directories to move(), but (if all you have is locks) you have to do it this way. --Example 2: see filemap.c at end of handout for an extreme case --Mitigation? Unfortunately, no way around this trade-off. --worse, easy to get this stuff wrong: correct code is harder to write than buggy code --If you have fine-grained locking (i.e., you are trading off simplicity), then you are much more likely to encounter the two types of errors: (i) safety errors (race conditions) (ii) liveness errors (deadlocks, etc.) --***So what do people do?*** --in app space: --don't worry too much about performance up front. makes it easier to keep your code free of safety problems *and* liveness problems --if you are worrying about performance, make sure there are no race conditions. much more important than worrying about deadlock. --SAFETY FIRST. --almost always far better for your program to do nothing than to do the wrong thing (example of using Linear Accelerator for radiation therapy: **way** better not to subject patient to radiation beam than to subject patient to a beam that is 100x too strong, leading to gruesome, atrocious injuries) --if the program deadlocks, the evidence is intact, and we can go back and see what the problem was. --there are ways around deadlock, as we will discuss in a moment --but we shouldn't be too cavalier about liveness issues because it could lead to catastrophic cases. Example: Mars Pathfinder (which was addressed; see above), but still. --in kernel space: --same thing, to some extent --but performance matters more in kernel space, so likely to be dealing with more complex issues --here again, SAFETY FIRST --lock more aggressively --worry about deadlock later --not a satisfying answer, but there is no silver bullet for concurrency-related issues --By the way, if there is lots of contention, then the style and granularity of locks will not eliminate the problem. Where does contention come from? --application requirements. lots of contention from applications that inherently require global resources or shared data. --example of Apache: every CPU needs to write to a global logfile, which causes contention in the kernel. you can make the locking as fine-grained as you want, but at the end of the day, if there's a single logfile, a single writer permitted at a time, and many contending writers, then that logfile is going to wind up serializing all of the writers. C. Starvation --thread waiting indefinitely (if low priority and/or if resource is contended) D. Priority inversion --T1, T2, T3: (highest, middle, lowest priority) --T1 wants to get lock, T2 runnable, T3 runnable and holding lock --System will preempt T3 and run highest-priority runnable thread, namely T2 --Solutions: --Temporarily bump T3 to highest priority of any thread that is ever waiting on the lock --Disable interrupts, so no preemption (T3 finishes) ... works okay unless a page fault occurs --Don't handle it; structure app so only adjacent priority processes/threads share locks --Happens in real life. For a real-life example, see: http://research.microsoft.com/en-us/um/people/mbj/Mars_Pathfinder/Mars_Pathfinder.html --------------------------------------------------------------------------- Admin stuff --video over weekend --------------------------------------------------------------------------- E. Deadlock --see handout: simple example based on two locks --see handout: more complex example --M calls N --N waits --but let's say condition can only become true if N is invoked through M --now the lock inside N is unlocked, but M remains locked; that is, no one is going to be able to enter M and hence N. --can also get deadlocks with condition variables --lesson: dangerous to hold locks (M's mutex in the case on the handout) when crossing abstraction barriers --deadlocks without mutexes: --Real issue is resources & how required --non-computer example **[picture of bridge]** --bridge only allows traffic in one direction --Each section of a bridge can be viewed as a resource. --If a deadlock occurs, it can be resolved if one car backs up (preempt resources and rollback). --Several cars may have to be backed up if a deadlock occurs. --Starvation is possible. --other example: --one thread/process grabs disk and then tries to grab scanner --another thread/process grabs scanner and then tries to grab disk --how do we get around deadlock? (i) ignore it: worry about it when happens (ii) detect and recover: not great --could imagine attaching debugger --not really viable for production software, but works well in development --threads package can keep track of resource-allocation graph --see book --For each lock acquired, order with other locks held --If cycle occurs, abort with error --Detects potential deadlocks even if they do not occur (iii) avoid algorithmically [didn't cover this year] --banker's algorithm (see book) --very elegant but impractical --if you're using banker's algorithm, the gameboard looks like this: ResourceMgr::Request(ResourceID resc, RequestorID thrd) { acquire(&mutex); assert(system in a safe state); while (state that would result from giving resource to thread is not safe) { wait(&cv, &mutex); } update state by giving resource to thread assert(system in a safe state); release(&mutex); } Now we need to determine if a state is safe.... To do so, see book --disadvantage to banker's algorithm: --requires every single resource request to go through a single broker --requires every thread to state its maximum resource needs up front. unfortunately, if threads are conservative and claim they need huge quantities of resources, the algorithm will reduce concurrency (iv) prevent them by careful coding --negate one of the four conditions: 1. mutual exclusion 2. hold-and-wait 3. no preemption 4. circular wait --can sort of negate 1 --put a queue in front of resources, like the printer --virtualize memory --not much hope of negating 2 --can sort of negate 3: --consider physical memory: virtualized with VM, can take physical page away and give to another process! --what about negating #4? --in practice, this is what people do --idea: partial order on locks --Establishing an order on all locks and making sure that every thread acquires its locks in that order --why this works: --can view deadlock as a cycle in the resource acquisition graph --partial order implies no cycles and hence no deadlock --three bummers: 1. hard to represent CVs inside this framework. works best only for locks. 2. compiler can't check at compile time that partial order is being adhered to because calling pattern is impossible to determine without running the program (thanks to function pointers and the halting problem) 3. Picking and obeying the order on *all* locks requires that modules make public their locking behavior, and requires them to know about other modules' locking. This can be painful and error-prone. --we saw Linux's filemap.c as an example of complexity introduced by having a locking order [willcover next time; listing here for context/flow] (v) Static and dynamic detection tools --See, for example, these citations, citations therein, and papers that cite them: Engler, D. and K. Ashcraft. RacerX: effective, static detection of race conditions and deadlocks. Proc. ACM Symposium on Operating Systems Principles (SOSP), October, 2003, pp237-252. http://portal.acm.org/citation.cfm?id=945468 Savage, S., M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: a dynamic data race detector for multithreaded programs. ACM Transactions on Computer Systems (TOCS), Volume 15, No 4., Nov., 1997, pp391-411. http://portal.acm.org/citation.cfm?id=265927 a long literature on this stuff --Disadvantage to dynamic checking: slows program down --Disadvantage to static checking: many false alarms (tools says "there is deadlock", but in fact there is none) or else missed problems --Note that these tools get better every year. I believe that Valgrind has a race and deadlock detection tool F. broken modularity --examples above: avoiding deadlock requires understanding how programs call each other. --also, need to know, when calling a library, whether it's thread-safe: printf, malloc, etc. If not, surround call with mutex. (Can always surround calls with mutexes conservatively.) --basically locks bubble out of the interface