Class 9 CS 372H 16 February 2010 On the board ------------ 1. Last time 2. Concurrency --What is it? --What makes it hard? --How can we deal with races? 3. Protecting critical sections --Peterson --Spinlocks --Mutexes --Turning off interrupts --------------------------------------------------------------------------- 1. Last time a. threads --new abstraction --basic idea: a thread has its own registers (and stack) but not its own memory --can implement threads in: --user space --Java virtual machine --Flash player --kernel --etc. --two main choices: (i). user-level or kernel (but can have N:M threading, in which N user-level threads are multiplexed over M kernel threads, so the choice is a bit fuzzier) (ii). if user-level: non-preemptive (also known as cooperative) or preemptive [be able to answer: why are kernel threads always preemptive?] --*only* the presence of multiple kernel threads can give: --true multiprocessing (i.e., different threads running on different processors) --asynchronous disk I/O using Posix interface [because read() blocks and causes the *kernel* scheduler to be invoked] --but many modern operating systems provide interfaces for asynchronous disk I/O, at least as an extension --Windows --Linux has AIO extensions --thus, even user-level threads can get asynchronous disk I/O, by having the run-time translate calls that *appear* blocking to the thread [e.g., thread_read()] into a series of instructions that: register for interest in an I/O event, put the thread to sleep, and switch() to another thread --[moral of the story: if you find yourself needing async disk I/O from user-level threads, use one of the non-Posix interfaces!] b. confusing thing: --The kernel itself uses threads internally, when executing in kernel mode. Such threads-in-the-kernel are related to, but not the same thing as, the kernel threads mentioned above. --We'll try to keep these concepts distinct in this class, but we may not always succeed. c. Note well: the issues with concurrency that we're going to discuss are relevant in nearly all of the above cases --the one exception is non-preemptive user-level threads. those only yield when the programmer says yield(). --so you can do clever things (but be careful) d. How many people have programmed with threads before? e. Questions? 2. Concurrency A. What is it? --Stuff happening at the same time --Arises in many ways --pseudo-concurrency: from scheduling --real concurrency: multiple processors --Examples: --multiple kernel threads within a process --multiple processes sharing memory --what about multiple hosts distributed across a network? (conceptually, issues are the same, but needed mechanisms are different) --We're going to treat the issues in general...they apply to processes sharing memory pages, kernel threads sharing memory spaces, user-level threads that are preemptible, etc. --so for the rest of today, we're going to talk about two threads, but this could mean: --threads inside a single process --threads inside the kernel --even two separate processes that share memory B. What makes it hard? --lots of things can go wrong..... --we will see others later (deadlock, priority inversion, etc.) --for now, look at data races.... --some examples; see handout: 1a: x = 1 or x = 2. 1b: x = 13 or x = 25. 1c: x = 1 or x = 2 or x = 3 2: incorrect list structure 3: incorrect count in buffer --all of these are called *race conditions* --worst part of these errors are that a program may work fine most of the time but only occasionally show problems. why? (because the instructions of the various threads or processes or whatevever get interleaved in a non-deterministic order). --and it's worse than that because inserting debugging code may change the timing so that the bug doesn't show up C. How can we deal with races? --make the needed operations atomic --how? 1. A single-instruction add? --E.g., i386 allows single instruction addl $1, count --So implement count++/-- with one instruction --Now are we safe? --No: not atomic on multiprocessor! --Will experience exact same race condition 2. How about using x86 LOCK prefix? --can make read-modify-write instructions atomic by preceding them with "LOCK". examples of such instructions are: XADD, CMPXCHG, INC, DEC, NOT, NEG, ADD, SUB... (when their destination operand refers to memory) --but using LOCK is very expensive (flushes processor caches) and not a "general-purpose abstraction" --only applies to one instruction: what if we need to execute three or four instructions as a unit? --compiler won't generate it by default, assumes you don't want penalty 3. Critical sections --Place count++ and count-- in critical section --Protect critical sections from concurrent execution --Now we need solution to _critical section_ problem --Solution must satisfy 3 rules: 1. mutual exclusion only one thread can be in c.s. at a time 2. progress if no threads executing in c.s., one of the threads trying to enter a given c.s. will eventually get in 3. bounded waiting once a thread T starts trying to enter the critical section, there is a bound on the number of other threads that may enter the critical section before T enters --Note progress vs. bounded waiting --If no thread can enter C.S., don't have progress --If thread A waiting to enter C.S. while B repeatedly leaves and re-enters C.S. ad infinitum, don't have bounded waiting --Gameboard is that we're now going to build primitives to protect critical sections 3. Protecting critical sections --Peterson's algorithm.... --see book --does satisfy mutual exclusion, progress, bounded waiting --But expensive and not encapsulated --High-level: --want: lock()/unlock() or enter()/leave() or acquire()/release() --lots of names for the same idea --mutex_init(mutex_t* m), mutex_lock(mutex_t* m), mutex_unlock(mutex_t* m),.... --pthread_mutex_init(), pthread_mutex_lock(), ... --in each case, the semantics are that once the thread of execution is executing inside the critical section, no other thread of execution is executing there --How to implement locks/mutexes/etc.? --Spinlocks --Fine for quick operations in kernel --Not good in user space or even for waiting for long periods of time in kernel --Question: why not use spinlocks for access to disk drive? --answer: wastes CPU --note: it's unavoidable that we need hardware support because at the lowest level, we're trying to decide which particular thread is doing something first --Mutexes --those get implemented in terms of lower-level lock --how to implement the lower-level lock? --turn off interrupts --only works on uni-processor system --spinlocks, which rely on hardware-level synchronization for example, xchg .... --How does turning off interrupts relate to all of the above? --Answer: on a single CPU machine, could replace spinlock.acquire() with "disable interrupts" and spinlock.release() with "enable interrupts" --On a multi-CPU machine, spinlocks themselves need to run with interrupts off. Why? --consider memory shared between an interrupt handler and a thread-inside-kernel (e.g., interrupt handler enqueues I/O events and inside-kernel thread handles those events). --interrupt handler cannot sleep so cannot wait for a lock (that would wedge the machine) --solution: turn off interrupts before acquiring spinlock: spinlock.acquire() "pushes" interrupts (saves their current state) spinlock.release() "pops" interrupts (restores their current state). --Perhaps even more confusingly, "interrupt" above should be viewed *abstractly*: --certainly if the code is executing inside the kernel, disabling and enabling interrupts means "turning off the processor's interrupts" --but if we're talking about a preemptive user-level threading package, then "interrupt" might just mean a non-deterministic timer signal that invokes the thread scheduler. in that case, "turning off interrupts" could mean "deregistering for signals" or else "record the fact that signals were delivered but don't act on that fact until 'interrupts' are reenabled".