Class 9 CS 439 12 Feburary 2013 On the board ------------ 1. Last time 2. revisiting threads --review --classification --implementation --------------------------------------------------------------------------- 1. Last time (video; sorry about the problems with it.) --PC architecture --x86 programming --gcc calling convention today: --implementation of concurrency primitives 2. Thread abstraction (emphasis on how implemented, and on the variants.) A. Review --recall: *threads* are a very natural way to do multiple tasks but operating on the same memory state. there are two fundamental motivations for threads, but not each of these motivations applies to every instance: (1) desire to have a single process take advantage of multiple CPUs (*) --> but we'll see that whether the process can in fact take advantage of multiple CPUs depends on the implementation of threads (2) often very natural to structure some computation (or task or job or whatever) as multiple units of control that see the same memory (*) --> but we'll see that this motivation depends on the computation itself --abstraction/illusion: sequential set of instructions that executes within the address space of a process (i) a thread *is* a set of registers (including a PC/IP) and a stack. (ii) multiple threads within the same process share the same memory. (they can even read and write each other's stacks, but if there are no bugs, that should not happen. generally the memory that they both look at it is heap memory or statically initialized memory.) --another way to put this: a thread does not have its own page directory. so on the x86, two threads share the same value of %cr3 (the virtual memory lab will make clear what that means) (iii) multiple threads within the same process are executing at once (*) --> but we'll see that this only actually happens sometimes [Note for your studying: if you truly understand why each of the three counterpoints marked "(*) --> but" above is true, then you have a good handle on the true motivations for threads and on what problems threads are solving.] B. Classification this abstraction can be implemented at multiple levels a. in-kernel, for kernel b. in-kernel, for processes c. in process, for user-level threads (examples: Java virtual machine, Flash player, lots of applications!) recall that multiple threads share memory but not registers --this means, to first approximation, that they see each other's heaps but not each other's stacks. different kinds of threads: (review) --non-preemptive: a thread executes exclusively until it makes a blocking call. (e.g., a read() on a file). --preemptive threads: between any two instructions, another thread can run [how is this implemented? answer: with interrupts and context switches] C. Implementation --one way to understand a given implementation of threads is by answering: * where is TCB stored? * what does swtch() look like, and who implements it? * what is the level of true concurrency? (1) kernel-level threading --TCB looks a lot like PCB --[Draw picture] --thread_create() becomes a syscall --swtch() is like context switch --what is the level of true concurrency? --when do thread switches happen? --with kernel-level threading, it can happen at any point. --multiple kernel-level threads can run on multiple processors (because it's the kernel that decides what runs on which processors whenA) --basic game plan for dispatch/swtch: --thread is running --switch to kernel --save thread state (to TCB) --Choose new thread to run --Load its state (from TCB) --new thread is running --Can two kernel-level threads execute on two different processors? (Answer: yes.) --Disadvantage to kernel-level threading: --every thread operation (create, exit, join, synchronize, etc.) goes through the kernel --> 10x-30x slower than user-level threads --heavier-weight memory requirements (each thread gets a stack in user space *and* within the kernel. compare to user-level threads: each thread gets a stack in user space, and there's one stack within the kernel that corresponds to the process.) (2) user-level threading --kernel is totally ignorant of user-level threads. so where is TCB stored? --thread_create() allocates a new stack --do we need memory space for registers? --run-time system: --keeps a queue of runnable threads --provides a layer above system calls: if they would block, switch, and run a different thread --run-time system does scheduling --thread is running --save thread state (to TCB) --Choose new thread to run --Load its state (from TCB) --new thread is running --what does swtch() look like? --see handout..... --what is the level of true concurrency? --answer: none. given a process that is using user-level threading, **only one instruction in that process can execute at a time**. --when does swtch() happen? Two options: 1. Only when a thread calls yield() or would block on I/O --This is called *cooperative multithreading* or *non-preemptive multithreading*. --Upside: Makes it easier to avoid errors from concurrency --Downside: Harder to program because now the threads have to be good about yielding, and you might have forgotten to yield inside a CPU-bound task. 2. What if we wanted to make user-level threads switch non-deterministically? --deliver a periodic timer interrupt or signal to a thread scheduler [setitimer() ]. When it gets its interrupt, swap out the thread. --makes it more complex to program with user-level threads --in practice, systems aren't usually built this way, but sometimes it is what you want (e.g., if you're simulating some OS-like thing inside a process, and you want to simulate the non-determinism that arises from hardware timer interrupts). --Before continuing, we need to clarify *blocking* versus *nonblocking* I/O calls. --Blocking means that the entity making the call (the thread in this case) does not progress past the I/O call (often a read() or write()) unless there is data for the thread (or, in the case of a write, unless the output channel can accommodate the data) --Nonblocking means that if the call *would* block, the call returns with an error message, and the thread keeps going. --(This idea also pertains to read/write system calls exposed by the kernel for the use of a process.) --Usually, the *thread* is supposed to see the call as blocking. However, there is a subtlety that is important: the other side of that call (e.g., the run-time that created the thread abstraction) makes a corresponding system call in *non-blocking* mode. That is because in this scenario of user-level threads, if the run-time *did* block, it wouldn't be able to run another thread. --As an aside, note that the relationship between the run-time and the thread is very similar to the relationship between the kernel and a process. When a process makes a blocking I/O call (most of you have done this at some point in your life -- pretty much whenever you called read() to get the data in some file), the kernel puts the process to sleep until the data arrives from the disk. But just as the run-time issues the I/O syscall to the kernel in non-blocking mode, the kernel issues the I/O request to the disk in non-blocking mode. The reason is that if the kernel went to sleep every time it waited on data from the disk, then the kernel wouldn't be able to run other processes. Put differently, the abstraction of "sleeping until there is data available" is an abstraction presented to the higher layer, and the lower layer implements that abstraction by simply not running the higher layer until the data is available. --Let's look at how the above approach is implemented, focusing on the register/EIP/stack switching. We will further focus on the case of *cooperative* user-level multithreading. Basic idea: swtch() called at "sane" moments, in response to a function call from a thread. That function is usually yield(), i.e., the call graph usually looks like this: fake_read() if read would block yield() swtch() and the pseudocode looks something like this: int fake_read(int fd, char* buf, int num) { int nread = -1; while (nread == -1) { /* this is a non-blocking read() syscall */ nread = read(fd, buf, num); if (nread == -1) { /* read would block */ yield(); } } return nread; } void yield() { tid next = pick_next_thread(); /* get a runnable thread */ tid current = get_current_thread(); swtch(current, next); } --to repeat, what "would block" means: --in read direction, it means that there's no data to read --in write direction, it means that output buffers are full, so the write cannot happen yet --how is swtch() implemented? --see handout..... --[draw picture of the two stacks] --make sure you understand what is going on --How to switch threads in non-cooperative context? In non-cooperative context, a thread could be switched out at any moment, so its state is not neatly arranged on the stack, per the call graph but in that case, the OS would have put some of the thread's registers in a trap frame, and the run-time can yank those registers, save them (and the other registers) in the TCB or on the thread's regular stack, and then restore them later Said differently, thread switching by the user-level run time looks a lot like process switching by the kernel. Notes/questions: --In kernel's PCB, only one set of registers is stored..... --QUESTION: where are the other registers for the other threads? Disadvantages to user-level threads: --Can we imagine having two user-level threads truly executing at once, that is on two different processors? (Answer: no. why?) --What if the OS handles page faults for the process? (then a page fault in one thread blocks all threads). --(not a huge issue in practice) --Similarly, if a thread needs to go to disk, then that actually blocks *all* threads (since the kernel won't allow the run-time to make a non-blocking read() call to the disk). So what do we do about this? --extend the API; or --live with it; or --use elaborate hacks with memory mapped files (e.g., files are all memory mapped, and runtime asks to handle its own page faults, if the OS allows it) --[SKIP IN CLASS] Old debates about user-level threading vs. kernel-level threading. The "Scheduler Activations" paper, by Anderson et al., [ACM Transactions on Computer Systems 10, 1 (February 1992), pp. 53--79] proposes an abstraction that is a hybrid of the two. --basically OS tells process: "I'm ready to give you another virtual CPU (or to take one away from you); which of your user-level threads do you want me to run?" --so user-level scheduler decides which threads run, but kernel takes care of multiplexing them --[COVER LATER] Some people think that threads, i.e., concurrent applications, shouldn't be used at all (because of the many bugs and difficult cases that come up, as we'll discuss). However, that position is becoming increasingly less tenable, given multicore computing. --The fundamental reason is this: if you have a computation-intensive job that wants to take advantage of all of the hardware resources of a machine, you either need to (a) structure the job as different processes; or (b) use kernel-level threading. There is no other way, given mainstream OS abstractions, to take advantage of a machine's parallelism. (a) winds up being inconvenient (in order to share data, the processes either have to separately set up shared memory regions, or else pass messages). So people use (b). Quick comparison between user-level threading and kernel-level: (i). high-level choice: user-level or kernel-level (but can have N:M threading, in which N user-level threads are multiplexed over M kernel threads, so the choice is a bit fuzzier) (ii). if user-level, there's another choice: non-preemptive (also known as cooperative) or preemptive [be able to answer: why are kernel-level threads always preemptive?] --*Only* the presence of multiple kernel-level threads can give: --true multiprocessing (i.e., different threads running on different processors) --asynchronous disk I/O using Posix interface [because read() blocks and causes the *kernel* scheduler to be invoked] --but many modern operating systems provide interfaces for asynchronous disk I/O, at least as an extension --Windows --Linux has AIO extensions --thus, even user-level threads can get asynchronous disk I/O, by having the run-time translate calls that *appear* blocking to the thread [e.g., thread_read()] into a series of instructions that: register for interest in an I/O event, put the thread to sleep, and switch() to another thread --[moral of the story: if you find yourself needing async disk I/O from user-level threads, use one of the non-Posix interfaces!] Historical notes: classification: # address spaces one many # threads/ addr space one MS Dos traditional Unix Palm OS many Embedded systems, VMS, Mach, NT, Solaris, HP-UX, ... Pilot (OS on first personal computer ever built -- the Alto. idea was there was no need for protection if there was only one user.) D. The use of threads [thanks to David Mazieres for content in portions of this lecture.]