Class 11 CS 439 19 Feburary 2013 On the board ------------ 1. Last time 2. scheduling --disciplines --lessons and conclusions 3. midterm review --------------------------------------------------------------------------- 1. Last time --mutexes and spinlocks 2. Scheduling intro A. [last time] Where do scheduling decisions happen? B. [done] What are metrics and criteria? C. Context switching costs --CPU time in kernel --save and restore registers --switch address spaces --indirect costs --TLB shootdowns, processor cache, OS caches (e.g., buffer caches) --result: more frequent context switches will lead to worse throughput (higher overhead) 3. Scheduling disciplines A. FCFS/FIFO --run each job until it's done --P1 needs 24 seconds --P2 needs 3 seconds --P3 needs 3 seconds --[ P1 P2 P3 ] --throughput: 3 jobs / 30 seconds = .1 jobs/sec --average turnaround time? 1/3(24 + 27 + 30) = 27 --observe: scheduling P2,P3,P1 would drastically reduce average turnaround time --advantages to FCFS: --simple --no starvation --few context switches --disadvantage: --short jobs get stuck behind long ones *** Larger issue: I/O vs computation --jobs contain bursts of computation, then must wait for I/O --to maximize throughput: --must maximize CPU utilization *and* --must maximize I/O device utilization --how? --overlap I/O and computation from multiple jobs --means *response time* very important for I/O intensive jobs: I/O device will be idle until some job gets small bit of CPU to issue next I/O request Most CPU bursts are small; a few are very long What are implications for FCFS? CPU bound jobs will hold CPU until exit or I/O (but I/O rare for CPU-bound process) --long periods where no I/O requests issued, and CPU held --Result: poor I/O device utilization Example: one CPU-bound job, many I/O bound --CPU bound runs (I/O devices idle) --CPU bound blocks --I/O bound job(s) run, quickly block on I/O --CPU bound runs again --I/O completes --CPU bound job continues while I/O devices idle Simple hack: run process whose I/O completed? --What is a potential problem? (Answer: when the CPU-bound job issues an I/O request, it gets the CPU again and then bursts for a while, leaving us back where we started.) *** B. Round-robin --add timer --preempt CPU from long-running jobs. per time slice or quantum --after time slice, go to the back of the ready queue --most OSes do something of this flavor --JOS does something like this, as you saw in lab 4 --advantages: --fair allocation of CPU across jobs --low average waiting time when job lengths vary --good for responsiveness if small number of jobs --disadvantages: --what if jobs are same length? --example: 2 jobs of 50 time units each, quantum of 1 --average completion time: 100 (vs. 75 for FCFS) --how to choose the quantum size? --want much larger than context switch cost --majority of bursts should be less than quantum --pick too small, and spend too much time context switching --pick too large, and response time suffers (extreme case: system reverts to FCFS) --typical time slice is between 10-100 milliseconds. context switches are usually microseconds or tens of microseconds (maybe hundreds) C. SJF (shortest job first) --STCF: shortest time to completion first --Schedule the job whose next CPU burst is the shortest --SRTCF: shortest remaining time to completion first --preemptive version of STCF: if job arrives that has a shorter time to completion than the remaining time on the current job, immediately preempt CPU to give to new job --idea: --get short jobs out of the system --big (positive) effect on short jobs, small (negative) effect on large jobs --result: minimize waiting time (can prove this) --seeks to minimize average waiting time for a given set of processes --example 1: process arrival time burst time P1 0 7 P2 2 4 P3 4 1 P4 5 4 preemptive: P1 P1 P2 P2 P3 P2 P2 P4 P4 P4 P4 P1 P1 P1 P1 P1 --example 2: 3 jobs A, B: both CPU bound, run for a week C: I/O bound, loop 1 ms of CPU 10 ms of disk I/O by itself, C uses 90% of disk By itself, A or B uses 100% of CPU what happens if we use FIFO? --if A or B gets in, keeps CPU for 2 weeks what about RR with 100msec time slice? --only get 5% disk utilization what about RR with 1msec time slice? --get nearly 90% disk utilization --but lots of preemptions with SRTCF: --no needless preeemptions --get high disk utilization --SRTCF advantages: --optimal response time (min waiting time) --low overhead --disadvantages: --not hard to get unfairness or starvation (long-running jobs) --does not optimize turnaround time (only waiting time) --** requires predicting the future ** so useful as a yardstick for measuring other policies (good way to do CS design and development: figure out what the absolute best answer is, then figure out how to approximate it) however, can attempt to estimate future based on past (another thing that people do when designing systems): --Exponentially weighted average a good idea --t_n: length of proc's nth CPU burst --\tao_{n+1}: estimate for n+1 burst --choose \alpha, 0 < \alpha <= 1 --set \tao_{n+1} = \alpha * t_n + (1-\alpha)*\tao_n --this is called an exponential weighted moving average (EWMA) --reacts to changes, but smoothly upshot: favor jobs that have been using CPU the least amount of time; that ought to approximate SRTCF D. Priority schemes --priority scheme: give every process a number (set by administrator), and give the CPU to the process with the highest priority (which might be the lowest or highest number, depending on the scheme) --can be done preemptively or non-preemptively --generally a bad idea because of starvation of low priority tasks --here's an extreme example: --say H at high priority, L at low priority --H tries to acquire lock, fails, and spins --L never runs --but note: SJF is priority scheduling where priority is the predicted next CPU burst time --solution to this starvation is to increase a process's priority as it waits E. lottery and stride scheduling [citation: C. A. Waldsburger and W. E. Weihl. Lottery Scheduling: Flexible Proportional-Share Resource Management. Proc. Usenix Symposium on Operating Systems Design and Implementation, November, 1994. http://www.usenix.org/publications/library/proceedings/osdi/full_papers/waldspurger.pdf] --lottery scheduling: Issue lottery tickets to processes -- Let p_i have t_i tickets -- Let T be total # of tickets, T = \sum t_i -- Chance of winning next quantum is t_i / T -- Note lottery tickets not used up, more like season tickets controls long-term average proportion of CPU for each process can also group processes hierarchically for control --subdivide lottery tickets --can model as currency, so there can be an exchange rate between real currencies (money) and lottery tickets --lots of nice features --deals with starvation (have one ticket --> will make progress) --don't have to worry that adding one high priority job will starve all others --adding/deleting jobs affects all jobs proportionally (T gets bigger) --can transfer tickets between processes: highly useful if a client is waiting for a server. then client can donate tickets to server so it can run. --note difference between donating tix and donating priority. with donating tix, recipient amasses enough until it runs. with donating priority, no difference between one process donating and 1000 processes donating --many other details --ticket inflation for processes that don't use their whole quantum --use fraction f of quantum; inflate tix by 1/f until it next gets CPU --disadvantages --latency unpredictable --expected error somewhat high --for those comfortable with probability: this winds up being a binomial distribution. variance n*p*(1-p) --> standard deviation \proportional \sqrt(n), --where: p is fraction of tickets owned n is number of quanta --in reaction to these disadvantages, Waldspurger and Weihl proposed *Stride Scheduling* [citations: C. A. Waldsburger and W. E. Weihl. Stride Scheduling: Deterministic Proportional-Share Resource Management. Technical Memorandum MIT/LCS/TM-528, MIT Laboratory for Computer Science, June 1995. http://www.psg.lcs.mit.edu/papers/stride-tm528.ps Carl A. Waldspurger. Lottery and Stride Scheduling: Flexible Proportional-Share Resource Management, Ph.D. dissertation, Massachusetts Institute of Technology, September 1995. Also appears as Technical Report MIT/LCS/TR-667. http://waldspurger.org/carl/papers/phd-mit-tr667.pdf] --the current Linux scheduler (post 2.6.23), called CFS (Completely Fair Scheduler), roughly reinvented these ideas --basically, a deterministic version of lottery scheduling. less randomness --> less expected error. 4. Scheduling lessons and conclusions --Scheduling comes up all over the place --m requests share n resources --disk arm: which read/write request to do next? --memory: which process to take physical page from? --This topic was popular in the days of time sharing, when there was a shortage of resources all around, but many scheduling problems become not very interesting when you can just buy a faster CPU or a faster network. --Exception 1: web sites and large-scale networks often cannot be made fast enough to handle peak demand (flash crowds, attacks) so scheduling becomes important again. For example may want to prioritize paying customers, or address denial-of-service attacks. --Exception 2: some scheduling decisions have non-linear effects on overall system behavior, not just different performance for different users. For example, livelock scenario, which we are discussing. --Exception 3: real-time systems: soft real time: miss deadline and CD or MPEG decode will skip hard real time: miss deadline and plane will crash Plus, at some level, every system with a human at the other end is a real-time system. If a Web server delays too long, the user gives up. So the ultimate effect of the system may in fact depend on scheduling! --In principle, scheduling decisions shouldn't affect program's results --This is good because it's rare to be able to calculate the best schedule --So instead, we build the kernel so that it's correct to do a context switch and restore at any time, and then *any* schedule will get the right answer for the program --This is a case of a concept that comes up a fair bit in computer systems: the policy/mechanism split. In this case, the idea is that the *mechanism* allows the OS to switch any time while the *policy* determines when to switch in order to meet whatever goals are desired by the scheduling designer [[--In my view, the notion of "policy/mechanism split" is way overused in computer systems, for two reasons: --when someone says they separated policy from mechanism in some system, usually what's going on is that they separated the hard problem from the easy problem and solved the easy problem; or --it's simply not the case that the two are separate. *every* mechanism encodes a range of possible policies, and by choice of mechanism you are usually constraining what policies are possible. That point is obvious but tends to be overlooked when people advertise that they've "fully separated policy from mechanism"]] --But there are cases when the schedule *can* affect correctness --multimedia: delay too long, and the result looks or sounds wrong --Web server: delay too long, and users give up --Three lessons (besides policy/mechanism split): (i) Know your goals; write them down (ii) Compare against optimal, even if optimal can't be built. --It's a useful benchmark. Don't waste your time improving something if it's already at 99% of optimal. --Provides helpful insight. (For example, we know from the fact that SJF is optimal that it's impossible to be optimal and fair, so don't spend time looking for an optimal algorithm that is also fair.) (iii) There are actually many different schedulers in the system that interact: --mutexes, etc. are implicitly making scheduling decisions --interrupts: likewise (by invoking handlers) --disk: the disk scheduler doesn't know to favor one process's I/O above another --network: same thing: how does the network code know which process's packets to favor? (it doesn't) --example of multiple interacting schedulers: you can optimize the CPU's scheduler and still find it does nothing (e.g., if you're getting interrupted 200,000 times per second, only the interrupt handler is going to get the CPU, so you need to solve that problem before you worry about how the main CPU scheduler allocates the CPU to jobs) --Basically, the _existence_ of interrupts is bad for scheduling (also true in life) --------------------------------------------------------------------------- pre-midterm office hours 4:30-6:00 --------------------------------------------------------------------------- 4. Midterm review Ground rules --120 minute exam --at 110 minutes, you have to stay seated; do not get up and distract your classmates. --you must hand your exam to me (we are not going to collect them). the purpose of this is to give everyone the same amount of time. --at 123 minutes, I will walk out of the room and won't accept any exams when I leave --thus you must hand in your exam at time x minutes, where: x <= 110 OR 120 <= x < 123 --bring ONE two-sided sheet of notes; formatting requirements listed on Web page Material --Readings (see course Web page, the column called "Reading assignment") --Labs --Homeworks --Lectures --Operating systems: what are they? --goals, purpose --examples and history --privileged vs unprivileged mode --user-level / kernel interaction: how does the kernel get invoked? --by user programs (system calls) --by hardware interrupts --processes --what are they? (registers, address space, etc.) --how do they get created? fork()/exec() --context switches (when? how?) --process states --system calls --shell --threads --why threads? --use of threads --user-level vs. kernel --how implemented? swtch(), separate registers, separate stacks (which applies to both user-level and kernel) --blocking vs. non-blocking I/O --concurrency --hard to deal with; abstractions help us, but not completely --critical sections --mutexes --spinlocks --condition variables --monitors --lots of things can go wrong: safety problems, liveness problems, etc. --What's the plan for dealing with these problems? --safety problems: build concurrency primitives that get help from hardware (atomic instructions, turning off interrupts, etc.) and move up to higher level abstractions that are easy to program with --liveness problems: most common is deadlock, and we discussed strategies for avoiding it. other problems too: starvation, priority inversion, etc. --lots of trade-offs and design decisions --performance v. complexity --lots of "advice". some is literally advice; some is actually required practice in this class. --software safety (Therac-25) --PC architecture, x86 instructions, gcc calling conventions --PC emulation --scheduling --whatever we did last class and this one --Now questions from you all......