Operating Systems Design 1996-97 Fall Allan Gottlieb Text: Silbershatz and Galvin ------------------- Some administrivia -------------- All registered students are entitled to a sun acct (wwh 405) Reaching me Office Hours TT 4-5 email gottlieb@nyu.edu x8-3344 715 bway room 1001 Midterm & Final & Labs & HW Lab != HW Describe both Labs can be run on home machine ... but you are responsible Upper Left board for announcements ... but you are responsible. Handouts are low tech, not great (a feature!) ------------- End of Administrivia ---------------- Layers of abstraction used to hide details Applications and Utilities Libraries OS proper (kernel) Hardware All but applications are sys software. Sometimes called operating system OS raised the level of abstraction and hides details like devices from its users. OS, i.e. kernel, is itself layered Device (machine) independent device drivers machine specific code. Some kernels are more extensively layered with some abstractions implemented in terms of others. Filesystem may be on top of other abstractions. The OS is a resourse mgr (so users don't conflict) The OS is a control program (similar) OS goals: Convenience to user; efficiency. How is OS different from say compilers? Concurrency ! "The main difficulty of multiprogramming is that concurrent activities can interact in a time-dependent manner, which makes it practically impossible to locate programming errors by systematic testing. Perhaps, more than anyything else, this explains the difficulty of making operating systems reliable" Per Brinch Hansen, *Operating Systems Principles* 1973 Show failure. Interrupt handler puts request on list. Main line OS code removes. #items++ #items-- Homework: Fix the above failure History (VERY brief) Single User (No OS) Mention bendix G15 and IBM 1620 experiences paper tape, cards, delete key, scotch tape, etc Batch: Uniprogrammed, run-to-completion Resident monitor Online devices "Concurrent" I/O: Pipelined parallelism Offline pre- and post-processing (tapes) Spooling (disk) Multiprogramming Overlap CPU and I/O of a single job Multiple batches CPU Scheduling (which ready job to run MFT vs MVT Latter potentially more efficient but brings in serious mem mgt questions. Time sharing PREEMPTIVE job scheduling Often use (on-) demand paging (aka virt mem, but not by me) PCs and workstations Commodity OS Strong push for distributed systems and networked systems Multiprocessor OS Master Slave Symmetric Distributed systems Resource sharing Communication Speedup (not so clear) Reliability (not so clear) a distributed system is one in which a computer I've never seen can stop me from getting my work done--loosely quoted from somewhere Real-time systems Soft vs hard -- In the latter missing a deadline is fatal, perhaps literally Homework: 1.1 1.3 1.5 1.8 1.11 "why are dist sys not desirable?" End of Ch 1. ==================== End of Lecture 1 =========================== ==================== Start of Lecture 2 =========================== Homework 2.1 Eine Kleine Hardware (with appologies to Mozart) (assumes "hardware" is femine ... otherwise Ein Kleiner or Ein Keines) Figure 1 from book Correction Expansion Hardware reset Get part of system in a known state and trans cntl to known addr Bootstrap program/loader Get OS into system Interrupts Hardware (from controllers) Software (system calls; become traps) On interrupt, control is transferred to known location. There can be many different kinds of interrupts distinguished by Transferred addr (interrupt vector contains addrs) Some state (e.g. specific register in the case trap) Polling devices and asking Show asychrony with interrupts. Unsolicited subroutine call (especially hw interrupts). Disabling interrupts (solution to first homework problem) Interrupt driven OS sits in idle loop waiting for interrupt. I/O Synchronous gives favorable semantics but bad perf Async the reverse. Normal is for OS to use async and give user sync Implementing OS I/O (i.e. real I/0) Want to be async, i.e. many I/Os + computation active Can poll device Can use interrupts CPU gets data from controler one interrupt per buffer load Can use DMA to avoid many interrupts and lighten bus load Implementing user I/O (sync) Block user process Record status in device status table Field interrupt from device Make user process ready Storage Model is program + data in "main memory" Data accessed by loads and stores (and memory operands) Instructions fetched implicitly Doesn't all fit Volatile Secondary storage on-demand fetching Main memory Roughly 10MB-1GB Program and data resides here when accessed (at least is addressed here) I/O controllers have memory special instructions memory mapped I/O Disks sector track cylinder platter head arm arm assembly fixed head disk transfer rate 10-40 MB/S from cache; less from platter 100MB-10GB heads fly except on floppys amazing mechanical devices (1000 cylinders) disk has electronics to raise level of abstraction and caches disk controller (raise level of abstraction) and caches drum (good old days) optical (CDROM; DVD; writable; write once) Tapes Hierarchy Fast, Large, Cheap -- Pick 2 (maybe 1) Migration Caching Consistency Especially for MP systems DMA I/O conroller like another processor Homework 2.10 Hardware Protection Users from each other Users from themselves User/supervisor mode I/O is supervisor user traps to supervisor mode but not to user's code Homework 2.3, 2.4, 2.6 2.8 (just one difficulty), 2.9 ==================== End lecture 2 ============================= ==================== Start Lecture 3 =========================== NOTE: Prob 2.1 what is called buffering is normally called double buffering Memory protection Base/limit is simple but not general (contiguous) CPU protection I.e. prevent user to maintain control E.g. timesharing time slice Timer interrupt Context switch How a syscall happens User has scanf() in C scanf invoked Bunch of stuff for formatting Buffering Call read in libc, which is OS and machine dependent Set up regs say r0 has syscall # Do poof Case stmt based on r0 if not in buffer cache (assumed below) schedule I/O mark user process not ready call scheduler (perhaps not if in buffer cache) mark user process ready call scheduler move results from regs (OS dep) to where they belong (lang dep) return to scanf in stdio return to the user's C program Chapter 3 System Components Process management creation and deletion of processes suspension and resumption of procs block/unblock procs Synchronization / coordination Communication (IPC) Deadlock HOMEWORK 3.1 Memory management (Main-memory management) knowing what mem is assigned to what process Memory allocation Suspending processes for insufficient mem and resuming HOMEWORK 3.2 Secondary storage management Free space management storage alloc disk scheduling HOMEWORK 3.3 I/O management buffer cache generic device driver interface device drivers File management create/delete files and directories manipulating files (e.g. permissions, ownership, quotas) storage of files backup (is this part of OS?) HOMEWORK 3.4 Protection Networking Command interp (shell) Part of OS? OS Services program execution I/O filesystem communication resource allocation acct protection "controlled sharing" System calls The OS interface HOMEWORK 3.7 Process management fork/exec/exit/wait shell loop send and catch signals Memory mgt alloc/free book calls part of process control Files and filesystems (in unix this includes devices /dev) open/close read/write/reposition(seek) get/set attr mount Information get/set day/date get/set attr (e.g., creation date) Communication establish/tear down connection send/receive System Programs Many are UNPRIVILEGED remove a file, editor, list directory compilers Utility programs Could be part of command interpreter ================== End of Lecture 3 ======================== ================== Start of Lecture 4 ====================== NOTE: These class notes are now on the web http://allan.ultra.nyu.edu/gottlieb/os/class-notes HOMEWORK 3.8 System structure As we said before layering used to raise level of abstraction. Book give examples. MS-DOS and (orig) unix are simple Virtual Machines Give each process the illusion it has an entire machine Supply devices not present (assuming you program at a low level) Ask for a specific device get a virtual one Useful for debugging operating systems New OS runs in virtual supervisor mode Emulate one system on top of another intel on alpha win 3.1 on win 95 System design goals At highest level the goals are vague and like motherhood fast, robust, simple, cheap, correct, convenient, ... At more detailed level get hard tradeoffs Mechanisms and policy how vs what policy: users cannot delete the OS mechanism 1: file permissions mechanism 2: ROM micro-kernel provides (low-level) mechanism higher level processes implement policy HOMEWORK 3.11 Nowadays mostly written in "high-level" languages System Generation Configuration Bootstrap loading ---------- Part II, i.e. the "real course" ------ Chapter 4 Process management Process is an active entity. A program in execution. Terminology not standard (sadly). Will try to use process = task + thread(s) Standard picture with running/ready/waiting (blocked) start / terminate suspended (medium term sched) Process control block PCB Active entity now view as passive data When the process moves from blocked on a device to ready the PCB moves from the corresp device Q to the ready Q. accting/sched/mem info why the process is blocked process state (registers, condition codes, etc) Process switch also called context switch Big time state change. user --> kernel --> user in kernel save state of old process in PCB decide on new process to run this is called process scheduling. really short term scheduling (if on ready Q) Also have med term sched (degree of multiprogramming) Also have long term job scheduling. we will study scheduling policies soon restore state from pcb HOMEWORK 4.1, 4.2 Process Creation We did the unix fork/exec/wait/exit Parent can wait or keep executing (& in shell) VMS has "spawn" (fork+exec): new process is not copy of old Win/NT has both Get a tree of processes Termination What to do if parent terminates before child? Book is wrong about unix. Init inherits the children Process cooperation (coordination) A big topic that is one of my very favorites. Trivial example. Bounded buffer with ONE producer and ONE consumer This is shared memory communication shared in, out, buffer[n] loop -- producer produce an item while (in+1 mod n) = out do nothing; buffer[in] = item; in = in+1 mod n; loop -- consumer while in = out do nothing; item = buffer[out] out = out+1 mod n; consume the item Fails for multiple consumers or producers (in=in+1) Only permits n-1 items in buffer[n]. So useless for n=1 ================== End of Lecture 4 ======================== ================== Start of Lecture 5 ====================== Threads Lightweight process Unit of execution stack, registers, etc SHARES the data space with other threads in its TASK SHARES OS state like open files signals Plan 9 is more flexible, might not share these things. (ordinary, heavyweight) process is task+thread But can have many threads inside a task Faster to context switch to a peer thread than to another process. But it is still too slow for some cases Do not have to change memory map But do have to change protection domain twice user --> kernel --> user User mode threads are faster still Can have both kernel and user mode threads Want multiple kernel aware threads since many syscalls can be blocked HOMEWORK 4.5, 4.6 4.7 Interprocess communication (IPC) Message passing Send and receive Can have sender and receiver name the other process Can have sender name the received and receiver be told who sent Can use mailbox Buffering (sync vs async) Zero buffering (an important case) Synchronous semantics Rendezvous Unbuffered (Nonzero) buffering Send does not block unless capacity exceeded Async semantics What to do if capacity exceeded? Block send pretend infinite (throw some away) What to do if issue receive and no msg? Block receive Have receive return a failure msg Permit multiple receipients/mailboxes to be specified Lost/Garbled msgs Perhaps OK for OS to permit this (end-to-end checking) Can detect with acks and timeouts ---------------- The details begin ---------------- Chapter 5 CPU Scheduling Multiprogramming used to permit CPU / I/O overlap Short-Term CPU Scheduler (aka scheduler) Ready Q (might not be FIFO) Items on the Q are PCB Preemptive vs Non-preemptive Without preemption large cpu job can delay every job Sometimes preemption is not safe We saw this before (shared variable) Another question is if the OS itself is preemptable That is, can a syscall be interrupted What if OS working on a process and a higher prio proc becomes ready? Most OS's disable interrupts for some periods (hopefully short and bounded) HOMEWORK 5.2 The scheduler DECIDES which process to switch to. It implements policy. HOMEWORK 5.1 The dispatcher does the switching It implements a mechanism. Scheduling Criteria Contradictory CPU Utilization Really not the right criterion but often used It is good for telling you if you are CPU limited. Throughput Jobs per hour Need to fix a job-mix (workload) to compare sched policies Turnaround time Not the same as throughput Consider running short jobs first Waiting time (in ready Q) Poor criterion to use as figure of merit. System is good if wait is small. Does not matter where you wait. Response time Time to first response not end of output Predictable is good. DTSS Low variance good Max delay vs avg delay real time Algorithms FCFS--First Come First Served Sometimes considered to be no scheduling The simplist Non-preemptive SJF--Shortest Job First Aka SPN--Shortest Process Next Really means shortest "next CPU burst" first Consider case where each job has only one burst SJF has minimum waiting time If also assume that output STARTS right after burst, then it has the min response time as well. When you permit many bursts/process, this still is true but words are harder Difficulty is knowing the "next CPU burst" Can approx (guess) that it is the last burst Can use a waited avg of last n bursts Normally considered Non-preemptive (but not always) Preemptive variant is when you let new processes or newly unblocked processes to preempt the current process An "old" process cannot be eligible to preempt current, why? Called PSJF or PSPN or SRTF (shortest remaining time first) Can starve jobs with long bursts Priority Scheduling Generalization of SJF Preemptive or non-preemptive again depends on newly arriving or unblocked processes again has potential starvation priority aging internal or external (or both) priorities ================ End Lecture 5 ================ ================ Start Lecture 6 ================ ANNOUNCEMENTS Last week of oct: Office hours 3-5 WEDNESDAY!! Just that one week There is a list of students on the registrar's list on the www. I put version of class notes by lecture on the www. ONLY THE FULL VERSION WILL BE UPDATED Round Robin (RR) Quantum Timer interrupt Preemptive version of FCFS In limit (as quantum --> 0) becomes processor sharing (PS) How big to make quantum Shorter makes more responsive Longer is more efficient since less switching State Dependent RR RR but quantum depends on load HOMEWORK 5.3, 5.4 Selfish Round Robbin (SRR) "Accepted" procs run RR New Proc enters at prio 0 (low) and wait until its prio reaches that of accepted procs. New procs have prio increase at rate a >= 0 Accepted procs increase prio b >= 0 All accepted procs have same prio b >=a ==> FCFS b = 0 ==> RR a > b > 0 interesting HOMEWORK 5.7 (remind me to do it next week) Highest Penalty Ratio Next (HPRN) Priority = (time in system) / (running time) High priority goes first Another example of priority scheduling Aka Highest Response Ratio Next (HRN) Non-preemptive To make pre-emptive do not have to worry about new jobs (undefined ratio) but do have to compute when the current process won't be highest priority and set timer. Multiple Queues For example, batch vs. interactive or large mem vs small or paying customers vs courtesy users Treat each queue independently and then have some higher level procedure to arbitrate between queues. For example batch jobs get big quanta but rarely Processes stay in the same queue Multilevel feedback queues Several queues with different scheduling policies on each Idea is to get something like SJF by first going to the high level queue. Also have rare big quanta for batch jobs like multiple queues but we determine the "batch" jobs dynamically (internally). When does a process move to lower queue? Say when uses up a full quantum. When move to higher queue? Say when didn't use quantum. Worse queue is often FCFS This can starve other jobs in the lowest queue HOMEWORK 5.6 Scheduling Multiple Processors In the most tightly-coupled case SMP (symmetric multiprocessors), can have ONE ready queue with all processors accessing it. For large numbers of processors can get serious contention problems unless you have something fancy (e.g. NYU Ultracomputer). Sometimes some processes need to be scheduled together else one spin waits for the the other (unscheduled) processor. Gang scheduling Now OS is a parallel program. Can also restrict the OS to one processor and have it schedule the others. Called Master-slave OS. Better name would be Master-slaves. When the processors are not the same, cannot really have one process queue. Sometimes want jobs to stay on the same processor even when the processors are identical (cache effects). Sometimes have networks of autonomous machines with each with its own scheduler (indeed its own OS). Then clearly can't share ready queue. ================ End Lecture 6 ================ ================ Start Lecture 7 ================ Do 5.7 page 160 Midterm will cover through chapter 6. Really section 6.7 as we are skipping the rest of chapter 6 Real-time scheduling Have DEADLINES Hard realtime vs soft realtime For hard realtime normally reserve resourses, for example determine in advance when and for how long each process will run. Want to bound delays. So put preemption points in long syscalls. For soft realtime have serious external priorities. Priority inversion. A > B > C. B running. C holds resource needed by A Soln. Temporally give C the priority of A until it releases resource. Evaluating Algorithms Fixed workload (how choose it) Analytical modeling (queuing theory) Arrival rates service times Little's formula Simulation Not determanistic in that use pseudo random numbers say for cpu burst, I/O wait, etc. HOMEWORK 5.8, 5.9, 5.10 ---------------- End of Chapter 5 ---------------- ---------- Start of Chapter 6 Process Synchronization ------ Error with concurrently incrementing and decrementing a shared variable. Critical Section Problem loop Entry section Critical section Exit section Remainder section (non-critical section, NCS) solution requires Mutual exclusion for the critical sections Progress Weaker than book If no process in CS and >0 processes in entry, eventually will have one process in CS Book adds that processes in the NCS cannot effect decision of which process will get into critical section. This is false for the book's soln using test and set. Desirable Properties Some kind of fairness Book (actually requires) bounded waiting, i.e. after process P enters the entry section it will enter the critical section after a bounded number of exectutions of the critical section by other processes. Real nice is linear waiting, the bounded number above is just a multiple of the number of processes. Even better is if the multiple is 1. Even better still is FCFS Solution to Critical Section Problem (some wrong) Soln 1 for 2 processes i and j; code is for i; code for j analogous loop while turn != i CS turn <- j NCS WRONG doesn't satisfy bounded waiting requires strict alternation Soln 2 for 2 processes loop flag[i] <- true while flag[j] CS flag[j] <- false NCS WRONG lock step deadlocks Soln 3 for 2 processes loop while flag[j] flag[i] <- true CS flag[j] <- false NCS WRONG lock step lets both in Soln 4 for 2 processes loop flag[i] <- true turn <- j while (flag[j] & turn=j) CS flag[i] <- false NCS This one is correct and clever. A sensation. Previous solns were MUCH harder Proof of mutual exclusion in book is incomplete. Here is a different (and complete) proof. Write on board two versions of code (for proc 0 and 1) Label the while loops A and B Assume both in CS Assume turn is 1 (turn=0 similar) So turn=1 when 0 was at A So flag[1]=false @ A So when 0 was at A, 1 was in NCS So 1 cannot get into CS ==><== Bakery algorithm for N processes Normal bakery uses fetch-and-add We fake it with (non-atomic) max (a,b) < (c,d) means lexicographically (not atomic) loop choosing[i] <- true number[i] <- max(number[0],...,number[n-1]) + 1 choosing[i] <- false for j = 0 to n-1 while choosing[j] while number[j]!=0 & (number[j],j)<(number[i],i] CS number[i] <- 0 NCS Mutual exclusion Assume A in CS when B wants to enter Book correctly claims B will find A smaller and waits, but why can't B think B < A? When A checked, A < B. A hasn't changed since then If B checkes after A checked, only trouble would be if B got smaller (A stayed the same). But can't (look at code). Similarly, if B checked after A checked, B stays the same and A can't get smaller Easy to see fair (A can't get out and back in while B is waiting) Hardware assist Test-and-set(x) oldx <- x x <- true return oldx loop while TAS(lock) CS lock <- false NCS This does NOT satisfy condition that threads in NCS can't affect who someone trying to get in. The thread in NCS can finish NCS and contend and win. swap(a,b) olda <- a a <- b b <- olda loop key <- true while key swap(key,lock) CS lock <- false NCS Skip bounded waiting mutual exclusion with test-and-set ---------------- Semaphores ---------------- Integer variable that is initialized and then only accessed by wait and signal, which are ATOMIC! wait(S) while S < 0 S <- S-1 S-- in C signal(S) S++ loop wait(mutex) CS signal(mutex) NCS TO IMPLEMENT THE ABOVE DEF OF WAIT AND SIGNAL YOU NEED MUTUAL EXCLUSION HOMEWORK 6.6 The above is a busy waiting def. A busy-waiting mutual exclusion lock is called a spinlock Now do process switching (aka blocking) semaphore wait(S) ***** this must be atomic S.value-- if (S.value < 0) add proc to S.L block signal(S) S.value++ if (S.value <=0) remove a process P from S.L wakeup P ================ End Lecture 7 ================ ================ Start Lecture 8 ================ NOTES FOR LAB1: 1. If several processes are waiting on I/O, you may assume noninterference. For example, assume that on cycle 100 process A flips a coin and decides its wait is 6 units and next cycle (101) process B flips a coin and decides its wait is 3 units. You do NOT have to alter process A. That is, Process A will become ready after cycle 106 (100+6) so enters the ready list cycle 107 and process B becomes ready after cycle 104 (101+3) and enters ready list cycle 105. 2. PS (processor sharing). Every cycle you see how many jobs are in the ready Q. Say there are 7. Then during this cycle (an exception will be described below) each process gets 1/7 of a cycle. EXCEPTION: Assume there are exactly 2 jobs in RQ, one needs 1/3 cycle and one needs 1/2 cycle. The process needing only 1/3 gets only 1/3, i.e. it is finished after 2/3 cycle. So the other process gets 1/3 cycle during the first 2/3 cycle and then starts to get all the cpu. Hence it finishes after 2/3 + 1/6 = 5/6 cycle. The last 1/6 cycle is not used by any process. This shows that PS is not so easy to simulate (it is the easiest to analyze). For this reason I have moved it from required to extra credit 2. PS moved from required to extra credit How get atomicity. For uniprocessor, disable interrupts For multiprocessor, harder Deadlock No thread can make progress. Simple example is using two semaphores in opposite order in different threads. Different from livelock and starvation Subject of chapter 7 Binary vs Counting Semaphores What we did were counting semaphores. Special case where semaphore can only take on two values is called a binary semaphore The code in the book for implementing a counting semaphore using just binary semaphores is clearly wrong since it has signal(S2) but no wait(S2). I guess the signal(S1) on p180 should be wait(S2). I prefer a different algorithm, but we will skip it. Easy trick with two BINARY semaphores to get alternation. Bounded Buffer Problem (aka Producer Consumer) When the two semaphore trick is used with COUNTING semaphores (same initial value) get bounded buffer loop produce item wait(empty) wait(mutex); insert item in buffer; signal(mutex) signal full loop wait(full) wait(mutex); remove item from buffer; signal(mutex) consume item signal(empty) Readers writers problem Readers permit concurrency Writers demand exclusivity Solutions in this course slightly serialize readers The following solution can starve writers (and readers if the semaphore is not fair). There are (writer-priority) variants that don't, can but starve readers. There are fair variants that won't starve any process. loop wait(writer_sem) *** binary semaphore write signal(writer_sem) loop wait(#readers_sem) *** binary semaphore #readers++ if(#readers) = 1 wait(writer_sem) signal(#readers_sem) read wait(#readers_sem) #readers-- if (#readers = 0) signal(writer_sem) signal(#readers_sem) Dining Philosophers Problem 5 philosophers Each has an infinite loop of (think; hungry; eat) round table with chopstick between phils and rice in middle philos must get two chopsticks to eat natural algorithm (get left; get right; eat; down left; down right) deadlocks get is wait down is signal (kludgy fixes) Max 4 can sit down Crit sect to see if both sticks avail and pick up both need fair CS or can livelock better to use reader-writer with upgrade to writer Different phils act differently (e.g. even numbered phils start with right) ---------------- Higher level concurrency control ---------------- Critical regions shared v ... region v ... end Conditional critical region New stmt. await bool-expr If bool-expr false, Switch to new task with mut exclusion dropped for current task When return to current task assure bool-expr true If permit await anywhere, user must assure that whatever was done prev inside region is still true. If permit await only as first stmt, don't have above concern. Book calls this case critical region and has slightly different syntax. When should one check if bool-expr is true? Not trivial, we skip. ADA Rendezvous Monitors Handout picture (on web as well) Encapsulate date and operations Ada package; C++ class Mutual Exclusion Fifo (so fair) "Condition" variables with wait/signal operations If wait gets bad news process leaves monitor but gets higher prio to reenter than new monitor call On signal who runs? Common is to run waiter but place signaler in highest priority (urgent) queue. Don't have this problem is signal is last stmt. For nested monitors, when wait on inner do you release outer. HOMEWORK 6.7 PSEUDOcode 6.12 ================ End Lecture 8 ================ ================ Start Lecture 9 ================ SEE THE WEB EXAM NEXT WEEK HOMEWORK ASSIGNED TODAY DUE IN TWO WEEKS ---------------- Chapter 7 Deadlocks ---------------- Already defined. Permanant waiting of subset of processes. A set of processes each waiting for an event that only another process in the set can cause. Simple example with two binary semaphores S and T. Correct loop loop wait S wait S wait T wait T CS CS signal T signal T signal S signal S NCS NCS Simple example with two binary semaphores S and T. Incorrect loop loop wait S wait T wait T wait S CS CS signal T signal T signal S signal S NCS NCS HOMEWORK 7.1 7.2 View the semaphore as a resourse. A REUSUABLE resourse. Other examples are memory (allocate, free) and files (open, close). Pattern of interaction with resourse is Request Use Release This is the user view The resourse manager sees requests and releases. It issues assign/allocate and deassign/deallocate. Since all the res managers we study always respond to a release by doing the deassign/deallocate, we don't mention the deassign again I will use term allocate instead of assign. It is standard. Do NOT assume it means we are dealing with memory. From the resourse point of view it sees Request Allocate Use Release (followed by deallocate, but we won't mention it) HOMEWORK 7.9 have me do this in two weeks Picture. Resource allocation graph. Fairly std terminology. Circles for processes Rectangles for resources Arrow from process to resource for (pending) request Arrow from resourse to process for allocated resourses What can the user do? Add arrow from process to resourse Erase arrow (release) In some sense the manager is doing these What can the manager do? Reverse an arrow so it points towards the process Draw the pictures for the simple example (correct and incorrect) Can have multiple units of a resourse Draw as dots inside the box Allocation edges emanate from a dot Request edges go to the box (any dot will do) Necessary but NOT (repeat NOT!!) sufficient conditions for deadlock Due to Coffman and Havender 1. Mutual Exclusion 2. Hold and Wait 3. No preemption 4. Circular wait HOMEWORK 7.4 Strategies for dealing with deadlocks Ignore Prevent Avoid Detect (and recover) Ignoring the problem Ostrich algorithm Quite common Reasonable if deadlocks are so rare that not worth the extra coding and performance loss. Prevent Prevent one of the necessary conditions Conservative since conditions are NOT sufficient 1. Mutual Exclusion This is sharable resourses. Not applicable in general. 2. Hold and Wait Can't request if already hold E.g. All requests at beginning Can permit "phases": Release ALL at end of phase Not wonderful Low utilization Possible starvation for requester of popular resourse Can prevent starvation by stopping alloc of resourses requested by starver. But this lowers utilization more. 3. No Premption When make a request that can't be satisfied, lose all the resourses you already have Useful for resourses that can be taken and easily restored. Memory is possible Printer is not 4. Circular Wait Order the resourses and require allocations in order Variant: before request release any with bigger number Avoid Permit the necessary but NOT sufficient conditions. Tiptoe carefully through the state space to avoid deadlock states Typically need each process to predeclare the MAX number of each type of resourse it will need. Key concept is SAFE STATE A state is safe if the manager can guarantee that it can complete all the current processes (without deadlock) Does NOT mean deadlock can not be made to happen. From the initial state, a stupid manager can permit deadlock to occur. Safe means that the manager, BY CHOOSING WHAT REQUESTS TO GRANT WHEN, can keep deadlock from occuring. Unsafe state is one that is not safe. Does NOT mean deadlock will happen. Each process might be planning to release all its resources and then terminate. Unsafe means that no matter what the manager does there is a sequence of possible future actions that does lead to deadlock. Not strictly true the manager can prevent deadlock by simply not ever granting any future requests. Technically this isn't deadlock, but it is just as bad. Draw Venn dia for safe, unsafe, deadlocked states Take the simple example at beginning of lecture (2 semas) For good code all possible states are safe. For bad code get to point where manager can make a bad decision and get into unsafe state and from there cannot avoid deadlock. Example from book A good example 12 units of resource 3 processes proc claim holding 0 10 5 1 4 2 2 9 2 This is a safe state 3 free resourses Can arrange for P1 to finish Can THEN arrange for P0 to finish Can THEN arrange for P2 to finish You might think that with 3 free resources and P1 only needing 2, we have some slop (i.e. far from unsafe state). FALSE!! We are right on the edge. Let P2 request one unit and grant it. Now have proc claim allocated 0 10 5 1 4 2 2 9 3 This is UNSAFE. Assume now each processes requests the rest of its claim. P0 requests 10-5=5 P1 requests 4-2=2 P2 requests 9-3=6 You have 12 - (5+2+3) = 2 units avail You can give to P1 and wait for it to finish. Then you have 4 free, which is NOT enough for either P0 or P2. Resource-allocation graph algorithm Put in dased lines for claim edge pointing like request When make a request turn line solid When make a release put back claim Manager does not grant request if get a cycle (dashed lines count) Show for simple example (2 semas) bad code Works only for SINGLE UNIT resources since only one claim Banker's algorithm Dijkstra Works for general case Like a banker with money (kinda sorta) Available[j] is the number of units of j currently avail Claim[i,j] is the claim of process i for resource j Allocated[i,j] is the number of units of j that i has now MightStillNeed = Claim - Allocated X <= Y means X[i] <= Y[i] all i X < Y means X <= Y and X != Y Does NOT mean X[i] < Y[i] all i Algorithm for safety 1. (Initialize) CanAssign = Available; Finished[i] = false all i 2. (Find a process we can guarantee will finish) Find an i such that Finished[i] = false MightStillNeed[i] <= Available If no such i, goto 4 3. (Update state assuming i finished) Finished[i] = true Available += Allocated[i] 4. (Could everyone finish) If Finished[i] = true all i, safe; otherwise unsafe Manager's algorithm when receives a Request from i 1. (Check for legality) If NOT Request <= MightStillNeed[i], error 2. (Check if currently possible) IF NOT Request <= Available, must wait (exit algorithm) 3. Try the allocation Available -= Request Allocated[i] += Request MightStillNeed[i] -= Request If new state is safe grant, else make wait and undo Managers algorithm when receive a Release from i Book "forgot it" so HOMEWORK Give manager's alg for release HOMEWORK 7.6, 7.11 Do the good example (P0, P1, P2) Detect (and prevent) When single unit resources just need the WAIT-FOR graph Take resource-alloc graph and remove the resource nodes When have edge going in and out of resource node draw edge from source process to sink. This is just right! Get edge iff Pi currently "waits for" Pj Deadlock iff (if and only if) have cycle For multiple-unit resources need an alg like bankers. We skip it How often should we look for deadlock Not clear. Look more often if deadlock more likely What to do if find a deadlock Shoot 'em all Shoot 'em one by one until deadlock gone Preempt a resource if possible Mixed solutions Idea is different resources are handled differently Example Internal to OS resources (e.g PCB, open file table, etc) Prevent deadlocks by ordering requests. Feasible since have total control Central memory Prevent via preemption Feasible since memory state can be restored Job resources Avoidance if jobs pre-declare max Detection and recovery (probably shooting) if not Swap space Same as above job resources If know exact need (not just max), preallocate ================ End Lecture 9 ================ ================ Start Lecture 10 ================ ---------------- Memory Management Chapter 8 ---------------- Address binding Compile time Primitive Compiler generates absolute addresses Requires knowledge of where compilation unit will be loaded and run Rarely used (MSDOS .COM files) Load time Compiler generates relocatable addresses for each compilation unit Linkage editor converts to absolute addr by adding address where unit will load resolves inter-compilation-unit addresses Misnamed loader (ld) by unix Job cannot move Execution time Dynamically during program execution Needs hardware to help with this VIRTUAL to PHYSICAL address translation More later Dynamic Loading When executing a call check if loaded. If not, call linking loader to load it and update user tables Slows down calls (indirection) unless you rewrite code dynamically Dynamic Linking Normal linking is now called static (i.e. statically linked library) For dyn linked library, the routines are just trivial stubs that, when executed, check to see if a copy is already present and load one if it is not. In any case call patched up to go to full routine. Shared libraries (saves memory) A new bug fixed library is immediately used. A new bug-introduced library is immediately used. Needs OS help since two unrealed users accessing the same part of memory (more on this later). Overlays "My" era of programming. Fully under user-mode control. Human user controls when overlays are brought in. No longer used for gen'l purpose computing. Logical and Physical Adresses What the user sees vs what transistors (or capacitors) are used Simple example is relocation (a.k.a base) register Its value is ``added'' to every logical address This is execution time binding. Needs some hardware to translate virt to phy addresses Often called an MMU (mem mgt unit) Simple example is relocation (a.k.a base) register Its value is ``added'' to every logical address HOMEWORK 8.1 ``Honest to goodness'' Swapping Entire job is either in memory or not Bring back to same place unless have execution time binding Version 7 unix did this (Actually a v7 unix job had 3 segments and segments were swapped) Jobs (segments) not brought back to same place (MMU) Swapping Not true swapping if already have job on backing store due to demand paging. Then swapping out means paging out (More late) Fixed partitions At boot time divide real memory into partitions. Run a job in each. Separate job queues for each partition. Relocation register (and limit reg) sufficient MMU. IBM OS/MFT, Multiprogramming with a Fixed number of Tasks (early 360 OS) Can have big INTERNAL fragmentation, i.e. unused space within an allocated region (partition). (book is misleading here it says EXTERNAL but means that only for MVT) Variable Partitions IBM OS/MVT OS records which regions of memory are allocated and which are avail. Available regions called holes Number of partitions and their sizes vary dynamically Cute "Boundary tag" algorithm to keep track (NOT covered) When a memory request comes in, pick a big hole First-fit Normally circular (i.e. start looking where you left off) Best-fit Worst-fit HOMEWORK 8.2 8.5 If no hole big enough? If not enough mem, swap or wait or whatever but can't complain If enough mem but not in ONE hole, complain Called EXTERNAL fragmentation (outside any allocated region) HOMEWORK 8.4 Compaction Requires runtime binding Can be done with roll out / roll in, i.e. swapping with new location different from old. For these schemes the PHYSICAL memory for a job is contiguous What if the PHYSICAL memory for a job can be noncontiguous? Book says LOGICAL, must be a typo Paging Divide the logical address space into FIXED size pieces called pages. PAGESIZE is a power of 2, about 4KB Divide the physical addr space (i.e. real mem) into page frames. Same size as pages Page frames often called simply frames Map each page to a unique frame (same size) Need a PAGE TABLE to say which frame for each page Divide logical addr into page number and page offset (displacement) Often called p,d or p#,o or p#,d The page table is indexed by p# and gives f# (frame number, aka f) The logical addr p,d becomes f,d HOMEWORK 8.7 Can have page 2 assigned to frame 20 and page 3 assigned to frame 10 and page 4 assigned to frame 1111. So the physical addr space is definitely non-contiguous No external fragmentation since all frames are the right size Can have internal frag for last page of region. OS needs to know which frames are free Can have more frames than pages HOMEWORK 8.8 Next chapter will see more pages than frames. What hardware is needed for page tables? Simplest is to just have a page table in memory for each process and to have a single PTBR (page table base register) in the processor. This register, like R5, must be saved on context switch requires 2 memory accesses for each logical access Hopeless Have a "few" table entries in the processor Called TLB (translation look-asside buffer) or TB ASSOCIATIVE lookup on addr ``fancy'' hardware Normally used as a cache (sometimes entries pinned) Flushed when new page table is active (context switch) Some can be avoided with different organizations (not covered) Normally get > 95% HIT RATIO HOMEWORK 8.10 ================ End Lecture 10 ================ ================ Start Lecture 11 ================ Protection Put bits on each PTE (page table entry) Keep length of addr space, i.e. size of PT PTLR (PT length register) If p# > PTLR, trap Reducing contiguous size of PT PTLR as above reduces the overall size Multilevel paging By paging the page table the PT need not be contig Can have a bunch of levels Will see next chapter that can actually reduce the overall memory size of the PT this way (not all the PT is then mem resident). Inverted page table It is a page frame table: indexed by frame# it gives P# and PID (process ID) Subtle coding used in IBM RT/PC More complicated than in book We dealt with this in our research Not covered (beyond definition) Shared pages Two PTEs point to same page RISKY if read/write HOMEWORK 8.11 Segmentation User VISIBLE division of virtual addr space Variable size pieces Now addr is s#,offset HOMEWORK 8.16 Sample segments global variables Procedures Segment table indexed by s# entry contains size (limit) and starting phy addr (base) STBR points to beginning of ST STLR gives length of ST Implementation Naive is like naive paging, two memory refs, hopeless Figure 8.23 has bug: Arrows from d and s reversed Again use TLB Protection and sharing More natural than for paging since the seg boundaries are logical divisions in the program not just where a 2K bndry occurred. Not trivial to do sharing since must get agreement on seg number because the seg quite possibly points to itself (code has jumps) PC relative addresses are fine Address with s# in a reg are fine Address with s# in displacement are NG HOMEWORK 8.14 Suffers from external fragmentation since variable size Segmentation + paging STE (segment table entry) contains size and PTBR (pointer to PT) Addr is still s#,off Figure 8.26 Since offset is paged it is really p#,off, so addr is really s#,p#,off Three memory refs for naive implementation TLB (In fact the ST could be paged so s# is really two components and we have a 4 part addr). HOMEWORK 8.12 ---------------- Chapter 9 Virtual Memory ---------------- Not efficient use of memory to keep an entire job loaded while it runs Some code rarely (if ever used) Some data structures larger than needed Some data used only in certain phases of the program Virtual Memory: separation of user logical memory from physical memory Commonly implemented by demand paging Indeed common usage is to equate demand paging and virt mem Could have demand segmentation OS/2 on 286 (not 386) does this ---------------- Demand Paging ---------------- First Used in Atlas computer (Univ Manchester) All pages assigned to a disk block Only some are RESIDENT, i.e. assigned to a page frame Add a "valid" bit to each PTE If valid bit set, treat as in chapter 8 If valid bit not set, the page is not resident, it is only on disk Must know the disk block Could store in PTE instead of mem addr (if entry big enough) Could store all pages contiguously on disk Could store all static mem (known at load time) contiguously on disk Find a free frame (what if none exists? later_ Read disk block into this frame Really schedule the I/O and block the process Make the process ready after I/O completes Now it is back to normal HOMEWORK 9.1 9.3 Some instructions can generate MANY page faults Instruction could straddle page boundary Can reference several memory operands Memory operands could straddle page boundaries or could be BIG RISC machines do not do much of this Potential DISASTER Restart (rather than resume) instruction after TLB miss is satisfied Have fewer TLB entries than max number of misses Oops Performance impact Cache hit < 10 nanoseconds Page hit < 100 nanoseconds Page miss > 10 miliseconds > million cache hits > 100 thousand page hits Finding a free frame (from above) Good to keep a bunch free at all times so don't have to wait When analyzing number of misses generally assume you don't keep a bunch free HOMEWORK 9.2 When too few are free Choose a victim (how? later) Write victim to disk if dirty Mark victim's PTE as invalid Add frame to free list HOMEWORK 9.9 (Ask me next time to give answer) ================ End Lecture 11 ================ ================ Start Lecture 12 ================ Lots of email answers on web about banker ---------------- Page Replacement Algorithms ---------------- (Called choosing a victim above) Guiding Principle is locality Temporal Locality: Word referenced now is likely to be used in the near future Spacial Locality: Words near current reference likely to be used in the near future Locality suggests choosing "stale" pages for victims Spacial Locality also suggests using largish pages and prefetching Clean pages are cheaper victims Why? Don't need to write back to disk Maybe not cheaper beacause they might be heavily used (eg code) HOMEWORK 9.4 Random (replacement) Ignore principles Used for comparison purposes Optimal (Belady min) Choose as victim, page whose next ref is furthest in future Not implementable without a crystal ball or rerunning program Provably optimal Used for comparison purposes FIFO Victim = page whose last loading into memory was furthest in the past Amazing anomaly (Belady's) Can have MORE faults with MORE frames Try 1 2 3 4 1 2 5 1 2 3 4 5 with three and four frames LRU Least Recently Used Victim = page whose last reference was furthest in the past HOMEWORK 9.11 Works well Does not have belady's anomoly (neither does Optimal) HOMEWORK 9.13 (ask me in class next time for answer) Hard (i.e. expensive) to implement Store timestamps in PTE and search for oldest You must be kidding Doubly link PTEs as a stack, newest on top Two extra pointers per PTE Pointer updating for MANY memory refs Basically hopeless w/o hardware help Simple Approximation to LRU (Not Recently Used--NRU also NUR) A little hardware help: ref bit in PTE set on ref Choose a victim (at random) with ref bit NOT set Start next victim search where this one left off (clock) When do you clear ref bits? Periodically clear all Every page fault (or kth fault) clear all When all ref bits are set clear all When clock passes clear this one bit Enhanced NRU algorithms Can have two bits, ref and dirty View dirty as HOB, ref as LOB Choose a victim (at random) with lowest value for bits Can have k ref bits (plus dirty if desired) Each ref right shift current ref bits and set HOB Choose a victim (at random) with lowest value for bits Second chance algorithm Fifo (clock) but check the ref bit If bit not set, evict If set, unset bit and move to next page At worst go all the way around then the original choice will be selected since its bit is off Enhanced Two bits one for ref one for dirty Look for best (unref, clean) if none try next class etc HOMEWORK 9.6 Count References LFU (Least Frequently Used) MFU (Most Frequently Used) Good to replace heavily used (locality) Good to replace rarely used (just brought in) Sounds bogus Not used in real systems Page Buffering Keep pool of free (or at least clean pages) Whenever nothing better to do, write out a dirty page and mark it clean Allocating frames to processes Give each the same number Give proportional to virtual mem size of process Use some (external) priority as influence To each according to its need (marx) (see ws below) HOMEWORK 9.18, 9.20 Global vs Local Replacement Must victim come from same process as beneficiary? Local is "fair" but global works better Working Set Model (Denning) Can/will thrash if multiprogramming level MPL too high To each according to its need w (omega) the working set window size W(t,w) = { pages refed from t-w to t } is the working set w*(t,w) = | W(t,W) | (w* is w distinguished from w=omega) Choose w (not very sensitive) Adjust MPL so that working set is in memory That is SUM over all (non-suspended) processes of w < # frames Medium term scheduling Believed to work well but expensive to implement (keeping track of W) Approximations to working set PFF (Page Fault Frequency) If the faulting freq is too high a process needs more frames If all processes need more frames, lower MPL WS clock Like NRU On a fault start circular scan If used, set unused and record time If unused and old (time set above), remove If unused and new, skip over If all pages unused and new, reduce MPL Demand segmentation Makes sense Harder to implement External Fragmentation Was used by OS2 on 286 ---------------- End Chapter 9 ---------------- ---------------- Start Chapter 10 File System Interface ---------------- File - Named collection of (hopefully) related data File attributes often called metadata Name Type Location (on secondary storage) Size Protection Timestamps, user id File Operations Create Reading Writing/appending Delete Truncate (remove data keep metadata) Seek (could be part of read/write) Open get handle for the file Close Get/Set attributes ================ End Lecture 12 ================ ================ Start Lecture 13 ================ HOMEWORK 10.1, 10.7 File names and types Should extenstions determine the file type? Works well when it works Works poorly when it doesn't Good for hints OS needs to know if executable HOMEWORK 10.5 Structured files Should it be part of the supervisor-mode OS Unix says no MVS says yes Accessing file daya Sequential Access (common for tape) Direct Access Indexed files (databases) HOMEWORK 10.9 Directory Mapping of names (subdirectories and files) to locations Written as side effects of file ops, e.g. create delete Subdirectories give a tree. Normally permit a DAG NON-unique names! Hard vs symbolic links Cycles can be a problem Can prevent by not forbidding links to directories In unix hard links can't be to directories but symlinks can HOMEWORK 10.6 File protection (a.k.a permissions) Simplest is no protection Also easy is distinguish operations (r/w/e append delete) and treat all users the same (no "users") Also easy is just owner can access Unix has owner, group, other Most general is a protection (a.k.a. permission) matrix Row for each user Column for each file Entry gives access for that file to that user Read a row and get capability list for the user Read a column and get access list for a file ---------------- Chapter 11 File-system Implementation ---------------- Storing a file Obvious method is CONTIGUOUS ALLOCATION No seeks to read or write the next block External fragmentation Holes Sound familiar ? The analogue of paging is called INDEXED ALLOCATION Problem is files often grow and shrink (segments grow/shrink much less often) How grow the index (page table) CP/M disk block numbers (PTEs) stored in directory entry If need more blocks than fit in a dir entry, make a second dir entry Unix method is direct / indirect / double-indirect blocks Store direct block numbers, block number of indirect block and block number of double indirect block in the "inode" (index-node, information-node). Dir entry points to the inode. Hard links point to the same inode LINKED ALLOCATION Dir entry points to first block. Each points to next MSDOS similar FAT (file alloc table) contains one entry for each disk block. Dir entry gives first block number. Corresponding FAT entry contains next block number Last FAT entry for this file contains EOF HOMEWORK 11.1 Log Structured Filesystems (Not covered; not in book) Increased caching increases the write percentage. Writes are small so inefficient Gather up writes and write to end of "log" i.e. don't rewrite block Eventually need to clean (garbage collect) Free-space management Bit vector Linked list of free blocks HOMEWORK 11.2 Directory implementation Natural is linear list of entries Hashed is faster for search Recovery (skip) File system check Backups ---------------- Chapter 12 Secondary Storage Structure ---------------- Disk (Arm) Scheduling FCFS -- Simple but large seeks Pick -- FCFS but pick up any on the way SSTF (Shortest Seek Time First) Greedy algorithm Can starve Blocks at edge less well serviced HOMEWORK 12.7 SCAN (LOOK, Elevator) Jukebox algorithm Favors middle Book says SCAN always goes to edge, LOOK turns around when no more to do. I use SCAN and LOOK to mean turn around when no more to do. C-SCAN (Circular Scan) Only go in one direction; then go back to beginning Doesn't favor middle N-step SCAN Do not service requests arriving during current scan HOMEWORK 12.1 (is 12.1a true for algs I added?), 12.2, 12.4 Disk Latency Scheduling On a given track (or cylinder) use SCAN (same as C-SCAN) HOMEWORK 12.9 ************************************************************* * * IGNORE all this stuff * * Local Variables: * mode: text * indent-line-function: indent-relative * indent-tabs-mode: nil * tab-width: 4 * tab-stop-list: (4 8 12 16 20 24) * End: * **************************************************************