Class 19
CS 372H
03 April 2012

On the board
------------

1. Last time
2. LFS, continued 
3. Scheduling
    --intro
    --disciplines
    --lessons and conclusions

---------------------------------------------------------------------------

1. Last time

    LFS:

    A. Intro
    B. Finding data
    C. Crash recovery
    E. Discussion

    [Also, welcome to Prof. Vladimir Lifschitz and Prof. Al Mok.]

2. LFS

    * a bit more about crash recovery

    * garbage collection

    
    C. Crash recovery, continued

	--recovery wrinkle #2: directory and inodes may not be consistent:

	    --dir entry could point to inode, but inode not written

	    --inode could be written with too-high link count (i.e.,
	    dir entry not yet written)

	    --so what's the plan for fixing this?

	    --log directory operations:

		[ "create"|"link"|"rename"
		  dir inode #, position in dir
		  name of file, file inode #
		  new ref count ]

	    --and make sure that these operations appear before the
	    new directory block or new inode

	    --for "link" and "rename", dir op gives all the info needed to
	    actually complete the operation

	    --for "create", this is not the case. if "create" is in log,
	    but new inode isn't, then the directory entry is removed on
	    roll-forward

	[later on in the course, we will discuss crash recovery in a bit
	more detail.]

    D. Garbage collection (cleaning)

	--what if log fills up? then we're in trouble. to avoid this,
	need to do *cleaning*. 

	    --approach: basically, compress the log, and leave some free
	    space on the disk

	--use segments: contiguous regions of the log (1 MB in their
	implementation)

	--two data structures they maintain:
	    
	    --in-memory: segment usage table:
		[<seg>  <# free bytes> <most recently modified time>]

	    --on disk, per-segment: segment summary 
	    
		(a table indexed by entry in the segment, so first entry
		in the table gives info about first entry in segment)

		[ <type> <file (or inode) #> <block #> <vers #>]

	--okay, which segments should we clean? and do you want the
	utilization to be high or low when you clean?

	    --observe: if the utilization of a segment is 0 (as
	    indicated by the segment usage table), cleaning it is really
	    easy: just overwrite the segment!

	    --if utilization is very low, that's a good sign: clean that
	    segment (rewriting it is very little work)

	    --but what if utilization is high? can we clean the segment?

		--insight: yes! provided what? (provided that the
		segment generally has a lot of "cold", that is
		unchanging, data). the insight is that:

		    --because the segment is cold, it's never going to
		    get to a low utilization

		    --at the same time, because it's cold, you're not
		    wasting work by compressing it. the segment will
		    "stay compressed".

	    --they analyze bang for the buck: how long after compressing
	    will the data stick around to justify the work we did to
	    clean and compress it?

		--(benefit/cost) = (1-u)*age/(1+u)

		--cost: 1+u (1 to read in, u to write back)

		--benefit: "1-u" are the blocks we're taking back,
		    times "age". The "age" is an estimate of how long
		    the compacted blocks will stay compact. It may seem
		    counter-intuitive to multiply these together, but
		    this particular metric is just a rough guide
		    anyway. The idea is that there are two
		    factors that matter: how long the compacted blocks
		    will stay compact (captured by age) and how many
		    blocks we actually got by compacting the segment
		    (captured by 1-u). The notion that we "keep" the
		    blocks for their "age" (as stated in the paper) isn't
		    really right in literal terms because we don't know
		    anything about what will happen once the blocks are pressed
		    back into service. However, this metric captures the idea
		    that it's worthwhile to compact old data, even if it's in a
		    segment that is highly utilized, because it will be a while before
		    the old blocks need to be re-compacted.

		--figure 7 is telling us what effect the cost-benefit
		policy has on _write cost_ (the ultimate metric of
		interest).
		    
---------------------------------------------------------------------------

Admin announcement

    --moving to scheduling

    --regardless of your background, you should work through hw8

---------------------------------------------------------------------------

3. Scheduling intro

    A. When do scheduling decisions happen?

                                                   exit (iv)
                                                   |-------->[terminated]
              admitted         (ii) interrupt      |
	[new] -->   [ready]   <-------------- [running]
	              ^        --------------->      |
       I/O or event    \       scheduler dispatch    |
     completion (iii)   \                 ___________|
                         \               /
			      [waiting] v  I/O or event wait (i)


	scheduling decisions take place when a process:

	    (i) Switches from running to waiting state 
	    (ii) Switches from running to ready state 
	    (iii) Switches from waiting to ready 
	    (iv) Exits 

	preemptive scheduling: at all four points

	nonpreemptive scheduling: at points (i), (iv), only (this is the
	definition of nonpreemptive scheduling)

    B. What are metrics and criteria?
    
	--system throughput
	    # of processes that complete per unit time

	--turnaround time
	    time for each process to complete

	--response time
	    time from request to first response (e.g., key press to
	    character echo, not launch to exit)

	--fairness

	    different possible definitions:

		--freedom from starvation

		--all users get equal time on CPU

		--highest priority jobs get most of CPU

		--etc.

	    [often conflicts with efficiency. true in life as well.]

	the above are affected by secondary criteria:

	--CPU utilization (fraction of time CPU is actually working)

	--waiting time (time each process waits in ready queue; this is
	pretty much the same thing as response time)

    C. Context switching costs

	--CPU time in kernel

	    --save and restore registers

	    --switch address spaces 

	--indirect costs

	    --TLB shootdowns, processor cache, OS caches (e.g., buffer
	    caches)

	--result: more frequent context switches will lead to worse
	throughput (higher overhead)

4. Scheduling disciplines

    A. FCFS/FIFO

	--run each job until it's done

	--P1 needs 24 seconds

	--P2 needs 3 seconds

	--P3 needs 3 seconds
	
	--[ P1         P2  P3 ] 

	--throughput: 3 jobs / 30 seconds = .1 jobs/sec

	--average turnaround time?
	    1/3(24 + 27 + 30) = 27

	--observe: scheduling P2,P3,P1 would drastically reduce average
	turnaround time

	--advantages to FCFS:
	    --simple
	    --no starvation
	    --few context switches

	--disadvantage:
	    --short jobs get stuck behind long ones

    ***
    Larger issue: I/O vs computation

	--jobs contain bursts of computation, then must wait for I/O

	--to maximize throughput:

	    --must maximize CPU utilization *and*

	    --must maximize I/O device utilization

	--how?

	    --overlap I/O and computation from multiple jobs

	    --means *response time* very important for I/O intensive jobs:
	    I/O device will be idle until some job gets small bit of CPU
	    to issue next I/O request

    Most CPU bursts are small; a few are very long

    What are implications for FCFS?

	CPU bound jobs will hold CPU until exit or I/O 
	(but I/O rare for CPU-bound process) 

	    --long periods where no I/O requests issued, and CPU held 
	    --Result: poor I/O device utilization 

	Example: one CPU-bound job, many I/O bound 
	    --CPU bound runs (I/O devices idle) 
	    --CPU bound blocks 
	    --I/O bound job(s) run, quickly block on I/O 
	    --CPU bound runs again 
	    --I/O completes 
	    --CPU bound job continues while I/O devices idle 

	Simple hack: run process whose I/O completed? 
	    --What is a potential problem? 
	    (Answer: when the CPU-bound job issues an I/O request, it
	    gets the CPU again and then bursts for a while, leaving us
	    back where we started.)
    ***

    B. Round-robin

	--add timer

	--preempt CPU from long-running jobs. per time slice or quantum

	--after time slice, go to the back of the ready queue

	--most OSes do something of this flavor

	    --JOS does something like this, as you saw in lab 4

	--advantages:

	    --fair allocation of CPU across jobs
	    --low average waiting time when job lengths vary
	    --good for responsiveness if small number of jobs

	--disadvantages:

	    --what if jobs are same length?
	    --example: 2 jobs of 50 time units each, quantum of 1
	    --average completion time: 100 (vs. 75 for FCFS)

	--how to choose the quantum size?
	    
	    --want much larger than context switch cost

	    --majority of bursts should be less than quantum

	    --pick too small, and spend too much time context switching

	    --pick too large, and response time suffers (extreme case:
	    system reverts to FCFS)

	    --typical time slice is between 10-100 milliseconds. context
	    switches are usually microseconds or tens of microseconds
	    (maybe hundreds)

    C. SJF (shortest job first)

	--STCF: shortest time to completion first 
	    --Schedule the job whose next CPU burst is the shortest
	    
	--SRTCF: shortest remaining time to completion first
	    --preemptive version of STCF: if job arrives that has a
	    shorter time to completion than the remaining time on the
	    current job, immediately preempt CPU to give to new job

	--idea:
	    --get short jobs out of the system
	    --big (positive) effect on short jobs, small
	    (negative) effect on large jobs
	    --result: minimize waiting time (can prove this)

	--seeks to minimize average waiting time for a given set of
	processes

	--example 1:

	    process     arrival time         burst time
	    P1		0			7
	    P2		2			4
	    P3		4			1
	    P4		5			4

	    preemptive:
	    P1 P1 P2 P2 P3 P2 P2 P4 P4 P4 P4 P1 P1 P1 P1 P1

	--example 2:

	    3 jobs
	    A, B: both CPU bound, run for a week
	    C: I/O bound, loop
		1 ms of CPU
		10 ms of disk I/O

	    by itself, C uses 90% of disk
	    By itself, A or B uses 100% of CPU

	    what happens if we use FIFO?
		--if A or B gets in, keeps CPU for 2 weeks

	    what about RR with 100msec time slice?
		--only get 5% disk utilization

	    what about RR with 1msec time slice?
		--get nearly 90% disk utilization
		--but lots of preemptions

	    with SRTCF:
		--no needless preeemptions
		--get high disk utilization

	--SRTCF advantages:
	    --optimal response time (min waiting time)
	    --low overhead 
	--disadvantages:
	    --not hard to get unfairness or starvation (long-running
	    jobs)
	    --does not optimize turnaround time (only waiting time)
	    --** requires predicting the future **

	so useful as a yardstick for measuring other policies (good way
	to do CS design and development: figure out what the absolute
	best answer is, then figure out how to approximate it)

	however, can attempt to estimate future based on past (another
	thing that people do when designing systems):