Class 16
CS 372H
22 March 2011

On the board
------------

1. Preview
    --I/O, disks, file systems, transactions
2. JOS hints for 4B
3. I/O
4. Disks

[draw arch. pictures]

---------------------------------------------------------------------------

1. Preview

2. JOS hint for 4B

    recall what kernel does on an exception/trap/hardware interrupt:

	    --push the current processor state onto the kernel's
	    exception stack; start running the kernel at the trap
	    handler
	
    A. hint: lab 4B: I recommend doing exercise 6 before exercise 4
    (that is, I recommend the following order: 3,6,4,5)

    B. hint: what the heck is going on with user space handling
    page faults in lab 4B?

	--basically, when a page fault happens, some code running in
	user space is going to handle the page fault. this requires
	a few things:

	    --an exception stack for user-space programs

	    --kernel creates the analog of a trapframe (called a
	    UTrapframe) for the user-space handler, and invokes the
	    user-handler
		--this is a bit tricky

	    --a stub for returning back to the code that was
	    originally executing at the time of the page fault
	    (which could have been either normal user-level code or
	    the page fault handler itself)
		--this is the hardest part of the lab; you probably
		need to draw a picture of what's going on in order
		to code this part

	--here is an analogy:

			  normal trap/exc/int        user-level handling

    who does set up?             CPU                       JOS

    what code is invoked?       interrupt handler        page fault handler

    how does CPU/kernel find it?    IDT                env->env_pgfault_upcall

    what is the handler passed?    struct Trapframe       struct UTrapframe

    who sets up these structs?      CPU                       you, in ex.4
		       (with some help from trapentry.S)

    how does handler return?      env_run                   using your soln to ex.5
				    env_pop_tf
				       iret

3. I/O
    
    * architecture
    * communicating with devices
    * device drivers

    A. architecture

    [draw logical picture of CPU/Memory/crossbar]

	--CPU accesses physical memory over a bus

	--devices access memory over I/O bus

	--devices can appear to be a region of memory
	    --recall 640K-1MB region, from early classes
	    --and hole in memory for PCI

    [draw PC architecture picture]

    [draw picture of the I/O bus]

    B. communicating with a device

	(a) Memory-mapped device registers

	    --Certain _physical_ addresses correspond to device registers

	    --Load/store gets status/sends instructions -- not real memory
	    
	(b) Device memory -- device may have memory that OS can write to
	directly on the other side of I/O bus

	(c) Special I/O instructions

	    --Some CPUs (e.g., x86) have special I/O instructions

	    --Like load & store, but asserts special I/O pin on CPU

	    --OS can allow user-mode access to I/O ports with finer
	    granularity than page

	(d) DMA -- place instructions to card in main memory

	    --Typically then need to "poke" card by writing to register

	    --Overlaps unrelated computation with moving data over
	    (typically slower than memory) I/O bus

	    how it works (roughly)

	    [buffer descriptor list]
	       <metadata> --> [  buf ]
	       <metadata> --> [  buf ]
	       ....

	    card knows where to find the descriptor list. then it can
	    access the buffers with DMA
	

	    (i) example: network interface card


	    |
	    I/O bus -------  [ bus interface  
				<buffers in both directions> 
				    link interface] --> network link
	    |
	    |

	    --Link interface talks to wire/fiber/antenna

		--Typically does framing, link-layer CRC

	    --FIFO queues on card provide small amount of buffering

	    --bus interface logic uses DMA to move packets to and from
	    buffers in main memory

	    (ii) example: IDE disk read with DMA

	    [draw picture]


    C. Device drivers

	* entry points
	* synchronization
	    --polling
	    --interrupts

	--Device driver provides several entry points to kernel

	    --example: Reset, ioctl, output, read, write, **interrupt

	    --when you write a driver, you are implementing this
	    interface, and also calling functions that the kernel itself
	    exposes

	    --purpose of driver: abstract nasty hardware so that kernel
	    doesn't have to understand all of the details. kernel just
	    knows that it has a device that exposes a call like "read",
	    "write", and that the device can interrupt the kernel

	--How should driver synchronize with device?
	    
	    examples:
	    --need to know when transmit buffers free or packets arrive
	    --need to know when disk request is complete
 
	    --[note: the device doesn't care a huge amount about which
	    of the following two options is in effect: interrupts are an
	    abstraction that happens between the device and the CPU. the
	    question here is about the logic in the driver and the
	    interrupt controller.]
 
	    --Approach 1: **Polling**

		--Sent a packet?  Loop asking card when buffer is free

		--Waiting to receive?  Keep asking card if it has packet

		--Disk I/O?  Keep looping until disk ready bit set
      
		--What are the disadvantages of polling? (Trade-off between
		wasting CPU cycles [can't do anything else while polling]
		and high latency [if poll scheduled for the future but, say,
		packet is ready or disk block has arrived])

	    --Approach 2: **Interrupt-driven **

		--ask card to interrupt CPU on events
		    --Interrupt handler runs at high priority
		    --Asks card what happened (xmit buffer free, new packet)
		    --This is what most general-purpose OSes do.
		    Nevertheless.....

		--....it's important to understand the following; you'll
		probably run into this issue if you build systems that
		need to run at high speed

		--interrupts are actually bad at high data arrival rate.
		classically this issue comes up with network cards

		    --Packets can arrive faster than OS can process them

		    --Interrupts are very expensive (context switch)

		    --Interrupt handlers have high priority

		    --In worst case, can spend 100% of time in interrupt handler
		      and never make any progress. this is a phenonmenon
		      known as *receive livelock*.

		--best thing to do is: start with interrupts. if you
		need high performance and interrupts are slowing you,
		then use polling. if you then notice that polling is
		chewing too many CPU cycles, then move to adaptive
		switching between interrupts and polling.

	      
		--interrupts are great for disk requests.

		--going to talk about disks now and network devices in a
		few weeks and in lab 6


4. disks

    A. What is a disk?
    B. Geometry
    C. Performance
    D. Common #s
    E. [next time] how driver interfaces to disk
    F. [next time ]Performance II
    G. [next time] Disk scheduling (performance III)
    H. [next time] Technology and systems trends

    Disks are *the* bottleneck in many systems

    [Reference: "An Introduction to Disk Drive Modeling",
    by Chris Ruemmler and John Wilkes. IEEE Computer 1994, Vol. 27,
    Number 3, 1994. pp17-28.]

    A. What is a disk?

    --stack of magnetic platters

	--Rotate together on a central spindle @3,600-15,000 RPM
	
	--Drive speed drifts slowly over time

	--Can't predict rotational position after 100-200 revolutions

	
	 -------------
	|          platter
	| ------------
	|
	|
	| ------------
	|          platter
	| ------------
        |
	|
	| ------------
	|          platter
	| ------------
	|          

    --Disk arm assembly

	--Arms rotate around pivot, all move together

	--Pivot offers some resistance to linear shocks

	--Arms contain disk heads--one for each recording surface

	--Heads read and write data to platters

    B. Geometry of a disk


	--track: circle on a platter. each platter is divided into
	concentric tracks.

	--sector: chunk of a track

	--cylinder: locus of all tracks of fixed radius on all platters

	--Heads are roughly lined up on a cylinder

	    --Significant fractions of encoded stream for error correction
    
	--Generally only one head active at a time

	    --Disks usually have one set of read-write circuitry

	    --Must worry about cross-talk between channels

	    --Hard to keep multiple heads exactly aligned

	--disk positioning system

	    --Move head to specific track and keep it there

		--Resist physical shocks, imperfect tracks, etc.

	    --a **seek** consists of up to four phases:

		--*speedup*: accelerate arm to max speed or half way point
    
		--*coast*: at max speed (for long seeks)

		--*slowdown*: stops arm near destination

		--*settle*: adjusts head to actual desired track

    C. Performance (important to understand this if you are building
    systems that need good performance)

	components of transfer: rotational delay, seek delay, transfer
	time.
	    rotational delay: time for sector to rotate under disk head
	    seek: speedup, coast, slowdown, settle
	    transfer time: will discuss

	discuss seeks in a bit of detail now:

	--seeking track-to-track: comparatively fast (~1ms). mainly
	settle time

	--short seeks (200-400 cyl.) dominated by speedup

	    --BTW, this thing can accelerate at up to several hundred g

	--longer seeks dominated by coast

	--head switches comparable to short seeks

	--settle times takes longer for writes than reads. why?
	    --because if read strays, the error will be caught, and the
	    disk can retry
	    --if the write strays, some other track just got clobbered.
	    so write settles need to be done precisely

	--note: "average seek time" quoted can be many things

	    --time to seek 1/3 of disk

	    --1/3 of the time to seek the whole disk

	    --(convince yourself those may not be the same)


    D. Common #s

	--capacity: 100s of GB

	--platters: 8

	--number of cylinders: tens of thousands or more

	--sectors per track: ~1000

	--RPM: 10000

	--transfer rate: 50-85 MB/s

	--mean time between failures: ~1 million hours
	    (for disks in data centers, it's vastly less; for a provider
	    like Google, even if they had very reliable disks, they'd still need
	    an automated way to handle failures because failures would
	    be common (imagine 2 million disks: *some* will be on the
	    fritz at any given moment). so what they do is to buy
	    defective and cheap disks, which are far cheaper. lets them
	    save on hardware costs. they get away with it because they
	    *anyway* needed software and systems -- replication and
	    other fault-tolerance schemes -- to handle failures.)

5. Exam

    --scores were a bit lower than I'd expected

    --some statistics:

	--median: 60

	--mean: 59.2

	--stddev: 13.5

	--mode: 67

	--high score: 81.5

    --interpretation:

	--no letter grades yet (sorry)

	--don't panic if you're not happy with your score; lots of
	opportunity to bring things up

    --regardless of how you think you did, *please* make sure you
    understand all the answers; the solutions are posted on the course
    Web page, and they are intended to be helpful here

    --please read the solutions and make sure you understand

	--on the final, you'll be writing multithreaded code

    --some notes about the grading:

	--we weren't hugely generous with partial credit, especially when an
	answer indicated a misunderstanding. that's for several reasons:

	    --want to be clear with you about what you do and don't
	    understand, as reflected in what you write

	    --want to be fair to those who get the question

	    --ground rules were that attempting a problem started at 0
	    points.

	--if you have questions, let me know. we tried to be careful, but
	it's possible we made mistakes. 
	
	    --please note that a regrade request will generate a regrade
	    of the entire exam