Class 16 CS 372H 22 March 2011 On the board ------------ 1. Preview --I/O, disks, file systems, transactions 2. JOS hints for 4B 3. I/O 4. Disks [draw arch. pictures] --------------------------------------------------------------------------- 1. Preview 2. JOS hint for 4B recall what kernel does on an exception/trap/hardware interrupt: --push the current processor state onto the kernel's exception stack; start running the kernel at the trap handler A. hint: lab 4B: I recommend doing exercise 6 before exercise 4 (that is, I recommend the following order: 3,6,4,5) B. hint: what the heck is going on with user space handling page faults in lab 4B? --basically, when a page fault happens, some code running in user space is going to handle the page fault. this requires a few things: --an exception stack for user-space programs --kernel creates the analog of a trapframe (called a UTrapframe) for the user-space handler, and invokes the user-handler --this is a bit tricky --a stub for returning back to the code that was originally executing at the time of the page fault (which could have been either normal user-level code or the page fault handler itself) --this is the hardest part of the lab; you probably need to draw a picture of what's going on in order to code this part --here is an analogy: normal trap/exc/int user-level handling who does set up? CPU JOS what code is invoked? interrupt handler page fault handler how does CPU/kernel find it? IDT env->env_pgfault_upcall what is the handler passed? struct Trapframe struct UTrapframe who sets up these structs? CPU you, in ex.4 (with some help from trapentry.S) how does handler return? env_run using your soln to ex.5 env_pop_tf iret 3. I/O * architecture * communicating with devices * device drivers A. architecture [draw logical picture of CPU/Memory/crossbar] --CPU accesses physical memory over a bus --devices access memory over I/O bus --devices can appear to be a region of memory --recall 640K-1MB region, from early classes --and hole in memory for PCI [draw PC architecture picture] [draw picture of the I/O bus] B. communicating with a device (a) Memory-mapped device registers --Certain _physical_ addresses correspond to device registers --Load/store gets status/sends instructions -- not real memory (b) Device memory -- device may have memory that OS can write to directly on the other side of I/O bus (c) Special I/O instructions --Some CPUs (e.g., x86) have special I/O instructions --Like load & store, but asserts special I/O pin on CPU --OS can allow user-mode access to I/O ports with finer granularity than page (d) DMA -- place instructions to card in main memory --Typically then need to "poke" card by writing to register --Overlaps unrelated computation with moving data over (typically slower than memory) I/O bus how it works (roughly) [buffer descriptor list] --> [ buf ] --> [ buf ] .... card knows where to find the descriptor list. then it can access the buffers with DMA (i) example: network interface card | I/O bus ------- [ bus interface link interface] --> network link | | --Link interface talks to wire/fiber/antenna --Typically does framing, link-layer CRC --FIFO queues on card provide small amount of buffering --bus interface logic uses DMA to move packets to and from buffers in main memory (ii) example: IDE disk read with DMA [draw picture] C. Device drivers * entry points * synchronization --polling --interrupts --Device driver provides several entry points to kernel --example: Reset, ioctl, output, read, write, **interrupt --when you write a driver, you are implementing this interface, and also calling functions that the kernel itself exposes --purpose of driver: abstract nasty hardware so that kernel doesn't have to understand all of the details. kernel just knows that it has a device that exposes a call like "read", "write", and that the device can interrupt the kernel --How should driver synchronize with device? examples: --need to know when transmit buffers free or packets arrive --need to know when disk request is complete --[note: the device doesn't care a huge amount about which of the following two options is in effect: interrupts are an abstraction that happens between the device and the CPU. the question here is about the logic in the driver and the interrupt controller.] --Approach 1: **Polling** --Sent a packet? Loop asking card when buffer is free --Waiting to receive? Keep asking card if it has packet --Disk I/O? Keep looping until disk ready bit set --What are the disadvantages of polling? (Trade-off between wasting CPU cycles [can't do anything else while polling] and high latency [if poll scheduled for the future but, say, packet is ready or disk block has arrived]) --Approach 2: **Interrupt-driven ** --ask card to interrupt CPU on events --Interrupt handler runs at high priority --Asks card what happened (xmit buffer free, new packet) --This is what most general-purpose OSes do. Nevertheless..... --....it's important to understand the following; you'll probably run into this issue if you build systems that need to run at high speed --interrupts are actually bad at high data arrival rate. classically this issue comes up with network cards --Packets can arrive faster than OS can process them --Interrupts are very expensive (context switch) --Interrupt handlers have high priority --In worst case, can spend 100% of time in interrupt handler and never make any progress. this is a phenonmenon known as *receive livelock*. --best thing to do is: start with interrupts. if you need high performance and interrupts are slowing you, then use polling. if you then notice that polling is chewing too many CPU cycles, then move to adaptive switching between interrupts and polling. --interrupts are great for disk requests. --going to talk about disks now and network devices in a few weeks and in lab 6 4. disks A. What is a disk? B. Geometry C. Performance D. Common #s E. [next time] how driver interfaces to disk F. [next time ]Performance II G. [next time] Disk scheduling (performance III) H. [next time] Technology and systems trends Disks are *the* bottleneck in many systems [Reference: "An Introduction to Disk Drive Modeling", by Chris Ruemmler and John Wilkes. IEEE Computer 1994, Vol. 27, Number 3, 1994. pp17-28.] A. What is a disk? --stack of magnetic platters --Rotate together on a central spindle @3,600-15,000 RPM --Drive speed drifts slowly over time --Can't predict rotational position after 100-200 revolutions ------------- | platter | ------------ | | | ------------ | platter | ------------ | | | ------------ | platter | ------------ | --Disk arm assembly --Arms rotate around pivot, all move together --Pivot offers some resistance to linear shocks --Arms contain disk heads--one for each recording surface --Heads read and write data to platters B. Geometry of a disk --track: circle on a platter. each platter is divided into concentric tracks. --sector: chunk of a track --cylinder: locus of all tracks of fixed radius on all platters --Heads are roughly lined up on a cylinder --Significant fractions of encoded stream for error correction --Generally only one head active at a time --Disks usually have one set of read-write circuitry --Must worry about cross-talk between channels --Hard to keep multiple heads exactly aligned --disk positioning system --Move head to specific track and keep it there --Resist physical shocks, imperfect tracks, etc. --a **seek** consists of up to four phases: --*speedup*: accelerate arm to max speed or half way point --*coast*: at max speed (for long seeks) --*slowdown*: stops arm near destination --*settle*: adjusts head to actual desired track C. Performance (important to understand this if you are building systems that need good performance) components of transfer: rotational delay, seek delay, transfer time. rotational delay: time for sector to rotate under disk head seek: speedup, coast, slowdown, settle transfer time: will discuss discuss seeks in a bit of detail now: --seeking track-to-track: comparatively fast (~1ms). mainly settle time --short seeks (200-400 cyl.) dominated by speedup --BTW, this thing can accelerate at up to several hundred g --longer seeks dominated by coast --head switches comparable to short seeks --settle times takes longer for writes than reads. why? --because if read strays, the error will be caught, and the disk can retry --if the write strays, some other track just got clobbered. so write settles need to be done precisely --note: "average seek time" quoted can be many things --time to seek 1/3 of disk --1/3 of the time to seek the whole disk --(convince yourself those may not be the same) D. Common #s --capacity: 100s of GB --platters: 8 --number of cylinders: tens of thousands or more --sectors per track: ~1000 --RPM: 10000 --transfer rate: 50-85 MB/s --mean time between failures: ~1 million hours (for disks in data centers, it's vastly less; for a provider like Google, even if they had very reliable disks, they'd still need an automated way to handle failures because failures would be common (imagine 2 million disks: *some* will be on the fritz at any given moment). so what they do is to buy defective and cheap disks, which are far cheaper. lets them save on hardware costs. they get away with it because they *anyway* needed software and systems -- replication and other fault-tolerance schemes -- to handle failures.) 5. Exam --scores were a bit lower than I'd expected --some statistics: --median: 60 --mean: 59.2 --stddev: 13.5 --mode: 67 --high score: 81.5 --interpretation: --no letter grades yet (sorry) --don't panic if you're not happy with your score; lots of opportunity to bring things up --regardless of how you think you did, *please* make sure you understand all the answers; the solutions are posted on the course Web page, and they are intended to be helpful here --please read the solutions and make sure you understand --on the final, you'll be writing multithreaded code --some notes about the grading: --we weren't hugely generous with partial credit, especially when an answer indicated a misunderstanding. that's for several reasons: --want to be clear with you about what you do and don't understand, as reflected in what you write --want to be fair to those who get the question --ground rules were that attempting a problem started at 0 points. --if you have questions, let me know. we tried to be careful, but it's possible we made mistakes. --please note that a regrade request will generate a regrade of the entire exam