Start Lecture #14
The device-independent code cantains most of the I/O functionality, but not most of the code since there are very many drivers. All drivers of the same class (say all hard disk drivers) do essentially the same thing in slightly different ways due to slightly different controllers.
As stated above the bulk of the OS code is made of device drivers and thus it is important that the task of driver writing not be made more difficult than needed. As a result each class of devices (e.g. the class of all disks) has a defined driver interface to which all drivers for that class of device conform. The device independent I/O portion processes user requests and calls the drivers.
Naming is again an important O/S functionality. In addition it offers a consistent interface to the drivers. The Unix method works as follows
specialfile in the /dev directory.
specialfiles and also contain so called major and minor device numbers.
allan dev # ls -l /dev/sd* brw-r----- 1 root disk 8, 0 Apr 25 09:55 /dev/sda brw-r----- 1 root disk 8, 1 Apr 25 09:55 /dev/sda1 brw-r----- 1 root disk 8, 2 Apr 25 09:55 /dev/sda2 brw-r----- 1 root disk 8, 3 Apr 25 09:55 /dev/sda3 brw-r----- 1 root disk 8, 4 Apr 25 09:55 /dev/sda4 brw-r----- 1 root disk 8, 5 Apr 25 09:55 /dev/sda5 brw-r----- 1 root disk 8, 6 Apr 25 09:55 /dev/sda6 brw-r----- 1 root disk 8, 16 Apr 25 09:55 /dev/sdb brw-r----- 1 root disk 8, 17 Apr 25 09:55 /dev/sdb1 brw-r----- 1 root disk 8, 18 Apr 25 09:55 /dev/sdb2 brw-r----- 1 root disk 8, 19 Apr 25 09:55 /dev/sdb3 brw-r----- 1 root disk 8, 20 Apr 25 09:55 /dev/sdb4 allan dev #
Protection. A wide range of possibilities are actually done in real systems. Including both extreme examples of everything is permitted and nothing is (directly) permitted.
Buffering is necessary since requests come in a size specified by the user and data is delivered by reads and accepted by writes in a size specified by the device. It is also important so that a user process using getchar() is not blocked and unblocked for each character read.
The text describes double buffering and circular buffers, which are important programming techniques, but are not specific to operating systems.
The system must enforce exclusive access for non-shared devices like CD-ROMs.
A good deal of I/O software is actually executed by unprivileged code running in user space. This code includes library routines linked into user programs, standard utilities, and daemon processes.
If one uses the strict definition that the operating system consists of the (supervisor-mode) kernel, then this I/O code is not part of the OS. However, very few use this strict definition.
Some library routines are very simple and just move their arguments into the correct place (e.g., a specific register) and then issue a trap to the correct system call to do the real work.
I think everyone considers these routines to be part of the operating system. Indeed, they implement the published user interface to the OS. For example, when we specify the (Unix) read system call by
count = read (fd, buffer, nbytes)as we did in chapter 1, we are really giving the parameters and accepting the return value of such a library routine.
Although users could write these routines, it would make their programs non-portable and would require them to write in assembly language since neither trap nor specifying individual registers is available in high-level languages.
Other library routines, notably standard I/O (stdio) in Unix, are definitely not trivial. For example consider the formatting of floating point numbers done in printf and the reverse operation done in scanf.
In unix-like systems the graphics libraries and the gui itself are outside the kernel. Graphics libraries are quite large and complex. In windows, the gui is inside the kernel.
Printing to a local printer is often performed in part by a regular program (lpr in Unix) that copies (or links) the file to a standard place, and in part by a daemon (lpd in Unix) that reads the copied files and sends them to the printer. The daemon might be started when the system boots.
Note that this implementation of printing uses spooling, i.e., the file to be printed is copied somewhere by lpr and then the daemon works with this copy. Mail uses a similar technique (but generally it is called queuing, not spooling).
The diagram on the right shows the various layers and some of the actions that are performed by each layer.
The arrows show the flow of control. The blue downward arrows show the execution path made by a request from user space eventually reaching the device itself. The red upward arrows show the response, beginning with the device supplying the result for an input request (or a completion acknowledgement for an output request) and ending with the initiating user process receiving its response.
Homework: 11, 13.
The ideal storage device is
When compared to central memory, disks are big and cheap, but slow.
Show a real disk opened up and illustrate the components.
Consider the following characteristics of a disk.
Remark: A practice final is available off the web page. Note the format and good luck.
Overlapping I/O operations is important when the system has more than one disk. Many disk controllers can do overlapped seeks, i.e. issue a seek to one disk while another disk is already seeking.
As technology improves the space taken to store a bit decreases, i.e., the bit density increases. This changes the number of cylinders per inch of radius (the cylinders are closer together) and the number of bits per inch along a given track.
Despite what Tanenbaum says later, it is not true that when one head is reading from cylinder C, all the heads can read from cylinder C with no penalty. It is, however, true that the penalty is very small.
Current commodity disks for desktop computers (not for commodity laptops) require about 10ms. before transferring the first byte and then transfer about 40K bytes per ms. (if contiguous). Specifically
This is quite extraordinary. For a large sequential transfer, in the first 10ms, no bytes are transmitted; in the next 10ms, 400,000 bytes are transmitted. The analysis suggests using large disk blocks, 100KB or more.
But the internal fragmentation would be severe since many files are small. Moreover, transferring small files would take longer with a 100KB block size.
In practice typical block sizes are 4KB-8KB.
Multiple block sizes have been tried (e.g. blocks are 8KB but a
file can also have fragments
that are a fraction of
a block, say 1KB).
Some systems employ techniques to encourage consecutive blocks of a given file to be stored near each other. In the best case, logically sequential blocks are also physically sequential and then the performance advantage of large block sizes is obtained without the disadvantages mentioned.
In a similar vein, some systems try to cluster related
files
(e.g., files in the same directory).
Homework: Consider a disk with an average seek time of 5ms, an average rotational latency of 5ms, and a transfer rate of 40MB/sec.
Originally, a disk was implemented as a three dimensional array
Cylinder#, Head#, Sector#
The cylinder number determined the cylinder, the head number
specified the surface (recall that there is one head per surface),
i.e., the head number determined the track within the cylinder, and
the sector number determined the sector within the track.
But there is something wrong here. An outer track is longer (in centimeters) than an inner track, but each stores the same number of sectors. Essentially some space on the outer tracks was wasted.
Later disks lied. They said they had a virtual geometry as above, but really had more sectors on outer tracks (like a ragged array). The electronics on the disk converted between the published virtual geometry and the real geometry.
Modern disk continue to lie for backwards compatibility, but also support Logical Block Addressing in which the sectors are treated as a simple one dimensional array with no notion of cylinders and heads.
The name and its acronym RAID came from Dave Patterson's group at Berkeley. IBM changed the name to Redundant Array of Independent Disks. I wonder why?
The basic idea is to utilize multiple drives to simulate a single larger drive, but with added redundancy.
The different RAID configurations are often called different
levels, but this is not a good name since there is no hierarchy and
it is not clear that higher levels are better
than low ones.
However, the terminology is commonly used so I will follow the trend
and describe them level by level, but having very little to say
about some levels.
There are three components to disk response time: seek, rotational latency, and transfer time. Disk arm scheduling is concerned with minimizing seek time by reordering the requests.
These algorithms are relevant only if there are several I/O requests pending. For many PCs, the system is so underutilized that there are rarely multiple outstanding I/O requests and hence no scheduling is possible. At the other extreme, many large servers, are I/O bound with significant queues of pending I/O requests. For these systems, effective disk arm scheduling is crucial.
Although disk scheduling algorithms are performed by the OS, they are also sometimes implemented in the electronics on the disk itself. The disks I brought to class were somewhat old so I suspect those didn't implement scheduling, but the then-current operating systems definitely did.
We will study the following algorithms all of which are quite simple.
no scheduling, but I wouldn't.
Happy Days) and by elevators.
stolecoins since requesting an already requested song was a nop.
Once the heads are on the correct cylinder, there may be several
requests to service.
All the systems I know, use Scan based on sector numbers to retrieve
these requests.
Note that in this case Scan is the same as C-Scan.
Why?
Ans: Because the disk rotates in only one direction.
The above is certainly correct for requests to the same track. If requests are for different tracks on the same cylinder, a question arises of how fast the disk can switch from reading one track to another on the same cylinder. There are two components to consider.
parallel readoutdisk with with some network we had devised. Alas, a disk designer explained to me that the heads are not perfectly aligned with the tracks.
Homework: 24.
Homework: A salesman claimed that their version of Unix was very fast. For example, their disk driver used the elevator algorithm to reorder requests for different cylinders. In addition, the driver queued multiple requests for the same cylinder in sector order. Some hacker bought a version of the OS and tested it with a program that read 10,000 blocks randomly chosen across the disk. The new Unix was not faster that an old one that did FCFS for all requests. What happened?
Often the disk/controller caches (a significant portion of) the entire track whenever it access a block, since the seek and rotational latency penalties have already been paid. In fact modern disks have multi-megabyte caches that hold many recently read blocks. Since modern disks cheat and don't have the same number of blocks on each track, it is better for the disk electronics (and not the OS or controller) to do the caching since the disk is the only part of the system to know the true geometry.
Most disk errors are handled by the device/controller and not the
OS itself.
That is, disks are manufactured with more sectors than are
advertised and spares are used when a bad sector is referenced.
Older disks did not do this and the operating system would form
a secret file
of bad blocks that were never used.
Skipped.
The hardware is simple. It consists of
The counter reload can be automatic or under OS control. If it is done automatically, the interrupt occurs periodically (the frequency is the oscillator frequency divided by the value in the register).
The value in the register can be set by the operating system and thus this programmable clock can be configured to generate periodic interrupts and any desired frequency (providing that frequency divides the oscillator frequency).
As we have just seen, the clock hardware simply generates a periodic interrupt, called the clock interrupt, at a set frequency. Using this interrupt, the OS software can accomplish a number of important tasks.
Time of day (TOD).
The basic idea is to increment a counter each clock tick (i.e.,
each interrupt).
The simplest solution is to initialize this counter at boot time
to the number of ticks since a fixed date (Unix traditionally
uses midnight, 1 January 1970).
Thus the counter always contains the number of ticks since that
date and hence the current date and time is easily calculated.
Two problems.
Three methods are used for initialization. The system can contact one or more know time sources (see the Wikipedia entry for NTP), the human operator can type in the date and time, or the system can have a battery-powered, backup clock. The last two methods only give an approximate time.
Overflow is a real problem if a 32-bit counter is used. In this case two counters are kept, the low-order and the high-order. Only the low order is incremented each tick; the high order is incremented whenever the low order overflows. That is, a counter with more bits is simulated.
Time quantum for Round Robbin scheduling. The system decrements a counter at each tick. The quantum expires when the counter reaches zero. The counter is loaded when the scheduler runs a process (i.e., changes the state of the process from ready to running). This is what I (and I would guess you) did for the (processor) scheduling lab.
Accounting.
At each tick, bump a counter in the process table
entry for the currently running process.
Alarm system call and system alarms.
Users can request a signal at some future time (the Unix
alarm
system call).
The system also on occasion needs to schedule some of its
own activities to occur at specific times in the future
(e.g., exercise a network time out).
The conceptually simplest solution is to have one timer for each event. Instead, we simulate many timers with just one using the data structure on the right with one node for each event.
Profiling.
The objective is to obtain a histogram giving how much time
was spent in each software module of a given user program.
The program is logically divided into blocks of say 1KB and a counter is associated with each block. At each tick the profiled code checks the program counter and bumps the appropriate counter.
After the program is run, a (user-mode) utility program can determine the software module associated with each 1K block and present the fraction of execution time spent in each module.
If we use a finer granularity (say 10B instead of 1KB), we get increased accuracy but more memory overhead.
Homework: 28.
Skipped.
At each key press and key release a scan code
is written
into the keyboard controller and the computer is interrupted.
By remembering which keys have been depressed and not released the
software can determine Cntl-A, Shift-B, etc.
There are two fundamental modes of input, traditionally called
raw and cooked in Unix and now sometimes call
noncanonical
and canonical
in POSIX.
In raw mode the application sees every character
the user
types.
Indeed, raw mode is character oriented.
All the OS does is convert the keyboard scan codes
to characters
and and pass these characters to the
application.
For example
Cooked mode is line oriented. The OS delivers lines to the application program after cooking them as follows.
The (possibly cooked) characters must be buffered until the application issues a read (and an end-of-line EOL has been received for cooked mode).
Whenever the mouse is moved or a button is pressed, it sends a
message to the computer consisting of Δx, Δy, and the
status of the buttons.
That is all the hardware does.
Issues such as double click
vs. two clicks are all handled by
the software.
In the beginning these were essentially typewriters (called
glass ttys
) and therefore simply received a stream of
characters.
Soon after, they accepted commands (called escape sequences
)
that would position the cursor, insert and delete characters,
etc.
Read.
The End: Good luck on the final