NOTE: These notes are by Allan Gottlieb, and are
reproduced here, with superficial modifications, with his permission.
"I" in this text generally refers to Prof. Gottlieb, except
in regards to administrative matters.
================ Start Lecture #16
A physical dump can only be done with a full dump and a full restoration.
A logical dump can be done with any combination of full or incremental dump
and with full or incremental restoration.
- Full dump vs. incremental
- Physical dump vs. logical
- Full restoration vs. partial
Physical vs. Logical dumps.
Physical dumps: Start at beginning of disk and copy through to the end.
Full restoration: Copy back into disk. Disk ends up in exactly the
same state as when dumped.
Full dump: Start at root, and traverse file system using tree search.
- Incremental dump: Find all parts of file system that have been
changed since last incremental dump and save those. Either traverse
file system (much more work at dump time) or keep log of
changes to files and directories (more work each time change is made.)
The result of a full restoration is that the file structure returns
to its previous state, but the actual location of disk can be entirely
Tricky issues in logical dumps:
- Full restoration. Find last full dump and do a full restore. Then
for each file and directory mentioned in a subsequent incremental dump,
find the most recent version in the incremental dumps, and load in that
version. Note that updating a directory may involve deleting a file
- Partial restoration. Restore some specified files or directories.
Generally better to place these into a new directory rather than
to try to put them back where they came from.
Tanenbaum says that the free list is a problem, but I don't see this:
The restoration process should manipulate the free list like any
other process that creates and destroys files.
- Careful not to dump things with names like files e.g. I/O devices.
- Directories. Disk addresses are not preserved, so directories
have to be saved symbolically, rather than simply copied. Likewise
When any kind of dump or restoration is taken place, the file system
must be "frozen": no changes are allowed, for fear of creating inconsistencies.
Since full dumps can take a long time, this is a substantial constraint.
Physical dumps are preferred:
Logical dumps are preferred
- A. If you're doing a full dump and the disk is largely full, then
much faster (because much reduced seek time).
- B. If the file system is corrupted, then a logical dump may miss
- C. If ordinary files (i.e. not directories) contain disk addresses
then physical dump will preserve these; logical dump will trash them.
- A. For incremental dumps or partial restore: Must use logical dump.
- B. If disk is largely empty, logical dump will be faster and
more memory efficient.
- C. Logical dump allows disk reorganization (e.g. putting all
the blocks of all files consecutively); in fact, it will naturally
tend to do this.
- D. Physical dump saves lots of non-file blocks (free space,
swapping space, etc.)
- E. For restoring to a disk with a different structure.
Full restoration: Go back to last full dump, reload, reload all incremental
dumps since in sequence.
Restoring 1 file F: Delete current version of F, find most recent
incremental dump when F was saved, load F.
Possible types of inconsistency:
- Fsck (file system check) and chkdsk (check disk)
- If the system crashed, it is possible that not all metadata was
written to disk. As a result the file system may be inconsistent.
These programs check, and often correct, inconsistencies.
- Scan all inodes (or fat) to check that each block is in exactly
one file, or on the free list, but not both.
- Also check that the number of links to each file (part of the
metadata in the file's inode) is correct (by
looking at all directories).
- Other checks as well.
- Offers to ``fix'' the errors found (for most errors).
- Block is in both free list and file. Fix: remove block from free list.
- Block is neither on free list nor any file. Fix: Add block to free list.
- Block B is in two files F and G. Fix: Copy B to B1, replace B in G with
B1. Probably at least one of these files is now garbage, but at least
the file structure is consistent.
- Reference count for file is not equal to number of actual references.
Fix: Set reference count to be number of actual references.
Redundant Array of Inexpensive/Independent Disks
Various schemes to use multiple disks to improve speed and/or reliability.
Different schemes are misleadingly called "levels".
Only the disk controller knows the difference between a RAID system and
a simple disk. As far as the disk driver is concerned, it's just another
random-access information storage device. As I said before, device
independence really does work over this class of devices.
Let K be the number of disks. Let T be a time small enough that
it is very unlikely that two disks will fail within time T. Let F
be the probability that a disk fails within time T.
RAID Level 0: Striping
Block I goes to disk I mod K.
Speed of reading and writing large file improves by factor of K.
Probability of failure within time T = F*K (i.e. the system become K
times more unreliable.)
RAID Level 1: Mirroring
K/2 primary disks, K/2 backup disks. Stripe over primary disks.
Each backup disk identical to corresponding primary disk.
Speed of reading large file improved by factor of K: Simultaneously
read different block from each disks.
Speed of writing large file improved by factor of K/2 (book is mistaken
in saying no improvement.) Simultaneously write block to K/2
[disk, backup] pairs.
Probability of failure within time T: (K/2)*F2.
(Fix disk when failure occurs.)
One of the normal RAID methods is to have K-1 data disks and one
parity disk. Data is striped across the data disks and the bitwise
parity of these sectors is written in the corresponding sector of the
On a read if the block is bad (e.g., if the entire disk is bad or
even missing), the system automatically reads the other blocks in the
stripe and the parity block in the stripe. Then the missing block is
just the bitwise exclusive or of all these blocks.
For reads this is very good. The failure free case has no penalty
(beyond the space overhead of the parity disk). The error case
requires N+1 (say 5) reads.
A serious concern is the small write problem. Writing a sector
requires 4 I/O. Read the old data sector, compute the change, read
the parity, compute the new parity, write the new parity and the new
data sector. Hence one sector I/O became 4, which is a 300% penalty.
Writing a full stripe is not bad. Compute the parity of the N
(say 4) data sectors to be written and then write the data sectors and
the parity sector. Thus 4 sector I/Os become 5, which is only a 25%
penalty and is smaller for larger N, i.e., larger stripes.
A variation is to rotate the parity. That is, for some stripes
disk 1 has the parity, for others disk 2, etc. The purpose is to not
have a single parity disk since that disk is needed for all small
writes and could become a point of contention.
Same as level 3, but with full
rather than just 1 parity bit.
Allows you to correct unreported error in 1 bit, or detect error in two bits.
Also called timers.
5.5.1: Clock Hardware
- Generates an interrupt when timer goes to zero
- Counter reload can be automatic or under software (OS) control.
- If done automatically, the interrupt occurs periodically and thus
is perfect for generating a clock interrupt at a fixed period.
5.5.2: Clock Software
- TOD: Bump a counter each tick (clock interupt). If counter is
only 32 bits must worry about overflow so keep two counters: low order
and high order.
- Time quantum for RR: Decrement a counter at each tick. The quantum
expires when counter is zero. Load this counter when the scheduler
runs a process.
- Accounting: At each tick, bump a counter in the process table
entry for the currently running process.
- Alarm system call and system alarms:
- Users can request an alarm at some future time.
- The system also on occasion needs to schedule some of its own
activities to occur at specific times in the future (e.g. turn off
the floppy motor).
- The conceptually simplest solution is to have one timer for
- Instead, we simulate many timers with just one.
- The data structure on the right works well.
- The time in each list entry is the time after the
preceding entry that this entry's alarm is to ring.
- For example, if the time is zero, this event occurs at the
same time as the previous event.
- The other entry is a pointer to the action to perform.
- At each tick, decrement next-signal.
- When next-signal goes to zero,
process the first entry on the list and any others following
immediately after with a time of zero (which means they are to be
simultaneous with this alarm). Then set next-signal to the value
in the next alarm.
- Want a histogram giving how much time was spent in each 1KB
(say) block of code.
- At each tick check the PC and bump the appropriate counter.
- A user-mode program can determine the software module
associated with each 1K block.
- If we use finer granularity (say 10B instead of 1KB), we get
increased accuracy but more memory overhead.
Keyboard controller reports each pressing of a key and each release of
a key as an "event" to OS; everything else is done in OS software.
Maximal flexibility. Keyboard driver ordinarily reports a sequence
of characters to the user process.
Reports distance in units of 0.1 mm in each direction (X,Y). Software
keeps track of position by dead reckoning. OK, because of monitor feedback.
A monitor has M x N (e.g. 1024 x 768) pixels, each of which has three displays
(Red, Green, Blue) of intensity 0 ... 255 (8 bits)
Each pixel refreshed 50 or 22 times per second.
Monitors are characterized by:
The last two properties are key to the very high degree of compression
achievable for video.
- Always on.
- High data rate: 1024 x 768 x 3 Bytes of color x 50 times / sec = 150MBytes
- Changes gradually.
- Loss tolerant.
CPU writes to video RAM; Video controller scans video RAM continuously.