Imagine hard links pointing to directories (unix does NOT permit this). cd ~ mkdir B; mkdir C mkdir B/D; mkdir B/E ln B B/D/oh-my Now you have a loop with honest looking links. Normally can't remove a directory (i.e. unlink it from its parent) unless it is empty. But when can have multiple hard links to a directory, should permit removing (i.e. unlinking) one even if the directory is not empty. So in above example could unlink B from A. Now you have garbage (unreachable, i.e. unnamable) directories B, D, and E. For a centralized system need a conventional garbage collection. For distributed system need a distributed garbage collector, which is much harder. Transparancy Location transparancy Path name (i.e. full name of file) does NOT say where the file is located. On our ultra setup, we have filesystems /a /b /c and others exported and remote mounted. When we moved /e from machine allan to machine decstation very little had to change on other machines (just the file /etc/fstab). More importantly, programs running everywhere could still refer to /e/xy But this was just because we did it that way. I.e., we could have mounted the same filesystem as /e on one machine and /xyz on another. Location Indpendence Path name is independent of the server. Hence can move a file from server to server without changing its name. Have a namespace of files an then have some (dynamically) assigned to certain servers. This namespace would be the same on all machines in the system. Not sure if any systems do this. Root transparancy made up name / is the same on all systems Would ruin some conventions like /tmp Examples Machine + path naming /machine/path machine:path Mounting remote filesystem onto local heirarchy When done inteligently get location transparancy Single namespace looking the same on all machines Two level naming Said above that a directory is a mapping from names to files (and subdirectories). More formally, the directory maps the user name /home/gottlieb/course/os/class-notes.html to the OS name for that file 143428 (the unix inode number). These two names are sometimes called the symbolic and binary names. For some systems the binary names are available. allan$ ls -i course/os/class-notes.html allan$ 143426 course/os/class-notes.html The binary name could contain the server name so that could directly reference files on other filesystems/machines Unix doesn't do this Could have symbolic links contain the server name Unix doesn't do this either I believe that vms did something like this. Symbolic name was something like nodename::filename It has been a while since I used VMS so I may have this wrong. Could have the name lookup yield MULTIPLE binary names. Redundant storage of files for availability Naturally must worry about updates When visible? Concurrent updates? WHENEVER you hear of a system that keeps multiple copies of something, an immediate question should be "are these immutable?". If the answer is no, the next question is "what are the update semantics?" HOMEWORK 13-5 Sharing semantics Unix semantics -- A read returns the value store by the last write. Probably unix doesn't quite do this. If a write is large (several blocks) do seeks for each During a seek, the process sleeps (in the kernel) Another process can be writing a range of blocks that intersects the blocks for the first write. The result could be (depending on disk scheduling) that the result does not have a last write. Perhaps Unix semantics means -- A read returns the value stored by the last write providing one exists. Perhaps Unix semantics means -- A write syscall should be thought of as a sequence of write-block syscalls and similar for reads. A read-block syscall returns the value of the last write-block syscall for that block Easy to get this same semantics for systems with file servers PROVIDING No client side copies (Upload/download) No client side caching Session semantics Changes to an open file are visible only to the process (machine???) that issued the open. When the file is closed the changes become visible to all If using client caching CANNOT flush dirty blocks until close. What if you run out of buffer space? Messes up file-pointer semantics The file pointer is shared across fork so all children of a parent share it. But if the children run on another machine with session semantics, the file pointer can't be shared since the other machine does not see the effect of the writes done by the parent). HOMEWORK 13-2, 13-4 Immutable files Then there is "no problem" Fine if you don't want to change anything Can have "version numbers" Book says old version becomes inaccessible (at least under the current name) With version numbers if use name without number get highest numbered version so would have what book says. But really you do have the old (full) name accessible VMS definitely did this Note that directories are still mutable Otherwise no create-file is possible HOMEWORK 13-4 Transactions Clean semantics Using transactions in OS is becoming more widely studied Distributed File System Implementation File Usage characteristics Measured under unix at a university Not obvious same results would hold in a different environment Findings 1. Most files are small (< 10K) 2. Reading dominates writing 3. Sequential accesses dominate 4. Most files have a short lifetime 5. Sharing is unusual 6. Most processes use few files 7. File classes with different properties exist Some conclusions 1 suggests whole-file transfer may be worthwhile (except for really big files). 2+5 suggest client caching and dealing with multiple writers somehow, even if the latter is slow (since it is infrequent). 4 suggests doing creates on the client Not so clear. Possibly the short lifetime files are tempories that are created in /tmp or /usr/tmp or /somethingorother/tmp. These would not be on the server anyway. 7 suggests having mulitple mechanisms for the several classes.