Imagine hard links pointing to directories
        (unix does NOT permit this).

            cd ~
            mkdir B;   mkdir C
            mkdir B/D; mkdir B/E
            ln B B/D/oh-my

        Now you have a loop with honest looking links.

        Normally can't remove a directory (i.e. unlink it from its
        parent) unless it is empty.

        But when can have multiple hard links to a directory, should
        permit removing (i.e. unlinking) one even if the directory is
        not empty.

        So in above example could unlink B from A.

        Now you have garbage (unreachable, i.e. unnamable) directories
        B, D, and E.

        For a centralized system need a conventional garbage
        collection.

        For distributed system need a distributed garbage collector,
        which is much harder.

    Transparancy

        Location transparancy

            Path name (i.e. full name of file) does NOT say where the
            file is located.

                On our ultra setup, we have filesystems /a /b /c and
                others exported and remote mounted.  When we moved /e
                from machine allan to machine decstation very little
                had to change on other machines (just the file
                /etc/fstab).  More importantly, programs running
                everywhere could still refer to /e/xy

                But this was just because we did it that way.  I.e.,
                we could have mounted the same filesystem as /e on one
                machine and /xyz on another.

        Location Indpendence

            Path name is independent of the server.  Hence can move a
            file from server to server without changing its name.

            Have a namespace of files an then have some (dynamically)
            assigned to certain servers.  This namespace would be the
            same on all machines in the system.

            Not sure if any systems do this.

        Root transparancy

            made up name

            / is the same on all systems

            Would ruin some conventions like /tmp

        Examples

            Machine + path naming

                /machine/path

                machine:path

            Mounting remote filesystem onto local heirarchy

                When done inteligently get location transparancy

            Single namespace looking the same on all machines

    Two level naming

        Said above that a directory is a mapping from names to files
        (and subdirectories).

        More formally, the directory maps the user name
        /home/gottlieb/course/os/class-notes.html to the OS name for
        that file 143428 (the unix inode number).

        These two names are sometimes called the symbolic and binary
        names. 

        For some systems the binary names are available.

            allan$ ls -i course/os/class-notes.html
            allan$ 143426 course/os/class-notes.html

        The binary name could contain the server name so that could
        directly reference files on other filesystems/machines

            Unix doesn't do this

        Could have symbolic links contain the server name

            Unix doesn't do this either

            I believe that vms did something like this.  Symbolic name
            was something like nodename::filename

                It has been a while since I used VMS so I may have
                this wrong.

        Could have the name lookup yield MULTIPLE binary names.

            Redundant storage of files for availability

            Naturally must worry about updates

                When visible?

                Concurrent updates?

                WHENEVER you hear of a system that keeps multiple
                copies of something, an immediate question should be
                "are these immutable?".  If the answer is no, the next
                question is "what are the update semantics?"

HOMEWORK 13-5

    Sharing semantics

        Unix semantics -- A read returns the value store by the last
        write.

            Probably unix doesn't quite do this.

                If a write is large (several blocks) do seeks for each

                During a seek, the process sleeps (in the kernel)

                Another process can be writing a range of blocks that
                intersects the blocks for the first write.

                The result could be (depending on disk scheduling)
                that the result does not have a last write.

            Perhaps Unix semantics means -- A read returns the value
            stored by the last write providing one exists.

            Perhaps Unix semantics means -- A write syscall should be
            thought of as a sequence of write-block syscalls and
            similar for reads.  A read-block syscall returns the value
            of the last write-block syscall for that block

        Easy to get this same semantics for systems with file servers
        PROVIDING

            No client side copies (Upload/download)

            No client side caching

        Session semantics

            Changes to an open file are visible only to the process
            (machine???) that issued the open.  When the file is
            closed the changes become visible to all

            If using client caching CANNOT flush dirty blocks until
            close.  What if you run out of buffer space?

            Messes up file-pointer semantics

                The file pointer is shared across fork so all children
                of a parent share it.

                But if the children run on another machine with
                session semantics, the file pointer can't be shared
                since the other machine does not see the effect of the
                writes done by the parent).

HOMEWORK 13-2, 13-4

        Immutable files

            Then there is "no problem"

            Fine if you don't want to change anything

            Can have "version numbers"

                Book says old version becomes inaccessible (at least
                under the current name)

                With version numbers if use name without number get
                highest numbered version so would have what book says.

                But really you do have the old (full) name accessible

                VMS definitely did this

            Note that directories are still mutable

                Otherwise no create-file is possible

HOMEWORK 13-4

        Transactions

            Clean semantics

            Using transactions in OS is becoming more widely studied

Distributed File System Implementation

    File Usage characteristics

        Measured under unix at a university

        Not obvious same results would hold in a different environment

        Findings

            1. Most files are small (< 10K)
            
            2. Reading dominates writing
            
            3. Sequential accesses dominate
            
            4. Most files have a short lifetime
            
            5. Sharing is unusual
            
            6. Most processes use few files
            
            7. File classes with different properties exist

        Some conclusions

            1 suggests whole-file transfer may be worthwhile (except
            for really big files).

            2+5 suggest client caching and dealing with multiple
            writers somehow, even if the latter is slow (since it is
            infrequent).

            4 suggests doing creates on the client

                Not so clear.  Possibly the short lifetime files are
                tempories that are created in /tmp or /usr/tmp or
                /somethingorother/tmp.  These would not be on the
                server anyway.

            7 suggests having mulitple mechanisms for the several classes.