---------------- Chapter 13--Distributed File Systems ---------------- File service vs file server File service is the specification File server is an process running on a machine to implement the file service for (some) files on that machine In a normal distributed would have one file service but perhaps many file servers If have very different kinds of filesystems might not be able to have a single file service as perhaps some services are not available File Server Design File Sequence of bytes Unix MS-Dos Windows Sequence of Records Mainframes Keys We do not cover these rilesystems. They are often discussed in database courses File attributes rwx perhaps a (append) This is really a subset of what is called ACL -- access control list or Capability Get ACLs and Capabilities by reading columns and rows of the access matrix owner, group, various dates, size dump, autocompress, immutable Upload/download vs remote access Upload/download means only file services supplied are read file and write file. All mods done on local copy of file Conseptually simple at first glance Whole file transfers are efficient (assuming you are going to access most of the file) when compared to multiple small accesses Not efficient use of bandwidth if you access only small part of large file. Requires storage on client What about concurrent updates? What if one client reads and "forgets" to write for a long time and then writes back the "new" version overwritting newer changes from others? Remote access means direct individual reads and writes to the remote copy of the file File stays on server Issue of (client) buffering Good to reduce number of remote accesses. What about semantics when a write occurs? Note that meta-data is written for a read so if you want faithful semantics. Ever client read must mod metadata on server or all requests for metadata (e.g ls or dir commands) must go to server. Cache consistency question Directories Mapping from names to files/directories Contains rules for names of files and (sub)directories Hierarchy i.e. tree (hard) links gives another name to an existing file a new directory entry The old and new name have equal status cd ~ mkdir dir1 touch dir1/file1 ln dir1/file1 file2 Now ~/file2 is the SAME file as ~/dir1/file1 In unix-speak they have the same inode Need to do rm twice to actually delete the file The owner is NOT changed so cd ~ ln ~joe/file1 file2 Gives me a link to a file of joe. Presumably joe set his permissions so I can't write it. Now joe does rm ~/file1 But my file2 still exists and is owned by joe. Most accounting programs would charge the file to joe (who doesn't know it exists). With hard links the filesystem becomes a DAG instead of a simple tree. Symlinks Symbolic (NOT symmetric). Indeed asymetric Consider cd ~ mkdir dir1 touch dir1/file1 ln -s dir1/file1 file2 file2 has a new inode it is a new type of file called a symlink and its "contents" are the name of the file dir/file1 When accessed file2 returns the contents of file1, but it is NOT equal to file1. If file1 is deleted, file2 "exists" but is invalid If a new file2 is created, file2 now points to it. Symlinks can point to directories as well With symlinks pointing to directories, the filesystem becomes a general graph, i.e. directed cycles are permitted.