Lecture 19: Distributed File Systems
The objective is to have a single file system distributed across
Models of interactions with files
Process P on machine M1 has read-only permission for files on machine M2.
The only way P can affect the file system on M2 is indirectly,
by sending a message to a process P2 running on M2. P2 can react to
this message as it chooses.
- Simple and clear.
- Underlying reality of any more permissive system. (I.e. even if
the system makes it "look" as though P is writing to a file on M2, what
is actually happening is that P is sending a message to the file server
- Leaves the issue of how to integrate multiple messages coming from
multiple processes up to P2. A good case can be made that the "correct"
thing to do here varies so much from one application to the next that
this is not an issue that should be solved at the level of the OS.
- Highly limited functionality on external files
- Strong distinction between local and external files; no transparency.
- Not really a distributed file system.
- Download once. P requests file F, file server on M2 delivers all of F.
(WWW model.) P henceforth works on its own copy. Cheaper if P is going to
read all of F. Simplest possible model. Can be problematic for very large
- Remote access model. P request 1 block of F at a time from file server.
Cheaper if P is going to read only part of F, particularly if F is very large.
Stateless vs. Stateful server
- Stateless server. Server sends off file or block, forgets all about it.
- Stateful server. Server keeps track of processes on machines that have
this file open and of their position in the file.
A stateless server has the advantages that:
A stateful server have the advantages that:
- Client processes are OK if server crashes and recovers (fault tolerant).
- Server and other clients are OK if one client crashes.
- No OPEN/CLOSE calls are needed.
- Simplifies server
- No tables at server; hence no limits on number of open files.
- Request messages are shorter.
- Performance is better.
- Readahead is possible.
- File locking is possible (for writing).
- Server-initiated notification is possible.
What happens if process P starts to read file F and F is modified while P
has F open?
- Disallow (Readers/writers protocol). We'll discuss this option below.
- With remote access, P necessarily gets the new version.
- With download once, if the server is stateful, it can send a notification
to P; either (a) a message that F has been changed; (b) a representation
of the change; (c) retransmittal of the entire new file. Either P itself
or the file system at M1 can deal with this new information; debatable
as to what it should do about it.
- Polling: P can intermittently query file server at M2 whether any
changes have taken place.
Copies of file F may in fact exist on two or more servers M2 and M3.
chooses which of these should interact with P, based on system load, network
state, machine being up, etc. If server is stateless, then this can switch
in the middle; if server is stateful, then chosen when file is opened.
- Backup against irrecoverable disk crash.
- Available if server is down.
- Use copy on machine with light load.
Problem of keeping replicas consistent.
Either the entire file or blocks may be cached either in the server RAM
or in the client disk or in the client RAM (kernel).
Caching in the server RAM is simple and the only cost is the standard cost of
maintaining a cache.
Caching in the client disk pretty much amounts to replication.
Caching in the client RAM can be useful if processes on client often need
this file (or block) but raises problem of currency.
Very important because major application of distributed file system.
Use transaction model; each transaction must be executed
atomically. Take a course on databases.
Each file is either being read by 1 or more readers or being written by
exactly one writer.
Problematic in the context of distributed file system, because if a
writer crashes, then the file is permanently locked. Of course, this can
also happen in a single processor system, -- a process can lock a resource
and then go into an infinite loop -- but there it's easier to detect
Allow multiple writers
Suppose that two processes on two different clients are writing to the
same file simultaneously.
The truth is that this kind of issue is quite specific to the kind of file
involved, and you're not going to get a general solution at the OS level
that is always going to do what you want.
For any given type of file, you have to look at
- Sequential consistency. Changes are performed on actual file
as soon as clients write them, and are immediately visible to all
processes that have the file open. Requires remote access model with no
no client caching, and requires a message to be sent to server each
time client performs a write. Prohibitive in general.
- Session semantics. Changes are propagated to server only when
client closes file. Final result is the output of one of the clients;
- Delayed write. Cache copy is written to server file from time to time.
Semantics are vague.
It's rare that general notions like "Sequential consistency", "session
semantics" etc. end up being the best way to think about the issue.
- What applications are using these files, and how?
- What kind of currency do they need?
- What is the best way for them to communicate?
On the other hand, the OS does have to have _some_ policy to deal with the
case of a new type of file or of an file being accessed in some new
kind of way. What's most important is that an application that does
have a clear idea of what it wants to do in this regard should not
find the OS's policy an unavoidable and intolerable obstacle.
Naming and directories
Method 1: machine-name:path (as in URLs) or /machine-name/path.
Note: The file has the same name on every machine.
Method 2: Different machines have different views of file systems.
For example, a machine may have a pseudo-directory "external". The name
on M1 of the file of path name P on machine M2 would be
"/external/M2/P". When a program on M1 accesses this file, the file
server knows that it should request the file from M2.
Method 3: Path name is entirely independent of machine. Supports
transparent replication and migration. Directory implementations:
In either case, the same problems of consistency come up as with simple
- Entire directory is replicated on all machines.
- Directories are distributed over machines. In that case, looking
up a path can involve a lot of network traffic.
Another question is whether you want to use the same server for directories
and files or a different server.
- Move file to machine that is using it.
- Move file toward lighly loaded machines and away from heavily loaded
- Replicate files that are in heavy demand or in immediate demand
(delay undesirable) or are particularly critical.
- Tend not to replicate files which are often being written to and
had high requirements of consistency. (Excluding databases, which are
a whole separate case.)