================ Start Lecture #13 ================

Chapter 6: File Systems

Requirements

  1. Size: Store very large amounts of data.
  2. Persistence: Data survives the creating process.
  3. Access: Multiple processes can access the data concurrently.

Solution: Store data in files that together form a file system.

6.1: Files

6.1.1: File Naming

Very important. A major function of the file system.

6.1.2: File structure

A file is a

  1. Byte stream
  2. (fixed size) Record stream: Out of date
  3. Varied and complicated beast.

6.1.3: File types

Examples

  1. (Regular) files.

  2. Directories: studied below.

  3. Special files (for devices). Uses the naming power of files to unify many actions.
        dir             # prints on screen
        dir > file      # result put in a file
        dir > /dev/tape # results written to tape
        
  4. ``Symbolic'' Links (similar to ``shortcuts''): Also studied below.

``Magic number'': Identifies an executable file.

Strongly typed files:

6.1.4: File access

There are basically two possibilities, sequential access and random access (a.k.a. direct access). Previously, files were declared to be sequential or random. Modern systems do not do this. Instead all files are random and optimizations are applied when the system dynamically determines that a file is (probably) being accessed sequentially.

  1. With Sequential access the bytes (or records) are accessed in order (i.e., n-1, n, n+1, ...). Sequential access is the most common and gives the highest performance. For some devices (e.g. tapes) access ``must'' be sequential.
  2. With random access, the bytes are accessed in any order. Thus each access must specify which bytes are desired.

6.1.5: File attributes

A laundry list of properties that can be specified for a file For example:

6.1.6: File operations

Homework: 6, 7.

6.1.7: An Example Program Using File System Calls

Homework: Read and understand ``copyfile''.

Notes on copyfile

6.1.7: Memory mapped files (Unofficial)

Conceptually simple and elegant. Associate a segment with each file and then normal memory operations take the place of I/O.

Thus copyfile does not have fgetc/fputc (or read/write). Instead it is just like memcopy

while ( *(dest++) = *(src++) );

The implementation is via segmentation with demand paging but the backing store for the pages is the file itself. This all sounds great but ...

  1. How do you tell the length of a newly created file? You know which pages were written but not what words in those pages. So a file with one byte or 10, looks like a page.
  2. What if same file is accessed by both I/O and memory mapping.
  3. What if the file is bigger than the size of virtual memory (will not be a problem for systems built 3 years from now as all will have enormous virtual memory sizes).

6.2: Directories

Unit of organization.

6.2.1-3: Single-level, Two-level, and Hierarchical directory systems

Possibilities

These are not as wildly different as they sound.

6.2.4: Path Names

You can specify the location of a file in the file hierarchy by using either an absolute versus or a Relative path to the file

Homework: 1, 9.

6.2.5: Directory operations

  1. Create: Produces an ``empty'' directory. Normally the directory created actually contains . and .., so is not really empty

  2. Delete: Requires the directory to be empty (i.e., to just contain . and ..). Commands are normally written that will first empty the directory (except for . and ..) and then delete it. These commands make use of file and directory delete system calls.

  3. Opendir: Same as for files (creates a ``handle'')

  4. Closedir: Same as for files

  5. Readdir: In the old days (of unix) one could read directories as files so there was no special readdir (or opendir/closedir). It was believed that the uniform treatment would make programming (or at least system understanding) easier as there was less to learn.

    However, experience has taught that this was not a good idea since the structure of directories then becomes exposed. Early unix had a simple structure (and there was only one). Modern systems have more sophisticated structures and more importantly they are not fixed across implementations.

  6. Rename: As with files

  7. Link: Add a second name for a file; discussed below.

  8. Unlink: Remove a directory entry. This is how a file is deleted. But if there are many links and just one is unlinked, the file remains. Discussed in more detail below.

6.3: File System Implementation

6.3.1: File System Layout

6.3.2; Implementing Files

Contiguous allocation

Homework: 12.

Linked allocation

FAT (file allocation table)

Inodes

6.3.3: Implementing Directories

Recall that a directory is a mapping that converts file (or subdirectory) names to the files (or subdirectories) themselves.

Trivial File System (CP/M)

MS-DOS and Windows (FAT)

Unix

Homework: 27

6.3.4: Shared files (links)

Hard Links

Start with an empty file system (i.e., just the root directory) and then execute:

cd /
mkdir /A; mkdir /B
touch /A/X; touch /B/Y

We have the situation shown on the right.

Note that names are on edges not nodes. When there are no multinamed files, it doesn't much matter.

Now execute

ln /B/Y /A/New
This gives the new diagram to the right.

At this point there are two equally valid name for the right hand yellow file, /B/Y and /A/New. The fact that /B/Y was created first is NOT detectable.


Assume Bob created /B and /B/Y and Alice created /A, /A/X, and /A/New. Later Bob tires of /B/Y and removes it by executing

rm /B/Y

The file /A/New is still fine (see third diagram on the right). But it is owned by Bob, who can't find it! If the system enforces quotas bob will likely be charged (as the owner), but he can neither find nor delete the file (since bob cannot unlink, i.e. remove, files from /A)

Since hard links are only permitted to files (not directories) the resulting file system is a dag (directed acyclic graph). That is, there are no directed cycles. We will now proceed to give away this useful property by studying symlinks, which can point to directories.

Symlinks

Again start with an empty file system and this time execute

cd /
mkdir /A; mkdir /B
touch /A/X; touch /B/Y
ln -s /B/Y /A/New

We now have an additional file /A/New, which is a symlink to /B/Y.

The bottom line is that, with a hard link, a new name is created that has equal status to the original name. This can cause some surprises (e.g., you create a link but I own the file). With a symbolic link a new file is created (owned by the creator naturally) that points to the original file.

Question: Consider the hard link setup above. If Bob removes /B/Y and then creates another /B/Y, what happens to /A/New?
Answer: Nothing. /A/New is still a file with the same contents as the original /B/Y.

Question: What about with a symlink?
Answer: /A/New becomes invalid and then valid again, this time pointing to the new /B/Y. (It can't point to the old /B/Y as that is completely gone.)

Note:

Shortcuts in windows contain more that symlinks in unix. In addition to the file name of the original file, they can contain arguments to pass to the file if it is executable. So a shortcut to

netscape.exe
can specify
netscape.exe //allan.ultra.nyu.edu/~gottlieb/courses/os/class-notes.html
End of Note

What about symlinking a directory?

cd /
mkdir /A; mkdir /B
touch /A/X; touch /B/Y
ln -s /B /A/New

Is there a file named /A/New/Y ?
Yes.

What happens if you execute cd /A/New/.. ?

What did I mean when I said the pictures made it all clear?
Answer: From the file system perspective it is clear. Not always so clear what programs will do.