NOTE: These notes are by Allan Gottlieb, and are reproduced here, with superficial modifications, with his permission. "I" in this text generally refers to Prof. Gottlieb, except in regards to administrative matters.


================ Start Lecture #18 (Apr. 8) ================

File Manipulation: What Actually Happens

Locating a file
When you read from a file
When you write to a file
When you delete a file
When you execute a file

For concreteness, we will suppose that files are implemented with inodes; that file attributes including length and protection are kept in the inode; and that free blocks are kept in a free list; and that memory management uses paging. We'll ignore linking. Other implementations are not very different in terms of what follows.

Note: In all the routines below, when a block is loaded into the cache, then, if the cache is full, the cache replacement algorithm must be called to remove a block from cache, and if it's dirty to write it to disk.

Process Q locates file F in directory D0

This subroutine is used whenever a file is opened.
{  Expand the relative path name to the absolute path name;
   If (the inode for F is not in memory) then {
      P = the path for D0;
      D = D0;
      loop until ((D is in cache) or (D is the root))
           D = the parent of D;     
      if (D == the root and D is not in cache) then
         read D from its known disk address to cache;
      check that Q has execute permission on D
           (compare the user of Q to the owner of D and 
            check associated permission) else exit with failure;
      loop { C = the child of D in P;
           look up the disk address of C in D;
           read the inode for C into cache;
           if (C == F) then exitloop;
           check that the user has execute permission for C;
           read C into cache;
           D = C;
         }     /* At this point the inode for F is in cache*/
    }  /* End if statement */
In some cases, there can exists a path to directory D0 using ".." links from some directory that is already in memory, that is shorter than the path from the root to D0. If that optimization is implemented, then the first part of this algorithm is modified so that the system first computes such a path (note that this can be done just from the path names) and then loads the intermediate directories that it needs. On the other hand, I'm not sure that this "optimization" is worthwhile.

When you read from a file

Reading from a file has three parts: (1) Opening the file; (2) Reading; (3) Closing the file.

Process Q opens file F in directory D for reading.

{ Locate file F in directory D;
  Check that Q has read permission for F;
  Record the inode for F in Q's file table;
}       

Reading K bytes from position P in F

(We assume all K bytes are in the same block.)
 
B,O = the block and offset for position P in F;
if (B is not in the chache) {
   B0 = the inode of F;
   if (the address of B is not in B0)
    { for each indirect block BI on the path to B  {
           /* this can be directly computed from P, given the
                  structure of the inodes */
         find the address of BI in B0;
        load BI into the cache;
        B0 = BI;
      }  /* At this point B0 is the block with the address of B */
   get the address of B from B0;
   load B into the cache;
  }  /* At this point B is in the cache
read the K bytes from the cache copy of B;
}

Closing file F

 
{ Remove F from Q's file table;
  Mark all the blocks of Q as available for release from the cache;
}

When you write to a new file

Writing to a file has three parts: (1) Opening; (2) Writing; (3) Closing.

Process Q opens file F in directory D0 for writing;

{ Locate F in D0;
  Check that Q has write permission as well as execute permission on D0.
  Get a free block from free list. Modify free list and save changes on disk.
  Create a block for inode in cache.
  Initialize attributes of F in cache copy of inode as specified 
     (or according to default).
  Write out inode to disk.
  Create entry for file F in directory D0;
  Write out modified directory to disk;
  Record inode for F in Q's file table.
}

Writing K bytes to position P in F

We assume all K bytes are in the same block, and that the file is written sequentially. Assume that if you are in the middle of writing to block B (either a data block or an indirect block) then B will be kept in the cache until you have filled B.

For simplicity, we will assume that there is at most one indirect block between the i-node and B.

 
B,O = the block and offset for position P in F;
if (B is not in the chache) {
   B0 = the inode for F;
   B1 = the block (i-node or indirect block) that should point to B;
   if (B1 is not in the cache) 
     { get a free block for B1 from free list; 
       save modified free list to disk;
       modify B0 to point to B1:
       save B0 on disk;
       choose a block in cache for B1;
     }
   get a free block for B from free list;
   save modified free list to disk;
   modify B1 to point to B;
   save B1 on disk;
   choose a block in cache for B;
  }
write K bytes at offset O in cache copy of B;
if  B is  full, or depending on write-out policy, 
    { write out B to disk;
      modify length of F as recorded in B0;
      save B0 on disk;
    }
}

Closing a new file F in directory D

{ Write out all dirty cache blocks of F; 
  Write out directory D;
  Remove F from file table;
  Mark all blocks of F as available for release from the cache
}

Deleting file F from directory D

{ Locate directory D as in opening for reading;
  Check that you have write permission on D (Note: at least in UNIX,
    you do not need write permission on F.)
  Read in the i-node of F and all indirect blocks;
  Delete F from D;
  Write out D to disk;
  Add all data blocks, all indirect blocks, and the i-node of F to
     the free list;
} 

Process P executes file F in directory D

Locate F in D;
Check that P has execute permission for F;
Check that F is an executable file (F begins with a "magic number")
Create a new process Q;
Initialize Q's page table to map each page to the disk address of the
   corresponding block.
Initialize Q's program counter (in the process table) to virtual address 0.
Page in the first few blocks of P.
Block P until Q is complete.  Put Q onto the ready queue.

Backups

All modern systems support full and incremental dumps.

Physical vs. Logical dumps.

Physical dumps: Start at beginning of disk and copy through to the end.

Logical dumps: Start at root, and traverse file system using tree search.
Have to be careful not to attempt to "dump" I/O devices.
Also have to be careful not to repeat, in case of hard links.

Physical dumps are preferred:

Logical dumps are preferred

Restoring

Full restoration: Go back to last full dump, reload, reload all incremental dumps since in sequence.
Restoring 1 file F: Delete current version of F, find most recent incremental dump when F was saved, load F.

Consistency

Some types of inconsistency: