key length (for keyed files)
4.1.6: File operations
- Create:
Essential if a system is to add files. Need not be a separate system
call (can be merged with open).
- Delete:
Essential if a system is to delete files.
- Open:
Not essential. An optimization in which the translation from file name to
disk locations is perform only once per file rather than once per access.
- Close:
Not essential. Free resources.
- Read:
Essential. Must specify filename, file location, number of bytes,
and a buffer into which the data is to be placed.
Several of these parameters can be set by other
system calls and in many OS's they are.
- Write:
Essential if updates are to be supported. See read for parameters.
- Seek:
Not essential (could be in read/write). Specify the
offset of the next (read or write) access to this file.
- Get attributes:
Essential if attributes are to be used.
- Set attributes:
Essential if attributes are to be user settable.
- Rename:
Tanenbaum has strange words. Copy and delete is not acceptable for
big files. Moreover copy-delete is not atomic. Indeed link-delete is
not atomic so even if link (discussed below)
is provided, renaming a file adds functionality.
Homework: 2, 3, 4.
Read and understand ``copyfile'' on page 155.
Notes on copyfile
- Normally in unix one wouldn't call read and write directly.
- Indeed, for copyfile, getchar() and putchar() would be nice since
they take care of the buffering (standard I/O, stdio).
- Tanenbaum is correct that the error reporting is atrocious.
The worst is exiting the loop on error and thus generating an
exit(0) as if nothing happened.
4.1.7: Memory mapped files
Conceptually simple and elegant. Associate a segment with each
file and then normal memory operations take the place of I/O.
Thus copyfile does not have fgetc/fputc (or read/write). Instead it is
just like memcopy
while ( (dest++)* = (src++)* );
The implementation is via segmentation with demand paging but
the backing store for the pages is the file itself.
This all sounds great but ...
- How do you tell the length of a newly created file? You know
which pages were written but not what words in those pages. So a file
with one byte or 10, looks like a page.
- What if same file is accessed by both I/O and memory mapping.
- What if the file is bigger than the size of virtual memory (will
not be a problem for systems built 3 years from now as all will have
enormous virtual memory sizes).
4.2: Directories
Unit of organization.
4.2.1: Hierarchical directory systems
Possibilities
- One directory in the system
- One per user
- One tree
- One tree per user
- One forest
- One forest per user
These are not as wildly different as they sound.
- If the system only has one directory, but allows the character / in
a file name. Then one could fake a tree by having a file named
/allan/gottlieb/courses/arch/class-notes.html
rather than a directory allan, a subdirectory gottlieb, ..., a file
class-notes.html.
- Dos (windows) is a forest, unix a tree. In dos there is no common
parent of a:\ and c:\.
- But windows explorer makes the dos forest look quite a bit like a
tree. Indeed, the gnome file manager for linux, looks A LOT like windows
explorer.
- You can get an effect similar to (but not the same as) one X per
user by having just one X in the system and having permissions that
permits each user to visit only a subset. Of course if the system
doesn't have permissions, this is not possible.
- Today's systems have a tree per system or a forest per system.
4.2.2: Path Names
You can specify the location of a file in the file hierarchy by
using either anabsolute versus or a
Relative path to the file
- An absolute path starts at the (or a if we have a forest) root.
- A relative path starts at the
current (a.k.a working) directory.
- The special directories . and .. represent the current directory
and the parent of the current directory respectively.
Homework: 1, 8.
4.2.3: Directory operations
- Create: Produces an ``empty'' directory.
Normally the directory created actually contains . and .., so is not
really empty
- Delete: Requires the directory to be empty (i.e., to just contain
. and ..). Commands are normally written that will first empty the
directory (except for . and ..) and then delete it. These commands
make use of file and directory delete system calls.
- Opendir: Same as for files (creates a ``handle'')
- Closedir: Same as for files
- Readdir: In the old days (of unix) one could read directories as files
so there was no special readdir (or opendir/closedir). It was
believed that the uniform treatment would make programming (or at
least system understanding) easier as there was less to learn.
However, experience has taught that this was not a good idea since
the structure of directories then becomes exposed. Early unix had a
simple structure (and there was only one). Modern systems have more
sophisticated structures and more importantly they are not fixed
across implementations.
- Rename: As with files
- Link: Add a second name for a file; discussed
below.
- Unlink: Remove a directory entry. This is how a file is deleted.
But if there are many links and just one is unlinked, the file
remains. Discussed in more
detail below.
4.3: File System Implementation
4.3.1; Implementing Files
- A disk cannot read or write a single word. Instead it can read or
write a sector, which is often 512 bytes.
- Disks are written in blocks whose size is a multiple of the sector
size.
- When we study I/O in the next chapter I will bring in some
physically large (and hence old) disks so that we can see what they
look like and understand better sectors (and tracks, and cylinders,
and heads, etc.).
Contiguous allocation
- This is like OS/MVT.
- The entire file is stored as one piece.
- Simple and fast for access, but ...
- Problem with growing files
- Must either evict the file itself or the file it is bumping
into.
- Same problem with an OS/MVT kind of system if jobs grow.
- Problem with external fragmentation.
- Not used for general purpose systems. Ideal for systems where
files do not change size.
Homework: 7.
Linked allocation
- The directory entry contains a pointer to the first block of the file.
- Each block contains a pointer to the next.
- Horrible for random access.
- Not used.
FAT (file allocation table)
- Used by dos and windows (but not windows/NT).
- Directory entry points to first block (i.e. specifies the block
number).
- A FAT is maintained in memory having one (word) entry for each
disk block. The entry for block N contains the block number of the
next block in the same file as N.
- This is linked but the links are store separately.
- Time to access a random block is still is linear in size of file
but now all the references are to this one table which is in memory.
So it is bad but not horrible for random access.
- Size of table is one word per disk block. If one writes all
blocks of size 4K and uses 4-byte words, the table is one megabyte for
each disk gigabyte. Large but not prohibitive.
- If write blocks of size 512 bytes (the sector size of most disks)
then the table is 8 megs per gig, which might be prohibitive.
Inodes
- Used by unix.
- Directory entry points to inode (index-node).
- Inode points to first few data blocks, often called direct blocks.
- Inode also points to an indirect block, which points to disk blocks.
- Inode also points to a double indirect, which points an indirect ...
- For some implementations there are triple indirect as well.
- The inode is in memory for open files.
So references to direct blocks take just one I/O.
- For big files most references require two I/Os (indirect + data).
- For huge files most references require three I/Os (double
indirect, indirect, and data).
4.3.2; Implementing Directories
Recall that a directory is a mapping that converts file (or
subdirectory) names to the files (or subdirectories) themselves.
Trivial File System (CP/M)
- Only one directory in the system.
- Directory entry contains pointers to disk blocks.
- If need more blocks, get another directory entry.
MS-DOS and Windows (FAT)
- Subdirectories supported.
- Directory entry contains metatdata such as date and size
as well as pointer to first block.
Unix
- Each entry contains a name and a pointer to the corresponding inode.
- Metadata is in the inode.
- Early unix had limit of 14 character names.
- Name field now is varying length.
- To go down a level in directory takes two steps: get inode, get
file (or subdirectory).
- Do on the blackboard the steps for
/allan/gottlieb/courses/os/class-notes.html
Homework: 11
================ Start Lecture #10
================
4.3.3: Shared files (links)
- ``Shared'' files is Tanenbaum's terminology.
- More descriptive would be ``multinamed files''.
- If a file exists, one can create another name for it (quite
possibly in another directory).
- This is often called creating a (or another) link to the file.
- Unix has two flavor of links, hard links and
symbolic links or symlinks.
- Dos/windows has symlinks, but I don't believe it has hard links.
- These links often cause confusion, but I really believe that the
diagrams I created make it all clear.
Hard Links
- Symmetric multinamed files.
- When a hard like is created another name is created for
the same file.
- The two names have equal status.
- It is not, I repeat NOT true that one
name is the ``real name'' and the other is ``just a link''.
Start with an empty file system (i.e., just the root directory) and
then execute:
cd /
mkdir /A; mkdir /B
touch /A/X; touch /B/Y
We have the situation shown on the right.
Note that names are on edges not nodes.
When there are no multinamed files, it doesn't much matter.
Now execute
ln /B/Y /A/New
This gives the new diagram to the right.
At this point there are two equally valid name for the right hand
yellow file, /B/Y and /A/New. The fact that /B/Y was created first is
NOT detectable.
- Both point to the same inode.
- Only one owner (the one who created the file initially).
- One date, one set of permissions, one ... .
Assume Bob created /B and /B/Y and Alice created /A, /A/X, and /A/New.
Later Bob tires of /B/Y and removes it by executing
rm /B/Y
The file /A/New is still fine (see third diagram on the right).
But it is owned by Bob, who can't find it! If the system enforces
quotas bob will likely be charged (as the owner), but he can neither
find nor delete the file (since bob cannot unlink, i.e. remove, files
from /A)
Since hard links are only permitted to files (not directories) the
resulting file system is a dag (directed acyclic graph). That is there
are no directed cycles. We will now proceed to give away this useful
property by studying symlinks, which can point to directories.
Symlinks
- Asymmetric multinamed files.
- When a symlink is created another file is created, one
that points to the original file.
Again start with an empty file system and this time execute
cd /
mkdir /A; mkdir /B
touch /A/X; touch /B/Y
ln -s /B/Y /A/New
We now have an additional file /A/New, which is a symlink to /B/Y.
- The file named /A/New has the name /B/Y as its data
(not metadata).
- The system notices that A/New is a diamond (symlink) so reading
/A/New will return the contents of /B/Y (assuming the reader has read
permission for /B/Y).
- If /B/Y is removed /A/New becomes invalid.
- If a new /B/Y is created, A/New is once again valid.
- Removing /A/New has no effect of /B/Y.
- If a user has write permission for /B/Y, then writing /A/New is possible
and writes /B/Y.
The bottom line is that, with a hard link, a new name is created
that has equal status to the original name. This can cause some
surprises (e.g., you create a link but I own the file).
With a symbolic link a new file is created (owned by the
creator naturally) that points to the original file.
Question: Consider the hard link setup above. If Bob removes /B/Y
and then creates another /B/Y, what happens to /A/X?
Answer: Nothing. /A/X is still a file with the same contents as the
original /B/Y.
Question: What about with a symlink?
Answer: /A/X becomes invalid and then valid again, this time pointing
to the new /B/Y.
(It can't point to the old /B/Y as that is completely gone.)
What about symlinking a directory?
cd /
mkdir /A; mkdir /B
touch /A/X; touch /B/Y
ln -s /B /A/New
Is there a file named /A/New/Y ?
Yes.
What happens if you execute cd /A/New/.. ?
- Answer: Not clear!
- Clearly you are changing directory to the parent directory of
/A/New. But is that /A or /?
- The command interpreter I use offers both possibilities.
- cd -L /A/New/.. takes you to A (L for logical).
- cd -P /A/New/.. takes you to / (P for physical).
- cd /A/New/.. takes you to A (logical is the default).
What did I mean when I said the pictures made it all clear?
Answer: From the file system perspective it is clear. Not always so
clear what programs will do.
4.3.4: Disk space management
All general purpose systems use a (non-demand) paging
algorithm for file storage. Files are broken into fixed size pieces,
called blocks that can be scattered over the disk.
Note that although this is paging, it is never called paging.
The file is completely stored on the disk, i.e., it is not
demand paging.
- Actually it is more complicated as various optimizations are
performed to try to have consecutive blocks of a single file stored
consecutively on the disk.
- One can imagine systems that store only parts of the file on disk
with the rest on tertiary storage (some kind of tape).
- This would be just like demand paging.
- Perhaps NASA does this with their huge datasets.
- Caching (as done for example in microprocessors) is also the same
as demand paging.
- We unify these concepts in the computer architecture course.
Choice of block size
- We discussed this before when studying page size.
- Current commodity disk characteristics (not for laptops) result in
about 15ms to transfer the first byte and 10K bytes per ms for
subsequent bytes (if contiguous).
- We will explain the following terms in the I/O chapter.
- Rotation rate is 5400, 7600, or 10,000 RPM (15K just now
available).
- Recall that 6000 RPM is 100 rev/sec or one rev
per 10ms. So half a rev (the average time for to rotate to a
given point) is 5ms.
- Transfer rates around 10MB/sec = 10KB/ms.
- Seek time around 10ms.
- This favors large blocks, 100KB or more.
- But the internal fragmentation would be severe since many files
are small.
- Multiple block sizes have been tried as have techniques to try to
have consecutive blocks of a given file near each other.
- Typical block sizes are 4KB anf8KB.
Storing free blocks
There are several possibilities.
- An in-memory bit map.
- One bit per block
- If blocksize=4K, 1 bit per 32K bits
- So 32GB disk (potentially all free) needs 1MB ram
- Bit map paged in.
- Linked list with each free block pointing to next: Extra disk
access per block.
- Linked list with links stored contiguously, i.e. an array of
pointers to free blocks. Store this in free blocks and keep one in memory.
4,3.5: File System reliability
Bad blocks on disks
Not so much of a problem now. Disks are more reliable and, more
importantly, disks take care of the bad blocks themselves. That is,
there is no OS support needed to map out bad blocks. But if a block
goes bad, the data is lost (not always).
Backups
All modern systems support full and
incremental dumps.
- A level 0 dump is a called a full dump (i.e., dumps everything).
- A level n dump (n>0) is called an incremental dump and the
standard unix utility dumps
all files that have changed since the previous level n-1 dump.
- Other dump utilities dump all files that have changed since the
last level n dump.
- Keep on the disk the dates of the most recent level i dumps
for all i. In Unix this is traditionally in /etc/dumpdates.
- What about the nodump attribute?
- Default policy (for Linux at least) is to dump such files
anyway when doing a full dump, but not dump them for incremental
dumps.
- Another way to say this is the nodump attribute is honored for
level n dumps if n>1.
- The dump command has an option to override the default policy
(can specify k so that nodump is honored for level n dumps if n>k).
Consistency
- Fsck (file system check) and chkdsk (check disk)
- If the system crashed, it is possible that not all metadata was
written to disk. As a result the file system may be inconsistent.
These programs check, and often correct, inconsistencies.
- Scan all inodes (or fat) to check that each block is in exactly
one file, or on the free list, but not both.
- Also check that the number of links to each file (part of the
metadata in the file's inode) is correct (by
looking at all directories).
- Other checks as well.
- Offers to ``fix'' the errors found (for most errors).
- ``Journaling'' file systems
- An idea from database theory (transaction logs).
- Eliminates the need for fsck.
- NTFS has had journaling from day 1.
- Many Unix systems have it. IBM's AIX converted to journaling
in the early 90s.
- Linux does not yet have journaling, a serious shortcoming. It
is under very active development.
- FAT does not have journaling.
4.3.6 File System Performance
Buffer cache or block cache
An in-memory cache of disk blocks
- Demand paging again!
- Clearly good for reads as it is much faster to read memory than to
read a disk.
- What about writes?
- Must update the buffer cache (otherwise subsequent reads will
return the old value).
- The major question is whether the system should also update
the disk block.
- The simplest alternative is write through
in which each write is performed at the disk before it declared
complete.
- Since floppy disk drivers adopt a write through policy,
one can remove a floppy as soon as an operation is complete.
- Write through results in heavy I/O write traffic.
- If a block is written many times all the writes are
sent the disk. Only the last one was ``needed''.
- If a temporary file is created, written, read, and
deleted, all the disk writes were wasted.
- DOS
- The other alternative is write back in which
the disk is not updated until the in-memory copy is
evicted (i.e., the replacement question).
- Much less write traffic than write through.
- Trouble if a crash occurs.
- Used by Unix and others for hard disks.
- Can write dirty blocks periodically, say every minute.
This limits the possible damage, but also the possible gain.
- Ordered writes. Do not write a block containing pointers
until the block pointed to has been written. Especially if
the block pointed to contains pointers since the version of
these pointers on disk may be wrong and you are giving a file
pointers to some random blocks.
- Research in ``log-structured'' file systems tries to make all
writes sequential (i.e., writes are treated as if going to a log file).
Homework: 12.
4.4: Security
Very serious subject. Could easily be a course in itself. My
treatment is very brief.
4.4.1: Security environment
- Accidental data loss
- Fires, floods, etc
- System errors
- Human errors
- Intruders
- Sadly an enormous problem.
- The NYU ``greeting'' no longer includes the word ``welcome''
since that was somehow
interpreted as some sort of license to break in.
- Indeed, the greeting is not friendly.
- It once was.
- Below I have a nasty version from a few years ago.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
WARNING: UNAUTHORIZED PERSONS ........ DO NOT PROCEED
~~~~~~~ ~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~
This computer system is operated by New York University (NYU) and may be
accessed only by authorized users. Authorized users are granted specific,
limited privileges in their use of the system. The data and programs
in this system may not be accessed, copied, modified, or disclosed without
prior approval of NYU. Access and use, or causing access and use, of this
computer system by anyone other than as permitted by NYU are strictly pro-
hibited by NYU and by law and may subject an unauthorized user, including
unauthorized employees, to criminal and civil penalties as well as NYU-
initiated disciplinary proceedings. The use of this system is routinely
monitored and recorded, and anyone accessing this system consents to such
monitoring and recording. Questions regarding this access policy or other
topics should be directed (by e-mail) to comment@nyu.edu or (by phone) to
212-998-3333.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
================ Start Lecture #11
================
Note:
There was a typo or two in the symlinking directories section (picture
should show /B not /B/new, and cd -P goes to / not /B). The giant
page and the lecture-10 page have been fixed.
- Privacy
- An enormously serious (societal) subject.
4.4.2: Famous flaws
- Good bathroom reading.
- Trojan horse attack: Planing a program of your choosing in place
of a well known program and having an unsuspecting user execute it.
- Some trivial examples:
- Install a new version of login that does everything normal,
but then mails the username and plaintext password to
gottlieb@nyu.edu.
- Put a new version of ls in your home directory and ask the
sysadmin for help. ``Hopefully he types ls while in your
directory and has . early in his path''.
4.4.3: The internet worm
- A worm divides itself and sends one portion to another machine.
- Different from a virus (see below).
- The famous internet (Morris) worm exploited silly bugs in unix to
crack systems automatically.
- Specifically, it exploited careless use of gets(), which does not
check the length of its argument.
- Attacked Sun and Vax unix systems.
- NYU was hit hard; but not our lab, which had IBM RTs.
4.4.4: Generic Security attacks
More bathroom reading
Viruses
- A virus attaches itself to (``infects'') a part of the system so
that it remains until explicitly removed. In particular, rebooting
the system does not remove it.
- Attach to an existing program or to a portion of the disk that is
used for booting.
- When the virus is run it tries to attach itself to other files.
- Often implemented the same was as a binary patch: Change the first
instruction to jump to somewhere where you put the original first
instruction, then your patch, then a jump back to the second
instruction.
4.4.5: Design principles for security
More bathroom reading
4.4.6: User authentication
Passwords
- Software to crack passwords is publically available.
- Use this software for prevention.
- One way to prevent cracking passwords is to use instead one time
passwords: e.g. SecurId.
- Current practice here and elsewhere is that when you telnet to a
remote machine, your password is sent in the clear along the ethernet.
So maybe .rhosts aren't that bad after all.
Physical identification
Opens up a bunch of privacy questions. For example,
should we require fingerprinting for entering the subway?
Homework: 15, 16, 19, 24.
4.5: Protection mechanisms
4.5.1: Protection domains
- We distinguish between Objects, which are
passive, and subjects, which are active.
- For example, processes (subjects) examine files (objects).
- Protection domain: A collection of (object,
rights) pairs.
- At any given time a subject is given a protection domain that
specifies its rights.
- In Unix a subject's domain is determined by its (uid, gid) (and
whether it is in kernel mode).
- Generates a matrix called the protection or permission matrix.
- Each row corresponds to a domain (i.e. a subject at some time).
- Each column corresponds to an object (e.g., a file or device).
- Each entry gives the rights the domain/subject has on this object.
- Can model Unix suid/sgid by permitting columns whose headings are
domains and the only right possible in the corresponding entries is
entry. If this right is present, the subject corresponding to the row
can s[ug]id to the new domain, which corresponds to the column.
4.5.2: Access Control Lists (ACLs)
Keep the columns of the matrix separate and drop the null entries.
4.5.3: Capabilities
Keep the rows of the matrix separate and drop the null entries.
4.5.4: Protection models
Give objects and subjects security levels and enforce:
- A subject may read only those objects whose level is at or below
her own.
- A subject may write only those objects whose level is at or
above her own.
4.5.5: Covert channels
The bad guys are getting smart and use other means of getting out
information. For example give good service for a zero and bad for a
one. The figure of merit is the rate at which bits can be sent,
i.e. the bandwidth of the covert channel.
Homework: 20.
Allan Gottlieb