================ Start Lecture #19
================
What about symlinking a directory?
cd /
mkdir /A; mkdir /B
touch /A/X; touch /B/Y
ln -s /B /A/New
Is there a file named /A/New/Y ?
Yes.
What happens if you execute cd /A/New/.. ?
Answer: Not clear!
What did I mean when I said the pictures made it all clear?
Answer: From the filesystem perspective it is clear. Not always so
clear what programs will do.
4.3.4: Disk space management
All general purpose systems use a (non-demand) paging
algorithm for file storage. Files are broken into fixed size pieces,
called blocks that can be scattered over the disk.
Note that although this is paging, it is never called paging.
The file is completely stored on the disk
- One can imagine systems that store only parts of the file on disk
with the rest on tertiary storage (some kind of tape).
- This would be just like demand paging.
- Perhaps NASA does this with their huge datasets.
- Caching (as done for example in microprocessors) is also the same
as demand paging.
- We unify these concepts in the computer architecture course.
Choice of block size
- We discussed this before when studying page size.
- Current commodity disk characteristics (not for laptops) result in
about 15ms to transfer the first byte and 10K bytes per ms for
subsequent bytes (if contiguous).
- We will explain the following terms in the I/O chapter
- Rotation rate is 5400, 7600, or 10,000 RPM (15K just now
available).
- Recall that 6000 RPM is 100 rev/sec or one rev
per 10ms. So half a rev (the average time for to rotate to a
given point) is 5ms.
- Transfer rates around 10MB/sec = 10KB/ms.
- Seek time around 10ms.
- This favors large blocks, 100KB or more.
- But the internal fragmentation would be severe since many files
are small.
- Multiple block sizes have been tried as have techniques to try to
have consecutive blocks of a given file near each other.
- Typical block sizes are 4KB-8KB.
Storing free blocks
- In-memory bit map.
- One bit per block
- If blocksize=4K, 1 bit per 32K bits
- So 32GB disk (potentially all free) needs 1MB ram
- Bit map paged in
- Linked list with each free block pointing to next: Extra disk
access per block
- Linked list with links stored contiguously, i.e. an array of
pointers to free blocks. Store this in free blocks and keep one in memory.
4,3.5: File System reliability
Bad blocks on disks
Not so much of a problem now. Disks more reliable and more
importantly, disks take care of the bad blocks themselves
Backups
All modern systems support full and
incremental dumps.
- A level 0 dump is a called a full dump (i.e., dumps everything).
- A level n dump (n>0) is called an incremental dump and dumps
all files that have changed since the previous level n-1 dump.
- Keep on the disk the dates of the most recent level i dumps
for all i. In Unix this is traditionally in /etc/dumpdates.
- What about the nodump attribute?
- Default policy (for Linux at least) is to dump such files
anyway when doing a full dump, but not dump them for incremental.
- Another way to say this is the nodump attribute is honored for
level n dumps if n>1.
- The dump command has an option to override the default policy
(can specify k so that nodump is honored for level n dumps if n>k).
Consistency
- Fsck (file system check) and chkdsk
- If the system crashed, it is possible that not all meta was
written to disk. As a result the filesystem may be inconsistent.
These programs check, and often correct, inconsistencies.
- Scan all inodes (or fat) to check that each block is in exactly
one file, or on the free list, but not both.
- Also check that the number of links to each file (part of the
metadata in the file's inode) is correct (by
looking at all directories).
- Other checks as well.
- Offers to ``fix'' the errors found (for most errors).
- ``Journaling'' file systems (not on 202 exams)
- An idea from database theory (transaction logs).
- Eliminates the need for fsck.
- NTFS has had journaling from day 1.
- Many Unix systems have it. IBM's AIX converted to journaling
in the early 90s.
- Linux does not yet have journaling, a serious shortcoming. It
is under very active development.
- FAT does not have journaling.
4.3.6 File System Performance
Buffer cache or block cache
An in-memory cache of disk blocks
- Demand paging again!
- Clearly good for reads as it is much faster to read memory than to
read a disk.
- What about writes?
- Must update the buffer cache (otherwise subsequent reads will
return the old value).
- The major question is whether the system should also update
the disk block.
- The simplest alternative is write through
in which all writes performed to the disk before declared
complete.
- Can remove floppy as soon as operation done.
- Heavy I/O write traffic
- If a block is written many times all the writes are
sent the disk. Only the last one was ``needed''.
- If a temporary file is created, written, read, and
deleted, all the disk writes were wasted.
- DOS
- The other alternative is write back in which
the disk is not updated until the in-memory copy is
evicted (i.e., the replacement question).
- Much less write traffic than write through.
- Trouble if a crash occurs.
- Unix
- Can write dirty blocks periodically, say every minute.
This limits the possible damage, but also the possible gain.
- Ordered writes. Do not write a block containing pointers
until the block pointed to has been written. Especially if
the block pointed to contains pointers since the version of
these pointers on disk may be wrong and you are giving a file
pointers to some random blocks.
- Research in ``log-structured'' file systems tries to make all
writes sequential (i.e., writes are treated as if going to a log file).
Homework: 12.
4.4: Security
Very serious subject. Could easily be a course in itself. My
treatment is very brief.
4.4.1: Security environment
- Accidental data loss
- Fires, floods, etc
- System errors
- Human errors
- Intruders
- Sadly an enormous problem.
- The NYU ``greeting'' no longer includes the word ``welcome''
since that was somehow
interpreted as some sort of license to break in.
Indeed, the greeting is not friendly.
It once was.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
WARNING: UNAUTHORIZED PERSONS ........ DO NOT PROCEED
~~~~~~~ ~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~
This computer system is operated by New York University (NYU) and may be
accessed only by authorized users. Authorized users are granted specific,
limited privileges in their use of the system. The data and programs
in this system may not be accessed, copied, modified, or disclosed without
prior approval of NYU. Access and use, or causing access and use, of this
computer system by anyone other than as permitted by NYU are strictly pro-
hibited by NYU and by law and may subject an unauthorized user, including
unauthorized employees, to criminal and civil penalties as well as NYU-
initiated disciplinary proceedings. The use of this system is routinely
monitored and recorded, and anyone accessing this system consents to such
monitoring and recording. Questions regarding this access policy or other
topics should be directed (by e-mail) to comment@nyu.edu or (by phone) to
212-998-3333.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Privacy
- An enormously serious (societal) subject
4.4.2: Famous flaws
- Good bathroom reading
- Trojan horse attack: Planing a program of your choosing in place
of a well known program and having an unsuspecting user execute it.
- Some trivial examples
- Install a new version of login that does everything normal but
mails the username and plaintext password to gottlieb@nyu.edu.
- Put a new version of ls in your home directory and ask the
sysadmin for help. ``Hopefully he types ls while in your
directory and has . early in his path''.
4.4.3: The internet worm
- Not a virus.
- A worm divides itself and sends one portion to another machine.
- A virus attaches itself to (``infects'') a part of the system so
that it remains until explicitly removed. In particular, rebooting
the system does not remove it.
- The famous internet (Morris) worm exploited silly bugs in unix to
crack systems automatically.
- Specifically, it exploited careless use of gets(), which does not
check the length of its argument.
- Attacked Sun and Vax unix systems.
- NYU was hit hard.
4.4.4: Generic Security attacks
More bathroom reading
Viruses
- Attach to an existing program or to a portion of the disk that is
used for booting.
- When the virus is run it tries to attach itself to other files
- Often implemented the same was as a binary patch: Change the first
instruction to jump to somewhere where you put the original first
instruction, then your patch, then a jump back to the second
instruction