CS202 Spring 2015 Lab 7: Journaling and transactions

In this lab, you will add journaling (logging) and crash recovery to a simple file system.

Getting Started

We recommend doing this lab inside the development platform that you set up in the previous labs (the virtual devbox or the CIMS machines).

From within this platform, change to your lab directory, use Git to commit changes you've made since handing in Lab 6 (if any), obtain the latest version of the course repository, and then create a local branch called lab7 based on our origin/lab7 (which is our lab7 branch). The commands are as follows, but you may not need all of them (for example, if git status indicates that there are no changes to commit):

You are now in the lab7 directory. Your next step depends on whether you are using the CIMS machines or the devbox:

CIMS Users

You have to use one of the following machines:

crackle1.cims.nyu.edu 
crackle2.cims.nyu.edu 
crackle3.cims.nyu.edu 
crackle4.cims.nyu.edu 
crackle5.cims.nyu.edu

Note that this requirement supersedes the usual list of machines.

Next, install required library files:

$ pwd     # make sure that you are in lab7 directory
/home/your-netid/cs202/lab7
$ ./install-lab7-for-cims.sh
You only need this on CIMS machines, press <ENTER> to continue, or <Ctrl>-C to cancel
<ENTER>
Copying fuse library....
Installation successful

Devbox Users

Install the necessary library and binary using apt: (default password for using sudo is labs)

$ sudo apt-get update # update the apt catalog
.....
$ sudo apt-get install fuse libfuse-dev
.....

The code provided this lab contains a filesystem that is a simplified version of the one that you completed in lab 6 (see below for the differences). Your job is to add journaling (logging) and crash recovery capabilities. Code is provided in the following files:

FUSE

The file system for this lab is implemented as a FUSE driver, similar to that of lab 6. A basic overview of FUSE is available in lab 6.

Our File System

The file system for this lab is similar to lab 6's file system, with a few small changes to make the system easier to work with:

Journaling

The file system presents an interface by which users can commit transactions. Users can modify the file system by creating, modifying, and deleting files, and then send a commit message to save their changes. Commit messages are sent via an ioctl (I/O control) system call: more information about the ioctl interface is presented in the next section.

From the file system user's perspective, creating and comitting two new files, file1 and file2, will look like this:

Our file system supports this transaction interface with journaling: all operations that modify the file system state are written to a log on the disk before the file system modifies on-disk inodes and blocks. When the file system receives a commit message, a commit entry is written to the log.

The file system makes a guarantee to the user that committed transactions will survive file system crashes, and uncommitted transactions will not. When the commit ioctl returns, the file system makes a guarantee to the user that the commit operation (and all other operations in the transaction) have been persisted to the disk log. (By persisted, we mean that the sector that contains the given entry is actually on disk.)

Our use of logging is simplified, compared to a production logging system (and also compared to the protocols presented in lecture). In particular, our file system will not include checkpoints: this means that the log will continue to grow as the file system is used. When the file system is restarted, the entire on-disk structure (inodes, bitmaps, blocks...) is erased and re-constructed from the log, by replaying committed transactions. You should be able to see from this picture why non-committed transactions do not appear after a crash: they are not replayed from the log.

Implementation

You will implement some components of the FUSE driver (and hence the file system): adding support for logging, crash recovery, and a transaction abstraction. Because a lot of the implementation is provided, it is important that you familiarize yourself with the provided code and the various file system interfaces.

If you haven't already, read OSTEP 42 to gain background on transactions and journaling.

Like in lab 6, the main file system code that we've provided for you resides in fsdriver.c. This file contains all the FUSE callbacks to handle I/O syscalls, as well as the main function. Once the path to the disk image and the path to the mount point (testfs.img and mnt respectively in the supplied example) have been read from the command line arguments, our FUSE driver maps the specified disk image into memory using the map_disk_image function (defined in disk_map.c), which itself initializes some file system metadata. Then, fsdriver.c calls fuse_main, which handles kernel dispatches to our registered callbacks. These callbacks are similar to the callbacks in lab 6: you will modify callbacks that write to the file system.

The FUSE driver code we provide maps the disk image into virtual memory using the mmap syscall: see lab 6's description of mmap for more information. You can think of this as caching the disk in memory. To write modified, cached contents back to the disk, one uses the command msync. Indeed, you will use msync at various points in this lab because you will need to ensure that particular disk sectors that are modified in memory (namely, those containing certain log entries) have actually "made it" to the disk. This is the underlying mechanism that gives the file system user the assurance that operations that claim to be committed are in fact fully logged.

As mentioned in the previous section, user-level processes send requests to commit to the file system using an ioctl system call. In general, ioctl is a mechanism for sending driver-defined request codes to a driver (the idea is that both the sender and the device know what a given ioctl request number means, but the kernel does not interpret them). The supplied program send_ioctl can be used to send ioctls to your driver:

So, for example, the following command sends a commit message to your FUSE driver:

It is only after the commit ioctl returns that the file system user has a guarantee that the transaction has taken effect. If there is a crash before the ioctl is called, the transaction has not taken effect. If there is a crash during the commit ioctl, then it depends on whether the commit point was reached. (What is the commit point?)

For this lab, we are assuming the file system has a single client: you do not need to worry about multiple transactions occurring at the same time. The potential source of non-atomic behavior is if the file system crashes.

Creating a Simple Transaction Interface

Simple log data structures and functions to manipulate them reside in log.h and log.c: your job is to integrate them with the file system.

Before starting any work on the driver, run ./chmod-walk in the Lab 7 directory. This will set up permissions in the directories leading up to your lab directory correctly so that you will be able to run the driver successfully. (If you do not run this script, FUSE will be unable to mount or unmount file systems in your lab directory.)

The file system log in this lab is an array of log entries. Each log entry has an opcode. These opcodes are defined in log.h, e.g. commit entries have an opcode of 19. Almost every user update to the file system will generate a new log entry. Each log entry stores all the necessary information to reapply the effects of the corresponding user update to the file system.

Now, take a look at the union type log_args_t defined in log.h. (A union, in C, is a C datatype that can represent multiple types of structs.) This union stores all of the information associated with a file system update (write, truncate, etc.). User accesses will generate struct log_entrys; each log_entry comprises a log_args_t plus some log-related metadata.

In log.c, implement log_tx_add and log_tx_done.

log_tx_add takes an opcode, a pointer to a log_args_t and a timestamp; this function should append a new entry to the log. The struct log structure is basically an array of log entries plus some bookkeeping data fields. Then nentries field of struct log denotes the number of entries in the current log; the txn_id field stores the current transaction id. Note that you can use nentries to find the latest inserted entry: it is log->entries[log->nentries-1]. Therefore, you should insert the new entry at nentries and advance nentries by 1 when you are done. New log entries should also store information about their current transaction, specifically the transaction number txn_id, which can be found at log->txn_id.

log_tx_done is called when a user process sends an "ioctl commit" command to the file system. Note that log_tx_done should not return until the transaction is durable in the log: that way, once "ioctl commit" returns, the user has a guarantee that the transaction has "taken," in particular that it will be seen from this point forward, regardless of whether there are crashes and restarts. (See the description of IOCTL_COMMIT earlier.)

But log_tx_done must arrange for something else as well. To provide crash atomicity, log_tx_done needs to bring the disk from one valid state to another; no intermediate state should be visible to observers, even if there is a crash. You will arrange for this by doing the following in log_tx_done. First, add a COMMIT log entry at the end of the log. Then make a call to msync to ensure that the COMMIT log entry, as well as all other log entries in the same transaction, are actually on the disk. (Use man msync to learn its usage; you want to use the MS_SYNC flag.) Note that at this point, this transaction is still not committed; why not? Finally, increment nentries and txn_id, and persist the incremented variables on disk by calling msync again (and using the MS_SYNC flag again). We can persist the two variables at the same time because they will be placed on the same disk sector, and recall from your reading and lecture that writes of single disk sectors happen atomically, and that the disk hardware enforces this. Once this second call to msync returns, the transaction is committed.

Make sure you understand what the commit point is here, why that is the commit point, and why it's the case that the system reaches the commit point atomically. Solutions that do not provide crash atomicity will not be considered correct. (In class, the commit point was the writing of the COMMIT log entry; we could have done the same thing here, but it would have required a different way of finding the end of the log during replay, versus the approach that you will take in Exercise 2.)

It's important to make sure that your log works correctly before proceeding to the next exercises.

Type make grade. You should be able to pass the "basic operations" test. While this test does not really test your implementation of log_tx_add and log_tx_done, it does check that your implementation of these two functions does not break existing file system functionality.

To do more comprehensive testing than what the grading script provides, follow the instructions below:

Test log entries can be sent to your driver via ioctl request code 5003, and you can print log contents using request code 5002 (as described above). Below is sample output under correct behavior: on one terminal, run fsdriver with the -d flag:

# run fsdriver -d
$ test/makeimage.bash # create a new image (with an empty log)
$ mkdir mnt # create directory mnt if you do not have mnt already
$ build/fsdriver -d testfs.img mnt

On a second terminal, issue the following commands:

# send commands
$ ./send_ioctl mnt/.txn 5003 
$ ./send_ioctl mnt/.txn 5003 
$ ./send_ioctl mnt/.txn 5002 
$ ./send_ioctl mnt/.txn 4000
$ ./send_ioctl mnt/.txn 5003
$ ./send_ioctl mnt/.txn 5003
$ ./send_ioctl mnt/.txn 5002

If you are using one of the CIMS machines, the output from the first terminal should look similar to the following:

# [ FUSE startup output and debug information have been omitted ]

...

received ioctl: 5003
adding entry...
done

...

received ioctl: 5003
adding entry...
done

...

received ioctl: 5002
dumping log...
nentries: 3
entry 0: op: 0, txn_id: 0
entry 1: op: 0, txn_id: 1
entry 2: op: 0, txn_id: 1

...

received ioctl: 4000
attempting to commit...
done: 0

...

received ioctl: 5003
adding entry...
done

...

received ioctl: 5003
adding entry...
done

...

received ioctl: 5002
dumping log...
nentries: 6
entry 0: op: 0, txn_id: 0
entry 1: op: 0, txn_id: 1
entry 2: op: 0, txn_id: 1
entry 3: op: 19, txn_id: 1
entry 4: op: 0, txn_id: 2
entry 5: op: 0, txn_id: 2

If you are using the devbox, the output will be slightly different: FUSE will check the file system first, adding some additional entries to the log:

# [ FUSE startup output and debug information have been omitted ]

# [ You might see a start-up message similar to "fusermount: failed to open
#   /etc/fuse.conf:Permission denied". This messsage can be safely ignored. ]

...

received ioctl: 5003
adding entry...
done

...

received ioctl: 5003
adding entry...
done

...

received ioctl: 5002
dumping log...
nentries: 47
entry 0: op: 0, txn_id: 0
entry 1: op: 5, txn_id: 1   #[devbox only] Entries 1 to 44 are additional entries from FUSE:
entry 2: op: 5, txn_id: 1   #[devbox only] When testfs.img is mounted, the system will
entry 3: op: 5, txn_id: 1   #[devbox only] read from the mounted directory, generating
...                         #[devbox only] log entries.
entry 43: op: 5, txn_id: 1  #[devbox only]
entry 44: op: 5, txn_id: 1  #[devbox only]
entry 45: op: 0, txn_id: 1
entry 46: op: 0, txn_id: 1

...

received ioctl: 4000
attempting to commit...
done: 0

...

received ioctl: 5003
adding entry...
done

...

received ioctl: 5003
adding entry...
done

...

received ioctl: 5002
dumping log...
nentries: 50
entry 0: op: 0, txn_id: 0
entry 1: op: 5, txn_id: 1   #[devbox only] Read entries from earlier.
entry 2: op: 5, txn_id: 1   #[devbox only]
entry 3: op: 5, txn_id: 1   #[devbox only]
...                         #[devbox only]
entry 43: op: 5, txn_id: 1  #[devbox only]
entry 44: op: 5, txn_id: 1  #[devbox only]
entry 45: op: 0, txn_id: 1
entry 46: op: 0, txn_id: 1
entry 47: op: 19, txn_id: 1
entry 48: op: 0, txn_id: 2
entry 49: op: 0, txn_id: 2

During the course of testing your FUSE driver, various combinations of operations may cause your FUSE driver to stop functioning or enter a non-clean state. It may be helpful to search Piazza for information about specific error messages.

More generally, to get your system to a clean starting state, you can do:

$ fusermount -u mnt      # unmount the driver
$ test/makeimage.bash    # make a clean testfs.img
$ rm -rf mnt             # remove the mounting directory and its residents
$ mkdir mnt              # recreate the mounting directory

or you can invoke the following script, which is equivalent to the lines above:

$ ./startclean.bash

In log.c, implement log_replay. This function is called when the driver is (re)started: it should replay all committed transactions -- and only committed transactions -- from the beginning of the log to the end of the log (log->nentries-1).

There are several ways to do this. One way is is to use two passes. In the first pass, record the transaction numbers of transactions that have commit entries. In the second pass, call log_entry_install on all transaction entries that are part of a committed transaction. log_entry_install takes a single log entry and modifies the file system: you will fill in this function in a later exercise. You can also implement log_replay in a single pass. To avoid undue code complexity, it may be helpful to model the replay logic as a FSM (finite state machine).

Use make grade to test your work so far. You should be able to pass the "replay" test.

Logging File System Operations

We have provided a full implementation of file system functions in fsdriver.c. In order for the file system to support transactions, all file system operations that modify the disk must have corresponding entries in the log.

(The following commands modify the disk: write, link, unlink, rename, mknod, and truncate.)

Change fs_mknod in fsdriver.c to use transactions: currently, it writes changes directly to the disk using fs_mknod_install. First, add a log entry for the operation using log_tx_add: pass in the opcode OP_MKNOD and a pointer to a populated log_args_t. Then, call fs_mknod_install with the appropriate arguments. In this exercise and the next, you will find some or all of the following library functions useful: strcpy, memcpy, and time (as usual, use the man pages to learn the usage of these library calls; for example, man 2 time).

Also implement the OP_MKNOD case branch in log_entry_install (defined in log.c) by calling fs_mknod_install with the appropriate arguments.

At this point, your driver should pass the test test_touch_crash.bash; equivalently, you should pass the "touch" test when doing make grade. The test checks the following:

Calling $ touch mnt/[file] creates a log entry for mknod and creates a file.
If a commit ioctl is received after the file is created, the file re-appears when the driver is restarted.
If no commit ioctl is received after the file is created, the file does not re-appear when the driver is restarted.

Using the same pattern as in Exercise 3, modify the following functions to use transactions: fs_write, fs_link, fs_unlink, fs_rename, and fs_truncate. See the prior exercise for standard C library calls that may be useful.

It will be easiest to do this by changing one function at a time. You can use the scripts provided in test/ and make grade to test your solution.

After modifying fs_write, you should be able to pass the "echo", "time", "big_txn" and "many_txn" tests.
After modifying fs_truncate as well as fs_write, you should be able to pass the "truncate" test.
After modifying fs_rename, you should be able to pass the "mv" test.
After modifying fs_link, you should be able to pass the "link" test.
After modifying fs_link and fs_unlink, you should be able to pass the "unlink" test.

Handing in the lab

After a successful submission, you should receive email confirmation (at your NYU email address).

If you submit multiple times, we will take the latest submission and count slack hours accordingly.