CSCI-UA.0202 Spring 2015 Lab 2: Shell

Handed out Friday, February 6, 2015
Due ~~9:00 PM, Friday, February 13, 2015~~ 10:00 AM, Tuesday, February 17, 2015

Lab 2: Shell

Introduction

In this lab, you will learn how a shell is built. You will also improve (or reinforce) your shell-using skills. You will also gain experience with C programming (you will interact with critical constructs, such as pointers and data structures, as well as standard idioms like linked lists). Along the way, you will use the fork() system call, a lightweight version of which you implemented in the previous lab! There is not much code to write, but there is a lot to absorb; we recommend beginning this lab over the weekend.

This lab adapts a shell lab due to Eddie Kohler. The high-level design, much of the code, and some of the description below is borrowed from him.

A shell parses a command line, and then runs (or executes) that command line (one can also think of GUIs as shells, in which case the "command line" is expressed by the user's mouse clicks, for example). We've given you the skeleton, and nearly all of the parsing code, for a simple but surprisingly functional shell. You will complete the parsing code. You will also fill in the logic for executing the command line: you will implement support for executing commands, I/O redirection, pipes, as well as conditional, sequential, and background execution. You will also implement several internal commands, such as cd. (Puzzle: why does cd have to be built into the shell, when most other commands are executed by exec()ing an existing program?) Finally, students in the honors section must implement an additional piece of functionality, described below.

You will electronically hand in code and answers (the mechanics are described at the end of the lab).

Getting started

We recommend doing this lab inside the development platform that you set up in the previous lab (the virtual devbox or the CIMS machines). However, this lab uses standard interfaces and tools, and as a consequence, you can do the lab on OS X or even cygwin on Windows; if you decide to go down that path, note the following: (1) the testing scripts expect to be run on a Linux machine, and will incorrectly report errors for some test cases when they are run on platforms other than Linux, and (2) we will be grading on the standard lab platforms.

From within the development platform that you have chosen (which, in the recommended cases, will be the virtual devbox or the CIMS machines), change to your lab directory, use Git to commit changes you've made since handing in lab 1 (if any), obtain the latest version of the course repository, and create a local branch called lab2 based on origin/lab2 (which is our lab2 branch). The commands are as follows, but you may not need all of them (for example if you are already on the lab1 branch or if git status indicates that there are no changes to commit):

The git checkout -b command shown above actually does two things: it first creates a local branch lab2 that is based on the origin/lab2 branch provided by the course staff, and second, it changes the contents of your lab directory to reflect the files stored on the lab2 branch. Git allows switching between existing branches using git checkout branch-name, though you should commit any outstanding changes on one branch before switching to a different one.

Recall that Git allows you to keep track of changes that you make to your code. For example, if you are finished with one of the exercises, and want to checkpoint your progress, you can commit your changes by running:

You can view changes by using git diff. For example, git diff origin/lab2 will display the changes relative to the initial code supplied for this lab.

Warm-up: shell commands and constructs

It will be much easier to do the coding work in this lab if you have some familiarity with shells in general; this is for two reasons. First, comfort with shells makes you more productive. Second, if you have a good handle on what a shell is supposed to do, the coding work will make more sense. This portion of the lab is intended to provide some of that background (but some of the background you will have to acquire by "playing around" with the shell on your computer). In this part of the lab, we will interact with the installed shell on your system (rather than the source that you retrieved above). We will be assuming the bash shell, which is the default shell on both of the development platforms in this class.

A shell is a program whose main purpose is to run other programs. Two of the key workhorses in a shell are fork() and execve() (as we saw in class). Here's a simple shell command:

(The initial $ is not part of the command. That is shorthand for the prompt that the shell prints before reading a command.) The shell parses this command into two arguments, echo and foo. The echo argument names the binary that should be executed. So the shell forks a child process to execute echo with those two arguments. The echo program has a very simple job: it just prints its arguments to the console. (echo foo will just print foo.) Meanwhile, the parent waits for the child to finish; when it does, the parent returns to read another command.

Each program has standard input, standard output, and standard error file descriptors, whose numbers are 0, 1, and 2, respectively. (You may know them in C++ as cin, cout and cerr; in C the files are called stdin, stdout, and stderr.) The echo program writes its output to the standard output file descriptor. Normally this is the same as the shell's standard output, which is the terminal (your screen). But the shell lets you redirect these file descriptors to point instead to other files. For example:

This command doesn't print anything to the screen. But let's use the cat program, which reads a file and prints its contents to standard output, to see what's in output.txt:

The > filename operator redirects standard output, < filename redirects standard input, and 2> filename redirects standard error. (The syntax varies from shell to shell; we generally follow the syntax of the Bourne shell or bash.)

Shells offer many ways to chain commands together. For example, the ; operator says "do one command, then do another". This shell command prints two lines:

The && and || operators chain commands together based on their exit status. You can think of the exit status as a command's "return value". If a command accomplishes its function successfully, that command generally exits with status 0, by calling exit(0). (This is also what happens when a program runs off the end of its main function.) But if there's an error, most commands will exit with status 1. For example, the cat command will exit with status 0 if it reads its files successfully, and 1 otherwise:

Now, && says "execute the command on the right only if the command on the left exited with status 0". And || says "execute the command on the right only if the command on the left exited with status NOT equal to 0". For example:

Parentheses ( ) allow you to run a set of commands in a subshell. This, in turn, lets you redirect the standard input, output, and error of a group of commands at once. For example:

The exit status of a subshell is the same as the exit status of the last command executed.

You can also execute a command in the background with the & operator. Normally, the shell will not read a new command until the previous command has exited. But the & operator tells the shell not to wait for the command.

Finally, the pipe operator | sends the output of one command to the input of another. For example:

You may find the following commands particularly useful for testing your shell. Find out what they do by reading their manual pages. Be creative with how you combine these! (Also see the tester, below.)

In addition to running programs from the file system, shells have builtin commands that provide functionality that could not be obtained otherwise. Two builtins commands that our shell will implement are cd and exit.

The cd command changes the shell's current directory, which is the default directory that the shell uses for files. So cd dir changes the current directory to dir. (You can think of the current directory as the directory that shows up by default when you use an "Open" or "Save" dialog in a GUI program.) Of course, files can also be manipulated using absolute pathnames, which do not depend on the current directory; /home/studentname/cs202/lab2/cmdparse.c is an example.

There may also come a time when you would like to leave your shell; the exit command instructs the shell to exit with a given status. (exit alone exits with status 0.)

Compiling, running, and testing

To quit your shell, you can press control-C. After you implement the exit command, you will be able to type "exit".

Test cases. You can and should interact with your shell by typing commands to it. You should also run a collection of test cases that are meant to stress your implementation:

Please note: passing tester.pl is necessary but not sufficient. We will be using additional tests to grade and test whether your code meets the specifications. Feel free to add additional tests to tester.pl, or devise your own testing script.

Parsing command lines

At a high level, our shell does two things: parse command lines, and execute those command lines. This piece of the lab is about parsing; the next (and longer) piece is about executing.

In our shell, parsing has two aspects (which are interleaved): converting a typed command line into a sequence of tokens, and converting that sequence of tokens into an internal representation (the execution piece acts on this internal representation). Tokens can be command names, command arguments, or special shell tokens. For instance, the command

To motivate the internal representation, notice that command lines can contain multiple commands, separated by various operators: ;, &, |, &&, ||, (, or ). Our internal representation is a list of commands, with information about the operator that separates each command from the next (along with other information, such as what kinds of redirection the user specified for each command). The data structure that implements this representation is (mostly) a linked list of commands, with each node in the list containing information about the command (we said "mostly" because, owing to subshells, the data structure can be thought of as a hybrid of a list and a tree). The key structure in the code is the struct command (in cmdparse.h).

A good way to get familiar with a code base is to look at the key data structures and interfaces (which are usually in header files). To this end:

The implementation of parsing is in cmdline.c, and is driven from main.c. You will need to have a solid understanding of this code in order to do the remaining exercises. Therefore:

There is missing functionality in cmdparse.c. It does not handle parentheses properly, and it does not free the memory that it uses. Your job is to fill these gaps.

Exercise 1. In cmdparse.c, complete cmd_parse() so that it handles parentheses and subshells.

If you did this correctly, then make test should now pass the first five tests (it will by default pass some of the successive tests).

We do not provide testing code for this. One hint: what does strdup() do? Type man strdup to find out.

Executing command lines

In this part of the lab, you will write the code to execute the parsed command line. Now would be a good time to take another look at main.c to see the high-level loop and logic for the entire shell.

Although we have given you the necessary skeleton, most of the "work" for executing command lines is unimplemented. Your job is to fill it in:

Your functions will need to make heavy use of system calls. To learn more about the exact use of these system calls, you want to consult the manual pages. The interface to these pages is a program called man. Systems programmers need to type man a lot; you should too.

For example, man waitpid will give you documentation on the parameters and return values to waitpid(). Note that sometimes you will need to specify a section of the manual. For example, man open may not give you documentation on the system call open(); you would want man 2 open. Here is a list of some man commands that will be useful in this lab; you may need others:

A hint, concerning the exec variants: note that the argv array includes the command name itself as the "zeroth" argument. For example, in the command echo foo, the initial element in the array is argv[0], which is the address of memory that holds the string echo. And argv[1] is the address of memory that holds the string foo. This is not necessarily intuitive.

A word about testing: if a particular test is giving you trouble, it may be helpful to type it into the standard shell (bash, as covered in the warmup earlier), and see what happens. Then play around with (or modify) the test, to try to understand what bash expects. That may give you insight about what your code is supposed to do.

Honors supplement and extra-credit problems

Honors students (section 001) should implement an additional piece of functionality; as usual, students in section 002 are free (even encouraged) to do this exercise, but there will be no credit given.

Exercise 4 (Honors). You will implement command queues. There's no skeleton provided, so leave extra time for this. Like background commands, command queues let a shell user run commands in parallel. But command queues limit the number of commands that execute simultaneously. For instance, a queue with parallelism 2 will run at most 2 commands at a time. Additional commands are saved in a queue; they execute, in order, as running commands complete, with at most 2 running commands active at any time. Command queues let a user run a bunch of commands without taking up lots of machine resources. It also is an interesting way to prevent certain limited fork bombs.

Implement command queues using three special shell commands.

The makeq qname size command creates a command queue named qname with parallelism size. You can also use makeq to reset the parallelism of an existing queue.
The q qname command... command adds a command to the command queue named qname. This command will be run when there is room in the queue, possibly immediately. Much like a background command, the q command returns immediately with status 0. (This is true even if command must wait to run.)
The waitq qname command blocks until qname is empty, then returns with status 0.

There is no command to delete a command queue.

Some examples may help. This simple example shows a command queue with parallelism 1. Such a queue runs one command at a time. Here, we enter the queue commands at the cs202$ prompt:

 cs202$ makeq myq 1
 [3 args "makeq" "myq" "1"] .
 cs202$ q myq echo "This is printed immediately."
 [4 args "q" "myq" "echo" "This is printed immediately."] .
 cs202$ This is printed immediately.
 cs202$ q myq echo "That emptied the queue, so this is printed immediately too."
 [4 args "q" "myq" "echo" "That emptied the queue, so this is printed immediately too."] .
 cs202$ That emptied the queue, so this is printed immediately too.
 cs202$ q myq sleep 1 ; q myq echo "This should run after 1 second." ; echo "This should be printed first." ; sleep 2
 [4 args "q" "myq" "sleep" "1"] ;
 [4 args "q" "myq" "echo" "This should run after 1 second."] ;
 [2 args "echo" "This should be printed first."] ;
 [2 args "sleep" "2"] .
 This should be printed first.
 This should run after 1 second.                 ## (it appeared after 1 second)
 cs202$

This example shows a queue with parallelism 2. This time, we give the commands all at once, in a command file. When run as ./cs202sh < commandfile, cs202sh will print messages with the described delays. (Figure out why.)

 makeq myq2 2
 q myq2 sleep 1
 q myq2 sleep 2
 q myq2 echo This is printed 1 second after the shell began.
 q myq2 sleep 3
 q myq2 echo This is printed 2 seconds after the shell began.
 waitq myq2
 echo This is printed 4 seconds after the shell began.

If a command queue command is called with too few arguments, or bad arguments, it should write q: Syntax error (or makeq: Syntax error, or waitq: Syntax error) to standard error. For example:

 cs202$ q X
 q: Syntax error
 cs202$ q
 q: Syntax error

Note that queue commands can have redirections just like normal commands.

 cs202$ q X 2> /tmp/x
 cs202$ cat /tmp/x
 q: Syntax error

Queued commands should "wake up" immediately after the relevant job ID completes, even if the shell is waiting for another process at the time. For example:

 makeq myq3 1
 q myq3 sleep 50
 q myq3 echo This should appear after 50 seconds
 sleep 100 ; echo This should appear after 100 seconds
   =>
 ... waits about 50 seconds, then prints: ...
 This should appear after 50 seconds
 ... waits another 50 seconds or so, then prints: ...
 This should appear after 100 seconds

Beware of errors! Unless you are careful, problematic commands might mess up your queue.

 makeq myq4 1
 q myq4 sleep 10
 q myq4 echo This should appear after 10 seconds
 q myq4                                                   ## Should cause a syntax error
 q myq4 echo This should appear immediately thereafter    ## ... but a common error might cause it to NEVER appear

Subshell note: Command queues in subshells (parentheses) don't need to be supported. The queueing commands do not mean anything special inside a subshell.

Maximum queue size note: A command queue must be able to support up to 1024 enqueued processes, but it doesn't need to support more than 1024 enqueued processes.

Current directory note: We will not test interactions between the cd command and queued commands. In the following, it is OK for the queued ls command to execute in either / or /tmp.

 makeq myq5 1
 q myq5 sleep 10
 cd /
 q myq5 ls
 cd /tmp

Prompt note: You do not need to run queued commands while cs202sh is blocked at a cs202$ prompt. (You do need to run queued commands while cs202sh is blocked waiting for another process, as described above.) If you make this work, though, tell us in answers.txt.

Implementing the q command will be a bit tricky. Questions to ponder: How does the shell know when a background process completes? How can the shell tell a waiting q command to start running?

Implementing the waitq command is also tricky. If you design your q implementation well, you should be able to reuse most of it for waitq! But note the difference: a q command will run whenever there is room in the queue, but waitq must not complete until the queue has no active processes.

It is OK if your command queue implementation creates a process for a queued command as soon as it is entered. However, those processes must block until it is their turn to run according to the queue.

Command queueing is, as far as we know, a feature unique to cs202sh (and its antecedents), but features like it are common in real Unix shells. In general features like q are given the name job control. See this reference for a good introduction to job control.

Extra credit. Any of the following exercises will receive extra credit (for students in either section).

Exercise 5 (Extra-credit). The solution that we guide you to for implementing cd has a race condition. Fix that race condition. The file cmdrun.c has details.

Exercise 6 (Extra-credit). Design and implement a new shell syntax that allows more complex forms of redirection. In particular, your syntax should support arbitrary redirection graphs, including:

Run five commands, a, b, c, d, and e, where:
- a stdout ==> b stdin (this is a conventional pipe).
- a stderr ==> c stdin.
- b file descriptor 3 ==> d file descriptor 4.
- d stdout ==> e stdin.
Run two commands, a and b, where:
- a stdout ==> b stdin.
- b stdout ==> a stdin. (This is a "bidirectional" pipe.)

You may investigate bash's implementation for some ideas (look for >&), but bash does not completely support this feature.

Handing in the lab

If you submit multiple times, we will take the latest submission and count slack hours accordingly.