Handed out Friday, February 6, 2015
Due 9:00 PM, Friday, February 13, 2015 10:00 AM,
Tuesday, February
17, 2015
In this lab, you will learn how a shell is built. You will also improve (or reinforce) your shell-using skills. You will also gain experience with C programming (you will interact with critical constructs, such as pointers and data structures, as well as standard idioms like linked lists). Along the way, you will use the fork() system call, a lightweight version of which you implemented in the previous lab! There is not much code to write, but there is a lot to absorb; we recommend beginning this lab over the weekend.
This lab adapts a shell lab due to Eddie Kohler. The high-level design, much of the code, and some of the description below is borrowed from him.
A shell parses a command line, and then runs (or executes) that
command line (one can also think of GUIs as shells, in which case the
"command line" is expressed by the user's mouse clicks, for example).
We've given you the skeleton, and nearly all of the parsing code, for a
simple but surprisingly functional shell. You will complete the parsing
code. You will also fill in the logic for executing the command line:
you will implement support for executing commands, I/O redirection,
pipes, as well as conditional, sequential, and background execution. You
will also implement several internal commands, such as
cd
. (Puzzle: why does cd have to be built
into the shell, when most other commands are executed by exec()ing an
existing program?) Finally, students in the honors section must
implement an additional piece of functionality, described below.
You will electronically hand in code and answers (the mechanics are described at the end of the lab).
Please note the following expectations:
We recommend doing this lab inside the development platform that you set up in the previous lab (the virtual devbox or the CIMS machines). However, this lab uses standard interfaces and tools, and as a consequence, you can do the lab on OS X or even cygwin on Windows; if you decide to go down that path, note the following: (1) the testing scripts expect to be run on a Linux machine, and will incorrectly report errors for some test cases when they are run on platforms other than Linux, and (2) we will be grading on the standard lab platforms.
From within the development platform that you have chosen (which, in the
recommended cases, will be the virtual devbox or the CIMS machines),
change to your lab directory, use
Git to commit changes you've made since handing in lab 1 (if any),
obtain the latest version of the course
repository, and create a local branch called lab2 based on
origin/lab2 (which is our lab2 branch). The commands are as follows, but you
may not need all of them (for example if you are already on the
lab1
branch or if git status
indicates that
there are no changes to commit):
$ cd ~/cs202 $ git checkout lab1 $ git commit -am "here are my leftover changes from lab1" $ git pull $ git checkout -b lab2 origin/lab2 Branch lab2 set up to track remote branch refs/remotes/origin/lab2. Switched to a new branch "lab2" $ cd lab2
The git checkout -b command shown above actually does two things: it first creates a local branch lab2 that is based on the origin/lab2 branch provided by the course staff, and second, it changes the contents of your lab directory to reflect the files stored on the lab2 branch. Git allows switching between existing branches using git checkout branch-name, though you should commit any outstanding changes on one branch before switching to a different one.
Recall that Git allows you to keep track of changes that you make to your code. For example, if you are finished with one of the exercises, and want to checkpoint your progress, you can commit your changes by running:
$ git commit -am 'my solution for lab2 exercise1' Created commit 60d2135: my solution for lab2 exercise1 1 files changed, 1 insertions(+), 0 deletions(-) $
You can view changes by using git diff. For example, git diff origin/lab2 will display the changes relative to the initial code supplied for this lab.
It will be much easier to do the coding work in this lab if you have some familiarity with shells in general; this is for two reasons. First, comfort with shells makes you more productive. Second, if you have a good handle on what a shell is supposed to do, the coding work will make more sense. This portion of the lab is intended to provide some of that background (but some of the background you will have to acquire by "playing around" with the shell on your computer). In this part of the lab, we will interact with the installed shell on your system (rather than the source that you retrieved above). We will be assuming the bash shell, which is the default shell on both of the development platforms in this class.
A shell is a program whose main purpose is to run other programs.
Two of the key workhorses in a shell are
fork()
and execve()
(as we saw in class). Here's a simple shell command:
$ echo foo
(The initial $
is not part of the command. That is shorthand for the prompt that the shell prints before reading a command.)
The shell parses this command into two arguments, echo
and foo
. The echo
argument names the binary that should be executed. So the shell forks a child process to execute echo
with those two arguments. The echo
program has a very simple job: it just prints its arguments to the console. (echo foo
will just print foo
.) Meanwhile, the parent waits for the child to finish; when it does, the parent returns to read another command.
You may be interested in a reasonable tutorial for Unix shells. You can find others by searching for, e.g., ''shell tutorial'' on Google. Let us know if you find one you really like.
Each program has standard input, standard output, and
standard error file descriptors, whose numbers are 0, 1, and 2,
respectively. (You may know them in C++ as cin
, cout
and cerr
; in C the files are called stdin
, stdout
, and stderr
.) The echo
program writes its output to the standard output file descriptor. Normally this is the same as the shell's standard output, which is the terminal (your screen). But the shell lets you redirect these file descriptors to point instead to other files. For example:
$ echo foo > output.txt
This command doesn't print anything to the screen. But let's use the cat
program, which reads a file and prints its contents to standard output, to see what's in output.txt
:
$ cat output.txt foo
The > filename
operator redirects standard output, < filename
redirects standard input, and 2> filename
redirects standard error. (The syntax varies from shell to shell; we generally follow the syntax of the Bourne shell or bash
.)
Shells offer many ways to chain commands together. For example, the ;
operator says "do one command, then do another". This shell command prints two lines:
$ echo foo ; echo bar foo bar
The &&
and ||
operators chain commands
together based on their exit status. You can think of the exit
status as a command's "return value".
If a command accomplishes its function successfully, that command
generally exits with status 0, by calling exit(0)
. (This
is also what happens when a program runs off the end of its main
function.) But if there's an error, most commands will exit with status 1. For example, the cat
command will exit with status 0 if it reads its files successfully, and 1 otherwise:
$ cat output.txt foo [[exit status 0]] $ echo $? 0 $ cat doesnotexist.txt cat: doesnotexist.txt: No such file or directory [[exit status 1]] $ echo $? 1(The special variable
$?
in bash contains the exit value of
the previous command. However, the shell we build in this lab does not support
this variable.)
Now, &&
says "execute the command on the right only if the command on the left exited with status 0". And ||
says "execute the command on the right only if the command on the left exited with status NOT equal to 0". For example:
$ cat output.txt && echo "output.txt exists!" foo Output.txt exists! $ cat doesnotexist.txt && echo "doesnotexist.txt exists!" cat: doesnotexist.txt: No such file or directory [[Note: does not run echo!]] $ cat output.txt || echo "output.txt does not exist." foo $ cat doesnotexist.txt || echo "doesnotexist.txt does not exist." cat: doesnotexist.txt: No such file or directory doesnotexist.txt does not exist.
Parentheses ( )
allow you to run a set of commands in a subshell. This, in turn, lets you redirect the standard input, output, and error of a group of commands at once. For example:
$ ( echo foo ; echo bar ) > output.txt $ cat output.txt foo bar
The exit status of a subshell is the same as the exit status of the last command executed.
You can also execute a command in the background with the &
operator. Normally, the shell will not read a new command until the previous command has exited. But the &
operator tells the shell not to wait for the command.
$ echo foo & $ foo
(foo
is printed on top of the next shell prompt.)
Finally, the pipe operator |
sends the output of one command to the input of another. For example:
$ echo foo | rev oof
You may find the following commands particularly useful for testing your shell. Find out what they do by reading their manual pages. Be creative with how you combine these! (Also see the tester, below.)
cat
(print one or more files to standard output)echo
(print arguments to standard output)true
(exit with status 0)false
(exit with status 1)sleep
(wait for N seconds then exit)sort
(sort lines)
In addition to running programs from the file system, shells have
builtin commands that provide functionality that could not be obtained
otherwise. Two builtins commands that our shell will implement are cd
and
exit
.
The cd
command changes the shell's current directory, which is the default directory that the shell uses for files.
So cd dir
changes the current directory to dir
.
(You can think of the current directory as the directory that shows up by default when you use an "Open" or "Save" dialog in a
GUI program.) Of course, files can also be manipulated using
absolute pathnames, which do not depend on the current
directory; /home/studentname/cs202/lab2/cmdparse.c
is an example.
There may also come a time when you would like to leave your shell; the exit
command instructs the shell to exit with a given status. (exit
alone exits with status 0.)
(Why are cd
and exit
part of the shell instead of standalone programs?)
$ cd lab2 # if you are not already in the lab2 directory $ make $ ./cs202shYou can type commands to the shell; after each command you type, it will display the parser output and then attempt to execute the command. Initially, it won't execute anything, but as you complete the lab, more and more kinds of commands will work. You can run the parser on its own like this:
$ ./cs202sh -p
To quit your shell, you can press control-C. After you implement the
exit
command, you will be able to type
"exit
".
Test cases. You can and should interact with your shell by typing commands to it. You should also run a collection of test cases that are meant to stress your implementation:
$ make testThis command invokes the file
tester.pl
; you should read
this file to see what tests your shell is being given, and what the
expected output is.
Please note: passing tester.pl
is necessary but not
sufficient. We
will be using additional tests to grade and test whether your code meets
the specifications. Feel free to add additional tests to
tester.pl
, or devise your own testing script.
At a high level, our shell does two things: parse command lines, and execute those command lines. This piece of the lab is about parsing; the next (and longer) piece is about executing.
In our shell, parsing has two aspects (which are interleaved): converting a typed command line into a sequence of tokens, and converting that sequence of tokens into an internal representation (the execution piece acts on this internal representation). Tokens can be command names, command arguments, or special shell tokens. For instance, the command
ls -l "foo bar" > zotconsists of five tokens: "
ls
",
"-l
", "foo bar
" (the quotes are not part of the token), ">
", and
"zot
".
To motivate the internal representation, notice that command
lines can contain multiple commands, separated by various
operators:
;
, &
, |
,
&&
, ||
, (
, or
)
. Our internal representation is a list
of commands, with information about the operator that separates each command from
the next (along with other information, such as what kinds of
redirection the user specified for each command). The data
structure that implements this representation is (mostly)
a linked list of commands, with each node in the list
containing information about the command (we said "mostly" because,
owing to subshells, the data structure can be thought of as a hybrid of
a list and a tree).
The key structure in the code is
the struct command
(in cmdparse.h
).
A good way to get familiar with a code base is to look at the key data structures and interfaces (which are usually in header files). To this end:
cmdparse.h
.The implementation of parsing is in cmdline.c
, and is
driven from main.c
. You will need to have a solid
understanding of this code in order to do the remaining exercises.
Therefore:
cmdparse.c
and main.c
.
It may help to play around with the parser for the shell that you have been
given:
$ make # if you have not already built the code $ ./cs202sh -p # -p means "parse only" cs202$ ls -l "foo bar" > zotInspect the output. You can and should try typing in some of the commands from the warm-up section.
There is missing functionality in cmdparse.c. It does not handle parentheses properly, and it does not free the memory that it uses. Your job is to fill these gaps.
Exercise 1. In cmdparse.c
,
complete cmd_parse()
so that it handles parentheses and subshells.
If you did this correctly, then make test
should now
pass the first five tests (it will by default pass some of the
successive tests).
Exercise 2. In cmdparse.c
, complete
cmd_free()
.
We do not provide testing code for this. One hint: what does
strdup()
do? Type man strdup
to find out.
Speaking of resources....in this and the next part of the lab, you will need some comfort with C. We have been building this comfort in class, but you may still feel shaky. Here are some potentially helpful resources:
In this part of the lab, you will write the code to execute the parsed
command line. Now would be a good time to take another look at
main.c
to see the high-level loop and logic for the
entire shell.
Although we have given you the necessary skeleton, most of the "work" for executing command lines is unimplemented. Your job is to fill it in:
cmdrun.c
,
implement cmd_line_exec()
and cmd_exec()
.
The comments in cmdrun.c
give specifications and hints.
It may be best to implement
cmd_line_exec()
first, and then
to fill out cmd_exec()
in pieces, testing the functionality
as you go (using make test
and your own testing).
Your functions will need to make heavy use of system calls. To learn
more about the exact use of these system calls, you want to consult the
manual pages. The interface to these pages is a program called
man
. Systems programmers need to type
man
a lot; you should too.
For example,
man waitpid
will give you documentation on the parameters and return values to
waitpid()
. Note that sometimes you will need to specify a
section of the manual. For example, man open
may not give you documentation on the system call open(); you would want
man 2 open
. Here is a list
of some man
commands that will be useful in this lab; you
may need others:
# the numbers may be optional, depending on your system $ man 2 waitpid # note WEXITSTATUS! $ man 2 fork $ man 2 open $ man 2 close $ man 2 dup2 $ man 3 strdup $ man 3 strcmp $ man 2 pipe $ man 2 execve # or "man execvp" or "man 3 exec" $ man 2 chdir $ man 3 getcwd $ man 3 perror $ man 3 strtolInterestingly, you don't need to invoke
read()
and
write()
in this lab. (Why not?)
You can learn more about the man
program by typing
man man
(but avoid possible confusion).
A hint, concerning the exec
variants: note that the argv
array includes the command name
itself as the "zeroth" argument. For example, in the command echo foo
,
the initial element in the array is argv[0]
, which is the
address of memory that holds the string echo
.
And argv[1]
is the address of memory that holds the string
foo
. This is not necessarily intuitive.
A word about testing: if a particular test is giving you trouble, it may be helpful to type it into the standard shell (bash, as covered in the warmup earlier), and see what happens. Then play around with (or modify) the test, to try to understand what bash expects. That may give you insight about what your code is supposed to do.
Honors students (section 001) should implement an additional piece of functionality; as usual, students in section 002 are free (even encouraged) to do this exercise, but there will be no credit given.
Implement command queues using three special shell commands.
makeq qname size
command creates a command queue named qname with parallelism size. You can also use makeq
to reset the parallelism of an existing queue.
q qname command...
command adds a command to the command queue named qname. This command will be run when there is room in the queue, possibly immediately. Much like a background command, the q
command returns immediately with status 0. (This is true even if command must wait to run.)waitq qname
command blocks until qname is empty, then returns with status 0.There is no command to delete a command queue.
Some examples may help. This simple example shows a command queue with
parallelism 1. Such a queue runs one command at a time. Here, we enter
the queue commands at the cs202$
prompt:
cs202$ makeq myq 1 [3 args "makeq" "myq" "1"] . cs202$ q myq echo "This is printed immediately." [4 args "q" "myq" "echo" "This is printed immediately."] . cs202$ This is printed immediately. cs202$ q myq echo "That emptied the queue, so this is printed immediately too." [4 args "q" "myq" "echo" "That emptied the queue, so this is printed immediately too."] . cs202$ That emptied the queue, so this is printed immediately too. cs202$ q myq sleep 1 ; q myq echo "This should run after 1 second." ; echo "This should be printed first." ; sleep 2 [4 args "q" "myq" "sleep" "1"] ; [4 args "q" "myq" "echo" "This should run after 1 second."] ; [2 args "echo" "This should be printed first."] ; [2 args "sleep" "2"] . This should be printed first. This should run after 1 second. ## (it appeared after 1 second) cs202$
This example shows a queue with parallelism 2. This time, we give the
commands all at once, in a command file. When run as ./cs202sh
< commandfile
, cs202sh will print messages with the described delays. (Figure out why.)
makeq myq2 2 q myq2 sleep 1 q myq2 sleep 2 q myq2 echo This is printed 1 second after the shell began. q myq2 sleep 3 q myq2 echo This is printed 2 seconds after the shell began. waitq myq2 echo This is printed 4 seconds after the shell began.
If a command queue command is called with too few arguments, or bad arguments, it should write q: Syntax error
(or makeq: Syntax error
, or waitq: Syntax error
) to standard error. For example:
cs202$ q X q: Syntax error cs202$ q q: Syntax error
Note that queue commands can have redirections just like normal commands.
cs202$ q X 2> /tmp/x cs202$ cat /tmp/x q: Syntax error
Queued commands should "wake up" immediately after the relevant job ID completes, even if the shell is waiting for another process at the time. For example:
makeq myq3 1 q myq3 sleep 50 q myq3 echo This should appear after 50 seconds sleep 100 ; echo This should appear after 100 seconds => ... waits about 50 seconds, then prints: ... This should appear after 50 seconds ... waits another 50 seconds or so, then prints: ... This should appear after 100 seconds
Beware of errors! Unless you are careful, problematic commands might mess up your queue.
makeq myq4 1 q myq4 sleep 10 q myq4 echo This should appear after 10 seconds q myq4 ## Should cause a syntax error q myq4 echo This should appear immediately thereafter ## ... but a common error might cause it to NEVER appear
Subshell note: Command queues in subshells (parentheses) don't need to be supported. The queueing commands do not mean anything special inside a subshell.
Maximum queue size note: A command queue must be able to support up to 1024 enqueued processes, but it doesn't need to support more than 1024 enqueued processes.
Current directory note: We will not test interactions between the cd
command and queued commands. In the following, it is OK for the queued ls
command to execute in either /
or /tmp
.
makeq myq5 1 q myq5 sleep 10 cd / q myq5 ls cd /tmp
Prompt note: You do not need to run queued commands while
cs202sh
is blocked at a cs202$
prompt. (You
do need to run queued commands while cs202sh
is blocked waiting for another process, as described above.) If you make this work, though, tell us in answers.txt
.
Implementing the q
command will be a bit tricky. Questions to ponder: How does the shell know when a background process completes? How can the shell tell a waiting q
command to start running?
Implementing the waitq
command is also tricky. If you design your q
implementation well, you should be able to reuse most of it for waitq
! But note the difference: a q
command will run whenever there is room in the queue, but waitq
must not complete until the queue has no active processes.
It is OK if your command queue implementation creates a process for a queued command as soon as it is entered. However, those processes must block until it is their turn to run according to the queue.
Command queueing is, as far as we know, a feature unique to cs202sh (and
its antecedents), but features like it are common in real Unix shells. In general features like q
are given the name job control. See this reference for a good introduction to job control.
Extra credit. Any of the following exercises will receive extra credit (for students in either section).
Exercise 5 (Extra-credit).
The solution that we guide you to for implementing cd
has a
race condition. Fix that race condition. The file cmdrun.c
has details.
a
, b
, c
, d
, and e
, where:a
stdout ==> b
stdin (this is a conventional pipe).a
stderr ==> c
stdin.b
file descriptor 3 ==> d
file descriptor 4.d
stdout ==> e
stdin.a
and b
, where:a
stdout ==> b
stdin.b
stdout ==> a
stdin. (This is a "bidirectional" pipe.)
You may investigate bash's implementation for some ideas (look for >&
), but bash does not completely support this feature.
If you have not done so yet, commit your code.
$ git status # see what files you changed $ git diff # examine your changes $ git commit -am "My solutions to lab2"Then, follow these steps:
$ make handinThis command removes binaries from your directory, archives your directory in a tarball called
lab2-handin.tgz
, and invokes a
script that submits this tarball to our submission server.
Assuming that you performed the submission
setup step in lab1,
you should see something like this:
CLEAN CREATE lab2-handin.tgz SUBMITTING ... ... progress ... Submission successful
If you submit multiple times, we will take the latest submission and count slack hours accordingly.
This completes the lab.
Last updated: Mon May 04 11:24:46 -0400 2015 [validate xhtml]