CS439 Spring 2013 Labsh: Introduction to System Calls and Process Control

Handed out Sunday, January 20, 2013
Due Monday, January 28, 2013, 9:00 PM

Introduction

In this lab you will gain exposure to some system calls in Unix, mainly fork(), exec(), and calls related to file I/O. You will also gain exposure to the concept of multiprocessing, which will be vital for the upcoming labs.

Administrative note: You are not programming in pairs for this assignment.

Getting Started

The source code for this lab is available from the repository that you cloned in the previous lab. To fetch that source, use Git to commit your lab 1 source, fetch the latest version of the course repository, and then create a local branch called labsh based on our labsh branch, origin/labsh:

tig% cd ~/cs439/labs
tig% git commit -am 'my solution to lab1'
Created commit 254dac5: my solution to lab1
 3 files changed, 31 insertions(+), 6 deletions(-)
tig% git pull
Already up-to-date.
tig% git checkout -b labsh origin/labsh
Branch labsh set up to track remote branch refs/remotes/origin/labsh.
Switched to a new branch "labsh"
tig% make tidy
Removing ...
tig%

The git checkout -b command shown above actually does two things: it first creates a local branch labsh that is based on the origin/labsh branch provided by the course staff, and second, it changes the contents of your lab directory to reflect the files stored on the labsh branch. Git allows switching between existing branches using git checkout branch-name, though you should commit any outstanding changes on one branch before switching to a different one.

The make tidy command shown above cleans up any files or directories left over in your lab repository from the previous lab that aren't needed or used in the new lab you just checked out. You should run make tidy after switching between the branches of independent labs in the future to ensure that your workspace is clean. This will also ensure that make turnin will not complain about untracked files when you attempt to turn in your completed lab solutions.

Lab Requirements

For labsh you need to answer all of the numbered questions (questions that are not numbered are optional). Place the write-up in a file called answers.txt (plain text) in the top level of your labs directory before handing in your work. Please include a header that contains your name, UTCS username, and lab number and make sure the file is named correctly. If you do not, your answer may not be graded.

Hand-in Procedure

As in the previous lab, when you are ready to hand in your code and write-up, run make turnin.

Background

A shell is an interactive command-line interpreter that runs programs on behalf of the user. A shell repeatedly prints a prompt, waits for a command line from the user, and then carries out some action, as directed by the contents of the command line.

The command line is a sequence of ASCII text words delimited by whitespace. The first word in the command line is either the name of a built-in command or the pathname of an executable file. The remaining words are command-line arguments. If the first word is a built-in command, the shell immediately executes the command in the current process. Otherwise, the word is assumed to be the pathname of an executable program. In this case, the shell forks a child process and then loads and runs the program in the context of the child. The child processes created as a result of interpreting a single command line are known collectively as a job.

If the command line ends with an ampersand “&”, then the job runs in the background, which means that the shell does not wait for the job to terminate before printing the prompt and awaiting the next command line. Otherwise, the job runs in the foreground, which means that the shell waits for the job to terminate before awaiting the next command line. Thus, at any point in time, at most one job can be running in the foreground. However, an arbitrary number of jobs can run in the background.

For example, typing the command line:

bash> jobs

causes the shell to execute the built-in jobs command.

Typing the command line:

bash> /bin/ls -l -d

runs the ls program in the foreground. The shell and the OS cooperate to ensure that when this program begins executing its main routine, whose signature is int main(int argc, char *argv[]), the arguments have the following values:

argc == 3
argv[0]: ‘‘/bin/ls’’
argv[1]: ‘‘-l’’
argv[2]: ‘‘-d’’

Alternatively, typing the command line

bash> /bin/ls -l -d &

runs the ls program in the background.

Unix shells support the notion of job control, which allows users to move jobs back and forth between background and foreground, and to change the process state (running, stopped, or terminated) of the processes in a job. Typing ctrl-c causes a SIGINT signal to be delivered to each process in the foreground job. The default action for SIGINT is to terminate the process. Similarly, typing ctrl-z causes a SIGTSTP signal to be delivered to each process in the foreground job. The default action for SIGTSTP is to place a process in the stopped state, where it remains until it is awakened by the receipt of a SIGCONT signal. (We are not studying signals in detail in this class, but your text CS:APP2e [Bryant and O'Hallaron] discusses them thoroughly, in section 8.5.)

Unix shells also provide various built-in commands that support job control. These commands include:

jobs: List the running and stopped background jobs.

bg <job>: Change a stopped background job to a running background job.

fg <job>: Change a stopped or running background job to a running in the foreground.

kill <job>: Terminate a job.

Exercises

Before you get started on the following exercises, read sections 8.3 and 8.4 of CS:APP2e [Bryant and O'Hallaron] thoroughly. Keep the following points in mind as you code:

You must follow the guidelines laid out in the C Style Guide or you will lose points. This includes selecting reasonable names for your files and variables.
This project will be graded on the UTCS public linux machines. Although you are welcome to do testing and development on any platform you like, we cannot assist you in setting up other environments, and you must test and do final debugging on the UTCS public linux machines. The statement "It worked on my machine" will not be considered in the grading process.
Your code must compile without any additions or adjustments, or you will receive a 0 for the correctness portion.
You are also encouraged to use code provided by a public library such as the GNU library.
If you find that the problem is underspecified, please make reasonable assumptions and document them in the answers.txt file. Any clarifications or revisions to the assignment will be posted to Piazza.

You should find the following files in your lab directory:

Makefile: compiles your programs.
fib.c: implement Fibonacci here.
psh.c: implement simple shell here.
ascii.c: implement file I/O exercises here.
util.c, util.h: contains provided utility functions.

One vital function that programs must be able to perform is input and output to files, on-disk or otherwise (such as special files, like the terminal). The operating system provides system calls to allow programs to perform I/O. In a Unix-like system such as Linux, these calls are:

open(): open a file, return a file descriptor to later do reads and/or writes to this file.
read(): read data from an open file.
write(): write data to an open file.
lseek(): seek to a different location in a file to read or write data from.
close(): close an open file.

You may type man 2 <syscall> (for any system call, including those listed above!) to get more information about it. Except for printing error messages (where you are welcome to use fprintf), any file I/O you do in the following exercise must be done with the syscalls listed above, or with any of the syscalls listed in man 2 syscalls (but the ones we listed here should suffice).

Exercise 1. Here's an example program to demonstrate using these syscalls to do file I/O:

#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>

int
main(int argc, char **argv)
{
	int fd;
	if ( (fd = open("hi.txt", O_WRONLY|O_TRUNC|O_CREAT, 0600)) < 0)
        {
            fprintf(stderr, "Couldn't open hi.txt. Error: %s\n",
                    strerror(errno));
            return -1;
        }

	write(fd, "hello, world\n", 13);
	close(fd);
	return 0;
}

Copy this program to the file hi.c in your labs directory, then compile it with gcc hi.c -Wall -o hi. Here, the compiler (gcc) takes the source code of the program (in hi.c) and produces an executable (in hi). Now run the executable and check its output:

tig% ./hi
tig% cat hi.txt
hello, world

When you type ./hi, the shell asks the kernel to create a process from the executable we just produced from gcc and run it. When that process finishes, the kernel notifies the shell (because the shell is waiting on the process), and the shell then prompts for another command. The command cat hi.txt prints the file hi.txt to the terminal, which, note, is the file name that we specified in the arguments to open. Feel free to remove hi.c and continue with the lab.

When a syscall fails, the operating system returns a negative value and sets a global variable, errno (from errno.h), to hold an error code. You can find a description of some of these error codes and what they mean here. To get a human-readable error message string from an errno error code, call strerror(errno). Note that strerror() returns a char *; it does not print out an error message. By convention, in a Unix-like system, error messages should be reported to "standard error", which means "whatever is being abstracted by file descriptor 2". The standard C I/O library (stdio.h) further abstracts file descriptor 2 as a FILE *, aliased to stderr. If that last sentence was confusing, what you need to know is (1) that fprintf(stderr, "...") will print to standard error, and (2) that good C programming style on Unix is to report errors on standard error.

You should always check the return values of any syscall you make and handle any errors appropriately. This is important to keep in mind for the upcoming exercise, as we will grade you on your style and adherence to these conventions and standards.

One special file you will be dealing with in the following exercise is /dev/urandom. /dev/urandom is not an actual file on disk, but rather a construction of the kernel itself to provide programs with a way of generating random numbers. You can open and close /dev/urandom like a normal file, but when you read from /dev/urandom, you will get a random sequence of bytes.

Exercise 2. Read the descriptions of the generate_ascii() and fibonacci_ascii() functions in ascii.c, then implement these functions using the system calls that we have outlined. You must check the errors from any system call you make in the style above, i.e. you must check the return value of the syscall and, on error, print a message to standard error with fprintf() containing the output of strerror(errno).

As an example of what we expect from generate_ascii() and fibonacci_ascii(), here is the output of a sample run to produce a 10x20 character picture:

tig% ./ascii 10 20 pic.txt fibpic.txt
tig% cat pic.txt
z{~`H8ZZf#oB1ff#[+ra
.WyU>i~AIU|N]|PP5vP7
.}gtBU3yk-rlEo)_R@nl
Z10xk`C'h^$om>VP_{.8
g@^nT5e-huwR"jX</I"Y
OExq b}pE3-VEc&>4NS}
LXSB3yHwnbj;0p!fJ{A4
l#g==VBLyk[ftRD^ w>j
MBeYTFHz2a$8f4!*QrP^
:3:BCiJF;L!8u#SN[z=U
tig% cat fibpic.txt
z{{~`8ff.|o5A

The fork() system call creates a child process that is nearly identical to the parent. The exec() family (see man 3 exec) replaces much (but not all!) of the state of the currently running process with new state, based on a new program.

Exercise 3. Update fib.c so that if invoked on the command line with some integer argument n, where n is less than or equal to 13, it recursively computes the nth Fibonacci number. (The numbers are counted from 0.)

Example executions:

tig% ./fib 3
2 
tig% ./fib 10
55

The trick is that each recursive call must be made by a new process, so you will call fork() and then have the new child process call doFib().

The parent must wait for the child to complete, and the child must pass the result of its computation to its parent.

Run make and test your fib program on some inputs to make sure you have implemented it correctly.

Now that you have some experience with fork(), your job in this next exercise is to create a simple shell, which will implement some of the functionality of a real Unix shell. To do this, you will use the fork() and exec() syscalls.

We have provided psh.c, which provides a framework for your shell, and util.h/util.c, which provide some helper functions. Read these files.

This shell waits for a line of input. If the line is “quit”, it exits. Otherwise, it parses the line and attempts to execute the program at the path specified by the first word with the arguments specified by the remaining words. It waits for that job to finish. Then it waits for the next line of input.

The prompt should be the string psh>.

The command line typed by the user should consist of a name and zero or more arguments, all separated by one or more spaces. If name is a built-in command, then psh should handle it immediately and wait for the next command line. Otherwise, psh should assume that name is the path of an executable file, which it loads and runs in the context of a child process. Your shell waits for that job to finish, and then it waits for the next line of input. (In this context, the term job refers to this child process.)
For example,

psh> /bin/ls -l -d
should run the ls program in the foreground.

Your shell should implement one built-in command: quit. If the user types quit, your shell should exit.

For now, all commands and jobs are executed in the foreground. You also can assume that jobs execute until they exit; you don’t need to worry about signal handling.

You may find the following hints useful as you implement the psh program:

The waitpid(), fork(), and exec* system calls will come in very handy. Use man 2 <call> to learn about them (and man 3 exec will be handy too). Remember, you can use man man (not to be confused with Man Man) to learn about man.

The WEXITSTATUS macro described in the waitpid man page may also be useful.

Recall that C does not have strings. Read more about string handling in C in these notes.

Exercise 4. Update the file psh.c by implementing the functions eval(), which the main() function calls to process one line of input, and builtin_cmd(), which your eval() function should call to parse and process the built-in quit command.

Now you should be able to run the programs you wrote inside your shell. For example:

$ ./psh
psh> ./fib 13
233
psh> quit

Answer the following questions in answers.txt in your labs directory. Be sure to run git add answers.txt so that your answers are turned in with the rest of your solution.

Questions:

If you have any preliminary comments on your submission or notes for the TAs, please give them here.
Please cite any offline or online sources you consulted while preparing your submission, other than the Linux documentation, course text, and lecture notes.
How many child processes are created when doFib() begins with an input of 5? Show the return value for each child process, numbering the processes beginning with 0 (the original process) and incrementing for each child process.
Which flavor of exec() did you choose to use in psh? Why?

This completes the lab. Make sure you have answered all questions in answers.txt and committed your solutions, then run make turnin.

Acknowledgments

Thanks to Alison Norman for much of this lab.

Last updated: Sun Jan 20 22:02:27 -0600 2013 [validate xhtml]