Handed out Thursday, January 29, 2015
Due 9:00 PM, Friday, February 6, 2015
This lab is split into three parts. In the first part, you will get your development environment set up, and quickly review C pointers. In the second part, you will do the heart of this lab: a small coding exercise (WeensyOS, written by Eddie Kohler) in which you manipulate processes (you will implement a basic fork call!). The third part gives an introduction to Git, a version control system that you will use this semester to manage your work.
This lab does not require you to write much code. So take time to understand the code, tools, and environment that you have been given; you will need to be fluent in these things in order to be productive this semester.
You will electronically hand in code and answers (the mechanics are
described at the end of the lab; sneak preview). You will hand in responses to numbered
exercises. Do your work in the code files themselves and in a file called
answers.txt
(your answers must be in this file in text; a blank
answers.txt will receive no credit).
Please note the following expectation, which we will have throughout the semester. You must comment every block of code that you write, and you will be graded on these comments. As a rough guideline, figure that you need a one-sentence comment every four lines or so. In these comments, you must explain the purpose of the code, rather than narrating the instruction that you gave to the computer. Examples of not-good comments are, "assign variable X to variable Y", "dereference pointer P", or "the following code is a for loop that iterates over the foo structure." An example of a better comment would be, "Initialize the frobnitz data structure." To be clear: in the real world, it is terrible style to comment code that is obvious or self-documenting. Why are we departing from that? Because, this being a class, we want to make sure that you understand and can explain your code.
As an exception, the preceding expectation does not apply to the exercises in part 1 below.
Go to the setup page, and follow the instructions there. Come back here when that step is complete.
The files that you will need for this and subsequent lab assignments in this course are distributed using the Git version control system. We will learn a bit more about Git at the end of this lab.
The URL for the course Git repository is http://www.cs.nyu.edu/~mwalfish/classes/15sp/cs202-labs.git
To get the labs, go into your development environment (the devbox or the CIMS machines) and run the commands below (the lines preceded with '#' are comments):
$ mkdir ~/cs202 $ cd ~/cs202 # You must execute the line below; NOT doing so violates the course policies $ chmod 0700 . # The 'dot' at the end of the next line tells git not to add another # layer in the directory hierarchy: the clone will be into the current directory $ git clone http://www.cs.nyu.edu/~mwalfish/classes/15sp/cs202-labs.git . Cloning into .... $
Git allows you to keep track of the changes you make to the code. For example, if you are finished with one of the exercises, and want to checkpoint your progress, you can commit your changes by running:
$ git commit -am 'my solution for lab1 exercise1' Created commit 60d2135: my solution for lab1 exercise1 1 files changed, 1 insertions(+), 0 deletions(-) $
You can keep track of your changes by using the git diff command. Running git diff will display the changes to your code since your last commit, and git diff origin/lab1 will display the changes relative to the initial code supplied for this lab. Here, origin/lab1 is the name of the git branch with the initial code you downloaded from our server for this assignment.
All of the projects in this class will be done in C or C++, for two main reasons. First, some of the things that we want to implement require direct manipulation of hardware state and memory, which are operations that are naturally expressed in C. Second, C and C++ are widely used languages, so learning them is a useful thing to do in its own right. While you have some experience in C from CS201, you will probably need greater comfort and familiarity here than was required there. You will truly need to "think in C."
If you are interested in why C looks like it does, we encourage you to look at Ritchie's history. Here, perhaps, is the key quotation: "Despite some aspects mysterious to the beginner and occasionally even to the adept, C remains a simple and small language, translatable with simple and small compilers. Its types and operations are well-grounded in those provided by real machines, and for people used to how computers work, learning the idioms for generating time- and space-efficient programs is not difficult. At the same time the language is sufficiently abstracted from machine details that program portability can be achieved."
You will do four short exercises in C, as a warmup. From your cs202 directory:
$ cd lab1/review-pointers
Exercise 1. Implement the function set_to_five in ex1.c.
For Exercises 1 through 4, you can test with:
$ makeLater in the lab, we will revisit the program make. For now, think of it as an automatic way to invoke the compiler, and link your compiled code to a small test harness that we provide. Now invoke ex1:
$ ./build/ex1 ex1: OKIf your ex1.c is correctly implemented, you will see OK, as above.
Exercise 2. Implement the function swap in ex2.c.
You will need to remove the assert(0); line. Remove this wherever you see it as you implement functions in future exercises; it is a reminder that you have yet to implement a particular function.
As above, you compile and test with:
$ makeNow you should see
$ ./build/ex2 ex2: OK
The assert line you saw in the previous exercise is a simple application of a powerful tool. In C, an assert is a preprocessor macro which effectively enforces a contract in the program. (You can read more about macros in C here.) The contract that assert enforces is simple: when program execution reaches assert(<condition>), if condition is true, execution continues; otherwise, the program aborts with an error message.
Assertions, when used properly, are powerful because they allow the programmer who uses them to guarantee that certain assumptions about his or her code hold. For example, in the swap function that you just implemented, there are two assertions at the beginning of the function, before where your code should have been placed:
void swap(int *p1, int *p2) { assert(p1 != NULL); assert(p2 != NULL); ... }These two assertions combined enforce the contract that neither of the parameters p1 or p2 can be NULL. If these assertions were not present and either of these passed parameters were NULL, if we tried to swap them, we would encounter a type of error called a segmentation violation (or segmentation fault). This is because dereferencing the NULL address is invalid; NULL points to "nothing". By using assertions, we guarantee that swap will never try to swap the value of a variable at NULL, saving us the headache of having to debug a segmentation fault if some code tried to pass swap a NULL value. Instead, we will get a handy error message describing exactly what contract of the function was invalidated.
Exercise 3. Implement the function array_sum in ex3.c.
Test it:
$ make gcc -m32 -g -c ex3.c -o build/ex3.o gcc -m32 static/part3_harness.o build/ex3.o -lm -o build/ex3 $ ./build/ex3 ex3: OK
Again, you should see "OK".
Test it:
$ make gcc -m32 -g -c ex4.c -o build/ex4.o gcc -m32 static/part4_harness.o build/ex4.o -lm -o build/ex4 $ ./build/ex4 set_point ok point_dist okIf you implemented ex4.c correctly, you will see the printout above.
The heart of this lab is the first WeensyOS. WeensyOS refers to a series of small coding assignments that are also complete operating systems. You could boot a WeensyOS operating system on real x86-compatible hardware! The purpose of the WeensyOS assignments is first, to teach some of the class's concepts through example, and second, to demystify operating systems in general.
WeensyOS was developed by Eddie Kohler; the code that we use, and the description below (narrative, exercises, etc.), is borrowed from him. One of his goals in developing these labs was to make them fun. We think he succeeded!
The first WeensyOS problem set is called MiniprocOS, and it concerns
processes.
MiniprocOS is a tiny operating system that supports the major process
primitives: creating a new process with fork
, exiting a
process, and waiting on a process that has exited.
The only thing missing -- and it's a big one -- is process isolation:
MiniprocOS processes actually share a single address space. (In a later
WeensyOS assignment you will implement process isolation for memory.)
In this assignment, you will actually implement the code that forks a
new process, and see how system calls are implemented.
You will also update the code that waits on an exited process to avoid busy
waiting.
You could take the disk image file built in this assignment, write it to your laptop's hard drive, and boot up your operating system directly if you wanted! However, it's much easier to work with a virtual machine or PC emulator.
An emulator mimics, or emulates, the behavior of a full hardware platform. A PC emulator acts like a Pentium-class PC: it emulates the execution of Intel x86 instructions and the behavior of other PC hardware. For example, it can treat a normal file in your home directory as an emulated hard disk; when the program inside the emulator reads a sector from the disk, the emulator simply reads 512 bytes from the file. PC emulators are slower than real hardware, since they do all of the regular CPU's job in software -- not to mention the disk controller's job, the console's job, and so forth. However, debugging with an emulator is a whole lot friendlier, and you can't screw up your machine!
We will be using the QEMU emulator. We will also be using a C compiler (gcc), configured to compile code for an x86 ELF target. (ELF, or Executable and Linkable Format, is a particular format for storing machine language programs on disk.) All of the required tools are installed on our two lab platforms (the virtual devbox or the CIMS machines), which you set up above.
Note that if you are using the virtual devbox option, then you are using an additional layer of emulation to run the devbox itself (in that Ubuntu Linux is made to "run" on your own laptop or desktop). You are probably using Virtual Box for this purpose. Note that Virtual Box is solving almost the same problem that QEMU is. This is explained further on our setup page.
You fetched the source at the beginning of the lab. Look over the source:
# this assumes we were in the review-pointers directory
$ cd ..
# now we are in the lab1 directory
$ ls lab1
answers.txt conf GNUmakefile k-loader.c mergedep.pl process.h x86.c
boot.c const.h kernel.c lib.c p-procos-app2.c review-pointers x86.h
bootstart.S COPYRIGHT kernel.h lib.h p-procos-app3.c submit.py
build elf.h k-int.S link p-procos-app.c types.h
$
Now it's time to give the OS a whirl.
Change into the lab1 directory and run the make program (which must be GNU make).
make, if you haven't heard of it, is a program that simplifies the process of building software projects. The user writes a set of rules, called a Makefile (ours is called GNUmakefile) that tells the make program what to build. For example, a Makefile might say, "to compile a C program, run the gcc compiler; and by the way, I want to compile the program named hello, which depends on the C source file hello.c". Makefiles can be quite simple, although most medium-to-large projects have complex Makefiles. You'll be invoking a couple of Makefiles in the labs.
The WeensyOS GNUmakefile builds a hard disk image called procos.img, which contains the MiniprocOS "kernel" and two applications, p-procos-app.c and p-procos-app2.c.
Make's output should look something like this:
$ make
HOSTCOMPILE build/mkbootdisk.c
ASSEMBLE bootstart.S
COMPILE boot.c
LINK obj/bootsector
ASSEMBLE k-int.S
COMPILE kernel.c
COMPILE x86.c
COMPILE k-loader.c
COMPILE lib.c
COMPILE p-procos-app2.c
LINK obj/p-procos-app2
COMPILE p-procos-app3.c
LINK obj/p-procos-app3
COMPILE p-procos-app.c
LINK obj/p-procos-app
LINK obj/kernel
CREATE procos.img
$
Now that you've built the OS disk image, it's time to run it! We've made it very easy to boot a given disk image; just run this command:
$ make run
This will start up QEMU. After a moment you should see a window like this!
Hit "1
" to try to run the first application, and you should see a window like this:
To quit QEMU, type control-C in the terminal.
You're now ready to start learning about the OS code!
Start first with the application, p-procos-app.c. This application simply starts a single child process and waits for it to exit. It uses system calls that you saw in CS201 and that we also cover in this class: fork starts a new process; exit exits a process; and wait returns a process's exit status.
Read and understand the code in p-procos-app.c.
How are those system calls implemented?
As you saw in CS201 (and as we will review in this class): to make a
system call, the application program executes a trap: an
instruction that initiates a protected control transfer to the kernel.
The system call's arguments are often stored in machine registers, and
that's how MiniprocOS does it.
Likewise, the system call's results are often returned in a machine
register.
On Intel 80386-compatible machines (colloquially called "x86es"), the
interrupt instruction is called int
, and registers have names
like %eax
, %ebx
, and so forth.
A special C language statement, called asm
, can execute the
interrupt instruction and connect register values with C-language
variables.
Read and understand the comments in process.h. This file defines MiniprocOS's system calls. Also glance through the code, to see how system calls actually work!
The MiniprocOS kernel handles these system calls.
This kernel is different from conventional operating system kernels in several ways, mostly to keep the kernel as small as possible. For one thing, the kernel shares an address space with user applications, so that user applications could write over the kernel if they wanted to. This isn't very robust, since the kernel is not isolated from user faults, but for now it is easier to keep everything in the same address space. Another difference is that MiniprocOS implements cooperative multitasking, rather than preemptive multitasking. That is, processes give up control voluntarily, and if a process went into an infinite loop, the machine would entirely stop. In preemptive multitasking, the kernel can preempt an uncooperative process, which forces it to give up control. Preemptive multitasking is more robust than cooperative multitasking, meaning it's more resilient to errors, but it is slightly more complex. All modern PC-class operating systems use preemptive multitasking for user-level applications, but the kernel itself usually switches between internal tasks using cooperative multitasking.
MiniprocOS's main kernel structures are as follows.
struct process_t
process_t proc_array[];
I
is stored in
proc_array[I]
. Initially, only one of these processes is
active, namely proc_array[1]
. The proc_array[0]
entry
is never used.process_t *current;
The code in kernel.c sets up these structures. In
particular, the start()
function initializes all the process
descriptors.
Read and understand the code and comments in
kernel.h. Then read and understand the memory map in
kernel.c, the picture at the top that explains how MiniprocOS's
memory is laid out. Then look at start()
.
The code you'll be changing in MiniprocOS is the function that responds to
system calls. This function is called interrupt()
.
Read and understand the code for
interrupt()
in kernel.c. Concentrate on the
simplest system call, namely sys_getpid/INT_SYS_GETPID
.
Understand how the sys_getpid
application function (in
process.h) and the INT_SYS_GETPID
clause in
interrupt()
(in kernel.c) interact.
Exercise 5. Answer the following question (in answers.txt
): Say
you replaced run(current)
in the
INT_SYS_GETPID
clause with schedule()
.
The process that called sys_getpid()
will eventually run
again, picking up its execution as if sys_getpid()
had returned
directly. When it does run, will the sys_getpid()
call have
returned the correct value? Why or why not?
You may have noticed, though, that the sys_fork()
system call isn't working! Your job is to write the code that actually
creates a new process.
Exercise 6. Fill out the do_fork()
and copy_stack()
functions in kernel.c.
Congratulations, you've written code to create a process -- it's not that hard, no? (Our version is less than 20 lines of code.) Here's what you should see when you're done:
Now take a look at the code in p-procos-app.c
that calls
sys_wait()
. Also look at the INT_SYS_WAIT
implementation in kernel.c
. The current system call design
uses a polling approach: to wait for process 2 to exit, a process
must call sys_wait(2)
over and over again until process 2
exits and the sys_wait(2)
system call returns a value
different from WAIT_TRYAGAIN
.
We'll see more about polling later in the semester, but for now, notice
that polling approaches like this often reduce utilization. A
process uses CPU time to call sys_wait(2)
over and over again,
leaving less CPU time for others. An alternative approach, which can
improve utilization, is called blocking. A blocking
implementation would put sys_wait(2)
's caller to sleep, then
wake it up once process 2 had exited and a real exit status was available.
The sleeping process doesn't use any CPU. A process that is asleep because
the kernel is waiting for some event is called blocked.
INT_SYS_WAIT
in kernel.c
to use blocking
instead of polling. In particular, when the caller tries to wait on a
process that has not yet exited, that process should block until the
process actually exits.
Important Hint: Make sure that your blocking version of
sys_wait()
has exactly the same user-visible behavior
as the original version, except that it blocks and so never returns
-2
. See process.h
for an English description of
the current behavior.
To implement Exercise 7, you will probably want to add a field to the
process descriptor structure.
This field will indicate whether or not a process is waiting on another
process.
You will change INT_SYS_WAIT
to add the calling process to
this "wait queue", and INT_SYS_EXIT
to wake any processes
that were on the "wait queue".
There are several ways to do this; describe how you did it in
answers.txt.
To check your work, try changing the sys_wait()
loop in
p-procos-app.c
to look like this:
do { status = sys_wait(p); app_printf("W"); } while (status == WAIT_TRYAGAIN);
A polling implementation of sys_wait
would produce output
like this:
You want it to produce output like this:
Now try running the other MiniprocOS application. You should see something like this (different processes generally print their lines in different colors):
The MiniprocOS2 application, in p-procos-app2.c
, tries to run
1024 child processes.
Read and understand p-procos-app2.c
.
Unfortunately, your current kernel code doesn't seem able to run more
than 15 total processes, ever!
It looks like old, dead processes aren't being cleaned up, even after we call
sys_wait()
on them.
This is what we call a bug.
Exercise 8. Find and fix this bug.
When you've completed this exercise, the application should count up to 1024, like this:
Your colors may differ, however, depending on how you implement
sys_wait()
. One common implementation strategy ends with
several red lines in a row. If you see this in your code, try to figure
out why!
Exercise 9 (Honors). MiniprocOS miniprocesses have some aspects of threads. For instance, like threads, they all share a single address space. A big difference from threads is that we create a new process by forking. New threads are created in a different way. Introduce a new system call,
pid_t sys_newthread(void (*start_function)(void));that creates a new process in a thread-like way. The new process should start with an empty stack, not a copy of the current stack. Rather than starting at the same instruction as the parent, the new thread should start by executing the
start_function
function: that
is, that function's address becomes the new thread's instruction
pointer.
Exercise 10 (Honors). Introduce a
sys_kill(pid)
system call by which one process can make
another process exit. Use this system call to alter p-procos-app2.c's
run_child()
function so that the even-numbered processes kill
off all odd-numbered processes (except for thread 1). Running the
application should print out "Process N lives" messages only for
even-numbered values of N.
Extra credit. For extra credit (honors or regular), do the following exercise.
sys_fork()
, with its dirt simple stack copying strategy, works
only for simple programs. For example, consider the following function
definition:
void pmain(void) { int x = 0; /* note that local variable x lives on the stack */ /* YOUR CODE HERE */ pid_t p = sys_fork(); if (p == 0) /* YOUR CODE HERE */; else if (p > 0) sys_wait(p); // assume blocking implementation app_printf("%d", x); sys_exit(0); }In a system with true process isolation, the child process's
x
and the parent process's x
would be different variables, and
changes in one process would not affect the other's x
. But in
MiniprocOS, this is not always the case! For this exercise, produce a
version of that code with the following properties:
10
".11
".volatile
to
tell the compiler not to optimize references to it. Doing this correctly
is tricky, but if you can understand the difference between volatile
int *x
and int * volatile x
you can do this problem.
This completes the WeensyOS portion of this lab. At this point, you should commit your work:
# see what files you changed $ git status # examine your changes $ git diff $ git commit -am "My solutions to lab1"The next (final) part of the lab will shed some light on what the commands above are actually doing.
Your name and email address were configured automatically based on your username and hostname. Please check that they are accurate. You can suppress this message by setting them explicitly: git config --global user.name "Your Name" git config --global user.email you@example.com ...This happens because you have not configured git, and it cannot figure out your name and email address (required by default for creating a commit). Type the following:
$ git config --global user.name "<your_name>" $ git config --global user.email <your_email_address>and then re-run the
git commit
command above.
This course will use Git to distribute code for all programming assignments. Git is a distributed (as opposed to centralized) version control system. If used correctly, it can be very useful for tracking your work, undoing mistakes, trying different approaches, etc.
Read Eddie Kohler's excellent introduction to git. Where this page mentions code.seas.harvard.edu, read that as cs.nyu.edu (where you cloned from, above); where it mentions "the CS50 appliance", read that as our lab platform (the devbox or your CIMS account).
Note that life with git will be far easier at first if you use a graphical browser. Gitk is a good one that is installed on the lab platform. Just type gitk while you are in a git repo (if you haven't heard the term "repo" before, it's short for "repository"; we and many others sometimes use this abbreviation).
Last, note that our reference page has tutorials and references on git.
Other git commands that may ultimately come in handy (if not in this class than in later projects) aregit stash
(type man
git-stash
to learn about it), git rebase
(ditto:
man git-rebase
), and git-cherry-pick
.
We have prepared a couple of very brief exercises using git.
First, go to your cs202/
directory. Now, type:
$ git checkout -b git-lab/mergeA origin/git-lab/mergeA
This will create a new branch named git-lab/mergeA. This branch should only have a single file called merge.c. Type ls to verify. This is a simple program that simply prints three characters to the screen. (Note: You may see other files besides merge.c, if you created them but did not check them into your lab1 branch. These files are not tracked by git, so they persist across checkouts). You may compile and run this program by typing:
$ make merge # To compile $ ./merge
Exercise 12. (This goes in answers.txt.) What is the name of the first commit to the tree in which this branch resides? (Hint: This is much easier if you use a git browser. An example is gitk, which is installed on our lab platform.)
Now we will create another branch called git-lab/mergeB. Type:
$ git checkout -b git-lab/mergeB origin/git-lab/mergeB
This branch also contains a single file called merge.c. Verify with ls. This program is very similar to the one you saw before. The only difference is that it prints out a different set of characters. Make sure that this is so.
One of the strengths of git is that it allows multiple users to work from the same repository independently from each other. Eventually, though, all of the work must be merged together into a final product Usually, git will do this automatically for you. However, there are times when multiple users modify the same place in a file, in which case git cannot know whose work should be used (only a human can manually resolve a "conflict" of this kind). You will be doing such conflict resolution, but here and throughout the semester, you must be careful: a botched merge is a reliable source of headaches. The two branches that you have just created have been set up so that they will cause exactly such a conflict when merging. Type the following into a console:
$ git branch git-lab/mergeA * git-lab/mergeB lab1 $ git merge git-lab/mergeA Auto-merging merge.c CONFLICT (content): Merge conflict in merge.c Automatic merge failed; fix conflicts and then commit the results.
Now would be a good time to go back over Eddie Kohler's excellent introduction to git. See especially the section on Conflicts; search for the word "scary".
Exercise 14. What is the name of your commited merge? Place your answer in answers.txt.
This completes the work of the lab. The final step is handing it in. Switch back to the lab1 branch (git checkout lab1).
To hand in the lab, you will interact with a submission server that we are running. This requires a one-time setup, good for this and all labs in this course. Here are the steps:
cs202-auth-<your-netid>
. To
do this step, you can use scp, copy/paste, http fetch, etc.$ make handinThis command does the following, on your behalf. It executes make clean (which removes .o files, executables, etc.). It then creates a copy of your
lab1
directory, placing the
copy in an archive (it uses the
tar program for this purpose. Historical note: tar stands for
"Tape ARchive", because the program was designed to backup files to
tape). It then compresses the archive, creating a bundle called
lab1-handin.tgz
(such bundles are sometimes known as
tarballs). Last, make handin invokes a
script that submits this tarball to our submission server.
You should see something like this:
CLEAN CREATE lab1-handin.tgz SUBMITTING ... ... progress ... Submission successful
Last updated: Mon May 04 11:24:46 -0400 2015 [validate xhtml]