This lab will reinforce your prior experience in the C programming language and give you practice using certain command-line tools. The overall comfort you’ll gain (or reinforce) with the Unix computing environment will be helpful for future labs.
Getting started
You will perform the labs using the CS202 Docker environment, and pull/push code with Git and GitHub. It's time for you to set up that infrastructure:
Before you go further. Go to the setup page, and follow the instructions there. Specifically:
- Get a GitHub account if you don’t have one already
- Set up your Git repository (setup SSH keys including ssh-agent, clone on GitHub using GitHub Classroom, clone to your local machine, set upstream)
- Read through the git FAQs
- Install Docker
- Build and run the CS202 Docker image
Come back here when you’ve done these.
Workflow. We have configured Docker so
that the directory from which you run ./cs202-run-docker
on
your host machine (usually this is cs202
) shows up in
Docker as cs202-labs
. This means that you can use your
preferred IDE or editor on your host machine to edit code while
compiling, running, and testing inside Docker. Depending on your setup,
it may be helpful for you to have at least two windows up (one for
editing and one with Docker).
Programming style. In this and subsequent labs, you will be graded for style. We will deduct up to 20% of total lab points for poor style based on our subjective evaluation. Please read this style guide.
Section 0: Expectations
The exercises in this lab are elementary. That means that an AI code assistant, such as GitHub CoPilot or ChatGPT, can possibly complete many or all of the exercises perfectly. That is unavoidable: the same kinds of exercises that teach beginners are also feasible for current-era AI. For that reason, this lab will be graded under homework rules (non-strictly, and a very low weighting of your grade). That does not mean the lab is skippable; read on.
Our quizzes, exams, and future labs will assume that you have acquired the skills in this lab. Therefore, we strongly encourage you to work this lab without AI assistance. We put teeth behind that encouragement. Specifically, we will look unfavorably on the following:
Requests for setup help and tech support in future weeks that are consistent with not having worked lab 1 on your own.
Strong submissions by students who are unable to perform the corresponding tasks on quizzes and exams.
Nonsensical semantic errors of the kind produced by current-era AI assistants.
Most likely, AI assistants are not going away. But in order to learn something to the point where you can drive the AI sensibly (not to mention exceeding its capabilities), you have to actually learn yourself, which means doing exercises on your own.
Section 1: C review
All of the projects in this class will be done in C or C++, for two main reasons. First, some of the things that we want to implement require direct manipulation of hardware state and memory, which are operations that are naturally expressed in C. Second, C and C++ are widely used languages, so learning them is a useful thing to do in its own right. You have some experience in C from CS201, and you will need to build on that here.
If you are interested in why C looks like it does, we encourage you to look at Ritchie's history. Here, perhaps, is the key quotation: "Despite some aspects mysterious to the beginner and occasionally even to the adept, C remains a simple and small language, translatable with simple and small compilers. Its types and operations are well-grounded in those provided by real machines, and for people used to how computers work, learning the idioms for generating time- and space-efficient programs is not difficult. At the same time the language is sufficiently abstracted from machine details that program portability can be achieved."
You will do short exercises in C, as a warmup, and to help review
some of what you learned in CS201 about C. Enter the Docker environment
(./cs202-run-docker
, per the setup
page); you’ll know you’re in the Docker environment if you see a
prompt like this:
cs202-user@af8b1be95427:~/cs202-labs$
then type cd lab1/mini
after the prompt:
cs202-user@af8b1be95427:~/cs202-labs$ cd lab1/mini
cs202-user@af8b1be95427:~/cs202-labs/lab1/mini$
Shell prompt. From here forward, we will
not include the cs202-user@...
part of the shell prompt in
this description, but it should be present on your machine. If it is
not, then you are not in the Docker environment.
(Depending on your CS201 section, you may have seen very similar, or the same, exercises; that’s fine. If you’re not absolutely fluent in C programming, then doing them again will be useful.)
Exercise 1. Implement the functions in
part1.c
For Exercises 1 through 3, you can test with:
$ make
$ build/part1
You should have seen make
in CSO. Recall that it is an
automatic way to build software. In this case, it invokes the compiler
and links your compiled code to a small test harness that we
provide.
$ ./build/part1
part1: set_to_fifteen OK
part1: array_sum OK
If your part1.c is correctly implemented, you will see OK, as above.
You will need to remove the assert(0);
line. Remove this
as you implement functions in future exercises; it is a reminder that
you have yet to implement a particular function.
As you write your code and improve it (fixing bugs, adding functionality), you should get in the habit of syncing your changes to the master copy of your labs repository on GitHub, for example:
$ git commit -am 'my solution for lab1 exercise1'
Created commit 60d2135: my solution for lab1 exercise1
1 files changed, 1 insertions(+), 0 deletions(-)
$ git push origin
The commit
keeps the history of changes to your code,
and so allows you to revert to an older version if you find that a
change causes a regression. The push
serves to back up your
code on GitHub’s servers, so you won’t lose work if your local working
copy is corrupted or lost.
Note: If git push origin
does not work
from within Docker and you are on a Mac, then make sure your
ssh-agent
is running. See here. If you are not on a Mac, you can
either push from outside Docker using your host’s git
or
you can use GitHub’s Personal
Access Tokens.
Exercise 2. Implement the functions
set_point
and point_dist
in
part2.c
.
As above, you compile and test with:
$ make
Now you should see
$ ./build/part2
part2: set_point OK
part2: point_dist OK
Note that you can keep track of your changes by using the
git diff
command. Running git diff
will
display the changes to your code since your last commit, and
git diff origin/main
will display the changes relative to
the initial code supplied for this lab. Here, origin/main
is the name of the git branch with the initial code you downloaded from
our server for this assignment.
Exercise 3. Implement the linked_list
utility functions in part3.c
.
Test it:
$ make
$ ./build/part3
part3: list_insert OK
part3: list_end OK
part3: list_size OK
part3: list_find OK
part3: list_remove OK
Debugging note: if you are having trouble with this piece, you may wish to skip to the section covering gdb, below, and come back here. You would do:
$ gdb build/part3 core
(gdb) bt
(Make sure you’ve issued the ulimit
command below.)
Another debugging note: the
assert
lines we’ve been seeing are a simple application of
a powerful tool. You may wish to use asserts yourself, to help debug
your linked list functions.
In C, an assert
is a preprocessor macro which
effectively enforces a contract in the program. (You can read more about
macros in C here.) The
contract that assert
enforces is simple: when program
execution reaches assert(<condition>)
, if
condition
is true, execution continues; otherwise, the
program aborts with an error message.
Assertions, when used properly, are powerful because they allow the programmer who uses them to guarantee that certain assumptions about the code hold. For example, you will see assertions like:
assert(head != NULL);
This assertion enforces the contract that the parameter head cannot
be NULL
. If these assertions were not present and we tried
to dereference head
, for example with *head
,
we would encounter a type of error called a segmentation
violation (or segmentation fault). This is because
dereferencing the NULL
address is invalid;
NULL
points to "nothing". Later in the lab, we will get
some experience with segmentation faults.
But by using assertions, we guarantee that list_end
(for
example) will never try to dereference a head
variable at
NULL
, saving us the headache of having to debug a
segmentation fault if some code tried to pass us a NULL
value. Instead, we will get a handy error message describing exactly
what contract of the function was invalidated.
Use your own asserts to make debugging easier!
Advice on debugging common problems encountered in doing this lab
Remember to recompile changed code Whenever you’ve changed a file, always type
make
to re-compile before executing again.Write your own simple test code. Don’t rely solely on the lab’s testing infrastructure (i.e. ./grade-lab) to test the correctness of your code. We don’t distribute the source of the harness code; this makes it hard to debug. So you should write your own tester. Let’s say you want to test the
array_sum
function in the filepart1.c
. To write your own test code, create a file (e.g. calledtest-part1.c
) with amain
function that invokesarray_sum
in various ways to test its correctness.An example
test-part1.c
might look something like this.Compile your test code by typing:
$ gcc -pedantic -Wall -std=c11 -g -o test_part1 test_part1.c part1.c
Use
gdb
As we noted above, and as will be covered below.
C programs from scratch
In the rest of Section 1 of the lab, you will get practice writing and compiling C programs from scratch. Knowing how to create a standalone C program is useful in its own right, and it will be helpful context for lab2.
We will quickly walk through how to write and compile a program in C, and then you will write and compile several programs of your own.
Using a text editor, create a file called fun.c
. In this
file, type:
#include <stdio.h>
int main(int argc, char** argv)
{
char* first_arg;
char* second_arg;
/* this checks the number of arguments passed in */
if (argc != 3) {
printf("usage: %s <arg1> <arg2>\n", argv[0]);
return 0;
}
first_arg = argv[1];
second_arg = argv[2];
printf("My program was given two arguments: %s %s\n",
first_arg, second_arg);
return 0;
}
This is a complete C program. The #include
at the top
tells the compiler to use the header files of "standard I/O" (the
standard input/ouput functions of the C library; these functions include
printf
). Also, argc
contains the number of
arguments passed to the program (including the program name itself)
while argv[0]
contains the name of the program that was
invoked.
You can compile this program using:
$ gcc -pedantic -Wall -std=c11 -g -o fun fun.c
You can now run this program using:
$ ./fun
You should see:
usage: ./fun <arg1> <arg2>
You can also do:
$ ./fun abc def
My program was given two arguments: abc def
Note the pattern here:
you create a C file and compile it with
$ gcc [flags] -o <output-file-name> <input-file-name>
You run your program using whatever was after the
-o
flag.Inside the program, you gain access to the arguments using the
argv
array. (As we’ll see in this semester, the OS syscallexec()
takes an argv; the OS takes that array and passes it to the new program'smain()
function.)
Exercise 4. Write, from scratch, a C program that takes two arguments and prints the first three characters of each string. If either of the two arguments has fewer than three characters, print an error message, and end gracefully (as opposed to core dumping).
For example:
$ ./first3 a b
./first3: error: one or more arguments have fewer than 3 characters
$ ./first3 abcd wxy
abcwxy
Instructions:
You may use the function
strlen()
. If you do, include#include <string.h>
at the beginning of the program.Put all your files in
~/cs202/lab1/fromscratch/first3
.Your executable file must be named
first3
.You must supply a Makefile. When we type
make
, it needs to create a binary executable namedfirst3
.Be diligent about creating test cases for yourself. You will lose points for not handling corner cases. You may wish to create a script that runs and tests your code on various cases. (You do not have to hand in your testing code, however; we will run our own testing scripts against your code.)
Pay attention to coding style.
Remember to add relevant source files and the Makefile to your git repo. Type
git status
, and if any of the files look relevant, add and commit them:$ git status $ git add *.c *.h Makefile $ git commit -am "first3" $ git push origin
Exercise 5. Write, from scratch, a C
program that takes a single argument and prints the number of ascii ‘a’
characters in the string. Don’t use the function
strlen()
; programs using strlen()
will get 0 points. The purpose of this exercise is in part to give you
familiarity with so-called “null-terminated strings” in C. A
null-terminated string is a pointer to an array of characters
(char *
), in which the end of the string is indicated by
the character ‘\0’, which has value 0, or NULL
.
For example:
$ ./countas abbba
2
$ ./countas aaaaa
5
$ ./countas bcdefghwxyz
0
$ ./countas
usage: countas <arg>
$ ./countas aaaaa bbbbbb
usage: countas <arg>
Instructions:
- Again, don’t use
strlen()
. - Put all your files in
~/cs202/lab1/fromscratch/countas
. - Your executable file must be named
countas
. - If
countas
receives any number of arguments other than 1, it should print a help/usage message (see above). - You must supply a Makefile. When we type
make
, it needs to create a binary executable namedcountas
. - Be diligent about creating test cases for yourself. You will lose points for not handling corner cases. You may wish to create a script that runs and tests your code on various cases. (You do not have to hand in your testing code, however; we will run our own testing scripts against your code.)
- Pay attention to coding style.
- As above, remember to add relevant files (source file, Makefile) to
your git repo:
git status ; git add [...] ; git commit -am "countas" ; git push origin
.
Section 2: Debugging
This part of the lab will give you practice debugging.
Put your answers for this section and in the next in the supplied
answers.txt
file.
Navigating to syntax errors
Go to the dbg
directory in lab1
:
$ cd ~/cs202-labs/lab1/dbg
Now type:
$ make
The code has a syntax error; thus, it cannot be compiled.
Exercise 6. Fix the syntax error.
Use the compiler's error message to determine what's wrong. Note that
in the vi
text editor (discussed below) you can navigate to
a given line of code :<number>
. You can also
launch vi
with its cursor in place:
$ vi +<number> <filename>
to begin directly on a given line number. For example,
vi +5 foo.c
begins with the cursor at line 5.
After you fix the syntax error, the code will compile. Use
make
to see this:
$ make
Now, try testing the code itself, using the small
test_linked_list
utility:
$ ./test_linked_list
Segmentation fault (core dumped)
Aha! Our code compiled, but it was not correct (core dumps are bad).
Specifically, the segmentation fault means that our program issued an
illegal memory reference, and the operating system ended our process.
Making matters worse, we have no idea what the problem in the code is.
In the following section, you will learn how to use gdb
to
debug this kind of problem.
Run gdb: Use the GNU debugger, or gdb
to run the program:
$ gdb test_linked_list
(gdb)
Set breakpoints: One thing that you might want to do
is to set a breakpoint before the program begins executing.
Breakpoints are a way of telling gdb
that you want it to
execute your program and then stop, or break, at a place that
you define. Use the following command to set a breakpoint at the
main
function:
(gdb) b main
Breakpoint 1 at 0x400963: file test_linked_list.c, line 43.
Then use gdb's command run
to actually start the program
(this is the general pattern in gdb: one invokes the debugger, perhaps
sets a breakpoint, and then starts the program with
run
):
(gdb) run
The program will be stopped when it reaches the breakpoint (advanced
topic: how does gdb conspire with the hardware to make this work??). At
this point, you will be presented with gdb's command prompt again. To
see the “call stack” (or stack trace), which is the list of
functions that have called this one – literally, the stack frames on top
of the current one – you issue backtrace
or bt
for short:
(gdb) bt
Experienced developers will often ask for a stack trace as step 0 or
1 of understanding a code problem. Get in the habit of asking
gdb
to give you a backtrace.
To make the program continue running after a breakpoint, use
continue
, or c
for short:
(gdb) c
Step through the code: Of course, if you just
c
every time you hit a breakpoint, then you will lose
control of the program. You often want the command next
, or
n
:
(gdb) n
This "executes" the next line of code, for example executing an
entire function. (The command step
executes a single line
of C code. There is little difference between step
and
next
unless you are about to enter a function.
step
steps into the function; next
"steps
over" the function.)
Inspect the values of
variables: In gdb's command prompt, the program is
stalled. You can query the program's current global and local variables
with the print
command, or p
for short.
Run gdb
on test_linked_list
. Set a
breakpoint at the function list_delete
.
At this breakpoint, determine the value of the integer
id
:
(gdb) print id
$1 = 1
This means that variable id
holds the integer 1.
Aside: you can check local variables' names using:
(gdb) info local
Core dump: If a program terminated abnormally (for
example, test_linked_list
), the state of the program will
be recorded by the OS and (if core dumps are enabled) saved in a
so-called core dump. gdb
can use a core dump to inspect a
crash situation.
To debug using core dumps, you must first enable core dumps, and then point gdb at the relevant file. We'll do this in several steps:
# enable core dumps
$ ulimit -c unlimited
$ ./test_linked_list
$ ls -l core
$ gdb ./test_linked_list core
The idea here is that the core file gives gdb
enough
information to recover the memory and CPU state of the program at the
moment of the crash. This will allow you to determine which instructions
experienced the error.
Exercise 7. Fix the bug in the code.
Recompile and rerun to make sure it is fixed. State the line of code and
the fix in answers.txt
.
Section 3: Software development skills and general productivity
This part of the lab will walk through general software development skills as well as some tools to enhance your productivity. You may feel at first that these tools are slowing you down ("how do I enter characters in vi???"), but comfort and fluency with them will pay off in the long term. It's worth slowing down and "learning proper technique."
A good text editor is essential
If you are not doing so already, begin using a powerful text editor.
Traditionally on Unix systems, the most popular editors were
vi
(which in most environments these days is an alias for a
more powerful version, called vim
) or emacs
,
both of which are installed in the Docker environment. Of course, many
of you will be using VSCode, an IDE that builds in many modern tools for
software development.
This section will be with reference to vi. To get started, work through this vi tutorial.
Navigating code: ctags
We will investigate ctags
with the Linux code base as a
running example.
Create a new directory inside the Docker environment, and then change into it:
$ mkdir ~/cs202-labs/learn-ctags
# (below, !! refers to the last shell command while $ refers to the last argument)
$ cd !!:$
Now download and unpack the Linux source code archive. We are interested in the kernel source and header files:
$ wget https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.2.9.tar.xz
# the line above could take a few minutes
$ tar -xf linux-5.2.9.tar.xz linux-5.2.9/include linux-5.2.9/kernel
$ cd linux-5.2.9
$ ls
include/ kernel/
Now, you are going to use the ctags program. Type:
$ ctags --recurse=yes *
# this could take time
The ctags
utility searches a body of code and creates a
tag file, which is an index of programming language objects,
such as function and variable definitions. The command above invokes
ctags
recursively on all items in the entire directory
(*
) in order to create a single tags file that includes
information from all files in the repository.
The rest of this section will use vi
to interact with
tags. However, emacs
also supports tags.
Using vi
, open the header file for the linux
scheduler:
$ vi include/linux/sched.h
We will now walk through some uses of tags. When we type
:something
, this refers to a command given in vi's
command mode. To get into command mode in vi, you type the
escape key (vi by default starts in command mode; the editor enters
insert mode when you type i, e, a, etc.). Once in command mode,
typing :something
does the "something". (For example, try
pressing escape and then :q
.)
Now, let's get started with tags. Type:
:tag sched_attr
This will take you to the definition of sched_attr
. To
return to your previous position, press ctrl-t
.
The tag file that we created indexed all of Linux's source. We can jump to a tag in another file and it will automatically be opened, as we illustrate in the following exercise.
Exercise 8. Type within vi:
:tag block_device_operations
What file does this open? (You can use the command :ls
inside of vi to check this.) Put the answer in
answers.txt
.
You can use ctrl-]
to jump to the definition of the
object underneath the cursor. Go to line 1001 in
include/linux/sched.h
(navigate there with
:1001
, since :<number>
in vi takes you
to the specified line number). Move your cursor over the token
futex_pi_state
. Press ctrl+]
.
Exercise 9. What is
futex_pi_state
? In what file and line is it defined?
A tag may have multiple definitions. type:
:tselect list_head
and note that a menu pops up giving you the choice of which definition to view.
Tags can also be used with regular expressions. For example, to look
up all tags containing proc_sched
, enter the following:
:tselect /proc_sched
And the following command shows tags that contain
proc_sched
and task
(in that order):
:tselect /proc_sched.*task
Exercise 10. How many functions does the
linux kernel have that contain proc_sched
and
task
? Where are they defined?
For more information about tags, this resource is good.
General productivity:
find
We now cover a very powerful Unix tool: find
.
At a high level, find by default recurses through directories, "looking" at files, to see which files match a provided predicate. Here is the pattern for its use:
$ find [flags] [path...] [expression]
Now cd
to the root directory of the linux kernel that
you downloaded earlier. Type this command:
$ find .
.
./kernel
./kernel/panic.c
./kernel/Kconfig.locks
./kernel/params.c
./kernel/.gitignore
./kernel/memremap.c
./kernel/freezer.c
./kernel/sysctl_binary.c
.....
This recursively prints all files in the current directory (it will print the names of tens of thousands of files).
You can also search on file name patterns:
$ find . -name "sched.h"
./kernel/sched/sched.h
./include/uapi/linux/sched.h
./include/linux/sched.h
./include/linux/sunrpc/sched.h
./include/asm-generic/bitops/sched.h
./include/xen/interface/sched.h
./include/trace/events/sched.h
And you can limit the search to particular directories:
$ find include/linux include/asm-generic -name "sched.h"
include/linux/sched.h
include/linux/sunrpc/sched.h
include/asm-generic/bitops/sched.h
As with most Unix commands, wildcards can be used:
$ find . -name "sched.*"
./kernel/sched/sched.h
./include/uapi/linux/sched.h
./include/linux/sched.h
./include/linux/sunrpc/sched.h
./include/asm-generic/bitops/sched.h
./include/xen/interface/sched.h
./include/trace/events/sched.h
This returns all files named "sched" with any extension.
Where find really starts to be powerful is that you can execute commands on the files that are returned. For example:
$ find . -name "sched.h" -exec grep foobar {} \;
This says, "return all file names that match sched.h and grep each of them for the string foobar."
Exercise 11. How many header files are there in the linux source code?
Hint: The unix program wc
can be used to count words or
lines from a file or standard input. How can we get the output from
find
to wc
?
More information on find
can be found here.
We will cover other tools (such as grep
) as the semester
goes on.
Handin Procedure
Handing in consists of three steps:
- Executing this checklist:
- Fill out the top of the
answers.txt
file, including your name and NYU Id - Make sure you’ve answered every question in
answers.txt
- Create a file called
slack.txt
noting how many slack days you have used for this assignment. (This is to help us agree on the number that you have used.) Include this file even if you didn’t use any slack days.
- Fill out the top of the
- Push your code to GitHub, so we have it:
Execute these instructions either from your local machine or within the Docker environment:
```
$ cd cs202-labs/lab1
$ git commit -am "hand in lab1"
$ git push origin
Counting objects: ...
....
To ssh://github.com/nyu-cs202/labs-23sp-<username>.git
7337116..ceed758 main -> main
```
Actually submit, by timestamping and identifying your pushed code:
- Decide which git commit you want us to grade, and copy its id (you
will paste it in the next sub-step). A commit id
is a 40-character hexadecimal string. Usually the commit id that you
want will be the one that you created last. The easiest way to obtain
the commit id for the last commit is by running the command
git log -1 --format=oneline
. This prints both the commit id and the initial line of the commit message. If you want to submit a previous commit, there are multiple ways to get the commit id for an earlier commit. One way is to look at the commit history on GitHub. Another isgit log -p
, as explained here, orgit show
. - Now go to NYU Brightspace; there will be an entry for this lab. Paste only the commit id that you just copied.
- You can submit as many times as you want; we will grade the last commit id submitted to Brightspace.
- Decide which git commit you want us to grade, and copy its id (you
will paste it in the next sub-step). A commit id
is a 40-character hexadecimal string. Usually the commit id that you
want will be the one that you created last. The easiest way to obtain
the commit id for the last commit is by running the command
NOTE: Ground truth is what and when you
submitted to Brightspace. Thus, a non-existent commit id in Brightspace
means that you have not submitted the lab, regardless
of what you have pushed to GitHub. And, the time of your submission for
the purposes of tracking lateness is the time when you upload the id to
Brightspace, not the time when you executed
git commit
.
This completes the lab.