CS202: Lab 1: C review, gdb, software skills

CS202: Lab 1: C review, gdb, software skills

This lab will reinforce your prior experience in the C programming language and give you practice using certain command-line tools. The overall comfort you’ll gain (or reinforce) with the Unix computing environment will be helpful for future labs.

Important: The use of any AI assistants or AI-powered tools is strictly prohibited for this lab assignment. This includes, but is not limited to, chatbots (e.g., ChatGPT, Claude), code completion tools (e.g., GitHub Copilot), and any other AI-based helpers for coding, debugging, or answering questions. Any use of such tools will be considered a violation of academic integrity.

Getting started

You will perform the labs using the CS202 Docker environment, and pull/push code with Git and GitHub. It's time for you to set up that infrastructure:

Before you go further. Go to the setup page, and follow the instructions there. Specifically:

  1. Install Docker (after installing WSL, if on Windows)
  2. Get a GitHub account if you don’t have one already
  3. Set up your Git repository (setup SSH keys including ssh-agent, clone on GitHub using GitHub Classroom, clone to your local machine, set upstream)
  4. Read through the Git FAQs
  5. Build and run the CS202 Docker image

Come back here when you’ve done these.

Workflow. We have configured Docker so that the directory from which you run ./cs202-run-docker on your host machine (usually this is cs202) shows up in Docker as cs202-labs. This means that you can use your preferred IDE or editor on your host machine to edit code while compiling, running, and testing inside Docker. Depending on your setup, it may be helpful for you to have at least two windows up (one for editing and one with Docker).

Programming style. In this and subsequent labs, you will be graded for style. We will deduct up to 20% of total lab points for poor style based on our subjective evaluation. Please read this style guide.

Section 0: Expectations

The exercises in this lab are elementary. Therefore, this lab will be graded under homework rules (non-strictly, and a very low weighting of your grade). That does not mean the lab is skippable; read on.

Our quizzes, exams, and future labs will assume that you have acquired the skills in this lab. We put teeth behind that encouragement. Specifically, we will look unfavorably on the following:

Section 1: C review

All of the projects in this class will be done in C or C++, for two main reasons. First, some of the things that we want to implement require direct manipulation of hardware state and memory, which are operations that are naturally expressed in C. Second, C and C++ are widely used languages, so learning them is a useful thing to do in its own right. You have some experience in C from CS201, and you will need to build on that here.

If you are interested in why C looks like it does, we encourage you to look at Ritchie's history. Here, perhaps, is the key quotation: "Despite some aspects mysterious to the beginner and occasionally even to the adept, C remains a simple and small language, translatable with simple and small compilers. Its types and operations are well-grounded in those provided by real machines, and for people used to how computers work, learning the idioms for generating time- and space-efficient programs is not difficult. At the same time the language is sufficiently abstracted from machine details that program portability can be achieved."

You will do short exercises in C, as a warmup, and to help review some of what you learned in CS201 about C. Enter the Docker environment (./cs202-run-docker, per the setup page); you’ll know you’re in the Docker environment if you see a prompt like this:

cs202-user@af8b1be95427:~/cs202-labs$ 

then type cd lab1/mini after the prompt:

cs202-user@af8b1be95427:~/cs202-labs$ cd lab1/mini
cs202-user@af8b1be95427:~/cs202-labs/lab1/mini$

Shell prompt. From here forward, we will not include the cs202-user@... part of the shell prompt in this description, but it should be present on your machine. If it is not, then you are not in the Docker environment.

(Depending on your CS201 section, you may have seen very similar, or the same, exercises; that’s fine. If you’re not absolutely fluent in C programming, then doing them again will be useful.)

Exercise 1. Implement the functions in part1.c

For Exercises 1 through 3, you can test with:

$ make
$ build/part1

You should have seen make in CSO. Recall that it is an automatic way to build software. In this case, it invokes the compiler and links your compiled code to a small test harness that we provide.

$ ./build/part1
part1: set_to_fifteen OK
part1: array_sum OK

If your part1.c is correctly implemented, you will see OK, as above.

You will need to remove the assert(0); line. Remove this as you implement functions in future exercises; it is a reminder that you have yet to implement a particular function.

As you write your code and improve it (fixing bugs, adding functionality), you should get in the habit of syncing your changes to the master copy of your labs repository on GitHub, for example:

$ git commit -am 'my solution for lab1 exercise1'
Created commit 60d2135: my solution for lab1 exercise1
 1 files changed, 1 insertions(+), 0 deletions(-)
$ git push origin

The commit keeps the history of changes to your code, and so allows you to revert to an older version if you find that a change causes a regression. The push serves to back up your code on GitHub’s servers, so you won’t lose work if your local working copy is corrupted or lost.

Note: If git push origin does not work from within Docker and you are on a Mac, then make sure your ssh-agent is running. See here. If you are not on a Mac, you can either push from outside Docker using your host’s git or you can use GitHub’s Personal Access Tokens.

Exercise 2. Implement the functions set_point and point_dist in part2.c.

As above, you compile and test with:

$ make

Now you should see

$ ./build/part2
part2: set_point OK
part2: point_dist OK

Note that you can keep track of your changes by using the git diff command. Running git diff will display the changes to your code since your last commit, and git diff origin/main will display the changes relative to the initial code supplied for this lab. Here, origin/main is the name of the git branch with the initial code you downloaded from our server for this assignment.

Exercise 3. Implement the linked_list utility functions in part3.c.

Test it:

$ make
$ ./build/part3
part3: list_insert OK
part3: list_end OK
part3: list_size OK
part3: list_find OK
part3: list_remove OK

Debugging note: if you are having trouble with this piece, you may wish to skip to the section covering gdb, below, and come back here. You would do:

$ gdb build/part3 core
(gdb) bt

(Make sure you’ve issued the ulimit command below.)

Another debugging note: the assert lines we’ve been seeing are a simple application of a powerful tool. You may wish to use asserts yourself, to help debug your linked list functions.

In C, an assert is a preprocessor macro which effectively enforces a contract in the program. (You can read more about macros in C here.) The contract that assert enforces is simple: when program execution reaches assert(<condition>), if condition is true, execution continues; otherwise, the program aborts with an error message.

Assertions, when used properly, are powerful because they allow the programmer who uses them to guarantee that certain assumptions about the code hold. For example, you will see assertions like:

assert(head != NULL);

This assertion enforces the contract that the parameter head cannot be NULL. If these assertions were not present and we tried to dereference head, for example with *head, we would encounter a type of error called a segmentation violation (or segmentation fault). This is because dereferencing the NULL address is invalid; NULL points to "nothing". Later in the lab, we will get some experience with segmentation faults.

But by using assertions, we guarantee that list_end (for example) will never try to dereference a head variable at NULL, saving us the headache of having to debug a segmentation fault if some code tried to pass us a NULL value. Instead, we will get a handy error message describing exactly what contract of the function was invalidated.

Use your own asserts to make debugging easier!

Advice on debugging common problems encountered in doing this lab

C programs from scratch

In the rest of Section 1 of the lab, you will get practice writing and compiling C programs from scratch. Knowing how to create a standalone C program is useful in its own right, and it will be helpful context for lab2.

We will quickly walk through how to write and compile a program in C, and then you will write and compile several programs of your own.

Using a text editor, create a file called fun.c. In this file, type:

#include <stdio.h>

int main(int argc, char** argv)
{
    char* first_arg; 
    char* second_arg; 

    /* this checks the number of arguments passed in */
    if (argc != 3) {
        printf("usage: %s <arg1> <arg2>\n", argv[0]);
        return 0;
    }

    first_arg = argv[1];
    second_arg = argv[2];

    printf("My program was given two arguments: %s %s\n",
           first_arg, second_arg);

    return 0;

}

This is a complete C program. The #include at the top tells the compiler to use the header files of "standard I/O" (the standard input/ouput functions of the C library; these functions include printf). Also, argc contains the number of arguments passed to the program (including the program name itself) while argv[0] contains the name of the program that was invoked.

You can compile this program using:

$ gcc -pedantic -Wall -std=c11 -g -o fun fun.c

You can now run this program using:

$ ./fun

You should see:

usage: ./fun <arg1> <arg2>

You can also do:

$ ./fun abc def
My program was given two arguments: abc def

Note the pattern here:

Exercise 4. Write, from scratch, a C program that takes two arguments and prints the first three characters of each string. If either of the two arguments has fewer than three characters, print an error message, and end gracefully (as opposed to core dumping).

For example:

$ ./first3 a b
./first3: error: one or more arguments have fewer than 3 characters

$ ./first3 abcd wxy
abcwxy

Instructions:

  • You may use the function strlen(). If you do, include #include <string.h> at the beginning of the program.

  • Put all your files in ~/cs202/lab1/fromscratch/first3.

  • Your executable file must be named first3.

  • You must supply a Makefile. When we type make, it needs to create a binary executable named first3.

  • Be diligent about creating test cases for yourself. You will lose points for not handling corner cases. You may wish to create a script that runs and tests your code on various cases. (You do not have to hand in your testing code, however; we will run our own testing scripts against your code.)

  • Pay attention to coding style.

  • Remember to add relevant source files and the Makefile to your git repo. Type git status, and if any of the files look relevant, add and commit them:

      $ git status
      $ git add *.c *.h Makefile
      $ git commit -am "first3"
      $ git push origin

Exercise 5. Write, from scratch, a C program that takes a single argument and prints the number of ascii ‘a’ characters in the string. Don’t use the function strlen(); programs using strlen() will get 0 points. The purpose of this exercise is in part to give you familiarity with so-called “null-terminated strings” in C. A null-terminated string is a pointer to an array of characters (char *), in which the end of the string is indicated by the character ‘\0’, which has value 0, or NULL.

For example:

$ ./countas abbba
2

$ ./countas aaaaa
5

$ ./countas bcdefghwxyz
0

$ ./countas
usage: countas <arg>

$ ./countas aaaaa bbbbbb
usage: countas <arg>

Instructions:

  • Again, don’t use strlen().
  • Put all your files in ~/cs202/lab1/fromscratch/countas.
  • Your executable file must be named countas.
  • If countas receives any number of arguments other than 1, it should print a help/usage message (see above).
  • You must supply a Makefile. When we type make, it needs to create a binary executable named countas.
  • Be diligent about creating test cases for yourself. You will lose points for not handling corner cases. You may wish to create a script that runs and tests your code on various cases. (You do not have to hand in your testing code, however; we will run our own testing scripts against your code.)
  • Pay attention to coding style.
  • As above, remember to add relevant files (source file, Makefile) to your git repo: git status ; git add [...] ; git commit -am "countas" ; git push origin.

Section 2: Debugging

This part of the lab will give you practice debugging.

Put your answers for this section and in the next in the supplied answers.txt file.

Go to the dbg directory in lab1:

$ cd ~/cs202-labs/lab1/dbg

Now type:

$ make

The code has a syntax error; thus, it cannot be compiled.

Exercise 6. Fix the syntax error.

Use the compiler's error message to determine what's wrong. Note that in the vi text editor (discussed below) you can navigate to a given line of code :<number>. You can also launch vi with its cursor in place:

$ vi +<number> <filename>

to begin directly on a given line number. For example, vi +5 foo.c begins with the cursor at line 5.

After you fix the syntax error, the code will compile. Use make to see this:

$ make

Now, try testing the code itself, using the small test_linked_list utility:

$ ./test_linked_list
Segmentation fault (core dumped)

Aha! Our code compiled, but it was not correct (core dumps are bad). Specifically, the segmentation fault means that our program issued an illegal memory reference, and the operating system ended our process. Making matters worse, we have no idea what the problem in the code is. In the following section, you will learn how to use gdb to debug this kind of problem.

Run gdb: Use the GNU debugger, or gdb to run the program:

$ gdb test_linked_list
(gdb)

Set breakpoints: One thing that you might want to do is to set a breakpoint before the program begins executing. Breakpoints are a way of telling gdb that you want it to execute your program and then stop, or break, at a place that you define. Use the following command to set a breakpoint at the main function:

(gdb) b main
Breakpoint 1 at 0x400963: file test_linked_list.c, line 43.

Then use gdb's command run to actually start the program (this is the general pattern in gdb: one invokes the debugger, perhaps sets a breakpoint, and then starts the program with run):

(gdb) run

The program will be stopped when it reaches the breakpoint (advanced topic: how does gdb conspire with the hardware to make this work??). At this point, you will be presented with gdb's command prompt again. To see the “call stack” (or stack trace), which is the list of functions that have called this one – literally, the stack frames on top of the current one – you issue backtrace or bt for short:

(gdb) bt

Experienced developers will often ask for a stack trace as step 0 or 1 of understanding a code problem. Get in the habit of asking gdb to give you a backtrace.

To make the program continue running after a breakpoint, use continue, or c for short:

(gdb) c

Step through the code: Of course, if you just c every time you hit a breakpoint, then you will lose control of the program. You often want the command next, or n:

(gdb) n

This "executes" the next line of code, for example executing an entire function. (The command step executes a single line of C code. There is little difference between step and next unless you are about to enter a function. step steps into the function; next "steps over" the function.)

Inspect the values of variables: In gdb's command prompt, the program is stalled. You can query the program's current global and local variables with the print command, or p for short.

Run gdb on test_linked_list. Set a breakpoint at the function list_delete.

At this breakpoint, determine the value of the integer id:

(gdb) print id
$1 = 1

This means that variable id holds the integer 1.

Aside: you can check local variables' names using:

(gdb) info local

Core dump: If a program terminated abnormally (for example, test_linked_list), the state of the program will be recorded by the OS and (if core dumps are enabled) saved in a so-called core dump. gdb can use a core dump to inspect a crash situation.

To debug using core dumps, you must first enable core dumps, and then point gdb at the relevant file. We'll do this in several steps:

# enable core dumps
$ ulimit -c unlimited 
$ ./test_linked_list
$ ls -l core
$ gdb ./test_linked_list core

The idea here is that the core file gives gdb enough information to recover the memory and CPU state of the program at the moment of the crash. This will allow you to determine which instructions experienced the error.

No core file? If you are on Windows, and you do not see the core file produced in your current directory, then do from within WSL (not Docker):

$ sudo sysctl -w kernel.core_pattern=/tmp/core

and then look for a file with a name like coreXXXX in the /tmp directory within Docker (the XXXX stands for a number).

Exercise 7. Fix the bug in the code. Recompile and rerun to make sure it is fixed. State the line of code and the fix in answers.txt.

Section 3: Software development skills and general productivity

This part of the lab will walk through general software development skills as well as some tools to enhance your productivity. You may feel at first that these tools are slowing you down ("how do I enter characters in vi???"), but comfort and fluency with them will pay off in the long term. It's worth slowing down and "learning proper technique."

A good text editor is essential

If you are not doing so already, begin using a powerful text editor. Traditionally on Unix systems, the most popular editors were vi (which in most environments these days is an alias for a more powerful version, called vim) or emacs, both of which are installed in the Docker environment. To get started with vi, work through this vi tutorial or see the editors unit in The Missing Semester of your CS education. Of course, many of you will be using VSCode, an IDE that builds in many modern tools for software development.

General productivity: find

We will use the Linux code base as a running example.

Create a new directory inside the Docker environment, and then change into it:

$ mkdir ~/cs202-labs/learn-find
# (below, !! refers to the last shell command while $ refers to the last argument)
$ cd !!:$

Now download and unpack the Linux source code archive. We are interested in the kernel source and header files:

$ wget https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.2.9.tar.xz
# the line above could take a few minutes
$ tar -xf linux-5.2.9.tar.xz linux-5.2.9/include linux-5.2.9/kernel
$ cd linux-5.2.9
$ ls 
include/ kernel/

Now, you are going to use a very powerful Unix tool: find.

At a high level, find by default recurses through directories, "looking" at files, to see which files match a provided predicate. Here is the pattern for its use:

$ find [flags] [path...] [expression]

Now cd to the root directory of the linux kernel that you downloaded earlier. Type this command:

$ find .
.
./kernel
./kernel/panic.c
./kernel/Kconfig.locks
./kernel/params.c
./kernel/.gitignore
./kernel/memremap.c
./kernel/freezer.c
./kernel/sysctl_binary.c
.....

This recursively prints all files in the current directory (it will print the names of tens of thousands of files).

You can also search on file name patterns:

$ find . -name "sched.h"
./kernel/sched/sched.h
./include/uapi/linux/sched.h
./include/linux/sched.h
./include/linux/sunrpc/sched.h
./include/asm-generic/bitops/sched.h
./include/xen/interface/sched.h
./include/trace/events/sched.h

And you can limit the search to particular directories:

$ find include/linux include/asm-generic -name "sched.h"
include/linux/sched.h
include/linux/sunrpc/sched.h
include/asm-generic/bitops/sched.h

As with most Unix commands, wildcards can be used:

$ find . -name "sched.*"
./kernel/sched/sched.h
./include/uapi/linux/sched.h
./include/linux/sched.h
./include/linux/sunrpc/sched.h
./include/asm-generic/bitops/sched.h
./include/xen/interface/sched.h
./include/trace/events/sched.h

This returns all files named "sched" with any extension.

Where find really starts to be powerful is that you can execute commands on the files that are returned. For example:

$ find . -name "sched.h" -exec grep foobar {} \;

This says, "return all file names that match sched.h and grep each of them for the string foobar."

Exercise 8. How many header files are there in the linux source code?

Hint: The unix program wc can be used to count words or lines from a file or standard input. How can we get the output from find to wc?

More information on find can be found here.

We will cover other tools (such as grep) as the semester goes on.

Handin Procedure

Handing in consists of three steps:

  1. Executing this checklist:
    • Fill out the top of the answers.txt file, including your name and NYU Id
    • Make sure you’ve answered every question in answers.txt
    • Create a file called slack.txt noting how many slack days you have used for this assignment. (This is to help us agree on the number that you have used.) Include this file even if you didn’t use any slack days.
  2. Push your code to GitHub, so we have it:

Execute these instructions either from your local machine or within the Docker environment:

$ cd cs202-labs/lab1     
$ git commit -am "hand in lab1"
$ git push origin 

Counting objects: ...
....
To ssh://github.com/nyu-cs202/labs-24fa-<username>.git
  7337116..ceed758  main -> main

If git push origin does not work from within Docker and you are on a Mac, then make sure your ssh-agent is running. See here. If you are not on a Mac, you can either push from outside Docker using your host’s git or you can use GitHub’s Personal Access Tokens.

  1. Actually submit, by timestamping and identifying your pushed code:

    • Decide which git commit you want us to grade, and copy its id (you will paste it in the next sub-step). A commit id is a 40-character hexadecimal string. Usually the commit id that you want will be the one that you created last. The easiest way to obtain the commit id for the last commit is by running the command git log -1 --format=oneline. This prints both the commit id and the initial line of the commit message. If you want to submit a previous commit, there are multiple ways to get the commit id for an earlier commit. One way is to look at the commit history on GitHub. Another is git log -p, as explained here, or git show.
    • Create a new file commit.txt locally with only the commit id you just copied.
    • Now go to Gradescope and select the “Lab 1” assignment; then choose Upload as the submission method. Submit only commit.txt.
    • You can submit multiple times before the deadline. Your last submission will determine your grade.

NOTE: Ground truth is what and when you submitted to Gradescope. And, the time of your submission for the purposes of tracking lateness is the time when you submit the assignment through Gradescope, not the time when you executed git commit.

After you submit your assignment, we will process it as follows:

We’ll checkout your Github repository using the commit ID you provided in the commit.txt file and upload this code to your Gradescope assignment. As a result, you may notice that your latest submission on Gradescope is marked as “late,” even if you submitted on time. Don’t worry about this discrepancy; it’s an expected part of our process. What matters is that you submit your commit.txt file on time. To verify your actual submission time, always refer to the latest submission of commit.txt in the “Submission History” on Gradescope. This is the timestamp we use to determine if your assignment was submitted by the deadline. We use this method to enable direct annotation of your code on Gradescope, allowing us to provide more detailed feedback.

Remember, as long as your commit.txt is submitted before the deadline with the correct commit ID, your assignment will be considered on time, regardless of when the code appears on Gradescope.

This completes the lab.