CS439 Spring 2013 Lab 1: Pointers in C

Handed out Monday, January 14, 2013
Due 9:00 AM, Tuesday, January 22, 2013

Introduction

This lab consists of five small exercises. Each exercise is intended to expose a different use of pointers in the C programming language. There is also an exercise at the end of the lab to introduce the git revision control system.

A strong grasp on pointer manipulation is required for all future labs in this course. This lab should help you become more comfortable with these manipulations if you are not already experienced with them.

Software Setup

The files you will need for this and subsequent lab assignments in this course are distributed using git. To learn more about git, take a look at the git user's manual, or, if you are already familiar with other version control systems, you may find this CS-oriented overview of git useful.

The URL for the course git repository is http://www.cs.utexas.edu/~mwalfish/classes/s13-cs439/cs439-labs.git. To install the files in your CS account, you need to clone the course repository, by running the commands below. You must use a CS public linux host.

tig% mkdir ~/cs439
tig% cd ~/cs439
tig% chmod 0700 . # (sets appropriate permissions)
tig% git clone http://www.cs.utexas.edu/~mwalfish/classes/s13-cs439/cs439-labs.git labs
Initialized empty git repository in ......./cs439/labs/.git/
got f6ec6e08634de9b9c4d73ab5af92da16cc610f44
walk f6ec6e08634de9b9c4d73ab5af92da16cc610f44
got a8d9dd484df67d928a51127ce4c6d9f6d01c5a6a
...
got c9dab101498914dbdce377b89a6eb0f6a421d018
Checking out files: 100% (44/44), done.
tig% cd labs
tig%

git allows you to keep track of the changes you make to the code. For example, if you are finished with one of the exercises, and want to checkpoint your progress, you can commit your changes by running:

tig% git commit -am 'my solution for lab1 exercise3'
Created commit 60d2135: my solution for lab1 exercise3
 1 files changed, 1 insertions(+), 0 deletions(-)
tig%

You can keep track of your changes by using the git diff command. Running git diff will display the changes to your code since your last commit, and git diff origin/lab1 will display the changes relative to the initial code supplied for this lab. Here, origin/lab1 is the name of the git branch with the initial code you downloaded from our server for this assignment.

Hand-In Procedure

When you are ready to hand in your lab code and write-up, run make turnin in the lab directory. This will first do a make clean to clean out any .o files and executables, and then create a tar file called lab1-handin.tar.gz with the entire contents of your lab directory and submit it via the CS turnin utility. If you submit multiple times, we will take the latest submission and count lateness accordingly.

Please note! At the moment, the turnin utility only works on a subset of the CS Linux machines. Before you can turn in your work, you will need to ssh to one of the following machines (these can be listed by running cshosts pub32):

charity.cs.utexas.edu
chastity.cs.utexas.edu
diligence.cs.utexas.edu
humility.cs.utexas.edu
kindness.cs.utexas.edu
patience.cs.utexas.edu
temperance.cs.utexas.edu

Once you are logged in to one of these machines, you should be able to run make turnin without error.

We will be grading your solutions with a grading program. You can run make grade to test your solutions with the grading program.

We have provided the compiled binary object files from our testing framework source code in the static directory in this lab. These allow the grading script to verify the correctness of your solutions. Since these object files were compiled on the CS Linux machines, you will most likely have to do all of your development on a CS machine for this lab. In most other labs, you will be free to use whatever machine you wish for development, as long as the code you turn in works on the CS Linux machines.

Background on C

All of the projects in this class will be done in C or C++, for two main reasons. First, some of the things that we want to implement require direct manipulation of hardware state and memory, which are operations that are naturally expressed in C. Second, C and C++ are widely used languages, so learning them is a useful thing to do in its own right. While you have some experience in C from CS429, you will need greater comfort and familiarity here than was required there. In our class, you will truly need to "think in C."

If you are interested in why C looks like it does, we encourage you to look at Ritchie's history. Here, perhaps, is the key quotation: "Despite some aspects mysterious to the beginner and occasionally even to the adept, C remains a simple and small language, translatable with simple and small compilers. Its types and operations are well-grounded in those provided by real machines, and for people used to how computers work, learning the idioms for generating time- and space-efficient programs is not difficult. At the same time the language is sufficiently abstracted from machine details that program portability can be achieved."

Readings

There are two required readings. The first introduces you to C syntax; the second, to the key programming construct of pointers:

C for Java Programmers (Columbia) (here's a local copy as a PDF). Please ignore slides 18-19.
The following tutorial explains pointers in C: A Tutorial on Pointers and Arrays in C (Ted Jensen)

Note: comfort with pointers is important for this lab and essential for this class.

Here are some other useful resources. (You are not required to read them. We include them for convenience if you want to learn more.) You can find more with some simple Web searches.

C for Java Programmers (Cornell)
Kernighan and Ritchie (the definitive book)
Brian W. Kernighan -- Programming in C: A Tutorial
C Programming intro and tutorial

Overview

This overview assumes that you are comfortable with Java.

Java borrows heavily from C syntax, and it is relatively easy to learn one language if you know the other well. At a high level, there are three major differences:

Safe v. unsafe, managed v. unmanaged memory
Java is a safe language with garbage collection. C is an unsafe language with explicit memory management. By unsafe, we mean that programs can manipulate pointers in arbitrary ways, e.g.,
```
/* Dangerous pointer manipulation will compile in C. Don't do this! */
int dangerous = 0x102481a0;
int *pointer = (int *)dangerous; /* We can set a pointer to anything. Yikes! */
pointer = pointer + 39468;       /* We can do arbitrary math with poiners. Yikes! */
int value = *pointer;            /* Programs can read/write any address. Yikes! */
```
In the second line above, we cast the int dangerous into a new type, a pointer to an int (int *). This should make you nervous. We are telling the compiler to ignore the type information it has and to trust us. This type of thing can lead to ugly bugs that are difficult to find (especially since the programmer here has deliberately deactivated the compiler's type safety checks.)
or,
```
/* A horse is a horse of course of course unless the horse... */
Horse h;
Cow *c = (cow *)&h; /* Set the pointer c to the address of h */
```
In Java, you must allocate objects on the heap manually (using the new operator), but the garbage collector takes care of freeing these objects for you. In C, you must manually allocate and free objects on the heap yourself:
```
#include <assert.h>
#include <stdlib.h>
...
Cow *cow = (Cow *)malloc(sizeof(Cow));
cow_init(cow);
moo(cow);
...
free(cow);
```
In the above snippet, we called malloc, which is part of the standard library stdlib.h, to allocate memory from the heap, and we called free, also part of the standard library, to free memory when we were done with it. If you fail to call free for heap memory objects when you are finished with them, you will have a memory leak. Conversely, if you continue to use a pointer after freeing it, you will have a nasty bug:
```
#include <assert.h>
#include <stdlib.h>
...
Horse *horse = (Horse *)malloc(sizeof(Horse));
...
free(horse);
...
neigh(horse); /* Using memory after it is freed = many nights suffering in lab */
```
Many bad things can happen if you continue to access memory once it has been freed. Calls that use horse may do strange things since the memory location being pointed at might be reallocated as another object and changed. Calls that use other objects may fail or do strange things, since the horse manipulations will likely corrupt other data structures using the same memory. Calls to malloc or free may fail or do strange things since the calls using (freed) horse may corrupt the heap data structures being stored at those locations, etc. And to top it all of, these bugs will manifest at strange times and different ways depending on details of how the libary manages the heap, what calls to the library are made in what order by the program, etc.
Modern v. old
Some details differ between the languages because programming has changed over the years. Java has better built-in support for object oriented programming, exception handling, type safety (i.e., today, we know how to do some things safely that we used to have to do unsafely via casts), threads, etc. Java has more extensive standard libraries for other useful things (containers, window system access, etc.); similar libraries exist in C for most of these things, but they are less standardized.
Cruft
Though the overall syntax is similar, some details differ, e.g., how to include a library, how to compile a program, the definition of a string, what functions exist in what libraries, etc. These differences are not fundamentally important, but there are a number of little differences that together probably are the main source of a learning curve for someone proficient in one langage to become similarly proficient in the other.

We have tried to keep the details of invoking the compiler and linker to a minimum in our labs. Thus, to build code in your cs439/labs directory, you should only ever have to run make. However, if you want to write your own programs in these directories, or when you eventually leave cs439, you will have to invoke the compiler and linker, perhaps through an automatic process, like make Thus, in this lab, and the others, you should peruse the Makefiles and try to understand what exactly is going on and how your code is being built at a high level. Our references page includes a few pointers to documentation on the make program and Makefiles.

Exercises

Much of the documentation for the following exercises is in the comments in their associated source files. The brief descriptions here are intended only to point you to those files.

After each exercise, run make grade to make sure that you wrote the code for that exercise correctly.

Before doing exercises 1 and 2, be sure you have read slides 61 to 64 of the C notes from Columbia.

Exercise 1. Implement the function set_to_five in part1.c.

Exercise 2. Implement the function swap in part2.c.

Before your function can pass the grading script, you will need to remove the assert(0); line. Remove this wherever you see it as you implement functions in future exercises; it only serves as a reminder that you have yet to implement a particular function.

The assert line you saw in the previous exercise is a simple application of a powerful tool. In C, an assert is a preprocessor macro which effectively enforces a contract in the program. (You can read more about macros in C here.) The contract that assert enforces is simple: when program execution reaches assert(<condition>), if condition is true, execution continues; otherwise, the program aborts with an error message.

Assertions, when used properly, are powerful because they allow the programmer who uses them to guarantee that certain assumptions about his or her code hold. For example, in the swap function that you just implemented, there are two assertions at the beginning of the function, before where your code should have been placed:

void
swap(int *p1, int *p2)
{
        assert(p1 != NULL);
        assert(p2 != NULL);

	...
}

These two assertions combined enforce the contract that neither of the parameters p1 or p2 can be NULL. If these assertions were not present and either of these passed parameters were NULL, if we tried to swap them, we would encounter a type of error called a segmentation violation (or segmentation fault). This is because dereferencing the NULL address is invalid; NULL points to "nothing". By using assertions, we guarantee that swap will never try to swap the value of a variable at NULL, saving us the headache of having to debug a segmentation fault if some code tried to pass swap a NULL value. Instead, we will get a handy error message describing exactly what contract of the function was invalidated.

Your code for exercise 2 could not pass before you removed the assert(0); line because 0 is the "false" value in C. Thus, the contract of this assert was "if false is true, then proceed; otherwise, abort". This condition obviously cannot ever hold, so the program would always abort when run.

Before doing exercise 3, be sure you have read slides 75 to 80 of the C notes from Columbia.

Exercise 3. Implement the function array_sum in part3.c.

Before doing exercises 4 and 5, be sure you have read slides 83 to 88 of the C notes from Columbia.

Exercise 4. Implement the set_point and point_dist functions in part4.c.

Look in part4.h for the definition of struct point, which represents a point in a two-dimensional plane.

Exercise 5 is significantly longer than the other exercises in this lab. In this exercise, you will implement functions for manipulating singly-linked lists. In particular, you will implement the following functions, in this order (to appease the grading script):

list_insert
list_end
list_size
list_find
list_remove

More detailed descriptions of the functions, and what they should do, can be found in part5.c.

Exercise 5. Implement all the functions in part5.c.

Look in part5.h for the definition of struct list_node, which represents a node in the linked list.

If you haven't already, now would be a good time to run git commit as described in Software Setup above to make sure your solutions to the above exercises have been committed.

Using git

As mentioned at the beginning of this lab, CS439 uses git to distribute code for all programming assignments. git is a distributed (as opposed to centralized) version control system that, if used correctly, can be very useful as you go through the labs in this course.

There are a couple of questions in this part of the lab that you must answer. Place your write-up in a file called answers.txt (plain text) in the top level of your labs directory before handing in your work. Please include a header that contains your name, UTCS username, and lab number and make sure the file is named correctly. If you do not, your answer may not be graded. To make sure your answers.txt file is turned in with the rest of your lab solution, run git add answers.txt.

Exercise 6. If you are not familiar with version control systems in general or git in particular, get familiar with the basics. This brief git tutorial is a fairly good starting point, and should be read by everybody. This other tutorial is a good place to go if you need to find out how to do something in git. There is also this git cheatsheet that you can use if you would prefer to dive into git and need to know what commands are available. You can also read any other of the numerous git tutorials online. A list of some other git tutorials besides the ones listed here is in the reference page.

You may also wish to experiment with a graphical git viewer while reading through theses tutorials to get a feel for how git works. gitk is a good one that comes pre-installed on all of the UTCS public Linux machines. Just type gitk while you are in a git repository.

We have prepared a couple of very brief exercises to get you used to using git. First, go to your cs439/labs directory. Now, type:

tig% git checkout -b git-lab/mergeA origin/git-lab/mergeA

This will create a new branch named git-lab/mergeA. This branch should have only a single file called merge.c. Type ls to verify. This is a simple program that simply prints three characters to the screen. (Note: You may see other files besides merge.c, if you created them but never checked them into your lab1 branch. These files are not tracked by git, so they persist across checkouts.) You may compile and run this program by typing:

tig% make merge   # To compile
tig% ./merge

Question:

What is the name of the initial commit to the tree in which this branch resides? (Hint: This is much easier if you use a git viewer like gitk.)

Now we will create another branch called git-lab/mergeB. Type:

tig% git checkout -b git-lab/mergeB origin/git-lab/mergeB

This branch also contains a single file called merge.c. Verify with ls. This program is very similar to the one you saw before. The only difference is that it prints out a different set of characters. Make sure that this is so.

One of the strengths of git is that it allows multiple users to work from the same repository independently from each other. Eventually, though, all of the work must be merged together into a final product (you will be doing this often as you progress through the labs). Usually, git will do this automatically for you. However, there are times when multiple users modify the same place in a file, in which case git cannot know whose work should be used (only a human can manually resolve a "conflict" of this kind). You will be doing such conflict resolution, but here and throughout the semester, you must be careful: a botched merge is a reliable source of headaches. The two branches that you have just created have been set up so that they will cause exactly such a conflict when merging. Type the following into a console:

tig% git branch
  git-lab/mergeA
* git-lab/mergeB
  lab1
tig% git merge git-lab/mergeA
Auto-merging merge.c
CONFLICT (content): Merge conflict in merge.c
Automatic merge failed; fix conflicts and then commit the results.

Exercise 7. Find out what the base version of merge.c does, then resolve the merge conflict in merge.c so that the merged file behaves the same as the base version. Don't forget to commit the merge when you are done. Hint: Look at the common parents of the two branches. gitk will be useful for this.

Make sure your merged merge.c compiles and runs correctly.

Question:

What is the name of your commited merge?

This completes the lab. Switch back to the lab1 branch (git checkout lab1) now. Make sure you are logged into one of the following CS Linux machines:

charity.cs.utexas.edu
chastity.cs.utexas.edu
diligence.cs.utexas.edu
humility.cs.utexas.edu
kindness.cs.utexas.edu
patience.cs.utexas.edu
temperance.cs.utexas.edu

Once you are logged in to one of these machines, cd to your labs directory, and type make turnin.

Acknowledgements

Portions of this lab and its instructions were taken from Mike Dahlin's C For Java Programmers lab.

Last updated: Tue Jan 22 00:27:54 -0600 2013 [validate xhtml]