Computer Systems Organization


Start Lecture #12

Chapter 6: Structures

For a Java programmer, structures are basically classes and objects without methods.

Section 6.1: Structure Basics

#include <math.h>
struct point {
  double x;
  double y;
};
struct rectangle {
  struct point ll;
  struct point ur;
} rect1;
double f(struct point pt);
struct point mkPoint(double x, double y);
struct point midPoint(struct point pt1,
                      struct point pt2);
main() {
  struct point pt1={40.,20.}, pt2;
  pt2 = pt1;
  rect1.ll = pt2;
  pt1.x += 1.0;
  pt1.y += 1.0;
  rect1.ur = pt1;
  rect1.ur.x += 2.;
  return 0;
}














double dist (struct point pt) { return sqrt(pt.x*pt.x+pt.y*pt.y); } struct point midpoint(struct point pt1, struct point pt2){ // return (pt1 + pt2) / 2; too bad struct point pt; pt.x = (pt1.x+pt2.x) / 2; pt.y = (pt1.y+pt2.y) / 2; return pt; } struct point mkPoint(double x, double y) { // return {x, y}; too bad, not C struct point pt; pt.x = x; pt.y = y; return pt; } void mvToOrigin(struct rectangle *r){ (*r).ur.x = (*r).ur.x - (*r).ll.x; r->ur.y = r->ur.y - r->ll.y; r->ll.y = 0; r->ll.x = 0; }

On the right we see some simple structure declarations. They should very familiar from your experience with Java.

  1. The top definition defines the struct point type. This is similar to defining a class without methods.
  2. The next definition defines both a new type struct rectangle and a variable of this type. Note that a previously define struct can be used.
  3. The third definition illustrates an initialization. Note that there are no structure constants so you can not write
            pt1 = {40.20};
          
    as an executable statement.
  4. We see in the executable statements that one can assign a point to a point as well as assigning to each component.
  5. Since the rectangle rect1 is composed of points, which are in turn composed of doubles, we can assign a point to a point component of a rectangle and can assign a double to a double component of a point component of a rectangle.

I think that, if a Java program had equivalent classes rectangle and point and objects pt1, pt2, and rect1, these same executable statements would be legal.

6.2: Structures and Functions

Functions can take structures as parameters, but is that a good idea? Should we instead use the components as parameters or perhaps pass a pointer to the structure? For example, if the function main above wishes to pass pt1 to another function f, should we write.

  1. f(pt1)
  2. f(pt1.x, pt1.y)
  3. f(&pt1)
Naturally, the declaration of f will be different for the three cases.

  1. f(pt1)
    This form is the most natural for a function that computes a value where the parameter is though of as a point, not just as two real numbers. For example the distance from the origin dist(pt).
  2. f(pt1.x, pt1.y)
    Consider two possible applications.
  3. f(&pt1)
    There are at least two common applications.

Homework: Write two versions of mkRectangle, one that accepts two points, and one that accepts 4 real numbers.

6.3: Arrays of Structures (and Structures of Arrays)

#define MAXVAL 10000
#define ARRAYBOUND (MAXVAL+1)
int G[ARRAYBOUND];
int P[ARRAYBOUND];

struct gameValType { int G[ARRAYBOUND]; int P[ARRAYBOUND]; } gameVal;
struct gameValType { int G; int P; } gameVal[ARRAYBOUND];
#define NUMEMPLOYEES 250 struct employeeType { int id; char gender; double salary; } employee[NUMEMPLOYEES] = { { 32, 'M', 1234. }, { 18, 'F', 1500. } };

Consider the following game. Take a positive integer N. If it is even, replace it by N/2; if odd, by 3N+1; if 1, stop. So we get, for example,
7 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1.

It is an open problem if all positive integer eventually get to 1. This has been checked for MANY numbers. Let G[i] be the number of rounds of the game needed to get 1. G[1]=0, G[2]=1, G[7]=16.

Factoring into primes is fun too. So let P[N] be the number of distinct prime factors of N. P[2]=1, P[16]=1, P[12]=2 (define P[1]=0).

This leads to two arrays as shown on the right in the top frame.

We might want to group them together and not use up the variable names G and P as in the second frame. This is an array of structures. In this frame the number of distinct prime factors of 763 would be stored in gameVal.P[763].

In the third frame we grouped together the values for the two games. This is an array of structures. In this frame the number of distinct prime factors of 763 would be stored in gameVal[763].P.

If we had a database with employeeID, gender, and salary, we might use the array of structures in the fourth frame. Note the initialization. The inner {} are not needed, but I believe they make the code clearer.

The sizeof and sizeof() Operators

How big is the employee array of structures? How big is employeeType?

C provides two versions of the sizeof unary operator to answer these questions.

These functions are not trivial and indeed the answers are system dependent ... for two reasons.

  1. Certain primitive types (e.g., int) may have different sizes in different systems.
  2. The alignment requirements may be different.

Example: Assume char requires 1 byte, int requires 4, and double requires 8. Let us also assume that each type must be aligned on an address that is a multiple of its size and that a struct must be aligned on an address that is a multiple of 8.

So the data in struct employeeType requires 4+1+8=13 bytes. But three bytes of padding are needed between gender and salary so the size of the type is 16.

Homework: How big is sizeof(gameValType? How big is sizeof employee?

6.4: Pointers to Structures

The program on the right illustrates well the use of pointers to structures and also serves as a good review of many C concepts. The overall goal is to read text from the console and count the occurrence of C keywords (such as break, if, etc.). At the end print out a list of all the keywords that were present and how many times each occurred.

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#define MAXWORDLENGTH 50
struct keytblType {
  char *keyword;
  int  count;
} keytbl[] = {
  { "break", 0 },
  { "case", 0 },
  { "char", 0 },
  { "continue", 0 },
  // others
  { "while", 0 }
};
#define NUMKEYS (sizeof keytbl / sizeof keytbl[0])
int getword(char *, int); // no vars
struct keytblType *binsearch(char *);
int main (int argc, char *argv[argc]) {
  char word[MAXWORDLENGTH];
  struct keytblType *p;
  while (getword(word,MAXWORDLENGTH) != EOF)
    if (isalpha(word[0]) &&
	((p=binsearch(word)) != NULL))
      p->count++;
  for (p=keytbl; p<keytbl+NUMKEYS; p++)
    if (p->count > 0)
      printf("%4d %s\n", p->count, p->keyword);
  return 0;
}
struct keytblType *binsearch(char *word) {
int cond;
  struct keytblType *low = &keytbl[0];
  struct keytblType *high =  &keytbl[NUMKEYS];
  struct keytblType *mid;
  while (low < high) {
    mid = low + (high-low) / 2;
    if ((cond = strcmp(word, mid->keyword)) < 0)
      high = mid;
    else if (cond > 0)
      low = mid+1;
    else
      return mid;
  }
  return NULL;
}

The function getword() is not shown. Its parameters are a buffer (i.e., a pointer to a character) and the size of the buffer. It reads the next word from the console into the buffer (limited by the size of the buffer; the world does not end). A word is either a string of letters and digits beginning with a letter or a single non-white-space character. The idea is that it places into the buffer the next token in a program written in C. The (integer) return value is either the first character in the buffer or EOF.

Now lets examine the code on the right.

  1. The first interesting item is keytbl, the table of keywords. It is an array of struct keytblType; each entry of the array contains string and an integer.
  2. The initialization of keytbl is interesting. Each string is set to a C keyword and each count is initialized to zero. The size of the array is determined by the initialization and the next line cleverly determines that size. The entries are initialized in alphabetical order, which permits the use of a binary search to find an entry.
  3. The main program contains two loops: The first computes the counts and the second outputs the results.
    1. The first loop calls getword() and terminates on receiving EOF. The word is looked up in the keytbl using binsearch. The value returned by binsearch is either a pointer to the table entry found or NULL if the word is not a keyword (i.e., is not in the table). If the word is found the corresponding count is incremented.
      I don't believe the isalpha() test is needed since, if the character is not a letter, binsearch will return NULL; it is presumably there save a useless search.
    2. The second loop traverses the table and prints out all entries with non-zero counts. Note the test used in the for statement and remember that the increment p++ increments p by enough so that it points to the next entry.
  4. As I suspect you know, a binary search is quite efficient (its running time is logarithmic in the size of the table) and very easy to get wrong (< vs. <=, mid vs. mid-1 vs. mid+1, etc.). The only real difference between this one and the one I hope you saw in 102, is that the code on the right is pointer based not array based. This explains the mysterious code to set mid to the midpoint between high and low. But, other than that oddity, I find it striking how array-like the code looks. That is, the manipulations of the pointers could just as well be manipulating indices.