Computer Systems Organization

Start Lecture #13

6.5: Self-referential Structures

tree node

Consider a basic binary tree with each node containing just an integer an pointers to the left and right subtrees. Looking at the diagram on the right suggests a structure with three components: left, right, and value. The first two refer to other tree nodes and the third is an integer.

struct bad {
  struct bad left;
  int value;
  struct bad right;
};
struct treenode_t {
  struct treenode_t *left;
  int value;
  struct treenode_t *right;
};

Since trees are recursive data structures you might expect some sort of recursive structure. Consider struct bad defined on the right. (You might be fancier and have a struct tree, which contains a struct root, which has an integer value and two struct tree's).

But struct bad and its fancy friends are infinite data structures. Some languages permit infinite structures providing you never try to materialize more than a finite piece. But C is not one of those languages so for us struct bad is bad!

Instead, we use struct treenode_t also on the right (names like treenode_t are a shorter and very commonly used alternative to names like treenodeType).

Be sure you understand why struct treenode_t is finite and corresponds exactly to the picture above it.

struct s {
  int val;
  struct t *pt;
};
struct t {
  double weight;
  struct s *ps;
};

Mutually Referential/Recursive Structures

What if you have two structure types that need to reference each other. You cannot have a struct s contain a struct t if the struct t contains a struct s.

Once again pointers come to the rescue as illustrated on the right. Neither structure is infinite. A struct s contains one integer and one pointer. A struct t contains one double and one pointer.

Lab 2: 2d-Structures

Instead of trees, lab 2, uses a different 2-dimensional structure, a linked list of linked lists.

Although the lab is not in final form and has not yet been assigned, it is in good enough shape for us to study to learn how to used linked structures.

Malloc()

As you know, in Java objects (including arrays) have to be created via the new operator. We have seen that in C this is not always needed: you can declare a struct rectangle and then declare several rectangles.

However, this doesn't work if you want to generate the rectangles during run time. When you are writing lab 2, you don't know how many 2d nodes or 1d nodes will be needed.

So we need a way to create an object during run time. In C this uses the library function malloc(), which takes one argument, the amount of space to be allocated. The function malloc() returns a pointer to this space.

Since malloc() is not part of C, but is instead just a library routine, the compiler does not treat it specially (unlike the situation with new, which is part of Java). Since malloc() is just an ordinary function, and we want it to work for dynamic objects of any type (e.g., an int, a char *, a struct treenode, etc), and there is no way to pass a type to a function, two questions arise.

  1. How do we arrange that the space returned by malloc() meets the alignment requirements of the object we desire?
  2. How do we arrange that the pointer returned by malloc() is a pointer to the correct type.

The alignment question is the easier. We just have malloc() return space aligned on the most stringent requirement. So, if double requires 8-byte alignment, and all structures require 16-byte alignment, and all other data types require 4-byte alignment, then malloc() always returns space aligned on a 16-byte boundary (i.e., the address is a multiple of 16).

Ensuring type correctness is not so easy. Specifically, malloc() returns a void *, which means it is a pointer that must be explicitly coerced to the correct type. For example, the code supplied with lab 2 contains.

    struct node2d *p2d;
    p2d = (struct node2d *) malloc(sizeof(struct node2d));
  

Link to Lab 2

6.6: Table Lookup

Skipped

6.7: Typedef

Instead of declaring pointers to trees via

    struct treenode *ptree;
  
we can write
    typedef struct treenode *Treeptr;
    Treeptr ptree;
  
Thus treeptr is a new name for the type struct treenode *. As another example, instead of
    char *str1, *str2;
  
We could write
    typedef char *String;
    String str1, str2;
  

Note that this does not give you a new type; it just gives you a new name for an existing type. In particular str1 and str2 are still pointers to characters even if declared as a String above.

A common convention is to capitalize the a typedef'ed name.

6.8: Unions

Saving Space by Sharing Memory between 2 or More Variables

struct something {
  int x;
  union {
    double y;
    int z;
  }
}

Traditionally union was used to save space when memory was expensive. Perhaps with the recent emphasize on very low power devices, this usage will again become popular. Looking at the example on the right, y and z would be assigned to the same memory locations. Since the size allocated is the larger of what is needed the union takes space max(sizeof(double),sizeof(int)) rather than sizeof(double)+sizeof(int) if a union was not done.

It is up to the programmer to know what is the actual variable stored. The union shown cannot be used if y and z are both needed at the same time.

It is risky since there is no checking done by the language.

Meeting Alignment Constraints

A union is aligned on the most severe alignment of its constituents. This can be used in a rather clever way to meet a requirement of malloc().

As we mentioned above when discussing malloc(), it is sometimes necessary to force an object to meet the most severe alignment constraint of any type in the system. How can we do this so that if we move to another system where a different type has the most severe constraint, we only have to change one line?

struct something {
  int x;
  struct something *p;
  // others
} obj;

// assume long most severely aligned typedef long Align union something { struct dummyname { int x; union something *p; // others } s; Align dummy; } typedef union something Something;

Say struct something, as shown in the top frame on the right, is the type we want to make most severely aligned.

Assume that on this system the type long has the most severe alignment requirement and look at the bottom frame on the right.

The first typedef captures the assumption that long has the most severe alignment requirement on the system. If we move to a system where double has the most severe alignment requirement, we need change only this one line. The name Align was chosen to remind us of the purpose of this type. It is capitalized since one common convention is to capitalize all typedefs.

The variable dummy is not to be used in the program. Its purpose is just to force the union, and hence s to be most severely aligned.

In the program we declare an object say obj to be of type Something (with a capital S) and use obj.s.x instead of obj.x as in the top frame. The result is that we know the structure containing x is most severely aligned.

See section 8.7 if you are interested.

6.9: Bit Fields

Skipped

Chapter 7: Input and Output

7.1: Standard Input and Output

getchar() and putchar()

#include <stdio.h>
int main (int argc, char *argv[argc]) {
  int c;
  while ((c = getchar()) != EOF)
    if (putchar(c) == EOF)
      return EOF;
  return 0;
}

This pair form the simplest I/O routines. The function getchar() takes no parameters and returns an integer. This integer is the integer value of the character read from stdin or is the value of the symbolic parameter EOF (normally -1), which is guaranteed not the be the integer value of any character.

The function putchar() takes one integer parameter, the integer value of a character. The character is sent to stdout and is returned as the function value (unless there is an error in which case EOF is returned.

The code on the right copies the standard input (stdin), which is usually the keyboard, to the standard output (stdout), which is usually the screen.

Homework: 7.1.

Formatted Output—printf

We have already seen printf(). A surprising characteristic of this function is that it has a variable number of arguments. The first argument, called the format string, is required. The number of remaining arguments depends on the value of the first argument. The function returns the number of characters printed, but that is not so often used. Technically its declaration is

    int printf(char *format, ...);
  

The format string contains regular characters, which are just sent to stdout unchanged and conversion specifications, each of which determines how the value of the next argument is to be printed.

The conversion specification begins with a %, which is optionally followed by some modifiers, and ends with a conversion character.

We have not yet seen any modifiers but have see a few conversion characters, specifically d for an integer (i is also permitted), c for a single character, s for a string, and f for a real number.

There are other conversion characters that can be used, for example, to get real numbers printed using scientific notation. The book gives a full table.

There are a number of modifiers to make the output line up and look better. For example, %12.4f means that the real number will be printed in 12 columns (or more if the number is too big) with 4 digits after the decimal point. So, if the number was 36.3 it would be printed as ||||||36.300 where I used | to represent a blank. Similarly -1000. would be printed as |||-1000.000. These two would line up nicely if printed via

    printf("%12.4f\n%12.4f\n\n", 36.3, -1000.);