Computer Systems Organization


Start Lecture #10

Slick String Length Using Pointer Substraction

int mystrlen(char *s) {
  char *p = s;
  while (*p)
    p++;
  return p-s;
}

The code on the right applies the technique used to get the slicker string copy to the related function string length. It also use pointer subtraction. Note that when the return is executed, p points just after the string (i.e., the character array) and s points to its beginning. Thus the difference gives the length.

String Comparison

int mystrcmp(char *s, char *t) {
  for (; *s == *t; s++,t++)
    if (*s == '\0')
      return 0;
  return *s - *t;
}

We next produce a string comparison routing that is to return a negative integer if the string s is lexicographically before t, zero if they are equal, and a positive integer if s is lexicographically after t.

The loop takes care of equal characters; it returns 0 if we have reached the end of the strings.

If the loop concludes, we have found the first difference. A key is that if one string has ended, its character ('\0') is smaller then the other string's character. This is another ascii fact (ascii null is zero the rest are positive).

I tried to produce a version usingwhile(*s++ == *t++), but that failed since the loop body and the post loop code would be dealing with the subsequent character. It could have been forced to work if I used a bunch of constructions like *(s-1), but that would have been ugly.

Homework: 5-5 (just do strncpy). These kinds of routines are not scary (and hence the world has not ended).

5.6: Pointer Arrays; Pointers to Pointers

For the moment forget that C treats pointers and arrays almost the same. For now just think of a character pointer as another data type.

So we can have an array of 9 character pointers, e.g., char *A[9]. We shall see fairly soon that this is exactly how some systems (e.g. Unix) store command line arguments.

#include 
int main() {
  char *STG[3] = { "Goodbye", "cruel", "world" };
  printf ("%s %s %s.\n", STG[0], STG[1], STG[2]);
  STG[1] = STG[2] = STG[0];
  printf ("%s %s %s.", STG[0], STG[1], STG[2]);
  return 0;
}

Goodbye cruel world. Goodbye Goodbye Goodbye.

The code on the right defines an array of 3 character pointers, each of which is initialized to a string. The first printf() has no surprises. But the assignment statement should fail since we allocated space for three strings of sizes 8, 6, and 6 and now want to wind up with three strings each of size 8 and we didn't allocate any additional space.

However, it works perfectly and the resulting output is shown as well. What happened? How can space for 8+6+6 characters be enough for 8+8+8?

The reason is that we do not have three strings of size 8. Instead we have one string of size 8, with three character pointers pointing to it.

The picture on the right shows a before and after view of the array and the strings. goodbye

This suggests and interesting possibility. Imagine we wanted to sort long strings alphabetically (really lexicographically). Not to get bogged down in the sort itself assume it is a simple interchange sort that loops and, if a pair is out of order, it executes a swap, which is something like

    temp = x;
    x = y;
    y = temp;
  

If x, y, and temp are (varying size, long) strings then we have some issues to deal with.

  1. It is expensive to do the three assignments if the strings are very long.
  2. If one of the strings is longer than the space allocated for another, we either overwrite something else (and potentially end the world) or refuse the copy and hence not complete the sort.

sortstrings

Both of these issues go away if we maintain an array of pointers to the strings. If the string pointed to by A[i] is out of order with respect to the string pointed to by A[j], we swap the (fixed size, short) pointers not the strings that they point to.

This idea is illustrated on the right.

#include <stdio.h>
void sort(int n, char *C[n]) {
  int i,j;
  char *temp;
  for (i=0; i<n-1; i++)
    for (j=i+1; j<n; j++)
      if (mystrcmp(C[i],C[j]) > 0) {
        temp = C[i];
        C[i] = C[j];
        C[j] = temp;
      }
}
int main() {
  char *STG[] = {"hello", "99", "3", "zz", "best"};
  int i,j;
  for (i=0; i<5; i++)
    printf ("STG[%i] = \"%s\"\n", i, STG[i]);
  sort(5,STG);
  for (i=0; i<5; i++)
    printf ("STG[%i] = \"%s\"\n", i, STG[i]);
  return 0;
}

Putting all the pieces together, the code on the right, plus the mystrcmp() function above, produces the following output.

    STG[0] = "hello"
    STG[1] = "99"
    STG[2] = "3"
    STG[3] = "zz"
    STG[4] = "best"
    STG[0] = "3"
    STG[1] = "99"
    STG[2] = "best"
    STG[3] = "hello"
    STG[4] = "zz"
  

Note the first line of the sort function, in particular the n in char C[n]. This is an addition made to C in 1999 (the language is called sometimes called C-99 to distinguish it from C-89 or ansii-C as described in our text, and K&R-C as described in the first edition of our text). Our text would write C[] instead of C[n].

You might question if the output is indeed sorted. For example, we remember that ascii '3' is less than ascii '9', and we know that in ascii 'b'<'h'<'z', but why is '9'<'b'?

Well, I don't know why it is, but it is. That is, in ascii the digits do in fact come before the letters.

5.7: Multi-dimensional Arrays

void matmul(int n, int k, int m, double A[n][k],
     double B[k][m], double C[n][m]) {
  int i,j,l;
  for (i=0; i<n; i++)
    for (j=0; j<m; j++) {
      C[i][j] = 0.0;
      for (l=0; l< k; l++)
    C[i][j] += A[i][l]*B[l][j];
    }
}

C does have normal multidimensional arrays. For example, the code on the right multiplies two matrices.

int A[2][3] = { {5,4,3}, {4,4,4} };
int B[2][3][2] = { { {1,2}, {2,2}, {4,1} },
                   { {5,5}, {2,3}, {3,1} } };

Multidimensional arrays can be initialized. Once you remember that a two-dimensional array is a one-dimensional array the syntax for initialization is not surprising.

(C, like most modern languages uses row-major ordering so the last subscript varies the most rapidly.)

5.8: Initialization of Pointer Arrays

char *monthName(int n) {
  static char *name[] = {"Illegal",
    "Jan", "Feb", "Mar", "Apr",
    "May", "Jun", "Jul", "Aug",
    "Sep", "Oct", "Nov", "Dec"};
  return (n<1 || n>12) ? name[0] : name[n];
}

The initialization syntax for an array of pointers follows the general rule for initializing an array: Enclose the initial values inside braces. How do we write an initial value for a pointer?
Ans: We remember that an array is just a pointer to the first element.

Looking at the code on the right we see this principle in action. I believe the most common usage is for an array of character pointers as in the example.

5.9: Pointers vs. Multi-dimensional Arrays

int  A[3][4];
int *B[3];

Consider the two declarations on the right. They look different, but both A[2][3] and B[2][3] are legal (at least syntactically). The real story is that they most definitely are different. In fact Java arrays have a great deal in common with the 2nd (pointer) form in C.

pointer array
  1. The declaration int A[3][4]; allocates space for 12 integers, which are stored consecutively so that A[i][j] is the (4*i+j)th integer stored (counting from zero). With the simple declaration written, none of the integers is initialized, but we have seen how to initialized them.

  2. The declaration int *B[3]; allocates space for NO integers. It does allocate space for 3 pointers (to integers). The pointers are not initialized so they currently point to junk. The program must somehow arrange for each of them to point to an group of integers (and must figure out when the group ends). An important point is that the groups may have different lengths. The technical jargon is that we can have a ragged array as shown in the bottom of the picture.

The diagram shows integers, but the more common usage is for characters. In that case the ragged array is an array of differing length strings. We have see two examples of this. The monthName program just above and the Goodbye Cruel World diagrams in section 5.6.