Computer Systems Organization


Start Lecture #9

String Copy

void mystrcpy (char *s, char *t) {
  while ((*s++ = *t++) != '\0');
}

Check out the ONE-liner on the right. Note especially the use of standard idioms for marching through strings and for finding the end of the string.

Slick!

But scary, very scary! Why?
Because there is no length check. If the character array s (or equivalently the block of characters s points to) is smaller than the character array t points to, then the copy will overwrite whatever happens to be located right after the array s.

Array Name Vs Pointer to First Element (Continued)

Although arrays and pointers are closely related, there is a difference between an array name like a, and a pointer that happens to point to the first element like pa. The latter is a variable that can be assigned to; whereas a or equivalently &a[0] is just a value.

The last paragraph might make you worry that the 2nd and 3rd calls above are illegal since they pass an array name and the corresponding parameter s is assigned to. However, all is well since C is call-by-value and the assignment to a parameter does not effect the argument.

Using int *A vs int A[] As a Parameter

double f(int *a);
double f(int a[]);

The two lines on the right are equivalent, when used as a function declaration (or as the head line of a function definition). The authors say they prefer the first. For me it is not so clear cut. In strlen() above I prefer char *s as written. However, if I were writing an inner product routine (a.k.a dot product), I would write

    double dotprod(double A[], B[], C[])
  
since I think of dot product as operating on vectors.

Passing Part of an Array: f(A+6), i.e., f(&A[6]) and p[-2], i.e., *(p-2)

void f(int *p) {
  printf("legal? %d\n", p[-2]);
}
int main(){
  int A[20];
  // calculate all of A
  f(A+6);
  return 0;
}

In the code on the right main() calculates the values for an integer array and then passes only part of it to f. Remembering that A+6 means (&A[0])+6, which is &A[6], we see that f() receives a pointer to the 7th element of the array A.

With call by value, we know that f() cannot change the value of the pointer in main(). But f() can use this pointer to reference or change all the values of A, include those before A[6].

It naturally would be illegal for f() to reference (or worse change) p[-9].

5.4: Address Arithmetic

A crucially important point is that pa+3, does not simply add three to the address stored in pa. Instead, it increments pa so that it points 3 integers further forward (since pa is a pointer to an integer). If pc is a pointer to a character, then pc+3 increments pc so that it points 3 characters forward.

#define ALLOCSIZE 15000
static char allocbuf[ALLOCSIZE];
static char *allocp = allocbuf;
char *alloc(int n) {
  if (allocbuf+ALLOCSIZE-allocp >= n) {
    allocp += n;
    return allocp-n;
  } else
    return 0;
}
void afree (char *p) {
  if (p>=allocbuf && p<allocbuf+ALLOCSIZE)
    allocp = p;
}

On the right is a primitive storage allocator and freer. When alloc(n) is called, with an integer argument n, it returns a pointer to a block of n characters.

When afree(p) is called with the pointer returned by alloc(), it resets the state of alloc()/afree() to what it was before the call to alloc().

A strong assumption is made that calls to alloc()/afree() are made in a stack-like manner. These routines would be useful for managing storage for C automatic, local variables. They are far from general. The real routines malloc()/free() are considerably more complicated.

Since pointers, not array positions are communicated to users of alloc()/afree(), these users do not need to know anything about the array, which is kept under the covers via static. alloc

The tricky (elegant? beautiful?) part is the if in alloc().

Notes

Using the Allocator

Remark: Much of this was presented by Prof. Grishman in the previous lecture.

These examples are interesting in their own right, beyond showing how to use the allocator.

#include <stdio.h>
int changeltox (char*);
void mystrcpy (char *s, char *t);
char *alloc(int n);
int main () {
  char stg[] = "hello";
  char* stg2 = alloc(6);
  mystrcpy (stg2, stg);
  changeltox (stg);
  printf ("The string is now %s\n", stg);
  printf ("String2 is now %s\n", stg2);
}

Making Changes in a New String

We have already written a program to change one character to another in a given string.

The code in this section first copies the string (using mystrcpy(), a one liner presented last time) and then makes changes in the copy. Thus, at the end, we have two versions of the string: the before and the after.

As expected the output is

    The string is now hexxo
    String2 is now hello
  

Messing Up

Recall the danger warning given with the code for mystrcpy(char *x, char *y): The code copies all the characters in t (i.e., up to '\0') to s ignoring the current length of s. Thus, if t is longer than the space allocated for s, the copy will overwrite whatever happens to be stored right after s.

#include <stdio.h>
int changeltox (char*);
void mystrcpy (char *s, char *t);
char *alloc(int n);
int main () {
  char stg[] = "hello";
  char* stg2 = alloc(2);
  char* stg3 = alloc(6);
  mystrcpy (stg2, stg);
  printf ("String2 is now %s\n", stg2);
  printf ("String3 is now %s\n", stg3);
  mystrcpy (stg3, stg);
  changeltox (stg);
  printf ("The string is now %s\n", stg);
  printf ("String2 is now %s\n", stg2);
  printf ("String3 is now %s\n", stg3);
}

The example on the right illustrates the danger. When the code on the right is compiled with the code for changeltox(), mystrcpy(), and alloc(), the following output occurs.

    String2 is now hello
    String3 is now llo
    The string is now hexxo
    String2 is now hehello
    String3 is now hello
  

What happened?

The string in stg contains the 5 characters in the word hello plus the ascii null '\0' to end the string. (The array stg has 6 elements so the string fits perfectly.)

The major problem occurs with the first execution of mystrcpy() because we are copying 6 characters into a string that has only room for 2 characters (including the ascii null). This executes flawlessly copying the 6 characters to an area of size 6 starting where stg2 points. These 6 locations include the 2 slots allocated to stg2 and then the next for locations. In general it is hard to tell what has been overwritten, but in this case it is easy since we know how alloc() works. The excess 4 characters are written into the first 4 slots of stg3.

mess

When we print stg2 we see no problem! A string pointer just tells where the string starts, it continues up to the ascii null. So stg2 does have all of hello. Since stg3 points 2 characters after stg2 the string is just the substring of stg2 starting at the third character.

The second mystrcpy copies the six(!) characters in the string hello to the 6 bytes starting at the location pointed to by stg3. Since the string stg2 includes the location pointed to by stg3, both stg2 and stg3 are changed.

The changeltox() execution works as expected.

Pointer Comparison

If pointers p and q point to elements of the same array, then comparisons using <, <=, ==, !=, >, and >= all work as expected.

Any pointer can be compared to 0 via == and !=.

If pointers p and q do not point to members of the same array, the value returned by comparisons is undefined, with one exception: p pointing to an element of an array and q pointing to the first element past the array.

Pointer Subtraction

Again we need p and q pointing to elements of the same array. In that case, if p<q, then p-q+1 equals the number of elements from p to q (including the elements pointed to by p and q.

5.5: Character Pointers and Functions

As we know C does not have string variables, but does have string constants. This arrangement sometimes requires care to avoid errors.

char amsg[]="hello"; vs char *msgp="hello";

  char amsg[] = "hello";
  char *msgp = "hello";
  int main () {...}

Let's see if we can understand the following rules, which can appear strange at first glance.

  1. amsg (a character pointer) cannot be changed, but both *(amsg+2) (an 'l') and amsg[2] can be changed.
  2. msgp (a character pointer) can be changed, but both *(msgp+2) (an 'l') and msgp[2] cannot be changed.
A key to understanding these rules, which really are just consequences of rules we already know, is that "hello" is a constant.

An Even Slicker String Copy

  void mystrcpy (char *s, char *t) {
    while (*s++ = *t++) ;
  }

The previous version of this program tested if the assignment did not return the character '\0' and that character has the value 0 (a fact about ascii null). However checking if something is not 0 is the same (in C) as asking if it is true. Finally, testing if something is true is the same as just testing the something. The C rules can seem cryptic, but they are consistent.