Lecture Material from 201
Pointers in C

Robert Dewar


/* Example of pointer manipulation in C, with main program driver */

/* Import standard I/O headers */
#include 

/* Following function counts number of spaces in null terminated string */

int nspaces (char *p) {
/* Here the parameter p is a pointer to the first character of a C string
   which is a sequence of characters terminated by ASCII.NUL, the character
  whose value is binary 0 */

  int count = 0;
  /* we will count the spaces in count. int is the normal signed integer
  type in C, though in fact we don't use negative values here */

  while (*p) {

  /* We are going to move p through the characters of the string. In C,
     zero is false, and non-zero is true, so this test yields true until
  we hit the zero terminator, which will be treated as false */

    if (*p++ == ' ') count++;

    /* The if statement here needs quite a bit of explanation, but the
       style is typical C. We could have written this:

          if (*p == ' ') count = count + 1;
          p = p + 1;

       but the more compressed form here is absolutely typical C style.
       The expression p++ gets the value of p, and then (after evaluation of
       the expression) increments the value of p. We read *p++ as *(p++),
       so the effect is to get the current character, and then bump p to
       point to the next character */
  }
  return count;
  /* return the result */
}

int main () {
  printf ("number of spaces = %d\n", nspaces ("the cat on the mat"));
  return 0;
}

We compile and bind using

      gcc scan.c

And we can run the program:

      C:\201>a
      number of spaces = 4

OK, let's look at the 16-bit Intel format assembler we might generate
for the NSPACES function:

P          EQU    WORD PTR [BP+4]     ; location of parameter
COUNT      EQU    WORD PTR [BP-2]     ; location of temp variable

NSPACES:   PUSH   BP                  ; establish frame
           MOV    SP, BP
           SUB    SP, 2               ; allocate space for count

           MOV    COUNT, 0            ; initialize count to zero

LP1:       MOV    BX, P               ; test *p
           CMP    BYTE PTR [BX], 0
           INC    P                   ; and increment p
           JZ     XIT

           CMP    BYTE PTR [BX], '='  ; test against equal
           JNE    LP1
           INC    COUNT               ; increment count if equal
           JMP    LP1

XIT:       MOV    AX, COUNT           ; load result
           MOV    SP, BP              ; strip stack
           POP    BP                  ; restore old bp
           RET                        ; return with result in AX

Again, let's look at what gcc actually generates in AT&T format (see the
file describing the recursive PUTDEC for a tutorial on AT&T style asm).

      gcc -S scan.c

Looking just at the nspaces function generated:

_nspaces:
	pushl	%ebp
	movl	%esp, %ebp
	subl	$4, %esp
	movl	$0, -4(%ebp)
L2:
	movl	8(%ebp), %eax
	cmpb	$0, (%eax)
	jne	L4
	jmp	L3
L4:
	movl	8(%ebp), %eax
	incl	8(%ebp)
	cmpb	$32, (%eax)
	jne	L2
	leal	-4(%ebp), %eax
	incl	(%eax)
	jmp	L2
L3:
	movl	-4(%ebp), %eax
	leave
	ret

OK, let's take that apart a bit at a time:

_nspaces:
	pushl	%ebp
	movl	%esp, %ebp
	subl	$4, %esp

This establishes the stack frame, and allocates space for COUNT (four bytes
on the 386, where integers are four bytes long = 32 bits).

	movl	$0, -4(%ebp)

That initializes COUNT to zero

L2:
	movl	8(%ebp), %eax
	cmpb	$0, (%eax)
	jne	L4
	jmp	L3

Just about what we had before, except that on the x86, we can use EAX as
an index register (any register can be used as a base or index register
on the 386 architecture). The jumps seem a bit silly, but this is running
gcc at the default -O0 (unoptimized) setting.

L4:
	movl	8(%ebp), %eax
	incl	8(%ebp)
	cmpb	$32, (%eax)
	jne	L2

That's the test for a space (the code for ' ' is decimal 32)

	leal	-4(%ebp), %eax
	incl	(%eax)
	jmp	L2

There's the (really silly) code for incrementing the count. Instead of
just doing an increment directly on -4(%ebp), the code loads the address
of the count and then does the incl using this address as an index. Well
we said this code was unoptimized!

L3:
	movl	-4(%ebp), %eax
	leave
	ret

That loads the result (at least we get a reasonable instruction that does
the load directly from COUNT here), and then strips the stack and returns.

Let's see if gcc can do a bit better in optimized mode

       gcc -S -O2 scan.c

_nspaces:
	pushl	%ebp
	xorl	%ecx, %ecx
	movl	%esp, %ebp
	movl	8(%ebp), %edx
	movb	(%edx), %al
	testb	%al, %al
	je	L8
	.p2align 4,,7
L6:
	incl	%edx
	cmpb	$32, %al
	je	L9
L2:
	movb	(%edx), %al
	testb	%al, %al
	jne	L6
L8:
	movl	%ecx, %eax
	popl	%ebp
	ret
	.p2align 4,,7
L9:
	incl	%ecx
	jmp	L2

OK, that's more like it:

_nspaces:
	pushl	%ebp
	xorl	%ecx, %ecx
	movl	%esp, %ebp

That's the normal establishing of the frame. Also COUNT wlil now be kept
in %ecx instead of as a local memory parameter, so no need to allocate
space for it.

	movl	8(%ebp), %edx

Loads the parameter into %edx where it will be accessed each time, instead
of having to access memory.

	movb	(%edx), %al
	testb	%al, %al
	je	L8

Everything goes into making the loop as fast as possible. The loop works
better if we can restructure it a bit, and the above code is to deal with
entering the loop the first time. The loop is going to assume that the
condition is true the first time in, so if this is not true, we must
exit right away.

	.p2align 4,,7

That makes sure we are aligned to a four byte boundary, jumps are
more efficient if they are to an aligned target.

L6:
	incl	%edx
	cmpb	$32, %al
	je	L9

The first part of the loop unconditionally increments the pointer and then
checks whether the character just loaded is a space (avoiding a duplicate
load). If it is a space, control jumps off to L9

L2:
	movb	(%edx), %al
	testb	%al, %al
	jne	L6

Otherwise we fall through to the second part of the loop which loads
the next character and loops back if it is still non-zero. This general
restructuring of a while loop so that there is only one conditional jump
at the end, instead of a conditional jump at the start and an unconditional
one at the end is definitely an efficiency improver. It does mean you have
to jump into the middle of the loop the first time to the test.

L8:
	movl	%ecx, %eax
	popl	%ebp
	ret

Move the result from ecx to eax to return it, then strip the stack and
return in the usual manner. Note that since this is a leaf procedure which
did not make any further calls, we did not have to worry about keeping the
stack pointer 16-byte aligned.

	.p2align 4,,7
L9:
	incl	%ecx
	jmp	L2

That's the last part of the loop, we jump here and back if we need to
increment the count. The compiler assumes that most of the time the if
will be false. Not clear how it guesses that, but it certainly guesses
right in this case.

The score in the loop:

In -O0 mode, for a space, 10 instructions, 5 memory references
             non-space:    7 instructions, 4 memory references

In -O2 mode, for a space,  8 instructions, 1 memory reference
             non-space:    6 instructions, 1 memory reference

So optimization sure helps!