Lecture Material from 201

Robert Dewar

```#include <stdio.h>

/* A C file showing indexing through arrays */

/* We are going to write a function that is given an array of int
values (actually a pointer to the first of these values), and
a count of the number of elements in the array, and returns the
sum of the elements of the array */

/* In this first example, we use array indexing, and we spell things out
in a rather full non-C style way */

int count1 (int v[], int c) {
int j;
int sum = 0;
for (j = 0; j < c; j = j + 1) {
sum = sum + v[j];
}
return sum;
}

/* Things to observe about the above program. The for loop is really not a
for loop at all, but rather a while. The first expression in the for (in
this case "j = 0" is initialization done once at the start of the loop.
The second expression is a test done at the start of the loop, just as
in a while loop. The third expression is the step statement (it almost
certainly must have a side effect, and will typically increment the
loop variable). It goes at the end of the loop. Thus the FOR here is
simply a shorthand for:

j = 0;
while (j < c) {
sum = sum + v[j];
j = j + 1;
}

The declarer int v[] says that v is a vector of int's. In fact this is
virtually the same as int *v in C, i.e. simply a pointer to the first
element of the array, and the expression v[j] means exactly *(v + j).
Note here that we are not simply adding the value of j to the pointer.
Each element of the array is (typically) four bytes long, the space
needed to hold an int, and so the pointer value we actually compute
is v + 4*j. The C compiler knows that the elements are four bytes
long and automatically adds the multiplication by 4) */

/* In this second example, we simply rewrite the above function in a manner
that is a little more practical C style. We still use array indexing. */

int count2 (int *v, int c) {
int j;
int sum = 0;
for (j = 0; j < c; j++)
sum += v[j];
return sum;
}

/* We really didn't chnage much. The *v is in fact much more typical of the
way that a C programmer regards arrays, they really are seen as pointers.
The use of ++ and += are very typical uses of these compact C notations */

/* Our third example differs from the second only at one small point */

int count3 (int *v, int c) {
int j;
int sum = 0;
for (j = 0; j < c; j++)
sum += j[v];
return sum;
}

/* You have to have sharp eyes to see the difference. We have rewritten v[j]
as j[v]. What's going on? Indexing an int value using an array as the
subscript??? Well this just helps to emphasize that what may look like
a conventional array reference is really just a pointer operation.
The expression j[v] means *(j + v) which has exactly the same value
as *(v + j) since addition does not care which way round its operands
are. Of course this is a bit of a perverse way of writing things, but
it works just fine */

/* Our fourth example avoids array indexing completely, and really thinks
of things as pointers, and is most likely the way a C programmer would
in practice write this function. */

int count4 (int *v, int c) {
int j;
int sum = 0;
for (j = 0; j < c; j++)
sum += *v++;
return sum;
}

/* Two things to mention here. First the notation *v++ is one we have seen
before. It gets the value at *v (the next array element we want to read),
and then increments it by 1, but again the compiler automatically knows
that since this is a pointer to type int, it is appropriate to add the
length of int, to get to the next int, rather than actually adding 1.
Secondly, this function clobbers the value of v, but that's OK, because
the caller passed a copy of this value, not the variable itself, and it
is this copy that is being clobbered (at the assembly level, it is the
parameter location on the stack, or a copy of it, that is being changed.

/* A main program to test these out */

int main () {
int x[10] = {1,2,3,4,5,5,4,3,2,1};
printf ("sum = %d, %d, %d, %d\n",
count1(x, 10), count2(x,10), count3(x,10), count4(x,10));
}

OK, let's see if that works

>gcc -c indexing.c
>gcc indexing.o -oindexing.exe
>indexing
sum = 30, 30, 30, 30

Here the first command compiled indexing.c into the object file indexing.o.
This object file is almost machine code, except that it has blanks left
for calling printf, since printf is in some other file.

Generally speaking, names are gone from object files. For instance, let's
look for the name sum, one of our local variables in our count functions.
The way we do this is to use the nm command if we have that around (in a
windows system, you can get this by obtaining the cygwin tools, in a unix
evironment it will be there).

>nm indexing.o
00000000 b .bss
00000000 d .data
00000000 t .text
U ___main
U __alloca
00000000 T _count1
0000003f T _count2
0000007e T _count3
000000bd T _count4
0000010b T _main
U _printf

What we see here is the ASCII names that are present in the generated object
module, along with some information about them. The first three are internal
sections used to generate data. We can ignore those for now.

The entries for ___main, __alloca and _printf are the unresolved (U for
unresolved) external references, they represent places where this object
code makes calls to routines in other units. For now, we have no idea where
these units are. The T entries for _count1,2,3,4 and _main represent routines
in this object file which could be called from other files. Note that the
locations within the object file are also given, so the machine code for
_count3 for example starts at offset 0000007e from the start of the code
generated for this file.

There is no trace of sum, since this symbol is used only internally within
the indexing.c file, it cannot be referenced from outside, so all references
to sum can be completely converted to machine language by the first gcc
command, and there is no need to include the name in the object file.

The second gcc command is the "link" step. The linker takes the object
file you give it (indexing.o) in this case (in large programs, you will
give it more than one file). Then it looks at the undefined references
like _printf, and goes and finds the units containing these routines in
the library. It then glues together all the units of the program, and now
that all the routines are present, it can fill in the addresses of the
CALL instructions that could not be completed before, e.g. the call to
printf.

Finally our command indexing (we had specified that the executable name be
indexing.o) will run the program, and lo and behold, we get what we expect!

Now what ASM is being generated, well the interesting thing is that if we
get the assembler file:

gcc -S indexing.c

then looking at the resulting assembler file, we find that the sequence of
assembler instructions is identical for count1, count2, count3. The compiler
understood that all of these mean exactly the same. Here is the code for
count1 (count2 and count3 are the same):

_count1:
pushl	%ebp                ; establish frame
movl	%esp, %ebp
subl	\$8, %esp            ; space for two locals, j and sum
movl	\$0, -8(%ebp)        ; initialize sum to zero
movl	\$0, -4(%ebp)        ; initialize j to zoer
L2:                                 ; start of the loop
movl	-4(%ebp), %eax      ; get value of j
cmpl	12(%ebp), %eax      ; exit from loop if j >= c
jl	L5
jmp	L3
L5:
movl	-4(%ebp), %eax      ; value of j
leal	0(,%eax,4), %edx    ; 4*j into edx
movl	8(%ebp), %eax       ; value of pointer v
movl	(%eax,%edx), %edx   ; get v[j] (double indexing here)
leal	-8(%ebp), %eax      ; point to sum
leal	-4(%ebp), %eax      ; point to j
incl	(%eax)              ; increment it
L3:
movl	-8(%ebp), %eax      ; load sum to return it
leave                       ; destroy frame

The code for count4 is a little different

_count4:
pushl	%ebp                ; establish frame
movl	%esp, %ebp
subl	\$8, %esp            ; space for two locals, j and sum
movl	\$0, -8(%ebp)        ; initialize sum to zero
movl	\$0, -4(%ebp)        ; initialize j to zero
L17:                                ; start of loop
movl	-4(%ebp), %eax      ; get value of j
cmpl	12(%ebp), %eax      ; exit from loop if j >= c
jl	L20
jmp	L18                 ; [all the same up to here]

;;;;;;;;;;;;;;;;;; here is different ;;;;;;;;;;;;;;;
L20:
movl	8(%ebp), %eax       ; copy pointer v to eax
movl	(%eax), %edx        ; get value of referenced element in edx
leal	-8(%ebp), %eax      ; point to sum
;;;;;;;;;;;;;;;;;; end of difference ;;;;;;;;;;;;;;;;

leal	-4(%ebp), %eax      ; point to j
incl	(%eax)              ; increment it
L18:
movl	-8(%ebp), %eax      ; load sum to return it
leave                       ; destroy frame

So it seems that indeed the pointer approach is a little more efficient

But that was unoptimized code, let's try optimizing the code (or in other
words tell the compiler to spend more time trying to make the code more
efficient):

gcc -S -O2 -fomit-frame-pointer indexing.c

Here the -O2 says to optimize, and -fomit-frame-pointer says "don't worry
about debugging, if you can avoid using ebp as a frame pointer and improve

Let's look at the code for count4 first:

_count4:
movl	8(%esp), %edx       ; c will be held in edx throughout
xorl	%eax, %eax          ; count will be in eax, set to zero
movl	4(%esp), %ecx       ; v will be in ecx throughout
cmpl	%edx, %eax          ; check special case of edx zero
jge	L32                 ; immediate exit if so
.p2align 4,,7               ; make sure loop is aligned
L30:                                ; start of loop
addl	\$4, %ecx            ; bump pointer
decl	%edx                ; decrement count
jne	L30                 ; loop till count exhausted
L32:
ret                         ; return with result in eax

That's really quite impressive. A lot has gone on here. First indeed there
is no need to mess with ebp, since everything can be indexed off esp.
Second, the parameters and the local variables are all held in registers.
Third, the loop works by counting c down to zero, instead of up. That's
equivalent, and allows the compiler to get rid of j completely.

The loop now is only four instructions, and it is hard to see how it could
be improved beyond this.

The whole function is only 10 instructions, instead of 20. But there is a
cost, namely in ability to debug. A programmer stopping the program on a
breakpoint in the middle of the loop and asking for the value of j will
not get much joy, how can the debugger tell you the value of j if the
procedure has been essentially transformed from:

int count4 (int *v, int c) {
int j;
int sum = 0;
for (j = 0; j < c; j++)
sum += *v++;
return sum;

to:

int count4 (int *v, int c) {
int sum = 0;
if (c != 0) {
loop: sum += *v++;
if (--c) goto loop;
}
}

and there is no j in sight? Answer it probably will simply say that it knows
nothing about j. But of course the result is still correct.

What about count1/2/3. Well we might expect the amazing optimizer to generate
exactly the same code as count4, since after all the meaning is identical, but
we don't quite get that, instead we get (for all three):

_count1:
pushl	%ebx                ; save ebx, we are going to use it
xorl	%eax, %eax          ; count will be in eax, set it to zero
movl	12(%esp), %ecx      ; get count c in ecx
xorl	%edx, %edx          ; edx will hold j, set it to zero
movl	8(%esp), %ebx       ; get array pointer in ebx throughout
cmpl	%ecx, %eax          ; exit if special case of zero
jge	L8
.p2align 4,,7
L6:
incl	%edx                ; bump index
cmpl	%ecx, %edx          ; check if index at limit
jl	L6                  ; loop back if not
L8:
popl	%ebx                ; restore ebx

That's 13 instructions instead of 10, but the all important loop is still
only four instructions. However, the four instructions have to do quite a
bit more, look at the addl instruction: