Generating Assembly Code
Due Sunday, December 23
Having generated intermediate code, you
are now faced with actually generating assembly.
Unless it is impossible for you, you should be generating assembly code
for an x86 machine running either
- Linux , with gcc installed, or
- Windows, with cygwin installed. To install cygwin on a windows machine, go
to http://www.cygwin.com
and click on "install now". When asked to select packages, click "devel" and then
scroll down to make sure that the box in the "bin" column next to
"gcc" is checked. This will ensure that gcc is included in the cygwin package.
If you do not have easy access to such a machine, please contact Amir.
On the course web page, there are several assembly references. The
first one to look at is
the Linux Assembly HOWTO, although the only sections that are
relevant are those for gcc and gas (the Gnu assembler). Then, look at
the 386
Assembly Reference Manual to see the syntax of the assembly instructions
that the Gnu assembler recognizes.
Here's the basic idea:
- Your compiler will generate an assembly file with a ".s" extension.
- The assembly file will be assembled and linked using gcc, as follows
gcc foo.s bar.c bif.o
where, for example, foo.s is the output of your compiler and bar.c and bif.o
are other files containing, say, library functions that can be called from
the source program.
- Programs written in the source language should be able to call C
procedures. Therefore, you should make sure that you use the
same calling convention (i.e. how parameters are passed, how result
values are returned from a function, which registers are saved by
whom) that gcc uses on your system.
An external C procedure can be used simply by putting a forward declaration
in the source program with the same name. For example, if the
external C procedure is of the form
void foo(int n)
{ ... }
then in the source program the forward declaration
Procedure foo(n: integer); forward
should appear.
The pre-defined procedures writestring, readstring,
writeint, and readint can simply be defined in C, as,
for example,
void writestring(char *s)
{ printf("%s",s);
}
The linker will then be able to link the code from your .s file and
the .c file together.
- The body of the main program (i.e. the outermost block) should be compiled
into an assembly procedure named
main. Thus,
this code will be the first code executed (just like the main()
procedure in a C program).
The very first thing you should
do is write a simple C program and look at the assembly code that gcc generates
for it. You do this by
gcc -S simple.c
where simple.c is your simple program. gcc will generate an assembly file
called simple.s . Your compiler should use the same naming convention (for
function names) and calling convention. Note that in some versions of
gcc/gas, function and procedure names have a "_" prepended to the front of
them (hence, those names in assembly look like "_main", "_foo", etc). In
other versions of gcc/gas, the names in assembly look just like the names
in C. Just take a look at the simple.s file that you generate to see
what your version of gcc does.
Please consult the directory
Assembler programs samples. This
directory contains in file "pas.lib.c" a C code for the I/O functions
"readint", "writeint", readstring", and "writestring". File
"pas.lib.s" contains assembler code for these four routines. In
addition, the directory contains 4 samples of assembly code that
invokes these routines.
Restrictions
To simplify the assignment, your compiler should not accept
declarations or operations on records. All assignments should be
between integers, booleans, and strings. Only integers, booleans, or
strings are acceptable as parameters.
Any violation of these restrictions should be flagged as an error with
the error message "Not currently implemented".
Sizes of Values
- Integers are 32-bit signed quantities
- Booleans occupy 32 bits. The false value is represented as 0 and
the true value as 1.
- The elements of an array are allocated contiguously. Each element
of an array occupies a multiple of 32 bits (you will notice that no
value of any type occupies less than 32 bits). On the stack, the
lowest indexed element of an array occupies the lowest address
(i.e. the address of a[n] is always lower than a[n+1]). We only allow
(possibly multi-dimensional) arrays of integers and booleans, but not
of strings. Arrays cannot be passed as parameters.
- A string value is the 32 bit address of a block of characters
allocated using the .ascii assembler directive. The only operation on
strings is assignment and parameter passing, which simply involves
copying the 32 bit address.
Parameter Passing
Parameter passing is completely by value. That is, the value of every
argument is copied into the activation
record of the called procedure or function. For a string parameter,
what is passed is the address of (pointer to) the string.
Returning Values
-
Your compiler will only have to support functions that return 32 bit
values (in particular, only integers, booleans, and strings). The
return value must be placed in the EAX register.
-
The return value of a function is specified by the assignment to a
variable with the same name as the function. To support this, a 32 bit
local variable should be allocated on the stack to hold the return
value. Thus, before a function returns, it should move the value of
that local variable into the EAX register.
Register Allocation
At this point, you are not expected to perform any kind of
sophisticated register allocation. Just be aware of which registers
have special functions (indexing, etc.). Also, once a variable's value
is moved into a register, try to use the register for subsequent
accesses to that variable (until it is overwritten). However, you
don't have to work hard to minimize loads and stores.
Compile-Time Information
Your compiler, during code generation, should keep track of at least the following information:
- Local Variables and Parameters: size, offset in stack frame, register location (if any)
- Expressions: Location of result
- Arrays: number of elements, size of the element type
- Registers: Which are available, which are taken (don't forget to consider their capabilities)
Code Generation
Code generation is divided up into roughly three parts:
- The layout/declarative section.
You'll have one of these at the top of your program, which will
have
the space for all the global variables. It will also have a global
string table, containing all the string constants used in the program,
so
that you can refer to them. After this, you will have the declaration
for
each function defined in your program. Functions will consist of the
following:
- the label for the function
- code saving the callee saved registers
- the code generated for the function itself
- code restoring the callee saved registers
- The actual code generated.
You will need to write a procedure in your compiler for each kind of
intermediate code construct (label, goto, binary op, etc.),
which generates an equivalent assembly language statement. Note that these
functions will of course have to use the space/type info, in order to
calculate offsets into the activation record, etc. For a label
statement, for example, you would have to generate a label. For an if
statement, you will have to generate a compare statement and then the
appropriate branch statement. Notice that param statements generally translate into
copying parameters to the stack before the function call (see
"parameter passing", above). Be sure to pass the parameters on the stack in such
a way that the first formal parameter ends up is nearest to the base pointer (i.e. frame pointer).
To accomplish this, arguments should be pushed onto the stack in reverse (i.e. right-to-left)
order.
- The details
This phase also has a number of details to get right, which will
likely
be the cause of many bugs. Expect to spend significant time debugging
your code, and allow for that time. A short list of things to watch
for
is:
- Are you using the correct calling conventions? This applies
also to
the entering and exiting of the whole program, as
main
is just another function to be called, from the OS's point of
view.
Messing up here will cause the program to crash before entering,
or
after exiting, which is frustrating.
- Are you generating a unique label every time you need one?
- Are you putting your formal parameters and local variables on
the
stack the same way your gcc does? Doing this will not only
ensure
that your program works, but that it is able to call external
C functions.
The registers %ebp, %esi, %edi, and %ebx
are callee-saved (i.e. saved upon entering a function and restored just
before exiting), the other
registers are caller-saved (i.e. saved right before a call to a function
and restored after the call finishes) ; %eax is to hold the result, or
%edx:%eax for 64-bit results. See the
Linux assembly HOWTO
document.
- If you have any bugs in your type-checker or intermediate code
generator, they will all come out and cause problems now!
- The following short
X86 reference will be quite useful.
- Among the provided sample programs, you should only test those
that do not contain errors, refer to records, or transfer arrays as
parameters. This includes the sample programs test0011.pas,
test0016.pas, test0017.pas, test0018.pas, test0019.pas, and
test0025.pas.
Good Luck!