Architecture of the algol2MMIX compiler
The algol2MMIX compiler was designed with the goal of achieving the bootstrap in mind. Although a complete bootstrap has not been achieved, the general organization of the compiler is probably not to be blamed: in particular, splitting the compiler in different modules for lexical analysis, parsing, semantic analysis and code generation , made it possible to obtain partial bootstrap results, indicating that the part of the compiler needing more careful debugging to get a full bootstrap is the semantic analyzer (specifically, the way data structures are maintained and handled).
When implementing a compiler in the same language as the source
language, without access to any other compiler for the given language,
two approaches may be taken: either to write the new compiler in
another language for which a working compiler exists, and then
translate (in a more or less automated fashion) the new compiler into
the source language, or to do it the other way around: write the new
compiler in a stylized subset of the source language in such a way
that it is possible to (semi-)automatically convert it
into another programming language which has already been implemented.
In our case, the choice for the "other" language went to the C programming language, mainly because it was reasonably similar to both the source language (since C is a descendent of Algol68, from which Algol68Nix has been derived) and the target language (the semantics of MMIX assembly is often given in term of equivalent C code). As for which of the above two "roads" to follow, a hybrid approach was taken: the new compiler has been written in a subset of the intersection of Algol68Nix and C. Then, taking advantage of the macro facilities provided by the C language, the code was automatically converted into C for testing and debugging, and semiautomatically into algol68Nix to obtain the final working version of the compiler in its own source language.
The parser implemented for the algol2MMIX works by recursive descent. The AST corresponding to a given sequence of tokens is built in a top-down fashion: at the beginning the parser creates a node to represent the whole program (which, in our case, is simply an identity declaration of the main procedure), and then it recursively creates nodes to represent the syntactic entities that constitutes a program. When the parser sees a sequence of tokens that correspond to one of the deepest node, it can tell that it is arrived to a leaf: it then fills the node with the information given by the token, and return from the recursive call. At the end of this process, the whole tree has been constructed, and the sequence of tokens will be completely exhausted.
To avoid exponential backup, the recursive descent is driven by the next token in the sequence. In a few cases this is not enough, so that more tokens of lookahead are necessary, or the grammar for the language needs to be accommodated. Two cases where such restrictions were deemed necessary for the algol68Nix language are procedure calls and array subscripting, as described in another part of this report.
A last remark about the recursive-descent parser implemented regards the data structure used to maintain the semantic attributes associated with each construct: instead of keeping a separated table usually referred to as the Symbol Table, all the relevant information are directly stored within each node of the tree. This leads to some amount of space waste (particularly given the absence of UNION's in the implemented sublanguage), but is conceptually clearer and allows for a simple way to handle input/output of ASTs.
The main semantic tasks carried out by the back-end are type-checking (procedure annotatemode) and coercion inference (here the term coercion inference is used to indicate the process of computing all the coercions that must be performed to convert the actual mode of a unit into the mode required in a given context; see procedures annotatecontext , admissiblemodes, iscoercibleto).
As a by-product of these activities, two tables are constructed: one to verify the well-formedness of (recursive) user-defined modes (that is also useful to check for "mode equality"; see usermodetable and procedures gatherusermodes, modecmp), and another to maintain all the information about the blocks introduced in the program (see blocktable and procedure gatherblock).
The code generation phase (procedure generatemms) starts generating a standard assembly preamble (procedure issuemmsheader) that sets up the Data Segment (that will contain the Stack of alive activation records), the Pool Segment (that will be used as the Heap for the allocation of objects with a dynamic lifetime), and the Text Segment (for the actual code). Then, the code generator recursively translates (procedure issuecode) each Algol68Nix construct (represented by a node of the AST) into the corresponding sequence of MMIX assembly instructions. For greater modularity, the procedure issuecode acts as dispatcher, inspecting the actual type type of the construct at hand, and then forwarding the call to a procedure of the form issuetype.
Among those ad hoc procedures, an interesting one is the procedure issueproccall, that implements the calling sequence designed for algol2MMIX (see figure).
The calling sequence, depicted in the figure above, has the caller doing almost all the work; if we let f be the procedure about to call another procedure g, then the caller (i.e. the procedure f) performs the following steps:
The only duty left to the callee (i.e. the procedure g) is to "clean up" after it has finished its computation: in particular, g should move the Stack Pointer SP to the cell currently pointed by the Frame Pointer FP, and the Frame Pointer FP back to the base of f's activation record (using the dynamic link), right before jumping back to the return address provided in the first octa of its own activation record; additionally, in the case g has a non-VOID return type, the computed value must be left in a special register, specifically reserved for this purpose.
|Last modified: 08/30/02||Antonio Nicolosi|