Description of Algol68Nix

 Prev   Top   Next 

Algol68Nix, the language implemented by the algol2MMIX compiler, is a sublanguage of the Algol68 programming language, designed to be as simple as possible, while remaining powerful enough to write complex programs such as a compiler for the language Algol68Nix itself.
A formal description of the BNF grammar for Algol68Nix can be found here. The rest of this document highlights some of the most important syntactic and semantic features of the language.

Following the indication given in class, Algol68Nix allows integers, characters and booleans as basic types (or modes, using Algol68-ish terminology), freeing itself from the intrinsic complexity of dealing with IEEE floating point numbers and related operations. The special basic mode VOID is also provided. As for complex modes, the language provides REF's, STRUCT's, monodimensional arrays (or rows) and procedures. As usual, pointers comes for free as REF REF modes.

The main syntactical conventions enforced by the grammar of Algol68Nix are the following:

To declare variables (i.e. REF modes), only the long form is allowed:
Illegal: Legal:
CHAR answer := 'Y' REF CHAR answer = LOC CHAR := 'Y'

The type-checker of the compiler allows any possible combination of mode constructors with the basic modes (in particular, recursive modes are allowed), but the code generation phase cannot deal with formal parameters of complex modes other than REFs. For this reason, string literals (that are often passed as a parameter to I/O procedures) have mode REF [] CHAR, instead of the more natural [] CHAR.

Within a block, declarations and units can be freely mixed (so a name can be declared just before being used, and not necessarily at the beginning of the block); the scope of each declared name is taken to be the entire block where the declaration lies. However, the binding of the name to its value actually takes place only when the flow of control reaches the declaration. This means that using a variable before its declaration should still be considered an error --- a semantic error, not a syntax error. E.g:

   INT x = y + 3 ;
   INT y = 2;


The value yielded by this block is not well-defined, because y is used before its declaration. In this specific case, in elaborating the declaration INT x = y + 3, algol2MMIX would assume a default value of 0 for y, thus assigning 3 to x.

Nevertheless, the possibility to declare names after their use is very helpful; in particular, it allows for recursive mode definitions and recursive functions definitions, without the need for C-like stub declarations (a.k.a. forward declarations). E.g:

  { definitions of the recursive modes NODE and LIST }
  MODE NODE = STRUCT (REF [] CHAR info, LIST next) ;


  { definitions of the mutually recursive procedures even and odd }
  PROC (INT) BOOL even = (INT n) BOOL :
    IF n = 0
       odd (n-1)
    FI ;
  PROC (INT) BOOL odd = (INT n) BOOL :
    IF n = 1
       even (n-1)
    FI ;

As the original language, Algol68Nix is a block-structured language: the basic building construct is the block, and the closed clause (i.e. the BEGIN ... END construct) is not the only way to define new blocks. In particular, both the IF...FI construct and the FOR...WHILE...DO...OD construct introduces several nested block, as in the full language (see figure), following the basic idea that a name is visible if its definition must have been elaborated (so that names introduced in the IF will be visible up to the FI, while names introduced in the THEN are not visible in the ELIF/ELSE part, etc.)

Nested blocks introduced by IF and DO constructs

The mode of a unit (the Algol68-ish term for what remains of statements and expressions after they have been combined into a unique language entity) depends not only on its intrinsic mode, but also on the context in which it is used. The mechanism under which the mode required in a given point of the program is derived from the actual mode of the unit is known as coercion. In Algol68Nix there are four coercions: deproceduring, dereferencing, weak-dereferencing and voiding. Which coercion is available when, and the order in which they should be applied, depends on the same rules as in the full language. For the implementation of the algol2MMIX compiler, chapter 10 of the book Programming in Algol68 Made Easy was used as reference.

The use of implicit mode conversions (or coercions) adds a subtlety in the implementation of construct such as IF...THEN...ELSE...FI, because at compile time it is not known which alternative will be undertaken. Therefore, both alternatives must yield the same mode: this is accomplished through a process known as clause balancing. In the implementation, clause balancing has been implemented and tested for simple IF constructs (i.e. with no ELIF part).

Formally, a program consists of the declaration of a unique procedure. Of course, nested procedure are allowed, so that a typical way to organize a program would be as a procedure main within which several other procedure are declared and invoked. Notice that the outermost procedure need not be called main: any (more creative) name is allowed.
In order to access command-line arguments, the outermost procedure should have the mode PROC (INT, REF [] REF [] CHAR) INT: following the conventions of the C language, the number of command-line arguments will be passed by the OS to the program as the first formal parameter, while the second formal parameter will by a reference to a row of strings (which, in turn, are references to row of characters).

In Algol68Nix, all the I/O happens using a rudimentary interface, whose functionalities are implemented in terms of the primitive functions provided by the OS for MMIX:

FOPEN(handle, name, mode)
FOPEN associates handle (a one-byte integer) to the file name and allows to do input/output on that file. Returns 0 if the file is opened with success, otherwise returns -1.
FCLOSE closes handle, if it is open. Returns 0 if the operation is successful, otherwise returns -1.
FREAD(handle, buffer, size)
FREAD reads the next size characters from handle and puts the result in buffer. If an error occurs, the value -1 is returned, otherwise 0 is returned.
FGETS(handle, buffer, size)
FGETS reads from handle until either size-1 characters have been read, or a newline character has been read and stored. If an error or end of file occurs, then -1 is returned; otherwise the number of characters successfully read is returned.
FWRITE(handle, buffer, size)
FWRITE writes size characters from buffer to the file associated with handle. If no error occurs, the 0 is returned, otherwise -1 is returned.
FPUTS(handle, string)
FPUTS write a string to the file associated with handle. If no error occurs, the 0 is returned, otherwise -1 is returned.
FPUTNL writes a newline on the file associated with handle. If no error occurs, the 0 is returned, otherwise -1 is returned.

Notice that, although functions for I/O with arbitrary files are provided, the algol2MMIX compiler does all its I/O through standard input/standard output.

The above mentioned functions, together with an exit function with a clear meaning, are implemented as intrinsic functions: the compiler knows what code to generate for them (essentially TRAP calls to the OS), but still their prototypes must be explicitly included in order to ensure a correct type-checking of a program. For example:

PROC( INT, REF [] REF [] CHAR) INT main = ( INT argc, REF [] REF [] CHAR argv) INT :

  PROC(INT, REF [] CHAR) INT fputs = SKIP ;
  PROC(INT) INT fputnl = SKIP ;

  PROC INT greet = INT :
    fputs(1, "Hello world") ;
  END ;


Another limitation of the current implementation of the algol2MMIX compiler with respect to the full Algol68 language is that the primary of a procedure call or of an array subscript must be an identifier. Indeed, this is not a crucial limitation: the following table illustrates two pieces of code that are illegal in Algol68Nix, and how they could be modified to make the compiler happy:

Illegal: Legal:
((INT x) INT : x * x) (5) PROC (INT) INT f = (INT x) INT : x * x ;
f (5)
MODE PERSON = (INT age, [10] CHAR name) ;

(name OF myself)[0] := 'A'
MODE PERSON = (INT age, [10] CHAR name) ;

REF [] CHAR myname = name OF myself ;
myname[0] := 'A'

Last modified: 08/30/02
Antonio Nicolosi