Typechecking Pointless Programs

Here we describe how to typecheck Pointless programs, by going through each piece that makes up a pointless program. Pointless programs consist of record declarations, global declarations, and function definitions. We cover each in turn, doing record declarations first, since they are easier, and should be done first. Record declarations should be done first, since we need them to typecheck functions and global declarations. The other part of typechecking is setting the position of each reference in memory.

Memory management

Luckily for the implementors of pointless, we have no pointers, and therefore no garbage, or garbage collection. Since everything is copied into and out of functions, simple stack based memory management is fine. To help us write the code generator, in addition to verifying the type correctness of a program, we also need to calculate the following:

  1. How much space each type will occupy. We assume that Ints, Bools, and Strings take up 1 unit of space, and a record is as big as the sum of all of its components. Since pointless has no pointers, recursive data structures will take up an infinite amount of space. Therefore we have no recursive types.
  2. The field offsets for each field in a record type.
  3. The offset of each local variable (including formal parameters) within the activation record of the function. For now, the offset of the first variable should be 0, and the offset of subsequent variables should be the sum of the sizes of the preceding variables.

Record Declarations

When you run across a record, you need to check that it makes sense, and record the information, so you can use it to typecheck function definitions later. We need to check the for the following, when we look at record declarations:

  1. Within any one record declaration, the same field name should not be used twice.
  2. Fields must have valid types.
  3. The same type cannot be declared twice.

Notice that we need to know about all the record declarations to check #2, above. This means you will have to typecheck record declarations in two passes. In addition to checking records for consistency, you will need to store them them in order to determine if a field reference item.field is valid.

Global Declarations

These are simply declarations of global variables. Their names and types should be entered into a table so that they can be used to typecheck functions. Additionally, the following should be checked for each declaration:

That's all there is to global declarations.

Function Definitions

Now comes the complicated part... typechecking functions. There are a number of steps to typechecking functions:

  1. Record the names of all the functions.
  2. Record the number and types of parameters for each function, as well as the return types.

    NOTE: you must do these two steps first on all functions before doing steps 3 and 4 on any of them. This is because functions have an annoying tendency to call each other.

  3. Ensure that the function returns the type it is supposed to. This involves ensuring the following are true.
  4. Ensure that the body of the function is a well-typed statement.

Statements

This of course depends on us being able to typecheck statements. There are several types of statements, and we cover each type in turn. Nevertheless we need to check that the expressions within the statement have consistent types. Unlike expressions, statements are not going to have a type, as a statement cannot be used as a value. Typechecking statements depends, of course, on being able to typecheck expressions, which we handle later. We now present each case below:

Declarations
Similarly to global declarations, above, we have to ensure that there are no names redeclared within the same function, and that the types used are valid. However, the declaration info should be stored in a separate table for each function, since two functions can obviously both define x, with different types. Also a function-level declaration can overshadow a global declaration. It is, however, an error to define a variable with the same name as one of the arguments, as arguments and function-level declarations both have the same scope.
Blocks
The key here is merely to ensure that every statement within the block is correctly typed.
Return statements
We should typecheck the expression being returned, and check that it has the same type as the return type.
If statements
An if statement can be looked at as if it were one IfClause, followed by any number of ElseIfClauses, and zero or one ElseClauses. Since IfClauses and ElseIfClauses are the same from a typechecking point of view, we can regard If statements as being one or more IfClauses, and possibly an ElseClause. For each IfClause, we need to ensure that the conditional expression typechecks as a boolean expression, and that the statement typechecks. For the ElseClause we merely need to ensure that the statement is correctly typed.
For statements
We need to do the following to typecheck For statements:
  1. Check that both the expression after the "=" and the expression after the "to" typecheck, and have type Int.
  2. Add the variable defined after the "for" to the system, as if it had been declared in a statement just before the for. This means it will be in scope at the beginning of the for, and will remain in scope until the end of the function. If two for statements use the same index variable, they use the same location for that index variable.
While Statements
All we need do here is ensure that the expression typechecks to a boolean expression, and that the statement typechecks.
Variable References (used in Assign Statements, and as a case of Expr)
A reference is valid if one of the following is true of it:
  1. It is of the form x, where x is a declared variable. In this case we say the reference has the same type the variable does.
  2. It is of the form R[expr], expr typechecks and has type int, and R is a valid reference of type T array. In that case, this reference has type T.
  3. It is of the form R.id, R is a valid reference of record type T, and T has a field named id, declared to have type U. In this case, the reference has type U.

Notice the recursion in parts 2 and 3. To check that x.id[455].y.x is valid, we first check that x.id[455].y is valid, which involves checking x.id[455], which involves verifying x.id, which in turn forces us to check x.

Assign Statements
An assign statement forces us to do three things. First we must make certain that the RefExpr is a valid reference, and determine that its type is T. Then we must typecheck the expression, to find out that it has type U. Finally we must make certain that T=U.

You may be asking yourself where function call and the like are. They are covered next, as expressions. They can both be used as statements, in which case the returned value is thrown away.

Expressions

We have, throughout our typechecking of statements had to typecheck an expression, and ensure that it has, say, type T. We describe how to typecheck an expression and determine its type. We can do this by typechecking each piece or child within the AST, and then looking at the types of the children to figure out whether the expression is correctly typed, and if so, what type it has. For convenience, we also want to label each expression node of the AST with its type. We cover each possible form of an expression E below:

(E1)
We simply typecheck E1, which has type T, say. This tells that E also has type T.
func(E1,...,En)
We first check that there is a function func, and make sure it has n arguments. Next we typecheck each of E1, ..., En, and make sure that the types we get back, say T1, ..., Tn match the argument types of func. If they do, then if func is declared to return U, we know E has type U.
E1[E2]
We typecheck E1 and E2, and ensure that E1 has type T array, and T2 has type Int. If this is true, E has type T.
E1.id
We typecheck E1. If it typechecks, with record type T, then we make sure type T has a field named id, which has been declared to have type U. In that case, E has type U.
id
This is merely a reference to a variable. We first look id up in the local scope. If it is found, and has type U, then we give E type U. Otherwise we look id up in the global scope. If it if found there with type U, then E has type U. Otherwise we cause a "variable not defined" error.
-E1
We typecheck E1, and make sure E1 has type Int. If so, then E has type Int.
E1 IntOp E2, where IntOp is one of {+,-,/,*}
We typecheck E1 and E2 and ensure they have both type Int. If so, then E has type Int.
E1 BoolOp E2, where BoolOp is one of {=,>,<}
We typecheck E1 and E2. If they both have type Bool, then E has type Bool.
IntConst
All integer literals have type Int.
StringConst
All string literals have type String.
BoolConst
true and false both have type Bool.
$func(E1,...,En):T
This is the syntax to make a C function call, such as printf, which would look like $printf("%d %d",x,y):Int. E1 through En are typechecked to make sure they typecheck. E is given type T. Since there is no way to check whether the C function even exists, or has the number and types of arguments you gave it, all of that is taken on faith. Messing these up will result in programs that segfault and dump core. On the plus side, at least searching for $ will help you find your problem. E is similarly given whatever type the user wants, T in this case. The arguments and return types to C-calls must be 32 bits long, to make implementing them easy.