Typechecking Pointless Programs
Here we describe how to typecheck Pointless programs, by going through
each
piece that makes up a pointless program. Pointless programs consist of
record
declarations, global declarations, and function definitions. We cover
each in
turn, doing record declarations first, since they are easier, and should
be
done first. Record declarations should be done first, since we need them
to
typecheck functions and global declarations. The other part of
typechecking
is setting the position of each reference in memory.
Memory management
Luckily for the implementors of pointless, we have no pointers, and
therefore no garbage, or garbage collection. Since everything is copied
into
and out of functions, simple stack based memory management is fine. To
help us
write the code generator, in addition to verifying the type correctness of
a
program, we also need to calculate the following:
- How much space each type will occupy. We assume that Ints, Bools,
and
Strings take up 1 unit of space, and a record is as big as the sum of
all
of its components. Since pointless has no pointers, recursive data
structures will take up an infinite amount of space. Therefore we
have no
recursive types.
- The field offsets for each field in a record type.
- The offset of each local variable (including formal parameters)
within the activation record of the function. For now, the offset of the
first variable should be 0, and the offset of
subsequent variables should be the sum of the sizes of the preceding
variables.
Record Declarations
When you run across a record, you need to check that it makes sense,
and
record the information, so you can use it to typecheck function
definitions
later. We need to check the for the following, when we look at record
declarations:
- Within any one record declaration, the same field name should not be
used twice.
- Fields must have valid types.
- The same type cannot be declared twice.
Notice that we need to know about all the record declarations to check
#2, above. This means you will have to typecheck record
declarations in two passes. In addition to checking records for
consistency,
you will need to store them them in order to determine if a field
reference
item.field is valid.
These are simply declarations of global variables. Their names and
types
should be entered into a table so that they can be used to typecheck
functions. Additionally, the following should be checked for each
declaration:
- No two global declarations have the same name.
- All global declarations declare variables to have valid types.
That's all there is to global declarations.
Now comes the complicated part... typechecking functions. There are a
number of steps to typechecking functions:
- Record the names of all the functions.
- Record the number and types of parameters for each function, as well
as
the return types.
NOTE: you must do these two steps first on all
functions before doing steps 3 and 4 on any of them. This is because
functions have an annoying tendency to call each other.
- Ensure that the function returns the type it is supposed to. This
involves ensuring the following are true.
- If a function has a non-void return type, there is a return
statement on the last line of the function.
- All return statements return the correct type.
- Ensure that the body of the function is a well-typed statement.
Statements
This of course depends on us being able to typecheck statements. There
are
several types of statements, and we cover each type in turn. Nevertheless
we
need to check that the expressions within the statement have consistent
types.
Unlike expressions, statements are not going to have a type, as a
statement
cannot be used as a value. Typechecking statements depends, of course, on
being able to typecheck expressions, which we handle later. We now present
each case below:
- Declarations
- Similarly to global declarations, above, we
have
to ensure that there are no names redeclared within the same
function,
and that the types used are valid. However, the declaration info
should
be stored in a separate table for each function, since two functions
can
obviously both define x, with different types. Also a
function-level
declaration can overshadow a global declaration. It is, however, an
error to define a variable with the same name as one of the
arguments,
as arguments and function-level declarations both have the same
scope.
- Blocks
- The key here is merely to ensure that every statement within the
block
is correctly typed.
- Return statements
- We should typecheck the expression being returned, and check that
it
has the same type as the return type.
- If statements
- An if statement can be looked at as if it were one IfClause,
followed
by any number of ElseIfClauses, and zero or one ElseClauses. Since
IfClauses and ElseIfClauses are the same from a typechecking point
of
view, we can regard If statements as being one or more IfClauses,
and
possibly an ElseClause. For each IfClause, we need to ensure that
the
conditional expression typechecks as a boolean expression, and that
the
statement typechecks. For the ElseClause we merely need to ensure
that
the statement is correctly typed.
- For statements
- We need to do the following to typecheck For statements:
- Check that both the expression after the "=" and the
expression
after the "to" typecheck, and have type Int.
- Add the variable defined after the "for" to the system, as if
it
had been declared in a statement just before the for. This
means it
will be in scope at the beginning of the for, and will remain in
scope until the end of the function. If two for statements use
the
same index variable, they use the same location for that index
variable.
- While Statements
- All we need do here is ensure that the expression typechecks to a
boolean expression, and that the statement typechecks.
- Variable References (used in Assign
Statements,
and as a case of Expr)
- A reference is valid if one of the following is true of it:
- It is of the form x, where x is a declared variable. In this
case
we say the reference has the same type the variable does.
- It is of the form R[expr], expr typechecks and has type int,
and R
is a valid reference of type T array. In that case, this
reference
has type T.
- It is of the form R.id, R is a valid reference of record type
T,
and T has a field named id, declared to have type U. In this
case,
the reference has type U.
Notice the recursion in parts 2 and 3. To check that
x.id[455].y.x
is valid, we first check that x.id[455].y is valid, which involves
checking x.id[455], which involves verifying x.id, which in turn
forces
us to check x.
- Assign Statements
- An assign statement forces us to do three things. First we must
make
certain that the RefExpr is a valid reference,
and determine that its type is T. Then we must typecheck the
expression, to find out that it has type U. Finally we must make
certain that T=U.
You may be asking yourself where function call and the like are. They
are
covered next, as expressions. They can both be used as statements, in
which
case the returned value is thrown away.
Expressions
We have, throughout our typechecking of statements had to typecheck an
expression, and ensure that it has, say, type T. We describe how to
typecheck
an expression and determine its type. We can do this by typechecking each
piece or child within the AST, and then looking at the types of the
children
to figure out whether the expression is correctly typed, and if so, what
type
it has. For convenience, we also want to label each expression node of
the
AST with its type. We cover each possible form of an expression E
below:
- (E1)
- We simply typecheck E1, which has type T, say. This tells that E
also
has type T.
- func(E1,...,En)
- We first check that there is a function func, and make sure it has
n
arguments. Next we typecheck each of E1, ..., En, and make sure
that
the types we get back, say T1, ..., Tn match the argument types of
func.
If they do, then if func is declared to return U, we know E has type
U.
- E1[E2]
- We typecheck E1 and E2, and ensure that E1 has type T array, and
T2
has type Int. If this is true, E has type T.
- E1.id
- We typecheck E1. If it typechecks, with record type T, then we
make
sure type T has a field named id, which has been declared to have
type
U. In that case, E has type U.
- id
- This is merely a reference to a variable. We first look
id up in the local scope. If it is found, and has type
U,
then we give E type U. Otherwise we look id up in the
global scope. If it if found there with type U, then E has type U.
Otherwise we cause a "variable not defined" error.
- -E1
- We typecheck E1, and make sure E1 has type Int. If so, then E has
type Int.
- E1 IntOp E2, where IntOp is one of
{
+,-,/,*}
- We typecheck E1 and E2 and ensure they have both type Int. If so,
then
E has type Int.
- E1 BoolOp E2, where BoolOp is one
of
{
=,>,<}
- We typecheck E1 and E2. If they both have type Bool, then E has
type
Bool.
- IntConst
- All integer literals have type Int.
- StringConst
- All string literals have type String.
- BoolConst
- true and false both have type Bool.
- $func(E1,...,En):T
- This is the syntax to make a C function call, such as printf,
which
would look like $printf("%d %d",x,y):Int. E1 through En are
typechecked
to make sure they typecheck. E is given type T. Since there is no
way
to check whether the C function even exists, or has the number and
types
of arguments you gave it, all of that is taken on faith. Messing
these
up will result in programs that segfault and dump core. On the plus
side, at least searching for $ will help you find your problem. E is
similarly given whatever type the user wants, T in this case. The
arguments and return types to C-calls must be 32 bits long, to make
implementing them easy.