Start Lecture #11
Production | Semantic Rule |
---|---|
e → t | e.addr = t.addr |
e.code = t.code | |
e → e1 + t | e.addr = new Temp() |
e.code = e1.code || t.code || gen(e.addr = e1.addr + t.addr) | |
e → e1 - t | e.addr = new Temp() |
e.code = e1.code || t.code || gen(e.addr = e1.addr - t.addr) | |
t → f | t.addr = f.addr |
t.code = f.code | |
t → t1 * f | t.addr = new Temp() |
t.code = t1.code || f.code || gen(t.addr = t1.addr * f.addr) | |
t → t1 / f | t.addr = new Temp() |
t.code = t1.code || f.code || gen(t.addr = t1.addr / f.addr) | |
f → ( e ) | f.addr = e.addr |
f.code = e.code | |
f → NUM | f.addr = get(NUM.lexeme) |
f.code = "" | |
f → if | f.addr = if.addr |
f.code = if.code | |
if → ID | f.addr = get(ID.lexeme) |
f.code = "" | |
if → ID [ expressions ] | Done later |
if → ID ( expressions ) | Done later |
The goal is to generate 3-address code for expressions.
We will generate them using the natural
notation of
6.2.
In fact we assume there is a function gen() that given the pieces
needed does the proper formatting so gen(x = y + z) will output the
corresponding 3-address code.
gen() is often called with addresses rather than lexemes like x.
The constructor Temp() produces a new address in whatever format gen
needs.
Hopefully this will be clear in the tables that follow
We will use two attributes code and address. For a parse tree node the code attribute gives the three address code to evaluate the input derived from that node. In particular, code at the root evaluates the entire expression.
The attribute addr at a node is the address that holds the value calculated by the code at the node. Recall that unlike real code for a real machine our 3-address code doesn't reuse addresses.
As one would expect for expressions, all the attributes in the table to the right are synthesized. The table is for the expression part of the lab 3 grammar. To save space let's use ID for IDENTIFIER, lv for lvalue, e for expression, t for term, and f for factor.
Since our current objective is primarily to illustrate the usage of the code and addr attributes, we omit arrays and function calls within expressions.
We saw this in chapter 2.
The method in the previous section generates long strings and we walk the tree. By using SDT instead of using SDD, you can output parts of the string as each node is processed.
The idea is that you associate the base address with the array name. That is, the offset stored in the identifier table is the address of the first element of the array. The indices and the array bounds are used to compute the amount, often called the offset (unfortunately, we have already used that term), by which the address of the referenced element differs from the base address.
Production | Semantic Rules |
---|---|
fd → FUNC np RET t IS ds BEG s ss END ; | ds.offset = 0 |
pd → PROC np IS ds BEG s ss END ; | ds.offset = 0 |
np → di ( ps ) | di | not used yet |
ds → d ds1 | d.offset = ds.offset |
ds1.offset = d.newoffset | |
ds.totalSize = ds1.totalSize | |
ds → ε | ds.totalSize = ds.offset |
d → di : t ; | addType(di.entry, t.type) |
addBaseType(di.entry, t.basetype) | |
addSize(di.entry, t.size) | |
addOffset(di.entry, d.offset) | |
d.newoffset = d.offset + t.size | |
t → ARRAY [ NUM ] OF t1 ; | t.type = array(NUM.value, t1.type) |
t.basetype = t1.basetype | |
t.size = NUM.value * t1.size | |
t → INTEGER | t.type = integer |
t.basetype = integer | |
t.size = 4 | |
t → REAL | t.type = real |
t.basetype = real | |
t.size = 8 |
To implement this technique, we store the base type of each identifier in the identifier table. For example, consider
arr: array [ 10 ] of integer ; x : real ;Our previous SDD for declarations calculates the size and type of each identifier. For arr these are 40 and array(10,integer). The enhanced SDD on the right calculates, in addition, the base type. For arr this is integer. For a scalar, such as x, the base type is the same as the type, which in the case of x is real.
Instead of a column distinguishing synthesized and inherited attributes, I now highlight in pink the inherited ones. This is not needed; you can look at the LHS of a rule and see if the rule is inherited or synthesized.
Calculating the address of an element of a one dimensional array is easy. The address increment is the width of each element times the index (assuming indexes start at 0). So the address of A[i] is the base address of A, which is the offset component of A's entry in the identifier table, plus i times the width of each element of A.
The width of each element is the width of what we have called the
base type.
So for an ID the element width is
sizeof(getBaseType(ID.entry.type)).
For convenience we define getBaseWidth by the formula
getBaseWidth(ID.entry) = sizeof(getBaseType(ID.entry.type))
Let us assume row major ordering. That is, the first element stored is A[0,0], then A[0,1], ... A[0,k-1], then A[1,0], ... . Modern languages use row major ordering.
With the alternative column major ordering, after A[0,0] comes A[1,0], A[2,0], ... .
For two dimensional arrays the address of A[i,j] is the sum of three terms
Remark: Our grammar really declares one dimension
arrays of one dimensional arrays rather than 2D arrays.
I think this makes it easier.
We could make the SDD above more fancy and capture for a declaration
like
A : array [5] of array [9] of real;
all the values needed to compute the offset
of an element.
However, we won't do this and for lab4 will only have 1D arrays.
The generalization to higher dimensional arrays is clear.
Consider the following expression containing a simple array reference, where a and c are integers and b is a real array.
a = b[3*c]We want to generate code something like
T1 = 3 * c // i.e. mult T1,3,c T2 = T1 * 8 // each b[i] is size 8 a = b[T2] // Uses the x[i]If we considered it too easy to use the special form we would generate something likespecial form
T1 = 3 * c T2 = 8 * T1 T3 = &b T4 = T2 + T3 a = *T4
Production | Semantic Rules |
---|---|
if → ID [ e ] |
if.t1 = new Temp() if.addr = new Temp if.code = e.code || gen(if.t1 = e.addr * getBaseWidth(ID.entry)) || gen(if.addr = get(ID.lexeme)[if.t1]) |
if → ID [ e ] |
if.t1 = new Temp() if.t2 = new Temp() if.t3 = new Temp() if.addr = new Temp if.code = e.code || gen(if.t1 = e.addr * getBaseWidth(ID.entry)) || gen(if.t2 = &get(ID.lexeme)) || gen(if.t3 = if.t2 + if.t1) gen(if.addr = *if.t3) |
To include arrays we need to specify the semantic actions for the
production
identifier-factor → IDENTIFIER [ expressions ]
Since, at least for now, we will limit ourselves to one-dimensional
arrays, we replace expressions by simply expression, which we
abbreviate as e.
The table on the right does this in two ways, both with and without using the special addressing form x[i].
Normally lisp is taught in our programming languages course, which is a prerequisite for compilers. If you no longer remember lisp, don't worry.
special formsthat are evaluated differently.
special formin that, unlike the normal rules for three-address code, we don't use the address of i but instead its value. Specifically the value of i is added to the address of a.
I also included a version without using a[i] for two reasons.
It was definitely instructive for me! The rules for addresses in 3-address code also include
a = &b a = *b *a = bwhich are other
special forms. They have the same meaning as in the C programming language.
Let's carefully evaluate the simple example above
This is an exciting moment. At long last we really seem to be compiling!
Production | Semantic Rules |
---|---|
ids → ID ra | ra.id = id.entry
ids.code = ra.code
|
ra → := e ; | ra.code = e.code || gen(ra.id.lexeme=e.addr) |
ra → [ e ] := e1 ; | ra.t1 = newTemp() ra.code = e1.code || e.code || gen(ra.t1 = getBaseWidth(ra.id.lexeme) * e.addr || gen(ra.id.lexeme[ra.t1]=e1.addr) |
Now that we can evaluate expressions (even including one-dimensional array reverences) we need to handle the left-hand side of an assignment statement (which also can be an array reference). Specifically we need semantic actions for the following productions from the lab3 grammar.
identifier-stmt → IDENTIFIER rest-of-assignment rest-of-assignment → = expression ; rest-of-assignment → [ expressions ] = expression
Once again we restrict ourselves to one-dimensional arrays, which
corresponds to replacing expressions
by expression
in
the last production.
Recall the program we could partially handle.
procedure test () is y : integer; x : array [10] of real; begin y = 5; // we haven't yet done statements x[2] = y; // type error? end;Now we can do the statements.
What about the possible type error?
Let's take the last option.
Homework: What code is generated for the program written above?