Compilers Lecture #8

Remark: There was a question last time about SLR concerning B⇒*ε. Consider A→α·Bβ. Can we consider the dot to be on the other side of B since B derives ε? I said I thought not and want to add that, since B derives ε, these productions will appear in the LR(0) automaton and hence will be taken care of without any extra rules here.

Remark: Do 7+6/3 on board using the SDD from the end of the previous lecture (should have been done last time).

5.1.2: Evaluating an SDD at the Nodes of a Parse Tree

If we are given an SDD and a parse tree for a given sentence, we would like to evaluate the annotations at every node. Since, for synthesized annotations parents can depend on children, and for inherited annotations children can depend on parents, there is no guarantee that one can in fact find an order of evaluation. The simplest counterexample is the single production A→B with synthesized attribute A.syn, inherited attribute B.inh, and rules A.syn=B.inh and B.inh=A.syn+1. This means to evaluate A.syn at the parent node we need B.inh at the child and vice versa. Even worse it is very hard to tell, in general, if every sentence has a successful evaluation order.

All this not withstanding we will not have great difficulty because we will not be considering the general case.

Annotated Parse Trees

Recall that a parse tree has leaves that are terminals and internal nodes that are non-terminals. We when we decorate the parse tree with attributes, the result is called an annotated parse tree, which is constructed as follows. Each internal node corresponds to a production with the symbol labeling the node the LHS of the production. If there are no attributes for the LHS in this production, we leave the node as it was (I don't believe this is a common occurrence). If there are k attributes for the LHS, we replace the LHS in the parse tree by k equations. The LHS of the equation is the attribute and the right hand side is its value. Note that the annotated parse tree contains all the information of the original parse tree since we replaced something like E with something like E.att=7.

Why Have Inherited Attributes?

Consider the following left-recursive grammar for multiplication of numbers and the parse tree on the right for 3*5*4.

It is easy to see how the values can be propagated up the tree and the expression evaluated.

When doing top-down parsing, we need to avoid left recursion. Consider the grammar below, which is the result of removing the left recursion, and again its parse tree is shown on the right. Try not to look at the semantic rules for the moment. 3*5*4

Production	Semantic Rules	Type

T → F T'	T'.lval = F.val	Inherited
T → F T'	T.val = T'.tval	Synthesized

T' → * F T₁'	T'₁.lval = T'.lval * F.val	Inherited
T' → * F T₁'	T'.tval = T'₁.tval	Synthesized

T' → ε	T'.tval = T'.lval	Synthesized

F → num	F.val = num.lexval	Synthesized

Now where on the tree should we do the multiplication 3*5? There is no node that has 3 and * and 5 as children. The second production is the one with the * so that is the natural candidate for the multiplication site. Make sure you see that this production (for 3*5) is associated with the blue highlighted node in the parse tree. The right operand (5) can be obtained from the F that is the middle child of this T'. F gets the value from its child, the number itself; this is an example of the simple synthesized case we have already seen, F.val=num.lexval (see the last semantic rule in the table).

But where is the left operand? It is located at the sibling of T' in the parse tree, i.e., at the F immediately to T's left. This F is not mentioned in the production associated with the T' node we are examining. So, how does T' get F.val from its sibling? The common parent, in this case T, can get the value from F and then our node can inherit the value from its parent.
Bingo! ... an inherited attribute. This can be accomplished by having the following two rules at the node T.
T.tmp = F.val
T'.lval = T.tmp

Since we have no other use for T.tmp, we combine the above two rules into the first rule in the table.

Now lets look at the second multiplication (3*5)*4, where the parent of T' is another T'. (This is the normal case. When there are n multiplies, n-1 have T' as parent and only one has T).

The red-highlighted T' is the site for the multiplication. However, it needs as left operand, the product 3*5 that its parent can calculate. So we have the parent (another T' node, the blue one in this case) calculate the product and store it as an attribute of its right child namely the red T'. That is the first rule for T' in the table.

We have now explained the first, third, and last semantic rules. These are enough to calculate the answer. Indeed, if we trace it through, 60 does get evaluated and stored in the bottom right T', the one associated with the ε-production. Our remaining goal is to get the value up to the root where it represents the evaluation of this term T and can be combined with other terms to get the value of a larger expression.

Going up is easy, just synthesize. I named the attribute tval, for term-value. It is generated at the ε-production from the lval attribute (which at this node is not a good name) and propagated back up. At the T node it is called simply val. At the right we see the annotated parse tree for this input.

Homework: Extend this SDD to handle the left-recursive, more complete expression evaluator given earlier in this section. Don't forget to eliminate the left recursion first.

Another question is how does the system figure out the evaluation order if one exists? That is the subject of the next section.

Remark: Consider the identifier table. The lexer creates it initially, but as the compiler performs semantic analysis and discover more information about various identifiers, e.g., type and visibility information, the table is updated. One could think of this is some inherited/synthesized attribute pair that during each phase of analysis is pushed down and back up the tree. However, it is not implemented this way; the table is made a global data structure that is simply updated. The the compiler writer must ensure manually that the updates are performed in an order respecting any dependences.

5.2: Evaluation Orders for SDD's

5.2.1: Dependency Graphs

The diagram on the right illustrates a great deal. The black shows the parse tree for the multiplication grammar just studied when applied to a single multiplication, e.g. 3*5. The synthesized attributes are shown in green and are written to the right of the grammar symbol at the node where they are defined. The inherited attributes are shown in red and are written to the left of the grammar symbol where it is defined.

Each green arrow points to the attribute calculated from the attribute at the tail of the arrow. These arrows either go up the tree one level or stay at a node. That is because a synthesized attribute can depend only on the node where it is defined and that node's children. The computation of the attribute is associated with the production at the node at its arrowhead. In this example, each synthesized attribute depends on only one other, but that is not required.

Each red arrow also points to the attribute calculated from the attribute at the tail. Note that two red arrows point to the same attribute. This indicates that the common attribute at the arrowheads, depends on both attributes at the tails. According to the rules for inherited attributes, these arrows either go down the tree one level, go from a node to a sibling, or stay within a node. The computation of the attribute is associated with the production at the parent of the node at the arrowhead.

5.2.2: Ordering the Evaluation of Attributes

The graph just drawn is called the dependency graph. In addition to being generally useful in recording the relations between attributes, it shows the evaluation order(s) that can be used. Since the attribute at the head of an arrow depends on the on the one at the tail, we must evaluate the head attribute after evaluating the tail attribute.

Thus what we need is to find an evaluation order respecting the arrows. This is called a topological sort. The rule is that the needed ordering can be found if and only if there are no (directed) cycles. The algorithm is simple.

If the algorithm succeeds in deleting all the nodes, then the deletion order is a suitable evaluation order and there were no directed cycles.

Homework: The topological sort algorithm is nondeterministic (Choose a node) and hence there can be many topological sort orders. Find all the orders for the diagram above (you should label the nodes so you can describe the orders).

5.2.3: S-Attributed Definitions

Given an SDD and a parse tree, it is easy to tell (by doing a topological sort) whether a suitable evaluation exists (and to find one).

However, a very difficult problem is, given an SDD, are there any parse trees with cycles in their dependency graphs, i.e., are there suitable evaluation orders for all parse trees. Fortunately, there are classes of SDDs for which a suitable evaluation order is guaranteed.

As mentioned above an SDD is S-attributed if every attribute is synthesized. For these SDDs all attributes are calculated from attribute values at the children since the other possibility, the tail attribute is at the same node, is impossible since the tail attribute must be inherited for such arrows. Thus no cycles are possible and the attributes can be evaluated by a postorder traversal of the parse tree.

Since postorder corresponds to the actions of an LR parser when reducing the body of a production to its head, it is often convenient to evaluate synthesized attributes during an LR parse.

5.2.4 L-Attributed Definitions

Unfortunately, it is hard to live without inherited attributes. So we define a class that permits certain kinds of inherited attributes. l-attributed

Definition: An SDD is L-Attributed if each attribute is either

Synthesized.
Inherited from the left, and hence the name L-attributed.
If the production is A → X₁X₂...X_n, then the inherited attributes for X_j can depend only on
1. Inherited attributes of A, the LHS.
2. Any attribute of X₁, ..., X_j-1, i.e. only on symbols to the left of X_j.
Attributes of X_j, *BUT* you must guarantee (separately) that these attributes do not by themselves cause a cycle.

Case three must be handled specially whenever it occurs. The top picture to the right illustrates what the first two cases look like and suggest why there cannot be any cycles. The picture below it corresponds to a fictitious R-attributed definition. One reason L-attributed definitions are favored over R, is the left to right ordering in English. See the example below on type declarations and also consider the grammars that result from left recursion.

Evaluating L-Attributed Definitions

The picture shows that there is an evaluation order for L-attributed definitions (again assuming no case 3). More formally, do a depth first traversal of the tree. The first time you visit a node, evaluate its inherited attributes (since you will know the value of everything it depends on), and the last time you visit it, evaluate the synthesized attributes. This is two-thirds of an Euler-tour traversal.

Homework: Suppose we have a production A → B C D. Each of the four nonterminals has two attributes s, which is synthesized, and i, which is inherited. For each set of rules below, tell whether the rules are consistent with (i) an S-attributed definition, (ii) an L-attributed definition, (iii) any evaluation order at all.

5.2.5: Semantic Rules with Controlled Side Effects

When we have side effects such as printing or adding an entry to a table we must ensure that we have not added a constraint to the evaluation order that causes a cycle.

For example, the left-recursive SDD shown in the table on the right propagates type information from a declaration to entries in an identifier table.

The function addType adds the type information in the second argument to the identifier table entry specified in the first argument. Note that the side effect, adding the type info to the table, does not affect the evaluation order.

Draw the dependency graph on the board. Note that the terminal ID has an attribute (given by the lexer) entry that gives its entry in the identifier table. The nonterminal L has (in addition to L.type) a dummy synthesized attribute, say AddType, that is a place holder for the addType() routine. AddType depends on the arguments of addType(). Since the first argument is from a child, and the second is an inherited attribute of this node, we have legal dependences for a synthesized attribute.

Production	Semantic Rule	Type

D → T L	L.type = T.type	inherited
T → INT	T.type = integer	synthesized

L → L₁ , ID	L₁.type = L.type	inherited
L → L₁ , ID	addType(ID.entry,L.type)	synthesized, side effect

L → ID	addType(ID.entry,L.type)	synthesized, side effect