Compilers

Start Lecture #9

5.2.5: Semantic Rules with Controlled Side Effects

Production	Semantic Rule	Type

D → T L	L.type = T.type	inherited

T → INT	T.type = integer	synthesized

T → FLOAT	T.type = float	synthesized

L → L₁ , ID	L₁.type = L.type	inherited
L → L₁ , ID	addType(ID.entry,L.type)	synthesized, side effect

L → ID	addType(ID.entry,L.type)	synthesized, side effect

When we have side effects such as printing or adding an entry to a table we must ensure that we have not added a constraint to the evaluation order that causes a cycle.

For example, the left-recursive SDD shown in the table on the right propagates type information from a declaration to entries in an identifier table.

The function addType adds the type information in the second argument to the identifier table entry specified in the first argument. Note that the side effect, adding the type info to the table, does not affect the evaluation order.

Draw the dependency graph on the board. Note that the terminal ID has an attribute (given by the lexer) entry that gives its entry in the identifier table. The nonterminal L has (in addition to L.type) a dummy synthesized attribute, say AddType, that is a place holder for the addType() routine. AddType depends on the arguments of addType(). Since the first argument is from a child, and the second is an inherited attribute of this node, we have legal dependences for a synthesized attribute.

Note that we have an L-attributed definition.

Homework: For the SDD above, give the annotated parse tree for

    INT a,b,c

5.3: Applications of Syntax-Directed Translations

5.3.1: Construction of Syntax Trees

Production	Semantic Rules

E → E ₁ + T	E.node = new Node('+',E₁.node,T.node)
E → E ₁ - T	E.node = new Node('-',E₁.node,T.node)
E → T	E.node = T.node
T → ( E )	T.node = E.node
T → ID	T.node = new Leaf(ID,ID.entry)
T → NUM	T.node = new Leaf(NUM,NUM.val)

Recall that in syntax tree (technically an abstract syntax tree) has just the essentials. For example 7+3*5, would have one + node, one *, and the three numbers. Lets see how to construct the syntax tree from an SDD.

Assume we have two functions Leaf(op,val) and Node(op,c1,...,cn), that create leaves and interior nodes respectively of the syntax tree. Leaf is called for terminals. Op is the label of the node (op for operation) and val is the lexical value of the token. Node is called for nonterminals and the ci's refer (are pointers) to the children.

Production	Semantic Rules	Type

E → T E'	E.node=E'.syn	Synthesized
E → T E'	E'node=T.node	Inherited

E' → + T E'₁	E'₁.node=new Node('+',E'.node,T.node)	Inherited
E' → + T E'₁	E'.syn=E'₁.syn	Synthesized

E' → - T E'₁	E'₁.node=new Node('-',E'.node,T.node)	Inherited
E' → - T E'₁	E'.syn=E'₁.syn	Synthesized

E' → ε	E'.syn=E'.node	Synthesized
T → ( E )	T.node=E.node	Synthesized
T → ID	T.node=new Leaf(ID,ID.entry)	Synthesized
T → NUM	T.node=new Leaf(NUM,NUM.val)	Synthesized

The upper table on the right shows a left-recursive grammar that is S-attributed (so all attributes are synthesized).

Try this for x-2+y and see that we get the syntax tree.

When we eliminate the left recursion, we get the lower table on the right. It is a good illustration of dependencies. Follow it through and see that you get the same syntax tree as for the left-recursive version.

Remarks:

You probably did/are-doing/will-do some variant of new Node and new Leaf for lab 3. When processing a production
1. Create a parse tree node for the LHS.
2. Call subroutines for RHS symbols and connect the resulting nodes to the node created in i.
3. Return a reference to the new node so the parent can hook it into the parse tree.
It is the lack of a call to new in the third and fourth productions that causes the (abstract) syntax tree to be produced rather than the parse (concrete syntax) tree.
Production compilers do not produce a parse trees; rather they produce syntax trees. The syntax tree is smaller, and hence more (space and time) efficient for subsequent passes that walk the tree. The parse tree is (I believe) slightly easier to construct as you don't have to decide which nodes to produce; you simply produce them all.

5.3.2: The structure of a Type

This course emphasizes top-down parsing (at least for the labs) and hence we must eliminate left recursion. The resulting grammars often need inherited attributes, since operations and operands are in different productions. But sometimes the language itself demands inherited attributes. Consider two ways to declare a 3x4, two-dimensional array. tree rep for arrays

    array [3] of array [4] of int    and     int[3][4]

Assume that we want to produce a tree structure like the one the right for either of the array declarations on the left. The tree structure is generated by calling a function array(num,type). Our job is to create an SDD so that the function gets called with the correct arguments.

For the first language representation of arrays (found in Ada and in lab 3), it is easy to generate an S-attributed (non-left-recursive) grammar based on
A → ARRAY [ NUM ] OF A | INT | REAL
This is shown in the table on the left.

Production	Semantic Rules	Type

T → B C	T.t=C.t	Synthesized
T → B C	C.b=B.t	Inherited

B → INT	B.t=integer	Synthesized
B → REAL	B.t=real	Synthesized

C → [ NUM ] C₁	C.t=array(NUM.val,C₁.t)	Synthesized
C → [ NUM ] C₁	C₁.b=C.b	Inherited

C → ε	C.t=C.b	Synthesized

Production	Semantic Rule

A → ARRAY [ NUM ] OF A₁	A.t=array(NUM.val,A₁.t)
A → INT	A.t=integer
A → REAL	A.t=real

On the board draw the parse tree and see that simple synthesized attributes above suffice.

For the second language representation of arrays (the C-style), we need some smarts (and some inherited attributes) to move the int all the way to the right. Fortunately, the result, shown in the table on the right, is L-attributed and therefore all is well.

Homework: 1.

5.4: Syntax-Directed Translation Schemes (SDTs)

Basically skipped.

The idea is that instead of the SDD approach, which requires that we build a parse tree and then perform the semantic rules in an order determined by the dependency graph, we can attach semantic actions to the grammar (as in chapter 2) and perform these actions during parsing, thus saving the construction of the parse tree.

But except for very simple languages, the tree cannot be eliminated. Modern commercial quality compilers all make multiple passes over the tree, which is actually the syntax tree (technically, the abstract syntax tree) rather than the parse tree (the concrete syntax tree).

5.4.1: Postfix Translation Schemes

If parsing is done bottom up and the SDD is S-attributed, one can generate an SDT with the actions at the end (hence, postfix). In this case the action is perform at the same time as the RHS is reduced to the LHS.

5.4.2: Parser-Stack Implementation of Postfix SDTs

Skipped.

5.4.3: SDTs with Actions Inside Productions

Skipped

5.4.4: Eliminating Left Recursion from SDTs

Skipped

5.4.5: SDTs For L-Attributed Definitions

Skipped

5.5: Implementing L-Attributed SDD's

A good summary of the available techniques.

Build the parse tree and annotate. Works as long as no cycles are present (guaranteed by L- or S-attributed).
the parse tree, add actions, and execute the actions in preorder. Works for any L-attributed definition. Can add actions based on the semantic rules of the SDD. (Since actions are leaves of the tree, I don't see why preorder is relevant).
Translate During Recursive Descent Parsing. See below.
Generate Code on the Fly. Also uses recursive descent, but is restrictive.
Implement an SDT during LL-parsing. Skipped.
Implement an SDT during LR-parsing of an LL Language. Skipped.

5.5.1: Translation During Recursive-Descent Parsing

Recall that in recursive-descent parsing there is one procedure for each nonterminal. Assume the SDD is L-attributed. Pass the procedure the inherited attributes it might need (different productions with the same LHS need different attributes). The procedure keeps variables for attributes that will be needed (inherited for nonterminals in the body; synthesized for the head). Call the procedures for the nonterminals. Return all synthesized attributes for this nonterminal.

5.5.2: On-the-fly Code Generation

5.5.3: L-attributed SDDs and LL Parsing

5.5.4: Bottom-Up Parsing of L-Attributed SDDs

Requires an LL (not just LR) language.

What is this all used for?

Assume we have a parse tree as produced, for example, by your lab3. You now want to write the semantics analyzer, or intermediate code generator, and you have these semantic rules or actions that need to be performed. Assume the grammar is L-attributed, so we don't have to worry about dependence loops.

You start to write
analyze (tree-node)
This procedure is basically a big switch statement where the cases correspond to the different productions in the grammar. The tree-node is the LHS of the production and the children are the RHS. So by first switching on the tree-node and then inspecting enough of the children, you can tell the production.

As described in 5.5.1 above, you have received as parameters (in addition to tree-node), the attributes you inherit. You then call yourself recursively, with the tree-node argument set to your leftmost child, then call again using the next child, etc. Each time, you pass to the child the attributes it inherits.

When each child returns, it passes back its synthesized attributes.

After the last child returns, you return to your caller, passing back the synthesized attributes you have calculated.

Variations

Instead of a giant switch, you could have separate routines for each nonterminal as done in the parser and just switch on the productions having this nonterminal as LHS.
You could have separate routines for each production (requires the child tree node to indicate the production it corresponds to, not just the nonterminal, i.e. not just the LHS of the production).
If you like actions instead of rules, perform the actions where indicated in the SDT.
Global variable can be used (with care) instead of parameters.
As illustrated earlier, you can call routines instead of setting an attribute (see addType in 5.2.5).

Chapter 6: Intermediate-Code Generation

Homework: Read Chapter 6.

6.1: Variants of Syntax Trees

6.1.1: Directed Acyclic Graphs for Expressions

The difference between a syntax DAG and a syntax tree is that the former can have undirected cycles. DAGs are useful where there are multiple, identical portions in a given input. The common case of this is for expressions where there often are common subexpressions. For example in the expression
X + a + b + c - X + ( a + b + c )
each individual variable is a common subexpression. But a+b+c is not since the first occurrence has the X already added. This is a real difference when one considers the possibility of overflow or of loss of precision. The easy case is
x + y * z * w - ( q + y * z * w )
where y*z*w is a common subexpression.

It is easy to find such common subexpressions. The constructor Node() above checks if an identical node exists before creating a new one. So Node ('/',left,right) first checks if there is a node with op='/' and children left and right. If so, a reference to that node is returned; if not, a new node is created as before.

Homework: 1.