Proggramming Languages assignment I

Solution to Assignment I




1. Consider the grammar:

         S -> aS | aSbS | epsilon

where S is the only non-terminal, and epsilon is the null string.

a) Show that the grammar is ambiguous, by giving two parse trees for the
string aab.

b) Show (by induction, e.g.) that this grammar generates strings such that in
any prefix of the string there are at least as many a's as b's. (Optional:
show that all such strings are generated by this grammar).

c) Find an unambiguous grammar that generates these strings.

Solution.
---------

a) The ambiguity is easy to show: you can derive the string aab as follows:
(at every step we expand the leftmost non-terminal);

S  -> aSbS -> aaSbS -> aabS -> aab
S  -> aS   -> aaSbS -> aabS -> aab

These two parses correspond to associating the b with the first or the
second a. This is somewhat analogous to the problem of the dangling else,
where the last else may be associated with the previous then or with an
earlier one. 

b) The proof that all prefixes have at least as many a's as b's follows from
the given productions. It is clearly true if the length of the string is
1 (it can only be a single a) or 2 (aa or ab). Assume that it is true for
all strings of length up to n. Then, using the given productions, we can
construct strings of length n+1. and longer, for which the property clearly
holds because we have added an a to any previous prefix. Therefore the
property holds for all n.

c) We can disambiguate by using a similar approach to the dangling else, and
decide that each b should be associated with the nearest a. This means that
the expansion within an ab pair should always be balanced. This leads to
the following grammar:

S -> a S | S1 S | epsilon
S1 -> a S1 S1 b | epsilon

It is easy to verify that this generates the same strings as the original
grammar, and the parse tree is always unique, because one b is always associated
with the most recent a.

Note that the answer is not necessarily unique. If the grammar is ambiguous,
it means that we get to choose between possible parses, and each choice is
in a sense a different language. For example, given the ambiguous grammar
for expressions:

  E -> E + E | E * E | id

We say that the unambiguous grammar we want is:

  E -> E + T | T,  T -> T * T | id

because it gives us the proper precedence between the two operators. But that
choice is in no way mandated by the grammar. We could just as well choose:

  E -> E * T | T , T -> T + T | id

which generates the same strings, but gives the opposite precedence to
operators.

2. Consider a simple language with the following non-terminals:

stat-seq   : a sequence of statements
if-stat    : if-statement, both branches have a sequence of statements
while-stat : a loop construct, whose body has a sequence of statements
func-def   : a function definition, whose body is a sequence of statements.
ret-stat   : a return statement
any-stat   : any of the above, assignment, goto, etc.,

Most languages have the semantic requirement that a function body have at least
one return statement somewhere, possibly nested in a loop or a conditional
statement.

For the non-terminals above, write the productions that enforce this rule,
namely that a function definition contain  at least one return statement.
You will want to introduce additional non-terminals.

Comment: this kind of semantic restriction can be expressed grammatically, but
as you will see, it makes the grammar larger and harder to understand, In
practice, this kind of restriction is better stated in prose, and such a rule
is not enforced by the parser of a compiler, but by a separate semantic check.
 
Solution.
---------

We start with the following grammar:

     stat-seq   -> {statement}*
     if-stat    -> IF expr THEN stat-seq ELSE stat-seq END
     while-stat -> WHILE expr DO stat-seq END
     ret-stat   -> RETURN expr
     any-stat   -> any statement, including assignments, etc.

We now introduce a new non-terminal, which is a sequence
of statements that must have  at least one return statement:

     ret-stat-seq -> stat-with-return stat-seq |
                     any-stat ret-stat-seq

   Now a statement with a return is one of the following

     stat-with-return -> ret-stat | ret-if-stat | ret-while-stat

   A ret-if-stat is an if statement that has at least one return

     ret-if-stat -> IF expr THEN ret-stat-seq ELSE stat-seq END  |
                    IF expr THEN stat-seq     ELSE ret-stat-seq END

   And similarly, a ret-while-stat is a while with at least one return

     ret-while-stat -> WHILE expr DO ret-stat-seq END

Now a function is defined as having a ret-stat-seq instead
of a stat-seq, and we know there is at least one return somewhere in
its body.