Assignment III

Due date Oct. 21.

A context-free grammar is said to be in Chomsky normal form if all its productions are of the form:

A => BC, where A, B and C are non-terminal symbols

A => a where a is a terminal symbol.

1. The first part of this assignment is to implement an algorithm that transforms any grammar into Chomsky normal form (CNF). The transformation is performed in several steps:

a) Epsilon productions are removed: given a production of the form: A => epsilon, for every production in which A appears in the right-hand side, we add a production from which A has been removed, and delete the epsilon production.

b) Single productions are removed: given a production of the form: A => B, replace of occurrences of A in the right-hand side of productions with B, and remove the single production.

c) New non-terminals are introduced. If a production has the form A => aBCD.. introduce a new non-terminal and the production A' => A, and rewrite the production as A => A'BCD

d) New non-terminals are introduced for right-hand sides that have more than 2 symbols. For example, A => BCD becomes A => BB', B' => CD.

2. The CNF of an ambiguous grammar can be used to produce all the parses of a given sentence, using a dynamic programming algorithm known as the Cocke- Katami-Younger parsing method. Given a sentence of length N, the CKY algorithm builds a lower triangular N by N matrix X, in which X(i, j) is the set of symbols that can generate the string betwen the i'th and the j'th symbols. Each X (i, j) is computed by examining all pairs of entries X(i, k) and X (k+1, j). If symbol A1 is in X (i, k) and symbol B1 is in X (k+1. j), and if there is production C => A1 B1, then C belongs in X (i, j). If the language is ambiguous, each such entry will end up containing more that one symbol. The possible parses of the sentences can be read once X (1, N) has been computed. Part two of the assignment is the implementation of this algorithm. As a test case you can use the ambiguous grammar for expressions, E => E + E | E * E | id | (E)