## Parsing algorithms

Note: All these routines suppose that the grammar G is given in a set of CFG rules of the form "L -> RHS", where RHS is a non-empty string of symbols.

### Algorithm 1: Non-deterministic algorithm for parsing CFGs top-down.

```parse(in Q : sentence; G : set of CFG rules;
out T : parse tree)

begin initialize T to be a tree with root "S";
loop while there are non-terminal leaves in T
pick a non-terminal leaf L of T;
choose a rule in G of the form "L -> RHS"
if RHS is a word W
then choose an occurrence O of W in Q;
associate O with L;
if the order of leaves in T violates the order of
associated words in Q then fail endif;
else make children for L corresponding to RHS
endif
endloop
if not all words in Q are attached then fail;
end
```

### Algorithm 2: Deterministic algorithm for parsing CFG's top-down, left to right, using DFS.

```parse(in Q : sentence; G : set of CFG rules;
out T_OUT : parse tree)

begin initialize T_IN to be a tree with root "S";
parse1(Q,G,T_IN,T_OUT)
end

/* parse1 is a recursive routine that does the work. Note: each recursion
corresponds to one more step in building the tree. You are not recurring
down the parse tree.

parse1 returns TRUE if T_IN can be expanded into a complete parse tree T_OUTi
and FALSE if it cannot.

Note: The modifications to T_IN must be done non-destructively. That is,
the value of T_IN in a calling rountine must not be changed by its subroutines.
*/

parse1(in Q : sentence; G : set of CFG rules; T_IN : parse tree;
out T_OUT : parse tree) :  boolean;

begin if all the leaves in T_IN are terminals
then if words in Q are all attached to T_IN
then begin T_OUT:= T_IN;
return(TRUE)
end endif
else return(FALSE) /* there are words in Q unaccounted for */
endif
L := leftmost non-terminal in T_IN;
if L is a part-of-speech
then W := first word in Q not attached to T_IN;
if L is a possible part of speech for W
then begin make W a child of L;
return(parse1(Q,G,T_IN,T_OUT))
end
else return(FALSE) endif
else for each rule in G of form "L -> RHS"
create chidren for L with non-terminals RHS;
FOUND := parse1(Q,G,T_IN,T_OUT);
if FOUND then exitloop
endfor
return(FOUND)
endif

end
```

### Algorithm 3: Non-deterministic algorithm for parsing CFGs bottom-up.

```parse(in Q : sentence; G : set of CFG rules;
out T : forest of parse tree)
/* If the routine succeeds then at the end T will be a single parse tree */

begin initialize T to be the forest of trees where each tree contains one
word in Q;
repeat choose a sequence RHS of consecutive roots of trees in T;
find a rule of form "L -> RHS" in G;
make L the parent of RHS in T;
until T is a single tree;
end.
```

### Algorithm 4: Deterministic algorithm for parsing CFGs bottom-up, left-right using DFS.

```parse(in Q : sentence; G : set of CFG rules;
out T : parse tree)

begin initialize T to be the forest of trees where each tree contains one
word in Q;
Q1 := the list of the first word of Q;
parse1(Q,G,Q1,T_IN,T_OUT)
end

/* parse1 return TRUE if T_IN can be grown into a complete parse tree T_OUT
and FALSE if it cannot.

In each recursive call it either builds a node whose rightmost descendant
is the last word in Q1
or it extends Q1 by one more word in Q. This enables the state space to
be tree-structured (systematic).
*/

parse1(in Q : sentence; G : set of CFG rules; Q1 : list of words;
T_IN : forest of parse trees;
out T_OUT : parse tree) :  boolean;

begin if T_IN is a single tree containing all the words of Q
then begin T_OUT := T_IN;
return(TRUE)
end
FOUND := FALSE;
RL := all the roots of the trees in T_IN;
R := the last element of RL;
if R is a word of Q
then repeat for each possible part of speech P for R do
make a parent of M labelled R;
until(FOUND := parse1(Q,G,Q1,T_IN,T_OUT))
else for M in RL going from right to left do
RHS := the end of RL from M to R;
repeat for each rule of form "L -> RHS" in G do
make a parent of RHS labelled L
until(FOUND := parse1(Q,G,Q1,T_IN,T_OUT))
endfor