Recursive Descent Parser for Natural Language

Input: A sentence Q and a grammar G.

Assume:

G is in Chomsky normal form; that is, all the rules have the simple form T -> S1 S2 ... Sk, where the right hand side has at least one symbol
There is no unit cycle; that is, no cycle of rules T1 -> T2, T2 -> T3 ... Tk -> T1 where every rule has only one symbol on the right hand side.

The lexicon is just a list for each word of its possible parts of speech.

Algorithm

rdParser(in S: sentence; G: grammar; L: lexicon;
         out P: set of parse trees)
{ var T: partially complete tree;

  T = a tree with one node marked "S";
  P = null set;
  rdParser1(S,G,L,T,P)
}

rdParser1(in S: sentence; G: grammar; L: lexicon, T: partial parse tree;
          out P: set of parse trees)
{ /* Base cases of the recursion */
   if  (every leaf of T is connected to a word in S) 
      /* Note: If it reaches here, then every word in S is connected to T */
    then  { add copy(T) to P;
            return; 
          }
   if (every leaf of T is connected to a word of S, but there are words in
         S not connected to T)
      then return;
   if (there are more leaves in T than words in S) then return;

   /* Recursive case */

  N = the leftmost leaf of T not connected to a word in S;
   if  (N is a part of speech (terminal symbol))
     then { W = the leftmost unattached word in S;
            if (L marks N as a possible p.o.s for W)
              then { attach W to N in T);
                     rdParser1(S,G,L,T,P);
                   }
              else return; 
          }
     else  /* N is a non-terminal */
       for each  rule N -> S1 ... Sk in G
         { make S1 ... Sk the children of N in T  
           rdParser1(S,G,L,T,P)
         }
}

Example

Grammar:
S -> NP VP
NP -> Name
NP -> Noun
VP -> VG
VP -> VG NP
VG -> Verb
VG -> Modal Verb

Lexicon:
ate -- Verb
can -- Modal, Verb, Noun
fish -- Noun, Verb
John -- Name.

Sentence: John ate fish.

Trace of execution.

The trace shows the structure of the recursive calls, marked by the value of T.