G22.2591 - Advanced Natural Language Processing - Spring 2004

Assignment #2

Due 8 am Monday February 9

Manning and Schutze provide a very simple example of an HMM, the "crazy soda machine", described on p. 321.  Rather than give you the drink you want, it outputs drinks randomly.  It can dispense cola, iced tea, and lemonade.   To complicate matters a bit more, it has two states, the cola preferring state (CP) and the iced tea preferring state (IP).  It switches randomly between these two states, and provides no external indication of what state it is in, only an indirect indication based on what drink comes out.  That's why it has to be modeled by an HMM.

It always starts out in state CP.  From state CP, the transition probability to state IP is 0.30;  the probability of remaining in CP is 0.70.  From state IP, it has a probability of 0.50 of staying in IP, and 0.50 of changing to state CP.  The drink emission probabilities are as follows:


cola
iced tea
lemonade
CP
0.6
0.1
0.3
IP
0.1
0.7
0.2

We want to see how we can learn HMM parameters for the soda machine, using the forward-backward algorithm (we assume we are told that it is a two-state machine). We will see that the learning process may be trickier than it seems from the textbook.  Holgar Wunsch (from Tubingen) has done the programming work for us; his Java code is available as part of the NLP resources page for Tubingen.  Your job is to run his code and explain some of the observations.

Wunsch's code includes four Java files:
(the versions here have comments and output messages translated into English;  if you want the original German, go to the Tubingen web site).  The main methods for bw and bw1 take a single integer parameter, the number of iterations of Baum-Welch to be performed;  the main method for soda takes a single integer parameter, the length of the output sequence to be generated.

1.  Consider the minature training sequence "lemonade iced-tea cola".  What are the values of the parameters learned by Baum-Welch starting from a uniform distribution?  From the actual parameters?  (You should run the procedure long enough that the parameters are not changing much.)  What is the probability of the sequence for these parameters?  What do we learn from this experiment?

2.  Perform the same experiment with the 100-item sequence provided in crazysoda.seq.  What results do we get?  How do they compare with the predicted value?

3.  Note that the parameter pi(1), which is initially 0, stays zero.  Is that a chance occurrence?  If we run the minature training sequence for one iteration, b(1,2) becomes zero.  Does it change on further re-estimation?  Make some generalization about B-W re-estimation of zero parameters.

A warning:  this simple HMM code operates in probability space, not in log probability space, so you might have difficulty if you generate and test long input sequences.

People who are preparing a presentation for Feb. 9 do not have to do these assignments.