G22.2591  Advanced Natural Language Processing  Spring 2004
Assignment #2
Due 8 am Monday February 9
Manning and Schutze provide a very simple example of an HMM, the "crazy
soda machine", described on p. 321. Rather than give you the
drink you want, it outputs drinks randomly. It can dispense cola,
iced tea, and lemonade. To complicate matters a bit more, it has
two states, the cola preferring state (CP) and the iced tea preferring
state (IP). It switches randomly between these two states, and
provides no external indication of what state it is in, only an
indirect indication based on what drink comes out. That's why it
has to be modeled by an HMM.
It always starts out in state CP. From state CP, the transition
probability to state IP is 0.30; the probability of remaining in
CP is 0.70. From state IP, it has a probability of 0.50 of
staying in IP, and 0.50 of changing to state CP. The drink
emission probabilities are as follows:

cola

iced tea

lemonade

CP

0.6

0.1

0.3

IP

0.1

0.7

0.2

We want to see how we can learn
HMM parameters for the soda machine, using the forwardbackward
algorithm (we assume we are told that it is a twostate machine). We
will see that the learning process may be trickier than it seems from
the textbook. Holgar Wunsch (from Tubingen) has done the
programming work for us; his Java code is available as part of the
NLP resources page for Tubingen. Your job is to run his code
and explain some of the observations.
Wunsch's code includes four Java files:
 HMM.java:
the general training (Baum Welch) and decoding (Viterbi)
procedures for HMMs
 soda.java:
initializes the HMM for the crazy soda machine using the parameters
given above, and then simulates the machine, writing on file crazysoda.seq
a sequence of 0's, 1's, and 2's representing a sequence of cola, iced
tea, and lemonade being dispensed.
 bw.java:
invokes the training procedure for the HMM, starting from a uniform
distribution and using crazysoda.seq
as input
 bw1.java:
invokes the training procedure for the HMM, starting from the actual
parameters from the soda machine and using crazysoda.seq as input
(the versions here have comments and output messages translated into
English; if you want the original German, go to the Tubingen web
site). The main methods for bw and bw1 take a single integer
parameter, the number of iterations of BaumWelch to be
performed; the main method for soda takes a single integer
parameter, the length of the output sequence to be generated.
1. Consider the minature training sequence "lemonade icedtea
cola". What are the values of the parameters learned by
BaumWelch starting from a uniform distribution? From the actual
parameters? (You should run the procedure long enough that the
parameters are not changing much.) What is the probability of the
sequence for these parameters? What do we learn from this
experiment?
2. Perform the same experiment with the 100item sequence
provided in crazysoda.seq.
What results do we get? How do they compare with the predicted
value?
3. Note that the parameter pi(1), which is initially 0, stays
zero. Is that a chance occurrence? If we run the minature
training sequence for one iteration, b(1,2) becomes zero. Does it
change on further reestimation? Make some generalization about
BW reestimation of zero parameters.
A warning: this simple HMM code operates in probability space,
not in log probability space, so you might have difficulty if you
generate and test long input sequences.
People who are preparing a presentation for Feb. 9 do not have to do
these assignments.