G22.2590 - Natural Language Processing - Spring 2005 Prof. Grishman
Lecture 7 Outline
March 21, 2005
Discuss homework (assignment 4):
- calculating probabilities for animal
HMM: probability of generating a sentence is the sum of the
probabilities of the different individual paths which generate that
- calculating probabilities for POS HMM: it is important to take
into account 3 probabilities when comparing the assignment of two
different POS to a particular word: the probability of the
transition in to the state, the probability of emission, and the
probability of transition out of that state
Note minor update to ChunkPattern.txt file for Assignment 6.
Term project: discuss possibilities.
Proposal due by email, April 4th.
Our noun/verb group tagger
('chunker') won't be perfect; like our POS tagger, we need to
evaluate it by hand-annotating a substantial corpus and then comparing
the results of our chunker against this standard.
What metric should we use? We can map a chunk annotation into an
assignment of a tag to each word. For example, if we are tagging
just noun groups, we can tag the first word of a noun group as B-NG,
subsequent words of a noun group as I-NG, and words not in a noun group
as O. Then we can measure the accuracy of our tagging (correct
tags / total number of words). This can be readily extended to
other chunk types, or multiple chunk types.
However, accuracy can be a deceptive measure for less frequent
annotations. If we are scoring verb group annotations, and 80% of
words are not part of a verb group (i.e., get tag O), then a tagger
which can't find any verb groups at all would get an 80% accuracy score.
So for annotations which can span one or more words we typically use
recall and precision measures. In comparing the key and system
response, we count the total number of annotations in the key, the
total number in the response, and the number of correct annotations
(annotations in the response which exactly match one in the key (same
starting and ending word)). We then compute (J&M p. 578)
recall = number correct / number in key
precision = number correct / number in response
F = geometric mean of recall and precision = 1 / (1 / recall + 1
To assist in evaluating chunkers and other similar annotators, Jet
provides the capability to annotate entire
documents and the ability to score the output
for precision and recall.
Named Entity Tagging
As we have noted before, identifying names is an important part of many
natural language processing applications. Names are very common
in most types of text, and -- unlike general vocabulary -- cannot be
looked up in a dictionary.
The simple ChunkPatterns set
for Jet treats any sequence of capitalized words as a name. This
is very crude ... it doesn't handle names at the beginning of sentences
or names with some lower-case words, such as "University of
Pennsylvania" or "City of New York". It doesn't work for
headlines, and such a strategy would not work for many languages with
no case information (Chinese, Japanese) or where other nouns are
capitalized (German). Furthermore, it doesn't classify names
(people vs. companies, for example), although that is essential for
almost any real application.
Fortunately, name identification has become a widely-studied task over
the last decade, so there are now many corpora annotated with name
information, in many languages. The 'standard set', introduced at
Understanding Conference - 6 in 1995, recognizes three types of
names -- people, organizations, and locations -- as well as four other
types of expressions -- dates, times, percentages, and monetary
amounts. These corpora have been used to develop both detailed
hand-coded rules and statistical models.
Some names are simply memorized -- for example, the names of well known
companies (IBM, Ford). Other names can be identified and
classified based on both internal and external evidence. Examples
of internal evidence are common first names ("Fred Kumquat") or
corporate suffixes ("Blightly
Associates", "Zippo Corp."); examples of external evidence are
titles ("President Huber") and verbs which take human subjects ("Zenca
died"). Such evidence can be used by both hand-coded and
Many different statistical models have been used for named entity
tagging; HMMs were one of the first and remain one of the most
popular (see Nymble: a
High-Performance Learning Name-finder and the Advanced
NLP notes on NE). The simplest HMM has one state for each
type of name,
plus one state for "other". However, such a model does not
capture any context information. To include context information
and a bit of internal structure, Jet uses a more elaborate HMM, with 6
states for each name type. Other HMMs for name recognition
condition the transition and emission probabilities on the prior word.
The name tagger for Jet is run with the command "tagNames". The parameter "NameTags.fileName" specifies
the HMM to be used for name tagging. Jet includes the "MUCnameHMM.txt" file, an HMM
trained on the MUC-7 name corpus. The tagger produces annotations
of type ENAMEX for names
and type TIMEX for dates
Sentence Level Patterns and Semantic Grammars (J&M 15.5)
We can couple together the annotators we have considered so far ... a
POS tagger, a name annotator, and a chunk annotator ... to give us
information on the low-level constituents of a sentence, quickly and
with reasonable accuracy. Can we keep goiing now and write
patterns for larger constituents, such as noun phrases and sentences?
Unfortunately, for larger constituents the problem of ambiguity becomes
much greater, as we have discussed before. Syntactic patterns are
not sufficient to produce accurate, unambiguous analyses. We need
to include semantic constraints, and maybe more.
Capturing semantic constraints in general is a difficult problem.
However, if we focus on a narrow domain, such as weather reports, car
repair reports, (specific types of ) medical reports, or some types of
financial reports, the problem gets easier. For such a
sublanguage, we can identify semantic
word classes: form classes of nouns (the types of
'entities' in the domain) and classify verbs in terms of the types of
nouns they take as arguments. The semantic co-occurrence constraints
(selectional constraints) can then be captured either
We will initially (for our Jet implementation) use semantic grammar
patterns. Semantic grammars
- by a separate grammar component which checks these constraints
whenever the parses completes a clause constituent, or
- by a context-free grammar stated in terms of these semantic word
classes -- a semantic grammar
(J&M p. 575-577).
- provide a simple approach to limited sublanguages
(because they capture both syntactic and semantic constraints in a
- in particular, they are convenient for constructs which fall outside
general language syntax (constructs which appear only with specific
- but they lose power of syntactic generalization … each semantic
must appear in each of its syntactic forms (active, passive, question,
…), and so are cumbersome for broad-coverage systems.