G22.2590 - Natural Language Processing - Spring 2008 Prof. Grishman

Lecture 7 Outline

March 6, 2008

Evaluating chunkers

Our noun/verb group tagger ('chunker') won't be perfect;  like our POS tagger, we need to evaluate it by hand-annotating a substantial corpus and then comparing the results of our chunker against this standard.

What metric should we use?  As discussed last week, we can map a chunk annotation into an assignment of a tag to each word.  For example, if we are tagging just noun groups, we can tag the first word of a noun group as B-NG, subsequent words of a noun group as I-NG, and words not in a noun group as O.  Then we can measure the accuracy of our tagging (correct tags / total number of words).  This can be readily extended to other chunk types, or multiple chunk types.

However, accuracy can be a deceptive measure for less frequent annotations.  If we are scoring verb group annotations, and 80% of words are not part of a verb group (i.e., get tag O), then a tagger which can't find any verb groups at all would get an 80% accuracy score.

So for annotations which can span one or more words we typically use recall and precision measures.  In comparing the key and system response, we count the total number of annotations in the key, the total number in the response, and the number of correct annotations (annotations in the response which exactly match one in the key (same starting and ending word)).  We then compute (J&M p. 578)
recall = number correct / number in key
precision = number correct / number in response
F = geometric mean of recall and precision  = 1 / (1 / recall + 1 / precision)

To assist in evaluating chunkers and other similar annotators, Jet provides the capability to annotate entire documents and the ability to score the output for precision and recall.

Named Entity Tagging

As we have noted before, identifying names is an important part of many natural language processing applications.  Names are very common in most types of text, and -- unlike general vocabulary -- cannot be looked up in a dictionary.

The simple ChunkPatterns set for Jet treats any sequence of capitalized words as a name.  This is very crude ... it doesn't handle names at the beginning of sentences or names with some lower-case words, such as "University of Pennsylvania" or "City of New York".  It doesn't work for headlines, and such a strategy would not work for many languages with no case information (Chinese, Japanese) or where other nouns are capitalized (German).  Furthermore, it doesn't classify names (people vs. companies, for example), although that is essential for almost any real application.

Fortunately, name identification has become a widely-studied task over the last decade, so there are now many corpora annotated with name information, in many languages.  The 'standard set', introduced at Message Understanding Conference - 6 in 1995, recognizes three types of names -- people, organizations, and locations -- as well as four other types of expressions -- dates, times, percentages, and monetary amounts.  These corpora have been used to develop both detailed hand-coded rules and statistical models.

Some names are simply memorized -- for example, the names of well known companies (IBM, Ford).  Other names can be identified and classified based on both internal and external evidence.  Examples of internal evidence are common first names ("Fred Kumquat") or corporate suffixes ("Blightly Associates", "Zippo Corp.");  examples of external evidence are titles ("President Huber") and verbs which take human subjects ("Zenca died").  Such evidence can be used by both hand-coded and corpus-trained models.

Many different statistical models have been used for named entity tagging;  HMMs were one of the first and remain one of the most popular (see  Nymble: a High-Performance Learning Name-finder and the Advanced NLP notes on NE).  The simplest HMM has one state for each type of name, plus one state for "other".  However, such a model does not capture any context information.  To include context information and a bit of internal structure, Jet uses a more elaborate HMM, with 6 states for each name type.  Other HMMs for name recognition condition the transition and emission probabilities on the prior word.

The name tagger for Jet is run with the command "tagNames".  The parameter "NameTags.fileName" specifies the HMM to be used for name tagging.  Jet includes the "MUCnameHMM.txt" file, an HMM trained on the MUC-7 name corpus.  The tagger produces annotations of type ENAMEX for names and type TIMEX for dates and times.

Sentence Level Patterns and Semantic Grammars (J&M 15.5)

We can couple together the annotators we have considered so far ... a POS tagger, a name annotator, and a chunk annotator ... to give us information on the low-level constituents of a sentence, quickly and with reasonable accuracy.  Can we keep goiing now and write patterns for larger constituents, such as noun phrases and sentences?

Unfortunately, for larger constituents the problem of ambiguity becomes much greater, as we have discussed before.  Syntactic patterns are not sufficient to produce accurate, unambiguous analyses.  We need to include semantic constraints, and maybe more.

Capturing semantic constraints in general is a difficult problem.  However, if we focus on a narrow domain, such as weather reports, car repair reports, (specific types of ) medical reports, or some types of financial reports, the problem gets easier.  For such a sublanguage, we can identify semantic word classes:  form classes of nouns (the types of 'entities' in the domain) and classify verbs in terms of the types of nouns they take as arguments.  The semantic co-occurrence constraints (selectional constraints) can then be captured either
We will initially (for our Jet implementation) use semantic grammar patterns.  Semantic grammars