G22.2590 - Natural Language Processing - Spring 2003 Prof. Grishman

Lecture 3 Outline

February 6, 2003

Parsers and their Problems, cont'd

Problems with the top-down backtracking parser Problems with the bottom-up parser Top-down chart parser (Earley Algorithm) (J&M 10.4)

Problem of ambiguity (p. 372)

Capturing constraints in a context-free grammar

Part-of-Speech Tagging (J&M chapter 8)

Role of parts-of-speech in grammar:  rules stated in terms of classes of words sharing syntactic properties

How fine should these classes be?
    Range of answers ... different part-of-speech 'tag' sets
    Penn Tag Set ... used to tag Univ. of Pennsylvania Tree Bank (1 million words)  (p. 297)

The tagging task:  determining the tag of each word
    Not trivial:  many common words have several tags
    How?
        A dictionary will tell us which tags are possible for a word, independent of context.
        We could parse the sentence, and see which tags are used in the parses, but that's an expensive
        and difficult process (we might not always get a parse).
        Instead, we develop separate part-of-speech taggers.
    Why?
        Help parsing (reduce ambiguity).
        Resolve pronunciation ambiguities (for text-to-speech).
        Resolve semantic ambiguities.

Rule based part-of-speech tagging (J&M 8.4)
    Ex:  Constraint-grammar tagger
    Needs large tagged corpus for testing

Statistical part-of-speech tagger (J&M 8.5)
    Needs large tagged corpus for training
    Unigram statistics (most common part-of-speech for each word) get us to about 90% accuracy
    For greater accuracy, need some information on adjacent words