The primary source of grammatical information about individual words is the lexicon. A lexical entry for a word will give its part of speech and various features of the word. The lexical lookup annotator processes a span of text which has already been divided into tokens, marked by token annotations (thus you must run a tokenizer prior to lexical lookup). It looks up each token in the lexicon and adds a constit annotation for each definition it finds. The attributes of the constit annotation are taken from the features of the lexical entry.
word,, cat = part-of-speech;
where part-of-speech is some pre-terminal symbol in the grammar (note the double comma!). For example,
my,, cat = art;The entry may give additional features for the word, in the form feature=value; for example
old,, cat = adj;
dog,, cat = n;
dogs,, cat = n;
chases,, cat = v;
cars,, cat=n;
dog,, cat=n, number=singular;Thus if the word "dog" appears in a sentence, lexical lookup will assign it the annotation <constit cat=n number=singular>dog</cat>. If a word has multiple parts of speech, it should have several entries in the lexicon:
dogs,, cat=n, number=plural;
walk,, cat=v, number=plural;When "walk" appears in a sentence, lexical lookup will add two constit annotations, one for each definition.
walk,, cat=n, number=singular;
The basic form of an entry is
defined-item, type, feature = value, feature = value, ... ;The defined-item is the word or word sequence being defined. It may be a single word, a sequence of words, or a string enclosed in double quotes ("). If the defined item contains any characters other than letters, it must be enclosed in quotes. Thus:
cat, noun;The type field may be noun, verb, adj, or adv, or may be empty. If the field is empty, the attribute / value pairs are used directly to create the internal lexicon entry. In this case, there should be at least a cat feature, indicating the word category of the lexical item:
floppy disk, noun;
"cat 'o nine tails", noun;
of,, cat=p.Each feature value may be an integer, a symbol (a sequence of letters beginning with a lower-case letter), or a string (enclosed in double quotes). For features representing inflected forms, if the value is a single word, it may be written as a symbol or string:
ox, noun, plural = oxen;or
ox, noun, plural = "oxen";If the inflected form consists of more than one word, or includes a non-letter, it must be enclosed in quotes:
musk ox, noun, plural = "musk oxen";The entry types and their features are described below. All features are optional.
base-form, noun, plural = plural, attributes = attributes, xn = xn;defines a noun (word category n) whose singular form is base-form. Its plural form is determined as follows: if plural is none, no plural is defined; if plural is given explicitly, it is used as the plural form; otherwise the plural form is determined from the base form as follows:
if it ends in 'x', 'z', 's', 'ch', or 'sh', add 'es'
if it ends in a vowel + 'y', add 's'
if it ends in a consonant + 'y', change the 'y' to 'ies'
otherwise add 's'
base-form, verb, thirdSing = singular, plural = plural, past = past, pastPart = past-participle, presPart = present-participle, attributes = attributes, xn = xn;defines a verb whose infinitival form is base-form. The following inflected forms are generated:
form, adj, attributes = attributes;defines an adjective (word category adj) with the specified attributes.
form, adv, attributes = attributes;defines an adverb (word category adv) with the specified attributes.