The dictionary includes entries for approximately 21,000 nouns, 8000 adjectives and 6,000 verbs, all of which are marked with a rich set of syntactic features and complements. Nouns have 9 possible features and 9 possible complements; adjectives have 7 features and 14 complements; and verbs have 5 features and 92 complements. Other entries identify words as adverbs, prepositions, cardinal numbers, etc. without further specification. The noun, adjective and verb entries were created by a team of four linguistics graduate students, working half-time for approximately one year. Each ELF (enterer of lexical features) has been provided with a menu-based entry program, which is written in Lisp using the Garnet GUI package, and which provides access to a concordance based on approximately 90 MB of text. Elves enter features and complements for verbs based on: (1) the concordance; (2) hard copy dictionaries; and (3) their individual usage.
Each lexical entry is organized as a nested set of feature-value lists, using a Lisp-style notation which, if needed, could be mapped into other forms, e.g. Prolog, SGML-marked text, etc. Each list consists of a type symbol followed by zero or more keyword-value pairs. Each value may in turn be an atom, a string, a list of strings, feature-value list, or a list of feature-value lists. Key-words identify orthography (:orth) inflected forms (e.g., :plural, :pastpart, etc.), features (:features), subcategorization/complements (:subc), and other information. Subcategorization is mostly self-explanatory, e.g., verbs marked with "np" and "part-np" respectively take "np" and "particle + np" complements. Features include "apreq" which is marked on adjectives which can modify a numerically quantified NP, e.g., "the above-mentioned one hundred gorillas" where "above-mentioned" modifies the group of one hundred gorillas (each gorilla is not above-mentioned) and ntitle which refer to nouns that occur as titles preceding names, e.g. "Prof. Mary Fitzburg". Some example lexical entries follow.
We expect to complete Version 2 of Comlex Syntax in May of 1995. The
two most significant changes will be: (1) An improvement in the
quality and coverage of Comlex as the result of our own quality checks
as well as feedback from users; and (2) A corpus, tagged with all of
our verb complement classes and many of our verb features. Each of our
lexical entries for verbs will include a list of tags, where each tag
will consist of one feature or complement, the name of the source
(Brown Corpus, Wall Street Journal, etc.) and a pointer to a corpus
file. This effort will be significant for gathering statistics on the
frequency of complements and features.
(verb :orth "build"
:subc ((np) (np-for-np) (part-np :adval ("up"))))
(noun :orth "assertion"
:subc ((noun-that-s) (noun-be-that-s)))
(adverb :orth "even")
(adjective :orth "above-mentioned"
:features ((apreq) (attributive)))
(verb :orth "abbreviate"
:subc ((np-pp :pval ("to")) (np) (np-np-pred) (np-as-np))
:features ((vveryving :pastpart t)))
(noun :orth "Prof."
:features ((ntitle)))
For more information about Comlex, please click on any of our references.
(For information about other natural language processing research at the NYU Proteus Project, click here).
Catherine Macleod, Adam Meyers, and Ralph Grishman
{macleod,meyers,grishman}@cs.nyu.edu
Macleod, Catherine and Ralph Grishman (1995). Comlex Syntax Reference Manual, Proteus Project, NYU.
Macleod, Catherine, Ralph Grishman and Adam Meyers (1994a). "The Comlex Syntax Project: The First Year", Presented at the 1994 ARPA Human Language Technology Workshop.
Macleod, Catherine, Ralph Grishman and Adam Meyers (1994b). "Creating a Common Syntactic Dictionary of English", Presented at SNLR: International Workshop on Sharable Natural Language Resources, Nara, August, 1994.
Macleod, Catherine, Ralph Grishman and Adam Meyers (1994c). "Developing Multiply Tagged Corpora for Lexical Research", Presented at the International Workshop on Directions of Lexical Research, Beijing, China, August, 1994.
Meyers, Adam, Catherine Macleod and Ralph Grishman (1994). "Standardization of the Complement Adjunct Distinction", Proteus Project Memorandum 64, Computer Science Department, New York University.
Wolff, Susanne Rohen, Catherine Macleod and Adam Meyers (1993). Comlex Word Classes Manual, Proteus Project, New York University.