[Using NE for text analysis: the Perseus Project; more on such 'low-level' applications next week in a guest lecture by Prof. Dan Melamed.]
Constraints on the words (word senses) allowed as arguments and modifiers of other words (word senses)
(cf. syntactic constraints on lexical items). (J&M p. 614-619).
Recording every acceptable (sense) word combination is impractical, so ...
We gather word senses into word classes, in a hierarchical structure (tree or directed acyclic graph)
And then record selectional constraints in terms of the class(es) of words acceptable as an argument / modifier of another word (or word class)
These word classes usually correspond to ‘conceptual’ (semantic) classes
Word senses may be organized into a taxonomy, connected by hyponomy ('isa') relations (J&M p. 600).
WordNet is the most widely used taxonomy of English (J&M sec. 16.2); similar taxonomies have been produced for many other languages (see the Global WordNet Association).
Associate selectional constraints with semantic roles (or treat different syntactic position --> semantic role assignments as different senses) to resolve semantic role ambiguities.
Direct approach: enumerate all semantic interpretations (logical forms), and see which ones satisfy all constraints
Problem: there may be very many interpretations
Factoring ambiguities (esp. word sense ambiguities)
More efficient for highly ambiguous sentences
Use iterative constraint satisfaction algorithm which eliminates senses
During or after the parse?
Can apply constraints during parse, blocking the creation of partial parses if they do not satisfy selectional constraints
Can significantly reduce number of partial parses, but …makes each parsing step slower.
Syntactic vs. semantic grammars:
Semantic grammars provide a simple approach to limited sublanguages (capture both syntactic and semantic constraints in a single component)Introducing semantic classes into Jet: the concept hierarchy.
- convenient for constructs which fall outside general language syntax
But they lose power of syntactic generalization … each semantic pattern must appear in each of its syntactic forms (active, passive, question, …), and so are cumbersome for broad-coverage systems.
For restricted sublanguages (esp. technical domains), selectional restrictions may be quite ‘sharp’ and can be captured by manual text analysis
For broader coverage, capturing selectional constraints very difficult
Can be acquired from tree banks (hand-parsed corpora), or
Learned from unambiguous examples in machine-parsed corpora
Corpus-trained approaches allow one to gather statistics on selection, making them selectional preferences rather than strict constraints
- compute probability of each head-relation-argument triple
- can generalize using a semantic hierarchy (e.g., WordNet, thesaurus)
Corpus-trained methods can also be used to acquire statistics for word-sense disambiguation (J&M sec. 17.1, 17.2)
- largely dependent on corpora hand-tagged with word senses