Keiji Shinzato and Kentaro Torizawa. Extracting Hyponyms of Prespecified Hypernyms from Itemizations and Headings in Web Documents. COLING 2004.
Systems for capturing hyponymy generally combine two
approaches: collecting the relations directly and augmenting them
with words similar to the hyponyms (to improve coverage).
We saw examples already in the work of Snow et al.
Distributional similarity: based on the idea that words that occur
in similar contexts have related meanings ("distributional hypothesis").
Create a context vector for each
word, using as contexts either adjacent words on each side or governor in a
dependency tree. Typical vector values are based on pointwise mutual
Compute vector similarity, e.g. using cosine metric.
(Word similarity measures are discussed in Jurafsky and Martin, section 20.7, pp. 658-667.)
(Patrick Pantel et al. 2009: Web-scale distributional similarity and entity set expansion. Proc. EMNLP 2009.)
Lexico-semantic patterns: extract co-ordinate terms from text
patterns "X, Y, and/or Z"
(Sarmento, L.; Jijkuon, V.; de Rijke, M.;
and Oliveira, E. “More
like these”: growing entity classes from seeds. In Proceedings of
Lexical resources (WordNet,
Google sets (labs.google.com/sets) is a patented
approach (patent 7,350,187) for searching lists on
the web to find related items. When the user inputs a seed, the
collected lists are searched for the seed items; then the lists
are weighted and the items on the lists merged. (description)
SEAL (Richard Wang and
William Cohen. Language-Independent
Set Expansion of Named Entities using the Web in ICDM-2007) sought
to generalize and improve on Google sets. It begins by using the
seeds as search terms to retrieve a set of web pages. It then
builds a page-specific wrapper for the list items (so it is not
dependent on a specific list-markup format), and uses the wrapper to
extract the items from each page. These are then combined by a
ASIA (Richard Wang and William W. Cohen (2009): Automatic Set Instance Extraction using the Web in ACL-IJNLP 2009.) combines an improved version of SEAL with Hearst-style patterns for finding hyponyms.
Marius Pasca and Benjamin Van Durme
Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs