Lecture 7: Specialized Search Engines

Wei Xu's presentation on Word Net and other NLP online tools

Specialized Search Engines

Advantages to specialized seach engine derive from Advantages include:


Collective business site

Collective business web sites of all kinds (bookfinder.com, expedia.com, Google shopping etc.)

Merchant sends database (and thereafter updates) in a uniform format to site. Site collates, enables a uniform search engine.

Similarly for engines for restricted categories of merchandise. (Cars, real estate, books, etc.) Restricting the categories enables domain-specific query attributes, drop-down value lists, appropriate presentation and interaction, greater precision.

More of a database than a search engine, but does support keyword matching.


CiteSeerX Precursor to Google Scholar, but has the advantage of having been published. Collects CS research papers; structures by citation.

Collection: Use search engine with keywords like "publications" or "papers" as starting point for crawl. Also uses known online journals, proceedings.

Single document processing Converts Postscript, PDF to plain text. Translate to various formats
Extracts index terms.
Extracts fields: Author, Title, Date, Pub, Abstract.
Identifies bibliography and references.
Identifies citations in text.
Locate author home pages.

Cross-document processing Identify reference with document. Note: Wide variance in form of referece. plus typos, errors
Identify common references to external document (i.e. not online)
Textual similarity between documents.
Co-citation similariy between documents.

Query results
Quotation from document with snippet.
Order by decreasing number of citations.

Document summary Title, date, authors, abstract, Citations to paper, similar docs at sentence level. bibliography, similar docs based on text, related docs from co-citation, histogram by year of number of citing articles.

Other Specialized Search