Lecture 8: Retrieving non-text
Invisible web (conclusion)
Grey line between invisible web and surface web.
If user makes a query to search engine, and then saves a link to the URL
of the result in a surface web page, then a crawler can follow that
link, even if the page is dynamically generated.
Pages that are too deep in site may not be indexed, even if they are
high-quality, static pages reachable through standard hlinks. E.g.
Of 248,706 pages of "Open
Directory", AltaVista indexed only 17,833 = 7.2%; Fast and Northen Light
had substantially fewer.
Method 1: Keyword search. Most standard search engines.
Captions, URL's, anchors.
Content-based image retrieval
Spiders crawling for images
Analysis in terms of image features
Analysis carried out for six subimages: whole, center, 4 quadrants
Color and Texture analysis
Image mapped to 3-D color space with psychological support.
Each dimension divided into 4, so 64 (=4x4x4) bins.
Texture characterized in terms of 16 paramters, each divided
into 4 bins.
Tamura's (1978) visual texture properties (not used in ImageRover, so far as I
Each image is point in 768 dimensional space. (= 6 * (64+64))
Data structure: approximate k-d tree,
Given distance D, can retrieve all instances closer than D and exclude
all instances further than D(1+e) in poly-log time.
Relevance feedback. User shown random images, asked for most similar.
Shown similar images, feedback on most relevant, etc.
Indexes by: keyword, BW/color, image dimension, number of faces,
size of largest face.
Extremely unsystematic. Sporadic surprisingly strong results.
Unlike words, there are no easily computed features of an image
that approximate semantic content.
Characterize Web services for purpose of
No attempt to do this automatically; extensive, manually written
descriptions. (Presumably, business web service providers have both
the incentive and the manpower to do this.)
- execution monitoring
All XML based
Two major directions of research.
Both characterized by immensely detailed and rather abstract standards
and remarkably verbose languages.
Web Services Languages
Various consortia of companies; primarily IBM and Microsoft.
Progressively more abstract descriptions.
Advantage: Working implementations
Disadvantage: Far from semantic content.
- Semantic Web
Begin with problem of semantics.
Advantage: Serious consideration of semantics
Disadvantage: Limited implementation
Web Services Languages
- SOAP (Simple Object Access Protocol).
XML-based protocol for remote procedure call.
WSDL (Web Service Description Language)
Web Services Essentials Chapter 6: WSDL Essentials
by Ethan Cerami).
Characterizes one functionality of a Web service. Layered on SOAP.
- UDDI (Universal Description, Discovery, and Integration Service)
Registry (Yellow pages) for web services. Layered on SOAP
Generally considered a failure (so far)
- Survey found 2/3 of UDDI entries unusable. (Invalid SOAP, non-existent
- Too elaborate.
- WSIL -- Web Services Inspection Language
Each company posts its own self-description at "inspection.wsil" on
Web site. Collected by standard search engines. "Lightweight" and extensible.
- Types -- container for data type definitions
- Message -- abstract, typed definition of data being communicated.
- Operation -- abstract description of action supported by service
- Port Type -- abstract set of operations supported by one or more
- Binding -- concrete protocol and data format specification for
- Port -- single endpoint = binding + network address.
- Service -- collection of related endpoints.
WSIL -- Web Services Inspection Language
Introduction to WSIL
WSIL: Do we need another Web Services Specification? Tarak Modi
DAML -- DARPA Agent Markup Language
OIL --- Ontology Interchange Language
Built on DAML+OIL.
DAML-S: Web Service Description for the Semantic Web
Language for Semantic Web Services: DAML-S
- Service Profile: What does the service require of agents and provide
- Service Model: How does it work?
- Service Grounding: Connection to WSDL representation.
- Description of service (for indexing and retrieval)
- intendedPurpose "high-level description of what constitutes successful
- requestedBy -- Actor
- providedBy -- Actor: physicalAddress, WebURL, name, phone, email, and fax.
- functional behavior -- rather important, but no information.
- functional attributes
- serviceParameter (e.g. averageResponseTime)
- communicationThru (e.g. SOAP)
- serviceCategory (e.g. Products, Information Services)
- qualityGuarantees (e.g. "response within 3 minutes")
- qualityRating (e.g. "Dun and Brandstreet Rating")
Service Model -- Process
Process operators: Sequence, concurrent, Split, Spilt+Join, Unordered,
Choice, If-Then-Else, Repeat-Until, Repeat-While.
Abstraction operators: Collapse (compound process -> blackbox atomic
process); Expand (the inverse).
by Paolucci et al.
Characterize service advertisement by types of input demanded, type
of output provided.
Characterize request by types of input supplied, type of output needed.
- Every input demanded by advertisement must match input supplied by
- Every output needed by request must be provided by advertisement.
- Match can be exact, or approximate (superset or subset).
- Matching advertisements listed in decreasing order of quality of
match. Match score on outputs more important than match score on inputs.
Major Problem: Only classes, no relations.
E.g. "car" matches "station wagon" OK.
But no way to distinguish a request for "Mother's maiden name" from
"My last name".
No representation of relation between input and output.
E.g. if input is "car" and output is "money", representation does not
distinguish between price, rental price, lease price, average cost
of maintenance per year, average cost of gas per mile ...
Minor problem: requestor required to predict every damn fool piece of
information that a service might ask for.