Web Images

Image Retrieval from the World Wide Web: Issues, Techniques, and Systems M.L. Kherfi, D. Ziou, and A. Bernardi

General issues

Categorize images

This can be largely done reasonably accurately on the basis of easily determined image characteristics. E.g. Trademarks etc. tend to have regions of simple structure, uniform color, high contrast. Icons are small, almost by definition. There are reasonably accurate filters for nude photographs based on color distribution and shapes. Etc.

Duplicate images or image parts

Efficient Near-Duplicate Detection and Sub-Image Detection

Find duplicate images, given changes in format, resolution, cropping, merging, geometric transformation.

Method: Compute transformation-invariant image features of subregions of the image. Use "locality sensitive hashing" for approximate similarity retrieval.

Text-based query

Associate text with image:

Keywords for a given image file from:

Weight by "closeness" of text to image, and text characteristics (e.g. font size).

Google generally does well with this, but reasonably often makes mistakes that can only be understood if you look at the embedding page.

Category specific image search

E.g. An early version of DIOGENES searches for celebrities' faces. Diogenes: a web search agent for person images Y.A. Aslandogan and C.T. Yu. Submit a text search to Google, collect the pages with images, filter those that are pictures of faces, check that the image is associated with the query term in the text document.

Recent Work

Recent work on web images tends to be characterized by



Matching Words and Pictures K. Barnard et al.

User studies show a large disparity between user needs and what technology supplies (Armitage and Enser 1997, Enser 1993, 1995). This work make hair-raising reading --- an example is a request to a stock photo library for "Pretty girl doing something active, sporty in a summery setting, beach -- not wearing lycra, exercise clothes -- more relaxed in tee-shirt. Feature is about deodorant, so girl should look active -- not sweaty, but happy, healthy, carefree -- nothing too post or set up -- nice and natural looking."
Cite various studies of requests to image collections.

80 Million Tiny Images: A Large Dataset for Non-parametric Object and Scene Recognition A. Torralba, R. Fergus, and W. Freeman

With overwhelming amounts of data, many problems can be solved without the need for sophisticated algorithms.

32 x 32 color pictures are generally recognizable. Lower resolution does not work. Vector of 3072 dimensions (1024 pixels x 3 colors) = 3072 Bytes per image.

General idea: Collect from the web a vast collection of annotated images, and use nearest neighbors to classify.

Data Set Collection

Nearest neighbors.
D(I1,I2) = sumx,y,c [I1(x,y,c) - I2(x,y,c)]2.
DWarp(I1,I2) = minimize [over transformation T] RawDist(I1,T(I2)) where T is a combination of translation, scaling, and horizontal mirror.
DShift(I1,I2) further allows in X and Y of individual pixels by 5 pixels.

Use of Wordnet
Convert wordnet into a tree of terms by extracting the most common meaning of all the words, and using the hypernym (supercategory) relationship. Then when searching for a category, you can include all words that are subcategories; e.g. if looking for person, include "artist", "politician", "kid" etc.

Annotation . Collect nearest neighbors. Each image "votes" for its label plus all supercategories

Person detection
Find 80 nearest neighbors, see how many are labelled "person" or (more usually) subcategory. Note: Better for pictures where the person is large (a) because easier to match (b) because label is more likely to refer to the person.

Person localization Extract multiple crops of the picture, renomalize to 32x32, see which crops match.

Scene recognition Collect votes among nearest neighbors for subcategory of "location" (e.g. "landscape", "workplace", "city" etc.)

Image colorization Given a grey scale image, find nearest neighbors in grey, apply average color.

Image orientation Try all rotations, find the orientation with the best match.

Training image classifiers from images collected off the web

Fergus et al.
As many as 85% of the returned images may be visually unrelated to the intended category, perhaps arising from polysemes (e.g. "iris" can be iris-flower, iris-eye, Iris-Murdoch). Even the 15% subset which do correspond to the category are substantially more demanding than images in typical training sets --- the number of objecs in each image is unknown and variable and the pose (visual aspect) and scale are uncontrolled.

Animals on the Web T. Berg and D. Forsyth

Animal images are particularly hard to identify (a) because they can adopt multiple poses, and are often seen from odd angles (b) because they have evolved to be camouflaged.

Learning Object Categories from Google's Image Search R. Fergus et al.

Harvesting Image Databases from the Web F. Schroff, A Criminisi, A. Zisserman

18 categories: Airplane, beaver, bike, boat, camel, car, dolphin, elephant, giraffe, guitar, horse, kangaroo, motorbike, penguin, shark, tiger, wrist watch, zebra.

Compare three downloading methods:

Filter out non-photographs based on image characteristics. Overall precision goes from 29% to 35%, number of in-class examples goes from 13,000 to 10,000. (Varies considerably across categories.)

Rank images in each category using surrounding text plus meta-data. Naive Bayes on various text features (file name, word within 10 of image link etc.)

Train on visual features (similar to Fergus') using SVM.

Results: At 15% recall getting overall 86% precision.

Scene Completion using Millions of Photographs James Hayes and Alexei Efros

Note 4th example in figure 6, where algorithm has actually removed the scaffolding.

Evaluation: Subjects evaluated doctored photos as real 37% of the time. Note however that subjects only evaluated real photos as real 87% of the time. 34% of doctored photos marked as fake within 10 seconds (as opposed to 3% of real photos).

LabelMe: a database and web-based tool for image annotation B. Russell et al. Tool for users on the web to label images and parts of images. Objective to get a large corpus of images with (reasonably) high-quality textual labels.

Searching For Multimedia: An Analysis Of Audio, Video, And Image Web Queries B. Jansen, A. Goodrum, A. Spink. How users search for multimedia

Clustering Art K. Barnard, P. Duygulu, D. Forsyth. Cluster images on the San Francisco Art Museum web site by image characteristics and text labels.