## Retrieving 3D Models

The problem of matching geometric models is hard because:
• 1. There are many different models for same shape. These are structurally very different, except for the geometry they describe.
• 2. Wish match to be invariant under translation and rotation around a vertical axis. Ideally, you would like the option to match or reject change in scale, 3D rotation, or reflection, depending on the application.
• 3. There is no standard measure of "difference" between two spatial regions:
• Area of symmetric difference.
• Hausdorff distance
• Hausdorff distance between the boundaries.
None of these are easy to compute in the standard modelling representations.
• 4. The geometric differences in (3) do not satisfy any of the invariances in (2). To adapt them so that they do, you have to use the measure
d'(R,S) = minT in space of transformationsd(T(R),S)
which is even harder to compute.
• 5. Natural categories (e.g. chair) do not correspond to clusters in the space of geometric regions.
• 6. Actual models are often topologically incorrect. They represent boundaries with overlaps, gaps, topological degeneracies etc. Since models are used 99% of the time for rendering, which only requires surface, they're adequate for their purposes, but many kind of geometric calculations become impossible.
In practice, researchers don't even bother considering the kinds of geometric difference in (3) (you almost never see any discussion of these in this literature.) Rather, they look for a signature S(M) -- i.e. a vector of numbers computable from the model -- and a matching function over signature d(S1,S2) with the following properties:
• S(M) is efficiently computable from M.
• S(M) can be used effectively as an index in a database.
• S(M) is fairly compact.
• If M1 and M2 are models for the same shape, then d(S(M1),S(M2)) should be small.
• d(S(M1),S(M2)) should generally be an increasing function of the geometric "differentness" (rarely formally defined) of M1, M2.
• S(M) should be defined and stable, even if M is topologically inconsistent, in the ways that occur in practice.
The signature function may be invertable or not, either
• A. In principle. That is, if S(M1)=S(M2) then M1 must be geometrically very close to M2. (In such cases, S is generally the first part of an infinite series that uniquely defines M.)
• B. In practice. It is computationally feasible to reconstruct M from S(M).
If the signature function is not invertible, then there exist M1, M2 that are substantially different such that S(M1)=S(M2). The invariances mentioned in (4) above are achieved (when desired) as follows:

Invariance under translation: Choose the origin of the coordinate system to be a uniquely defined point relative to the model (e.g. the center of mass; the center of mass of the boundary; the center of the minimal circumscribing sphere, etc.) These are reasonably stable, and easily computed.

Invariance under scale. Define the unit length in terms of some characteristic dimension of the model e.g. diameter. (Funkhouser uses the median distance from the center to a point on the boundary, because it is less sensitive to outliers: do these occur much?)

Invariance under rotation. Two methods:

• 1. Choose some axes characteristic of the shape and align these along the coordinate axes. For instance, principal axes. Take the X axis to be the principal axis with highest moment of inertia; Y axis to be principal axis with middle moment of inertia; Z axis to be principal axis with lowest moment of inertia. Choose positive orientation of X axis so that standard devation of positive X coordinates is greater than standard deviation of negative X coordinates. (Modified from Paquet and Rioux).

[A is a principle axis of region R if, in outer space, R could spin around A. Every region has at least one set of mutually orthogonal principle axes; most regions, unless they are radially symmetric have exactly one such set.]

The problem is that any function from a region to a set of three axes which is invariant under rotation is necessarily discontinuous. More: Any function that maps a region into the same region under some standard orientation is discontinuous. (Conjecture, but I'm pretty sure of it.) So this method is not compatible with the desired stability.

• 2. Use a signature that is inherently invariant under rotation. (Example further on.) These, however, tend to be non-invertable, and therefore at least in principle prone to mismatches.

Is there a signature that is rotationally invariant, stable, and invertible? Yes, at least in a trivial sense. Let S1(M) be any stable, invertible signature (e.g. the Fourier components or the occupancy array). Let S(M) = { S1(Ti(M)) } where T1, T2 ... is a sequence of rotations covering the space of rotations.

Is there a non-trivial signature ...? I don't know of any, but I don't know of a formal definition here of "non-trivial".

Invariance under reflection

• Just index the reflected version and compare query with both versions.
• Find a way to define the natural "handedness" of the model. If the model is naturally left-handed then reflect it. It is altogether unlikely that anyone does this. It has all the problems associated with finding the standard orientation of an object, and then some.
• If you're already using a rotation-invariant signature, it's probably reflection-invariant as well.

### Data size

Funkhouser's system has collected about 20,000 solid models off the Web. (Compare, of course, 2 billion documents indexed by Google.) In the object models provided by Viewpoint, the mean number of triangles per model is 3504 (median is 1536). The number of vertices is half the number of triangles = 1752 on average. Figure that each coordinate of each vertex is a 32bit = 4 Byte floating point, hence 12 Bytes per vertex. Connectivity data: Each triangle needs two bytes for each of its its three vertices. Overall 42 KBytes average per model; 800 MBytes for the whole collection: shape description.

Color information will an additional 3 or 4 bytes per triangle = about half as much information again = total of 1.2 GBytes.

### Comparing 3D models to text documents.

• Models are much longer and harder to construct than documents.
• Hence many fewer models than documents.
• Quality of model is an issue. Authority of a model is not an issue.
• Model signature analogous to vector model of document.
• Definition of model signature much more complex and computation much harder than document vector:
• Because we are considering structure much more seriously with the models than with the documents. If we were to apply NLP techniques to documents, they would take much longer than the 3D models.
• So the question really is, why can we get away with this cheap method on documents but not in models? Small elements of text are much more indicative of its subject than small elements of a solid model. E.g. if you find "Thoreau" in a document, that is quite suggestive, while finding coordinate "2.670548" in a model is not at all suggestive. (There should be something deeper to say about this, but I don't find it.)
• No analogue in documents of theoretically correct distance.
• Are there analogies in documents to invariance principles?
e.g. The meaning is invariant under permutation of the paragraphs? (Not for dialogue, of course, but for most expository text.)
• Query model is much much harder to construct than verbal query. No equivalent of the 1 word or two word query.

## Princeton System

A Search Engine for 3D Models by Thomas Funkhouser
The search engine can be found at Princeton 3D Models Search Engine

Repertory of 20,000 solid models collected around the web. Searchable by (1) solid model; (2) 2D hand-drawn sketch; (3) text.

Components

• Spider.
• Compute spherical harmonic descriptors and index.
• Compute 2D sketches and index.
• Index by keywords

#### Spider

Seed: Results returned by Google and other search engines for queries such as "3D and (models or meshes)".

Guided search, score(P) computed as follows: If P is a 3D model, then score = log(number of triangles).
If P is an HTML page then score = count of keywords in title and text that suggest a relation to 3D modelling.
If P is unvisited, then score is a weighted sum of
1. distance-weighted average of scores of documents linking to it.
2. distance-weighted average of scores of models nearby in link graph
3. site score reflecting proportion of documents retrieved from site that are models.

#### Searching by shape

Either upload your own model from a file, or use library shape and click on "Similar Shape".

#### 3D shape representation

• Take "shape of object O" to be boundary patches.
(More reliable than interior. Many solid models have topological errors that make computing the "inside" an ill-defined operation. Since the models are mostly used for rendering, all you really need is the boundary.)
• Discretize in 64x64x64 voxel grid.
Voxel = 1 if voxel within 1 unit of boundary(O).
Place center of grid at center of mass.
Set unit distance so average distance from center to non-zero voxel is 16.
Let V be this discretized shape.
• For r = 1 ... 32, construct the sphere Sr of radius r. Let Vr = V intersect Sr.
• Calculate the top 16 "spherical harmonic descriptors" for Vr in Sr

Spherical Harmonic Desciptor
Spherical harmonic analysis is basically Fourier analysis on the surface of a sphere. The kth descriptor is basically the energy associated with the Kth Fourier component. The following account is not mathematically precise, but it gives the flavor:

The 0th descriptor is just the area of Vr.

For K > 0,
Let P, Q be two random points on Sr such that the distance from P to Q on the sphere is PI*r/K.
(Thus, the angle from P to the center of the grid to Q is PI/k.)
Let C(P) and C(Q) be the circles in Sr centered at P and Q of spherical radius PI*r/2K.
Let D(P,Q) = abs(area(C(P) intersect V) - area(C(Q) intersect V))
Let QK,r be the average of D(P,Q) for all such P,Q.
Then QK,r is roughly the Kth spherical harmonic descriptor of Vr.
(I think. Don't quote me on this until I've had a chance to work through the math more carefully.)

Correct features of this rough account:

• The number is invariant under rotation. (Also under reflection, which has its pros and cons.)
• The computation does smoothing at the scale of PI*r/2K; it does a difference at the scale of PI*r/K; and it does an average over the sphere.
• The descriptors do not uniquely determine the actual shape. The shape cannot be recovered from the descriptors.
The advantages of the actual harmonic analysis are
• It has a sounder mathematical basis.
• It can be computed (comparatively) quickly.

#### Indexing and Retrieval

A model is indexed by the 512 (= 32 spheres * 16 descriptors) values of Qk,r.
(Actually, it seems very unlikely that they compute higher-order descriptors for small spheres, as they would be meaningless. So probably more like 256 values.)

Note that this is 1 KByte per model, as opposed to the average of 42 Kbytes quoted before.

The differentness of models M1 and M2 is taken to be the Euclidean distance between the two vectors of descriptors.
The best matches for M are just the K nearest neighbors in the database.

#### Sources of error:

• Discretization.
• Two shapes may be close in terms of the vector of desciptors, but actually far in a geometric sense.
• Two shapes of the same conceptual type may be far apart in geometric terms.

#### 2D-shape representation

1. Compute silhouette of 3D model from 13 view directions: Four corners of the coordinate cube, three faces, and 6 edge centers. (Note that the silhouette from the antipode is just the reflection.)
Thus, any view of the object is within 22.5 degrees of one of these standard views.

2. Compute the distance transform -- value at each pixel = distance to nearest boundary element.

3. Compute circular harmonic descriptors: exactly analogous to spherical harmonic desciptors.

Comment: I don't quite see how this works for user sketches:

• What do you do about interior lines?
• What do you do about non-closed boundaries?
• Doesn't deal with systematic distortions.

#### Text queries

Extract all the text you can from the model (e.g. filename, captions, informational fields, etc.) and from the anchors, and match queries in the usual way. Stemming, synonyms for file name.

#### Multimodal queries

Allowed, either initially or as successive feedback.

#### Shape matching

Test database: 1890 models, between 120 and 120,000 triangles.
85 classes of sizes between 5 and 153. 610 models not in any class.
E.g. 5 classes of chairs: 153 dining room chairs, 10 desk chairs, 5 directors chairs, 25 living room chairs, 6 lounge chairs
Wide range of shapes: 8 forks, 5 cannons, 6 hearts, 17 plates of food.

Shape matching algorithms:

• Random.
• Moments: integral(xpyqzr dx dy dz) Object oriented so that center of mass at origin, principal axes along coordinates.
• Extended Gaussian images. For each vector N, area of region where normal = N. Object oriented using principal axes.
• Shape histograms: For radius R, area of intersection of sphere radius R with object.
• D2 shape distributions: Distribution of distance(x,y) where x,y range over surface of object.
(As mentioned above, principal axis calculation is inherantly unstable.)

Results: Recall-precision curves very much better for 3D harmonics than for others. Over all classes, at recall = 40%, precision = 30% for 3D harmonics; 20% for D2 and shape histograms; 10% for EGI; 5% for moments. Over living-room chairs, at recall = 40%, precision for 3D harmonics = 65%.

Search time of less than 0.25 seconds in database of 17,500 models.

#### Sketch Interface Experiment

43 students. Shown model rotating for 15 seconds. Asked to
A. Provide a query of up to 5 words for retrieving model. B. Sketch model from top, front, and side.

Query / sketch / query and sketch used for retrieval.

Results: very variable, for different types of objects. It is not even the case that both combined did better than either singly. Two different evaluation measures -- median rank of target object, and percentage of queries where target object in top 16 -- give rather different results.

#### Interactive Search Results

Two versions created:
(1) "Similar shape" button retrieves by model match;
(2) "Similar shape" butten retrieves by text match.

Task: to find a particular model. User are constrained to use one or the other version of the search engine.

Text search: 48 seconds search time, 2.8 iterations, 60% find on first query, 77% find within first 10 iterations.

3D shape search: 40 seconds search time, 2.4 iterations, 54% find on first query, 89% find within first 10 iterations.

### Other work on indexing 3D models

A Web-based Retrieval System for 3D Polygonal Models Motofumi Suzuki

"Nefertiti: a query by content system for three-dimensional model and image databases management" Eric Paquet and Marc Rioux, Image and Vision Computing Vol. 17, No. 2, pp. 157-166. Can be accessed from NYU accounts here

Both of the above use color information as well as shape information.

3D Model Retrieval Dejan V. Vranic and D. Saupe
Uses principal component analysis.