- 1. There are many different models for same shape. These are structurally very different, except for the geometry they describe.
- 2. Wish match to be invariant under translation and rotation around a vertical axis. Ideally, you would like the option to match or reject change in scale, 3D rotation, or reflection, depending on the application.
- 3. There is no standard measure of "difference" between two spatial
regions:
- Area of symmetric difference.
- Hausdorff distance
- Hausdorff distance between the boundaries.

- 4. The geometric differences in (3) do not satisfy any of the invariances
in (2). To adapt them so that they do, you have to use the measure
d'(R,S) = min

which is even harder to compute._{T in space of transformations}d(T(R),S) - 5. Natural categories (e.g. chair) do not correspond to clusters in the space of geometric regions.
- 6. Actual models are often topologically incorrect. They represent boundaries with overlaps, gaps, topological degeneracies etc. Since models are used 99% of the time for rendering, which only requires surface, they're adequate for their purposes, but many kind of geometric calculations become impossible.

- S(M) is efficiently computable from M.
- S(M) can be used effectively as an index in a database.
- S(M) is fairly compact.
- If M1 and M2 are models for the same shape, then d(S(M1),S(M2)) should be small.
- d(S(M1),S(M2)) should generally be an increasing function of the geometric "differentness" (rarely formally defined) of M1, M2.
- S(M) should be defined and stable, even if M is topologically inconsistent, in the ways that occur in practice.

- A. In principle. That is, if S(M1)=S(M2) then M1 must be geometrically very close to M2. (In such cases, S is generally the first part of an infinite series that uniquely defines M.)
- B. In practice. It is computationally feasible to reconstruct M from S(M).

**Invariance under translation:**
Choose the origin of the coordinate system
to be a uniquely defined point relative to the model
(e.g. the center of mass; the center of mass of the boundary; the
center of the minimal circumscribing sphere, etc.) These are reasonably
stable, and easily computed.

**Invariance under scale.** Define the unit length in terms of
some characteristic dimension of the model e.g. diameter.
(Funkhouser uses the median distance from the center to a point on
the boundary, because it is less sensitive to outliers: do these occur
much?)

** Invariance under rotation.** Two methods:

- 1.
Choose some axes characteristic of the shape
and align these along the coordinate axes. For instance,
principal axes. Take the X axis to be the principal axis with
highest moment of inertia; Y axis to be principal axis with middle
moment of inertia; Z axis to be principal axis with lowest moment of inertia.
Choose positive orientation of X axis so that standard devation of
positive X coordinates is greater than standard deviation of negative
X coordinates. (Modified from Paquet and Rioux).
[A is a principle axis of region R if, in outer space, R could spin around A. Every region has at least one set of mutually orthogonal principle axes; most regions, unless they are radially symmetric have exactly one such set.]

The problem is that

*any*function from a region to a set of three axes which is invariant under rotation is necessarily discontinuous. More: Any function that maps a region into the same region under some standard orientation is discontinuous. (Conjecture, but I'm pretty sure of it.) So this method is not compatible with the desired stability. - 2. Use a signature that is inherently invariant under rotation.
(Example further on.) These, however, tend to be non-invertable, and
therefore at least in principle prone to mismatches.
Is there a signature that is rotationally invariant, stable, and invertible? Yes, at least in a trivial sense. Let S1(M) be any stable, invertible signature (e.g. the Fourier components or the occupancy array). Let S(M) = { S1(T

_{i}(M)) } where T_{1}, T_{2}... is a sequence of rotations covering the space of rotations.Is there a non-trivial signature ...? I don't know of any, but I don't know of a formal definition here of "non-trivial".

**Invariance under reflection **

- Just index the reflected version and compare query with both versions.
- Find a way to define the natural "handedness" of the model. If the model is naturally left-handed then reflect it. It is altogether unlikely that anyone does this. It has all the problems associated with finding the standard orientation of an object, and then some.
- If you're already using a rotation-invariant signature, it's probably reflection-invariant as well.

Color information will an additional 3 or 4 bytes per triangle = about half as much information again = total of 1.2 GBytes.

- Models are much longer and harder to construct than documents.
- Hence many fewer models than documents.
- Quality of model is an issue. Authority of a model is not an issue.
- Model signature analogous to vector model of document.
- Definition of model signature much more complex and computation much
harder than document vector:
- Because we are considering structure much more seriously with the models than with the documents. If we were to apply NLP techniques to documents, they would take much longer than the 3D models.
- So the question really is, why can we get away with this cheap method on documents but not in models? Small elements of text are much more indicative of its subject than small elements of a solid model. E.g. if you find "Thoreau" in a document, that is quite suggestive, while finding coordinate "2.670548" in a model is not at all suggestive. (There should be something deeper to say about this, but I don't find it.)

- No analogue in documents of theoretically correct distance.
- Are there analogies in documents to invariance principles?

e.g. The meaning is invariant under permutation of the paragraphs? (Not for dialogue, of course, but for most expository text.) - Query model is much much harder to construct than verbal query. No equivalent of the 1 word or two word query.

A Search Engine for 3D Models by Thomas Funkhouser

The search engine can be found at
Princeton 3D Models Search Engine

Repertory of 20,000 solid models collected around the web. Searchable by (1) solid model; (2) 2D hand-drawn sketch; (3) text.

Components

- Spider.
- Compute spherical harmonic descriptors and index.
- Compute 2D sketches and index.
- Index by keywords

Guided search, score(P) computed as follows:
If P is a 3D model, then score = log(number of triangles).

If P is an HTML page then score = count of keywords in title and text
that suggest a relation to 3D modelling.

If P is unvisited, then score is a weighted sum of

1. distance-weighted average of scores of documents linking to it.

2. distance-weighted average of scores of models nearby in link graph

3. site score reflecting proportion of documents retrieved from site
that are models.

- Take "shape of object O" to be boundary patches.

(More reliable than interior. Many solid models have topological errors that make computing the "inside" an ill-defined operation. Since the models are mostly used for rendering, all you really need is the boundary.) - Discretize in 64x64x64 voxel grid.

Voxel = 1 if voxel within 1 unit of boundary(O).

Place center of grid at center of mass.

Set unit distance so average distance from center to non-zero voxel is 16.

Let V be this discretized shape. - For r = 1 ... 32, construct the sphere S
_{r}of radius r. Let V_{r}= V intersect S_{r}. - Calculate the top 16 "spherical harmonic descriptors" for V
_{r}in S_{r}

** Spherical Harmonic Desciptor **

Spherical harmonic analysis is basically Fourier analysis on the
surface of a sphere. The kth descriptor is basically the energy associated
with the Kth Fourier component. The following account is not mathematically
precise, but it gives the flavor:

The 0th descriptor is just the area of V_{r}.

For K > 0,

Let P, Q be two random points on S_{r} such that the distance
from P to Q on the sphere is PI*r/K.

(Thus, the angle from P to the center of the grid to Q is PI/k.)

Let C(P) and C(Q) be the circles in S_{r} centered at P and Q of
spherical radius PI*r/2K.

Let D(P,Q) = abs(area(C(P) intersect V) - area(C(Q) intersect V))

Let Q_{K,r} be the average of D(P,Q) for all such P,Q.

Then Q_{K,r} is roughly the Kth spherical harmonic descriptor of
V_{r}.

(I think. Don't quote me on this until I've had a chance to work through
the math more carefully.)

Correct features of this rough account:

- The number is invariant under rotation. (Also under reflection, which has its pros and cons.)
- The computation does smoothing at the scale of PI*r/2K; it does a difference at the scale of PI*r/K; and it does an average over the sphere.
- The descriptors do not uniquely determine the actual shape. The shape cannot be recovered from the descriptors.

- It has a sounder mathematical basis.
- It can be computed (comparatively) quickly.

(Actually, it seems very unlikely that they compute higher-order descriptors for small spheres, as they would be meaningless. So probably more like 256 values.)

Note that this is 1 KByte per model, as opposed to the average of 42 Kbytes quoted before.

The differentness of models M1 and M2 is taken to be the Euclidean distance
between the two vectors of descriptors.

The best matches for M are just the K nearest neighbors in the database.

- Discretization.
- Two shapes may be close in terms of the vector of desciptors, but actually far in a geometric sense.
- Two shapes of the same conceptual type may be far apart in geometric terms.

Thus, any view of the object is within 22.5 degrees of one of these standard views.

2. Compute the distance transform -- value at each pixel = distance to nearest boundary element.

3. Compute circular harmonic descriptors: exactly analogous to spherical harmonic desciptors.

Comment: I don't quite see how this works for user sketches:

- What do you do about interior lines?
- What do you do about non-closed boundaries?
- Doesn't deal with systematic distortions.

85 classes of sizes between 5 and 153. 610 models not in any class.

E.g. 5 classes of chairs: 153 dining room chairs, 10 desk chairs, 5 directors chairs, 25 living room chairs, 6 lounge chairs

Wide range of shapes: 8 forks, 5 cannons, 6 hearts, 17 plates of food.

Shape matching algorithms:

- Random.
- Moments: integral(x
^{p}y^{q}z^{r}dx dy dz) Object oriented so that center of mass at origin, principal axes along coordinates. - Extended Gaussian images. For each vector N, area of region where normal = N. Object oriented using principal axes.
- Shape histograms: For radius R, area of intersection of sphere radius R with object.
- D2 shape distributions: Distribution of distance(x,y) where x,y range over surface of object.

Results: Recall-precision curves very much better for 3D harmonics than for others. Over all classes, at recall = 40%, precision = 30% for 3D harmonics; 20% for D2 and shape histograms; 10% for EGI; 5% for moments. Over living-room chairs, at recall = 40%, precision for 3D harmonics = 65%.

Search time of less than 0.25 seconds in database of 17,500 models.

A. Provide a query of up to 5 words for retrieving model. B. Sketch model from top, front, and side.

Query / sketch / query and sketch used for retrieval.

Results: very variable, for different types of objects. It is not even the case that both combined did better than either singly. Two different evaluation measures -- median rank of target object, and percentage of queries where target object in top 16 -- give rather different results.

(1) "Similar shape" button retrieves by model match;

(2) "Similar shape" butten retrieves by text match.

Task: to find a particular model. User are constrained to use one or the other version of the search engine.

Text search: 48 seconds search time, 2.8 iterations, 60% find on first query, 77% find within first 10 iterations.

3D shape search: 40 seconds search time, 2.4 iterations, 54% find on first query, 89% find within first 10 iterations.

A Web-based Retrieval System for 3D Polygonal Models Motofumi Suzuki

"Nefertiti: a query by content system for three-dimensional model and
image databases management" Eric Paquet and Marc Rioux,
* Image and Vision Computing * Vol. 17, No. 2, pp. 157-166.
Can be accessed from NYU accounts
here

Both of the above use color information as well as shape information.

3D Model Retrieval Dejan V. Vranic and D. Saupe

Uses principal component analysis.