Given a set of points embedded in a Euclidean space, how should one
assign binary codes to each point so that the Hamming distances between
codes approximates the Euclidean distance between points as closely as
possible?
We show that the problem of finding the best set of codes for a given
dataset is closely related to the problem of graph partitioning and can
be shown to be NP hard. By relaxing the original problem, we obtain a
spectral method whose solutions are simply a subset of thresholded
eigenvectors of the graph Laplacian. By utilizing recent results on
convergence of graph Laplacian eigenvectors to the Laplace-Beltrami
eigenfunctions of manifolds, we show how to efficiently calculate the
code of a novel datapoint. Taken together, both learning the code and
applying it to a novel point are extremely simple. Our experiments show
that our codes outperform the state-of-the art.
Joint work with Yair Weiss (Hebrew U.) and Antonio Torralba (MIT)