Computer Science Colloquium

Riemannian Geometry and Text Classification

Guy Lebanon

Friday, April 22, 2005 11:30 A.M.
Room 1302 Warren Weaver Hall
251 Mercer Street
New York, NY 10012-1185

Colloquium Information:


Richard Cole, (212) 998-3119


With the growth of textual databases such as the World Wide Web the task of classifying text documents is becoming increasingly important. However, popular text classification models assume, either explicitly or implicitly, a Euclidean geometry on text documents. We demonstrate the inapplicability of this assumption and derive an axiomatic geometry for text documents. By generalizing popular algorithms such as logistic regression and support vector machines to use the canonical geometry of text documents we show a dramatic increase in performance over the state-of-the-art.

If time permits, I will also discuss conditional models for ranked data, including a generalization of the Mallows model that addresses the problem of combining multiple search engines.

top | contact