Computer Science Colloquium
Riemannian Geometry and Text Classification
Friday, April 22, 2005 11:30 A.M.
Room 1302 Warren Weaver Hall
251 Mercer Street
New York, NY 10012-1185
Colloquium Information: http://cs.nyu.edu/csweb/Calendar/colloquium/index.html
Richard Cole email@example.com, (212) 998-3119
With the growth of textual databases such as the World Wide Web the task of
classifying text documents is becoming increasingly important. However,
popular text classification models assume, either explicitly or implicitly,
a Euclidean geometry on text documents. We demonstrate the inapplicability
of this assumption and derive an axiomatic geometry for text documents. By
generalizing popular algorithms such as logistic regression and support
vector machines to use the canonical geometry of text documents we show a
dramatic increase in performance over the state-of-the-art.
If time permits, I will also discuss conditional models for ranked data,
including a generalization of the Mallows model that addresses the problem
of combining multiple search engines.
| contact firstname.lastname@example.org