Computer Science Colloquium
Computational Foundations for Statistical Learning: Enabling Massive Science
Alexander Gray
CMU
Friday, March 11, 2005 11:30 A.M.
Room 1302 Warren Weaver Hall
251 Mercer Street
New York, NY 100121185
Directions: http://cs.nyu.edu/csweb/Location/directions.html
Colloquium Information: http://cs.nyu.edu/csweb/Calendar/colloquium/index.html
Hosts:
Richard Cole cole@cs.nyu.edu, (212) 9983119
Abstract
The data sciences (statistics, and recently machine learning) have
always been part of the underpinning of all of the natural sciences.
`Massive datasets' represent potentially unprecedented capabilities in
a growing number of fields, but most of this potential remains
unlocked, due to the computational intractability of the most powerful
statistical learning methods. The computational problems underlying
many of these methods are related to some of the hardest problems of
applied mathematics, but have unique properties which make classical
solution classes inappropriate. I will describe the beginnings of a
unified framework for a large class of problems, which I call
generalized Nbody problems. The resulting algorithms, which I call
multitree methods, appear to be the fastest practical algorithms to
date for several foundational problems. I will describe four examples 
allnearestneighbors, kernel density estimation, distributionfree
Bayes classification, and spatial correlation functions, and touch on
two more recent projects, kernel matrixvector multiplication and
highdimensional integration. I'll conclude by showing examples where
these algorithms are enabling previously intractable data analyses at
the heart of major modern scientific questions in cosmology and
fundamental physics.
top  contact webmaster@cs.nyu.edu
