Computer Science Colloquium

Computational Foundations for Statistical Learning: Enabling Massive Science

Alexander Gray
CMU

Friday, March 11, 2005 11:30 A.M.
Room 1302 Warren Weaver Hall
251 Mercer Street
New York, NY 10012-1185

Directions: http://cs.nyu.edu/csweb/Location/directions.html
Colloquium Information: http://cs.nyu.edu/csweb/Calendar/colloquium/index.html

Hosts:

Richard Cole cole@cs.nyu.edu, (212) 998-3119

Abstract

The data sciences (statistics, and recently machine learning) have always been part of the underpinning of all of the natural sciences. `Massive datasets' represent potentially unprecedented capabilities in a growing number of fields, but most of this potential remains unlocked, due to the computational intractability of the most powerful statistical learning methods. The computational problems underlying many of these methods are related to some of the hardest problems of applied mathematics, but have unique properties which make classical solution classes inappropriate. I will describe the beginnings of a unified framework for a large class of problems, which I call generalized N-body problems. The resulting algorithms, which I call multi-tree methods, appear to be the fastest practical algorithms to date for several foundational problems. I will describe four examples -- all-nearest-neighbors, kernel density estimation, distribution-free Bayes classification, and spatial correlation functions, and touch on two more recent projects, kernel matrix-vector multiplication and high-dimensional integration. I'll conclude by showing examples where these algorithms are enabling previously intractable data analyses at the heart of major modern scientific questions in cosmology and fundamental physics.


top | contact webmaster@cs.nyu.edu