Speaker: Miroslav Dudik, Carnegie Mellon University
Location: Warren Weaver Hall 1302
Date: April 12, 2010, 11:30 a.m.
Host: Richard Cole
he maximum entropy approach (maxent), equivalent to maximum likelihood, is a widely used density-estimation technique. However, when trained on small datasets, maxent puts too much confidence on too little data (a phenomenon known as "overfitting"), and when trained over large sample spaces, naive implementations of maxent are intractable. To prevent overfitting, we propose a relaxed version of maxent, which turns out to be equivalent to L1-regularized log likelihood. We prove strong statistical guarantees for L1-regularized maxent, and show how it can be generalized to the problem of estimation in the presence of sample-selection bias, and to the problem of simultaneous estimation of multiple densities. To address computational challenges, we propose an approach based on sampling and coordinate descent.
I discuss two applications of maxent: statistical modeling of distributions of biological species and game-theoretic modeling of human negotiation, focusing mainly on the former. In species distribution modeling, statistical properties of regularized maxent are key in obtaining state-of-the-art performance on small data sets. In game-theoretic modeling, the coordinate descent algorithm and sampling allow our approach to solve negotiation scenarios an order of magnitude larger than previous techniques.
Based on joint work with Rob Schapire, Steven Phillips, Geoff Gordon, Dave Blei and others.
Miroslav Dudik received his PhD in Computer Science from Princeton University in 2007. Currently, he is a postdoctoral fellow at Carnegie Mellon University. His interests are in theoretical and applied aspects of machine learning, both statistical and algorithmic. He focuses on small-sample density estimation and game-theoretic modeling.
Refreshments will be offered starting 15 minutes prior to the scheduled start of the talk.