Computer Science Colloquium

Representation, Modeling and Computation: Opportunities and Challenges of Modern Datasets

Alekh Agarwal, Microsoft Research

March 07, 2014 11:30AM
Warren Weaver Hall, 1302
251 Mercer Street
New York, NY 10012

Denis Zorin


Machine learning from modern datasets presents novel
opportunities and challenges. Larger and more diverse datasets enable
us to answer more complex statistical questions, but present
computational challenges in designing algorithms that can scale. In
this talk I will present two results, the first one about
computational challenges and the second about an opportunity enabled
by modern datasets in the context of representation learning.

I will start by presenting a distributed machine learning system we
developed to address the computational scalability problem. Our system
obtains state-of-the-art computational results in many common
classification and regression tasks. I will discuss both the
communication and computational components of the system, along with
experimental evaluation on industry-scale data as well as large
datasets in the academic literature.

In the second part of my talk, I will present my recent work on
dictionary learning, also known as sparse coding. The goal here is to
efficiently learn a basis such that each data point is a combination
of only a small number of basis elements, and applications arise in
signal processing as well as machine learning. I will present an
efficient algorithm which is guaranteed to recover the true
dictionary, given enough data samples. This is the first recovery
result for overcomplete dictionaries, and comes with an easy to
implement algorithm.


Alekh is a post-doctoral researcher at the New York lab of
Microsoft Research, where his research is primarily focused on machine
learning, statistics and convex optimization. Prior to that, he
obtained his PhD from UC Berkeley under the supervision of Peter
Bartlett and Martin Wainwright. He received the MSR PhD Fellowship in
2009 and Google PhD Fellowship in 2011.

