Modern learning problems in computer vision, natural language processing, computational biology, and other areas are often based on large data sets of tens of thousands to millions of training instances. But, how do we scale standard learning algorithms, such as Support Vector Machines, Kernel Ridge Regression, Spectral Clustering and Kernel Principal Component Analysis, to such orders of magnitude? In this talk, I will discuss solutions to this problem using sampling-based matrix approximation techniques. I will address several fundamental theoretical, algorithmic, and empirical questions associated with this class of algorithms: Which approximation should be used? How should sampling be performed? How well do these approximations work in theory? How well do they work for practical applications? I will also report the results of extensive experiments with large-scale data sets.