Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh

Tutorial Description

Kernel methods are widely used in statistical learning. Positive definite symmetric (PDS) kernels implicitly specify an inner product in a Hilbert space where large-margin techniques are used for learning and estimation. They can be combined with algorithms such as support vector machines (SVMs) or other kernel-based algorithms to form powerful learning techniques.

But the choice of the kernel, which is critical to the success of
these algorithms, is typically left to the user. To limit the risk of
a poor choice of kernel, in the last decade or so, a number of
publications have investigated the idea of *learning the kernel*
from data. Rather than requesting the user to
commit to a specific kernel, which may not be optimal, in particular
if the user's prior knowledge about the task is poor, learning kernel
methods require the user only to supply a family of kernels. The task
of selecting (or learning) a kernel out of that family is then
reserved to the learning algorithm which, as for standard kernel-based
methods, must also use the data to choose a hypothesis in the
reproducing kernel Hilbert space (RKHS) associated to the kernel
selected.

This tutorial describes the main theoretical, algorithmic, and empirical results related to learning kernels obtained in the last decade, including recent progress in all of these aspects in the last few years. Our tutorial will also introduce the audience to software libraries and packages incorporating the implementation of several of the most effective learning kernel algorithms and indicate how to use these algorithms in applications to effectively improve performance.

Learning kernel is a fundamental topic for kernel methods and machine learning in general. The question of selecting the appropriate kernel has been raised since the beginning of kernel methods, in particular for SVMs. Significant improvements in this area will both reduce the requirements from the users when applying machine learning techniques and help achieve better performance. Additionally, the methods used for learning kernels, including the formulation and solution to the optimization techniques, the algorithms, and the theoretical insights can be useful in other areas of machine learning, such as learning problems with data-dependent hypotheses, feature selection or feature reweighting, distance learning, transfer learning and many others. Finally, there are many interesting research questions in this area that have not been explored sufficiently yet. This tutorial will provide a convenient introduction to both standard and advanced material in this area, which will help interested researchers to investigate these questions.

Location and Time

ICML 2011, Bellevue, Washington

Grand-AB

June 28, 2011.

4:30-7:30PM.

Targeted Audience

The tutorial is meant for a broad audience, including students and researchers interested in machine learning in general, in particular, kernel methods, feature selection, transfer learning, manifold learning and many others. No specific knowledge will be required since the tutorial is self-contained and most fundamental concepts are introduced during the presentation. No knowledge of programming languages is required.

The participants will learn about the fundamental aspects of the main techniques for learning kernels, including a concise presentation of the main algorithms, the most recent theoretical results, and an overview of empirical results. They will also become familiar with software libraries and packages helping them experiment with several effective learning kernel algorithms. These algorithms can be used in a variety of applications to improve performance.

Lectures

Part I: | Introduction to kernel methods. |

Part II: | Learning kernel Algorithms. |

Part III: | Theoretical guarantees. |

Part IV: | Software tools. |

[tarball of all PDFs]

References

A list of relevant papers for further reading is also provided within the lecture slides of each section.