A black-box approach to machine learning
Monday, March 10, 2003
Host: Richard Cole, firstname.lastname@example.org, 212-998-3119
I will present a new approach to the study and development of machine learning algorithms.
The goal of a learning algorithm is to produce a good decision algorithm for a repetitive decision task. A decision algorithm receives as input an instance (sensory data) and outputs a decision (an action). After the decision has been made, there is a measurable outcome. We assume the existence of a loss function which quantifies the utility of a decision with respect to an outcome.
My main focus is on binary classification tasks. In this case, the decision is binary, the outcome is binary, and the loss is 1 if the decision and outcome don't match and 0 if they do.
Given these definitions and a source of instances and outcomes, we can evaluate the performance of any decision algorithm. In this way we can ignore the internal workings of the decision algorithm and treat it as a "black box".
The practical advantage of the black-box approach is that it gives us a measuring stick for comparing all types of decision algorithms, regardless of how they are constructed or analyzed. By extension, it gives us a way of comparing all types of learning algorithms.
From a theoretical point of view, the task is to develop and analyze learning algorithms that combine several black boxes into a new box with better performance. The resulting box can then be used as a component in other combining algorithms.
The theoretical guarantees that we prove are relative rather than absolute. Absolute guarantees, which are the norm in theoretical analysis of adaptive algorithms, make detailed assumptions about the data distribution. Instead, we prove bounds which relate the performance of the learning algorithm to the performance of the black-box algorithms it combines. These learning algorithms include boosting, online combining of expert advice, and pseudo-Bayesian averaging.
In this talk I will outline the approach, describe some of my work on the theory and practice of learning algorithms, and suggest some of the implications of this work on the foundations of statistical inference and on the development of adaptive software.