Speaker: Tyler Neylon
Title: Unlabeled Compression Schemes for Maximum Classes (based on a paper by Manfred K. Warmuth and Dima Kuzmin)
Date: January 10th, 2006.

Abstract: In a 1995 paper, Manfred Warmuth and Sally Floyd published a proof that, for any concept class of finite VC dimension d, any training set could be effectively compressed to a smaller training of size at most d.  They did this by presenting a scheme such that, for any set of labeled data (ie, any sample) which is consistent with some hypothesis in your concept class, their sample compression scheme would pick out some d of those data points, from which it could reconstruct an hypothesis consistent with the entire original set.  For example, to compress the concept class of rectangle-interiors in the plane, a compression scheme could just keep the leftmost, rightmost, topmost, and bottommost points which are labeled positively (ie inside the rectangle to be learned) from any given sample.

In a forthcoming paper (the one to be presented), Manfred Warmuth returns with Dima Kuzmin to demonstrate that we can do even better in the case of maximum concept classes (those with maximum size given their VC dimension); more specifically, we can compress the sample in such a way that we throw away labels, and just keep the unlabeled data points as the compressed sample.  As an auxiliary benefit, it is conjectured that any such algorithm will also perform optimally from the perspective of PAC learning.  Some tantalizing related questions remain open.