Computer Science Colloquium

Classification Problems with Heterogeneous Information

Gert Lanckreit

Wednesday, April 20, 2005 11:15 A.M.
Room 1302 Warren Weaver Hall
251 Mercer Street
New York, NY 10012-1185

Colloquium Information:


Richard Cole, (212) 998-3119


An important challenge for the field of machine learning is to leverage the diversity of information available in large-scale learning problems, in which different sources of information often capture different aspects of the data. Beyond classical vectorial data formats, information in the format of graphs, trees, strings and beyond have become widely available (e.g., the linked structure of webpages, amino acid sequences describing proteins). In this talk I introduce a principled computational and statistical framework to integrate data from heterogeneous information sources in a flexible and unified way. The approach is formulated within the unifying learning framework of kernel methods and applied to the specific case of classification. The resulting formulation takes the form of a semidefinite programming (SDP) problem. Although this implies a polynomial time algorithm, the scale of many real-life problems is often beyond the reach of general-purpose SDP algorithms. Using tools from conic duality and convex analysis, I derive a dedicated algorithm that is significantly more efficient than generic SDP methods in this setting. Finally, I present applications to computational biology, showing that classification performance can be enhanced by integrating diverse genome-wide information sources.

top | contact