Speaker: Arun Kumar, University of Wisconsin-Madison
Location: Warren Weaver Hall 1302
Date: March 25, 2016, 11:30 a.m.
Host: Subhash Khot
Advanced analytics -- the analysis of large and complex data using machine learning (ML) -- is becoming ubiquitous, with a growing demand for advanced analytics tools in the enterprise domains. However, there exist several challenging bottlenecks in the end-to-end process of building and deploying advanced analytics applications. My research focuses on abstractions, algorithms, and systems to mitigate such bottlenecks and accelerate advanced analytics from a data management standpoint.
In this talk, I will focus on my work on mitigating one such pervasive bottleneck in the process of feature engineering for ML -- joins of multiple tables. Many real-world datasets are multi-table, connected by key-foreign key relationships, but almost all ML toolkits expect single-table inputs. This forces data scientists to join all tables and materialize a single table that collects all features. Alas, such joins often cause the output to blow up in size, which slows down ML, increases costs, and leads to data maintenance headaches. In my work, I show how it is possible to mitigate these issues by "avoiding joins physically," i.e., pushing ML down through joins. This reduces runtime without affecting accuracy. Going further, I apply statistical learning theory to show how it is often possible to also "avoid joins logically," i.e., ignore entire tables outright without losing much accuracy, but achieving significant runtime gains.
Arun Kumar is a Ph.D. candidate at the University of Wisconsin-Madison. His primary research interests are in data management and its intersection with machine learning. He is co-advised by Jeffrey Naughton and Jignesh M. Patel, and has also worked closely with Christopher Re and Xiaojin Zhu. Systems and ideas from his research have been shipped in products by EMC, Oracle, Cloudera, and IBM. A paper he co-authored was accorded the Best Paper Award at ACM SIGMOD 2014. He was awarded the Anthony C. Klug NCR Fellowship in Database Systems in 2015. He received his M.S. from UW-Madison in 2011 and his B.Tech. from IIT Madras in 2009.
Refreshments will be offered starting 15 minutes prior to the scheduled start of the talk.