Speaker: Azza Abouzied, Yale University
Location: Warren Weaver Hall 1302
Date: November 16, 2012, 11:30 a.m.
Host: Dennis Shasha
As Hal Varian, Chief Economist of Google, points out, while data is essentially ubiquitous and free, the human capacity to extract value from data is a scarce resource. For decades, database systems have been the system of choice for data storage and analysis. Despite their engineered high performance, databases can no longer keep up with today’s Big Data. Traditional database systems face at least three challenges.
First, databases favor performance over scalability. As we move from terabytes to exabytes, databases need to re-design their fault-tolerance models and sacrifice some of their performance for scalability. This is the central idea behind the HadoopDB system and its successful startup: Hadapt. Second, databases enforce rigid constraints on the structure and organization of data. Transforming loosely-structured data sets to fit into databases is a serious time-drain. Data scientists are forgoing the performance benefits of using database systems for the immediate gratification of processing data directly in the file system using MapReduce systems. The invisible loading scheme of HadoopDB manages the loading of data from the file system into the database systems without user-intervention thus providing users with the immediate gratification of processing data with MapReduce, while gradually achieving the performance benefits of storing data in efficient databases. Third, database interfaces favor query expressiveness over usability. To meet the increasing demand for data scientists, we need to simplify our data querying interfaces allowing a less technically skilled workforce to analyze data. DataPlay promotes new user-interaction techniques that simplify database query specification. In this talk, I describe these three works --- HadoopDB, Invisible Loading and DataPlay. Each work employs techniques from different research fields including Systems, HCI and Learning Theory. The diversity of strategies required to solve the different problems demonstrates the complexity of dealing with today’s Big Data challenges.
Azza Abouzied researches database architecture and query interface design at Yale University. She plans to defend her doctoral research in Spring 2013. In 2011, she was a Visiting Scholar at UC Berkeley. Azza believes in inter-disciplinary and collaborative research. At Yale, she works with Avi Silberschatz, Dana Angluin and Daniel Abadi. She also collaborates with Joseph Hellerstein and Christos Papadimitriou at UC Berkeley. In 2008, Azza was awarded the prestigious Canada Graduate Scholarship to start her doctoral studies. Azza is also one of the co-founders of the successful data analytics company: Hadapt.
Refreshments will be offered starting 15 minutes prior to the scheduled start of the talk.