Colloquium Details

Mosaics of Big Data: Database Systems and Information Management -- Trends and a Vision

Speaker: Volker Markl

Location: 60 Fifth Avenue Room 150

Date: October 13, 2023, 11 a.m.



The global database research community has greatly impacted the functionality and performance of
data storage and processing systems along the dimensions that define “big data”, i.e., volume, velocity, variety,
and veracity. Locally, over the past five years, we have also been working on varying fronts. Among our
contributions are: (1) establishing a vision for a database-inspired big data analytics system, which unifies the
best of database and distributed systems technologies, and augments it with concepts drawn from compilers
(e.g., iterations) and data stream processing, as well as (2) forming a community of researchers and institutions
to create the Stratosphere platform to realize our vision. One major result from these activities was Apache
Flink, an open-source big data analytics platform and its thriving global community of developers and
production users. Although much progress has been made, when looking at the overall big data stack, a major
challenge for database research community still remains. That is, how to maintain the ease-of-use despite the
increasing heterogeneity and complexity of data analytics, involving specialized engines for various aspects of
an end-to-end data analytics pipeline, including, among others, graph-based, linear algebra-based, and
relational-based algorithms, and the underlying, increasingly heterogeneous hardware and computing
infrastructure. At TU Berlin, DFKI, and the Berlin Institute for Foundations of Learning and Data (BIFOLD) we
currently aim to advance research in this field via the NebulaStream and Agora projects. Our goal is to remedy
some of the heterogeneity challenges that hamper developer productivity and limit the use of data science
technologies to just the privileged few, who are coveted experts. In this talk, we will outline how state-of-the-
art SPEs have to change to exploit the new capabilities of the IoT and showcase how we tackle IoT challenges in
our own system, NebulaStream. We will also present our vision for Agora, an asset ecosystem that provides the
technical infrastructure for offering and using data and algorithms, as well as physical infrastructure

Speaker Bio:

Volker Markl is a German Professor of Computer Science. He leads the Chair of Database Systems and
Information Management at TU Berlin and the Intelligent Analytics for Massive Data Research Department at
DFKI. In addition, he is Director of the Berlin Institute for the Foundations of Learning and Data (BIFOLD). He is
a database systems researcher, conducting research at the intersection of distributed systems, scalable data
processing, and machine learning. Volker led the Stratosphere project, which resulted in the creation of Apache
Flink. Volker has received numerous honors and prestigious awards, including best paper awards at ACM
SIGMOD, VLDB, and ICDE as well as the ACM SIGMOD Systems Award. In 2014, he was elected one of
Germany‘s leading “Digital Minds“ (Digitale Köpfe) by the German Informatics Society and is a member of the
Berlin-Brandenburg Academy of Sciences. He was elected an ACM Fellow for his contributions to query
optimization, scalable data processing, and data programmability. He served President of the VLDB
Endowment, and serves as advisor to academic institutions, governmental organizations, and technology
companies. Volker holds eighteen patents and has been co-founder and mentor to several startups.


In-person attendance only available to those with active NYU ID cards.

How to Subscribe