Computer Science Colloquium

Big Data Analytics with All-or-Nothing Parallel Jobs

Ganesh Ananthanarayanan, University of California, Berkeley

March 01, 2013 11:30AM
Warren Weaver Hall, 1302
251 Mercer Street
New York, NY 10012

Spring 2013 Colloquia Calendar


Denis Zorin


Extensive data analysis has become the enabler for diagnostics and
decision making in many modern systems. These analyses have both competitive
as well as social benefits. To cope with the deluge in data that is growing
faster than Moore's law, computation frameworks have resorted to massive
parallelization of analytics jobs into many fine-grained tasks. These
frameworks promised to provide efficient and fault-tolerant execution of
these tasks. However, meeting this promise in clusters spanning hundreds of
thousands of machines is challenging and a key departure from earlier work
on parallel computing.
A simple but key aspect of parallel jobs is the all-or-nothing property:
unless all tasks of a job are provided equal improvement, there is no
speedup in the completion of the job. This talk will demonstrate how the
all-or-nothing property impacts replacement algorithms in distributed caches
for parallel jobs. Our coordinated caching system, PACMan, makes global
caching decisions and employs a provably optimal cache replacement
algorithm. A highlight of our evaluation using workloads from Facebook and
Bing datacenters is that PACMan's replacement algorithm outperforms even
Belady's MIN (that uses an oracle) in speeding up jobs. Along the way, I
will also describe how we broke the myth of disk-locality's importance in
datacenter computing and solutions to mitigate straggler tasks.


Ganesh Ananthanarayanan is a PhD candidate in the University of California
at Berkeley, working with Prof. Ion Stoica in the AMP Lab. His research
interests are in systems and networking, with a focus on cloud computing and
large scale data analytics systems. Prior to joining Berkeley, he worked for
two years at Microsoft Research's Bangalore office. More details about
Ganesh's work can be found here:

top | contact webmaster