Fast Algorithms for Burst Detection
Candidate: Xin Zhang
Advisor: Dennis Shasha

Abstract

Events occur in every aspect of our lives.

An unexpectedly large number of events occurring within some certain measurement (e.g. within some time duration or a spatial region) is called a {\em burst}, suggesting unusual behaviors or activities. Bursts come up in many natural and social processes. It is a challenging task to monitor the occurrence of bursts whose lasting duration is unknown in a fast data stream environment.

This work describes efficient data structures and algorithms for high performance burst detection under different settings. Our view is that bursts, as an unusual phenomenon, constitute a useful preliminary primitive in a knowledge discovery hierarchy. Our intent is to build a high performance primitive detection algorithm to support high-level data mining tasks.

The work starts with an algorithmic framework including a family of data structures and a heuristic optimization algorithm to choose an efficient data structure given the inputs. The advantage of this framework is that it's adaptive to different inputs. Experiments on both synthetic data and real world data show the new framework significantly outperforms existing techniques over a variety of inputs.

Furthermore, we present a greedy dynamic detection algorithm which handles the changing data. It evolves the structure to adapt to the incoming data. It achieves better performance in both synthetic and real data streams than a static algorithm in most cases.

We have applied this framework to different real world applications in physics, stock trading and website traffic monitoring. All the case studies show our framework has superb performance.

We extend this framework to multi-dimensional data and use it in an epidemiology simulation to detect infectious disease outbreak and spread.