StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time

NYU-CS TR2002-827,June 2002

Yunyue Zhu,Dennis Shasha

Consider the problem of monitoring tens of thousands of time series data
streams in an online fashion and making decisions based on them. In addition
to single stream statistics such as average and standard deviation, we also
want to find high correlations among all pairs of streams. A stock market
trader might use such a tool to spot arbitrage opportunities. This paper
proposes efficient methods for solving this problem based on Discrete
Fourier Transforms and a three level time interval hierarchy. Extensive
experiments on synthetic data and real world financial trading data show
that our algorithm beats the direct computation approach by several orders
of magnitude. It also improves on previous Fourier Transform approaches by
allowing the efficient computation of time-delayed correlation over any size
sliding window and any time delay. Correlation also lends itself to an
efficient grid-based data structure.The result is the first algorithm that 
we know of to compute correlations over thousands of data streams in real time. 
The algorithm is incremental,has fixed response time, and can monitor the 
pairwise correlations of 10,000 streams on a single PC. The algorithm is 
embarrassingly parallelizable.