STATStream: a Toolkit for High Speed Statistical Time Series Analysis


Data Streams (sequences of data arriving in time order) are important in many applications such telecommunications, network monitoring, environmental monitoring and financial analysis. It is difficult to process these data in set-oriented data management systems. Our system's goal is to compute in near constant time the statistics for multi-stream analysis problems. The core function we compute is Pearson correlation over sliding windows. We do this using Fourier transforms and random-vector based sketches primarily depending on how "cooperative" the data is.

An alternative method based on research by doctoral student Carl Bosley and undergraduate Jiexun Xu predicts the results of a single time series based on that time series and possibly others. We call that FPS (fast prediction via sparsity).

To use our software, you will specify the size of the sliding window how frequently the correlations should be reported, and the correlation threshold in addition to information about the data set. Our system will report back to you at every reporting timepoint the stream pairs whose correlation absolute value is greater than or equal to the threshold. For a pictorial explanation see this brief tutorial.

This web page describes the installation, use, and semantics of the software based on these algorithms, which you may use for research purposes. You can find the underlying theory behind our algorithms in the book High Performance Discovery in Time Series: techniques and case studies by Shasha and Zhu, Springer Verlag Publishers, Monographs in Computer Science, June 2004, ISBN 0387008578, 270 Pages.

There have been several other closely related publications:

Other relevant publications include

Less relevant papers that we like are:

STATStream is based upon work supported by the U.S. National Science Foundation under grants IIS-0414763, DBI-0445666, N2010 IOB-0519985, N2010 DBI-0519984, DBI-0421604, and MCB-0209754. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. This support is greatly appreciated.

Maintained by

Last Updated Nov. 29, 2005