**STATStream:
a Toolkit for High Speed Statistical Time Series Analysis**

**
**

Data Streams
(sequences of data arriving in time order) are important in many applications
such telecommunications, network monitoring, environmental monitoring and
financial analysis. It is difficult to process these data in set-oriented
data management systems. Our system's goal is to compute in near constant
time the statistics for multi-stream analysis problems. The core function
we compute is Pearson correlation over sliding windows. We
do this using Fourier transforms and random-vector based
sketches primarily depending
on how "cooperative" the data is.

An alternative method based on research by doctoral student Carl Bosley and undergraduate Jiexun Xu predicts the results of a single time series based on that time series and possibly others. We call that FPS (fast prediction via sparsity).

To use our software, you will specify the size of the sliding window how frequently the correlations should be reported, and the correlation threshold in addition to information about the data set. Our system will report back to you at every reporting timepoint the stream pairs whose correlation absolute value is greater than or equal to the threshold. For a pictorial explanation see this brief tutorial.

This web page
describes the installation, use, and semantics of the software based on these
algorithms, which you may use for research purposes.
You can find the underlying theory behind our algorithms in the
book *
High Performance Discovery in Time Series: techniques and case studies
*
by Shasha and Zhu,
Springer Verlag Publishers, Monographs in Computer Science, June 2004,
ISBN 0387008578, 270 Pages.

There have been several other closely related publications:

- Xiaojian Zhao's 2006 dissertation High Performance Algorithms for Multiple Streaming Time Series
- Zhihua Wang's 2006 dissertation Time Series Matching: a Multi-filter Approach
- Tyler Neylon's 2006 dissertation Sparse Solutions for Linear Prediction Problems
- Xin Zhang'2 2006 dissertation Fast Algorithms for Burst Detection

Other relevant publications include

- ``Better Burst Detection'' IEEE International Conference on Data Engineering, April 2006 p. 146ff Xin Zhang and Dennis Shasha
- ``Fast Window Correlations Over Uncooperative Time Series'' Richard Cole, Dennis Shasha, and Xiaojian Zhao, ACM Knowledge and Data Discovery 2005, pp. 743-749.
- ``Incremental Methods for Simple Problems in Time Series: Algorithms and Experiments'' Xiaojian Zhao, Xin Zhang, Tyler Neylon, Dennis Shasha: International Database Engineering and Applications Symposium 2005: pp. 3-14
- ``Searching dynamic point sets in spaces with bounded doubling dimension.'' STOC 2006, 574-583. Richard Cole, Lee-Ad Gottlieb. You can find the paper here
- Improved algorithms for fully dynamic geometric spanners and geometric routing. Lee-Ad Gottlieb, Liam Roditty SODA 2008. You can find the paper here

Less relevant papers that we like are:

- ``An optimal dynamic spanner for doubling metric spaces" Lee-Ad Gottlieb, Liam Roditty: To appear in ESA 2008. You can find the paper here
- ``Matrix sparsification and the sparse null space problem." Lee-Ad Gottlieb, Tyler Neylon Submitted. You can find the paper here

Maintained by shasha@cs.nyu.edu

Last Updated Nov. 29, 2005