STATStream: a Toolkit for High Speed Statistical Time Series Analysis

Motivation:

Data Streams (sequences of data arriving in time order) are important in many applications such telecommunications, network monitoring, environmental monitoring and financial analysis. It is difficult to process these data in set-oriented data management systems. Our system's goal is to compute in near constant time the statistics for multi-stream analysis problems. The core function we compute is Pearson correlation over sliding windows.

To use our software, you will specify the size of the sliding window how frequently the correlations should be reported, and the correlation threshold in addition to information about the data set. Our system will report back to you at every reporting timepoint the stream pairs whose correlation absolute value is greater than or equal to the threshold. For a pictorial explanation see this brief tutorial.

This web page describes the installation, use, and semantics of the software based on these algorithms, which you may use for research purposes.

STATStream is sponsored by the US National Science Foundation under under grants NSF IIS-9988345, N2010-0115586, MCB-0209754 and CCR-0105678. This support is greatly appreciated.

Maintained by shasha@cs.nyu.edu

Last Updated Nov. 29, 2005