Basically the algorithms are composed of two components. One is parameter
space optimization; the other is streaming correlation algorithm with parameters
computed in the first step. All of them need the library base.k and sketch.k

1. Parameter space optimization
The principle is that the parameter space is searched in a large scale first. 
Then the local finer search is performed around those good coarse parameter 
groups computed in the first step. This is justified by the continuity of recall
and precision in the parameter space.

Four parameters are to be computed. 
c: distance factor
f: parameter group fraction
N: Sketch size
g: Group size

The files are under the directory of \paraTest
(1)testgrid.k and grid_para.k
Objective:Compute the parameters in a coarse scale
Input: Test data
Output:postSets.l and preSets.l (See the comments in the file for explanation)
(2)benchmark.k
Objective:Compute the correlation by the naive pairwise comparison
Input: Test data
Output:benchmarkDist.l
(3)rec_prec.k
Objective:Compute the recall and precision
Input: postSet.l, preSet.l and benchmarkDist.l
Output:recall_precision.l
(4)local/R2/testGridlocal.k and local/R2/grid_para.k
Objective:Pick out those good coarse parameters from recall_precision around which
          the finer parameter groups are used in the experiemtns. It returns both 
          false and true positive(preLocalSets) and the true positives(postLocalSets). 
          They will be used to compute recall and precision by rec_prec.k
Input:test data, recall_precision.l
Output:preLocalSets.l and postLocalSets.l and goodpara.l
Note: Due to the complexity of combining different group size ( R=2,3,4) tests in one 
      file, I split them in R2 R3 R4. The only diffference is in parameter setting R=2  
      in grid_para.k. So only R2 is explained here.
(5)local/R2/rec_prec.k
Objective:Compute the finer parameter groups
Input:preLocalSets, postLocalSets, goodpara, benchmarkDist
Output:recall_precision_local
(6)stable/R2/putalltogether.k
Objective:Based on the Bootstrapping to test the stablity of the parameters
Input: benchmarkDist.l, recall_precision_local.l and test data
Output:recall_precision.l
(7)stable/R2/grid_para.k
Similar to local/R2/grid_para.k
(8)stable/std.k
Compute the mean and std of recall and precision of each parameter group, pick out good
for use (see proposal or paper for details).
input:recall_precision.l from the subdirectory
ouput:mean and std of the recall and prevision for each parameter group appearing in 
      recall_precision

2. Streaming correlation algorithm
The files include: streamingbw.k and grid.k
Please see the comments in the files for details on the usage.
streamingbw.k takes in the data stream and compute their sketches which are
thrown into the grid structures(grid.k or gridhash.k) to compute the 
correlation. 
Note: gridHash.k is the hash version of grid.k. GridHash.k employs a hash structure 
to save space which, however may results in conflicts.

3.Library
base.k and sketch.k contains the utilities such as dft. mean, etc. Users
are encouraged to read the comments in the files.

4. Final comments.
Before using the algorithms, please pay attention to the library link and data generation
(1) User needs to generate the symbolic link to base.k and sketch.k in each directory.
(2) The data format is a 2-dimentional matrix. Each row contains the time series. You may 
    generate it in K or load the practical data. See the comments for details

5. An example
Here an example is given to demonstrate how the system works.

(1) Come to \paraTest\testgrid.
(2) Our goal is to test and choose a couple of good parameter groups. If users know which group of 
paremters will be tested on, put it in the same directory. Otherwise system will automatically
sample the parameter space uniformally. We assume system generated parameter groups are used.
(3)Users can choose to load the practical data or simulate by generating the data with 
ts:genRandomWalkMatrix[n_stream;size]. This can be done automatically by removing the comment before 
the function in testgrid.k. If user choose the simulation, pleaes save the data generated for 
later use in benchmark.k.
(2) Run testgrid and output preSets.l and postSets.l
(3) Run benchmark.k on the random data generated in step 3 and output benchmarkDist.l
(4) Run rec_prec.l and output recall_precision.l
(5) Copy recall_precision.l in \local\R2
(6) The goal of this directory is to screen those bad parameter groups and perform local finer 
search surrounding the good paramter groups. 
(7) testgridlocal.k is quite similar to testgrid.k in step (2). If users choose to load the 
practical data in step (3), put the same data set in this directory. Otherwise generate random
data or reuse the data generated as above
(8) To rule out those bad paramters, users can specify the threshold for recall and precision
in testgridlocal.k. The default is recall=0.99 and precision=0.01. 
(9) Run testgridlocal.k and output preLocalSets.l and postLocalSets.l and goodpara.l
(10) Run rec_prec.k and output recall_precision_local.l 
(11) Copy recall_precision_local.l to \paraTest\stable\R2. Enter this directory.
(12) The goal of this directory is to test the stability of the finer parameter groups computed
in the previous step.
(13) Again we need to take off the apparantly bad parameters based on their recall and 
precision. This is performed in putalltogether.k. Users are again required to specify the recall 
and precision threshold. The default values are recall=0.99 and precision=0.04
(14) Data sources come the same way as in Step 3 and 7
(15) Run putalltogether.k and output recall_precision.l which contains a bunch of recall and precision 
pairs for each parameter group.
(16) Copy recall_precision.l to the upper directory
(17) Run std.k and output the mean and std of each parameter group;
(18) Pick out the parameter group with low std as well as high average recall and precision.
(19) We are now in a position to run the main system by putting the selected parameters f, c and 
R(2,3 or 4) in \sketch\grid.k. Before running the program, set the correlation threshold in 
\sketch\streamingbw.k
(20) Run \sketch\streamingbw.k. The highly correlated pairs are output to the screen. We are done.




