A brief explanation of our model

We demonstrate how our model works by the following simple example. Note that this is a very simple demonstration of the basic framework of our model. The actual implementation in our software package does not strictly follow this demonstration.
Our model aims at finding x such that x minimizes |Ax-b|_2^2+mu*|x|_1, and then use this x for prediction. A and b are determined by the data and the size of the sliding window, and mu is either input by the user, trained by train_model.m if the solver is not 'path-finding'. If the solver is 'path-finding', then mu is ignored.
Suppose we have 20 stock price differenctial data for 5 stocks. We're interested in predicting the differential prices (which is close to predicting the returns) for stock 1 starting at time step 21. We'll use a sliding window size 10. That is, we'll use the value of the time series from time step i-10 to i to predict the price difference at time step i+1 for stock 1. We will always use mu=0.03. We will use the solver 'l1_ls' which solves . We don't remove the target series (i.e stock 1) when we do prediction because we assume that stock 1 also linearly depends on a shifted copy of itself.

Now we initialize our model: We first make a copy of stock 1 (the stock we're interested in prediction). Then We shift stock 1 backwards by one time step, discarding its first value. So at the last time step (time step 20), the value of the target series (from now on, we call it stock 0) at time step 20 is the value of stock 1 at time step 21, which is of course unknown. But we can use the solver to predict the value of stock 0 at time step 20. This predicted value will be the predicted value of stock 1 at time step 21.

Next, we feed the matrix A, the vector b, and the mu to the solver. We take the values of Stock 1 through Stock 5 from time step 20-10=10 to 20-1=19. This matrix will be our A. Then we take the values of Stock 0 from time step 20-10=10 to 20-1=19. This vector will be b. The mu value is fixed in our example (although in reality, mu should be frequently trained). We feed these parameters to the "Black Box" and get back an X. Notice that the number of time steps in A equals the sliding window size 10.

In our model, we assume that there exists a sparse linear correlation between Stock 0 and Stock 1 through Stock 5. Previous step gives us such a sparse linear correlation vector x, so in this step, we will use this x to predict the value of stock 0 at time step 20. It is simple: Treat the values of Stock 1 through Stock 5 at time step 20 as a vector. The dot product of this vector with x is the predicted value for Stock 0 at time step 20. This value is also the predicted value of Stock 1 at time step 21.

Finally, after we recieve the values of the stocks at time step 21, we update our model: We discard the values in Stock 0 through Stock 5 at time step 9+1=10. We append the new values to the end of Stock 1 through Stock 5. We also make a copy of the new value for Stock 1 and append it to the end of Stock 0. Next time when we want to predict the value of Stock 0 at time step 22, we take the values between time step 11 to time step 20 in Stock 1 through Stock 5 as A, the values between time step 11 and time step 20 in Stock 0 as b, and the mu input by the user, and repeat the above procedure for prediction.


Maintained by jjx203@cs.nyu.edu

Last Updated Dec 17th, 2009