Appendix: Informal Review of Statistical Concepts
-
Recall that the goal of probability theory is to determine the likelihood
of a given event given a probability distribution (e.g. how likely
is it to get 5,300 heads in 10,000 flips of a fair coin?).
The goal of statistics is to determine a probability distribution
given a series of observations or at least
to disprove a null hypothesis (e.g. is a fair coin a reasonable
model if I get 8,000 heads in 10,000 flips?).
-
In parametric statistics, one knows the form of the target probability
distribution but not the value of certain parameters, e.g.
coin flips are binomial but the probability of a head may be unknown.
In non-parametric statistics, one does not know the form
of the target probability distribution.
In finance, most models are parametric (autoregression, option pricing).
When models aren't, people use queries and eyeballs to figure
out what to do.
-
Stationary process : one whose statistics (mean and variance)
do not vary with time.
Stationarity is a fundamental assumption of pairs trading
and options pricing.
-
Correlation: a measure of the association between two series,
e.g. the option open interest and the price of a security
5 days later.
If cov(x,y) represents the covariance between x and y
and sigma(x) is the standard deviation of x, then
correlation(x,y) = cov(x,y)/(sigma(x)*sigma(y))
so is entirely symmetric and lies always between -1 and 1.
-
Partial correlation :
suppose you are looking at the one day returns of Merck and Pfizer
(two drug companies). You can look at them as raw data or you
can subtract out the market influence via a least squares estimate
and use the correlation of the residuals.
-
Volatility : a measure of the standard deviation
of the value of a variable over a specific time,
e.g. the annualized standard deviation of the returns.
The return at time t is ln(p(t)/p(t-1)).
This is a critical parameter in options pricing, because it determines
the probability that a price will exceed a certain price range.
-
Alpha, Beta, and Regression: suppose we estimate the relationship
between the percentage change in price of some stock S vs.
the percentage change in some market index M using a best fit
(least squares)
linear relationship:
s = a + bm
Then the parameter alpha (a) is the change in S independent of M
and beta (b) is the slope of the best fit line.
A riskless investment has a positive alpha and a zero beta,
but most investments have a zero alpha and a positive beta.
If beta is greater than 1, then for a given change in the market,
you can expect a greater change in S.
If beta is negative, then S moves in the opposite direction
from the market.
Note that beta is different from correlation (and can
be arbitrarily large or small) because it is not symmetric:
beta = cov(S,M)/(sigma(M)*sigma(M))
-
ANOVA: analysis of variance in cases when there is no missing data.
This is used to model situations in which several factors can play
a role and one wants to tease out a probabilistic model that describes
their interaction.
For example, product, location and customer income may be factors
that influence buying behavior.
ANOVA helps to figure out how to weight each one.
More significant variants of this include
principal components analysis
and factor analysis .
In finance, one might use one of these to figure out what determines
the price movement of a stock (perhaps half general market movement,
one third interest rates, etc.).
In psychology, one can ask a person 100 questions and then
categorize the person according to a weighted sum of a few questions.
-
Autoregression: a statistical model which predicts future
values from one or more previous ones.
This generalizes trend forecasting as used to predict sales.
Financial traders use this sparingly since models that look
at the recent past often just follow a short term trend.
As one trader put it:
``they
follow a trend and are always a day late and many dollars short.''
In general, regression of y on x is a determination
of how y depends on x.
-
Maximum likelihood method: suppose you are given a training set
consisting of observations and the categories to which the observations
belong.
The maximum likelihood method selects the probability distribution
that best explains the training set.
For example, if you toss a coin 10,000 times and observe
that heads comes up 8,000, you assign a probability to the heads
that maximizes the probability of this event.
This will make the probability of heads be greater than 1/2.
In finance, the maximum likelihood method is often used for forecasting
based on previously seen patterns.
-
Regularization A technique for smoothing a function
to make it have nice mathematical properties such as differentiability.
Moving averages are an example of regularization.
-
Bootstrapping (i) Divide the training set (set
of (observation, category) pairs) into
pieces. (ii) Infer the model from some pieces.
(iii) Test it
on the other pieces.
Thank you for your attention.