FOM: A problem in the foundations of statistics
JoeShipman@aol.com
JoeShipman at aol.com
Wed Apr 24 16:48:58 EDT 2002
If, in m=a+b trials, a successes and b failures have been observed, what is the chance that the next n=c+d trials will contain c successes and d failures?
The following assumption is reasonable:
There is an underlying binomial process with parameter p and the trials are independent instances of that process.
The simplest procedure is to assume that p=a/m and calculate the quantity
(n choose c)(p^c)((1-p)^d) =
n!(a^c)(b^d)
------------
c!d!(m^n)
but this assumes too much. In particular, if the first m trials were all successes then this method expresses *certainty* that p=1 and that the next m trials will also be successes. Professional statisticians prefer to be more conservative in their predictions.
This is truly a foundational problem. There is no obvious "right answer". One common solution is to assume a "prior" uniform distribution for p -- that is, p is equally likely to be anywhere between 0 and 1. Then the observation of a successes and b failures leads to a "posterior" distribution for p according to Bayes's theorem, from which may be obtained a probability to be associated with particular values of c and d.
But is this a reasonable way to do it?
Other solution have also been proposed. Even the case a=m, b=0, c=n=1, d=0 is interesting. Having seen m successes in a row, what odds would you be willing to give that the next trial will be a success, assuming no cheating?
The Bayesian calculation gives an interesting answer, but it's not the only reasonable one.
-- Joe Shipman
More information about the FOM
mailing list