FOM: A problem in the foundations of statistics

Wed Apr 24 16:48:58 EDT 2002

If, in m=a+b trials,  a successes and b failures have been observed, what is the chance that the next n=c+d trials will contain c successes and d failures?

The following assumption is reasonable:
There is an underlying binomial process with parameter p and the trials are independent instances of that process.

The simplest procedure is to assume that p=a/m and calculate the quantity
(n choose c)(p^c)((1-p)^d) =

n!(a^c)(b^d)
------------
c!d!(m^n)

but this assumes too much.  In particular, if the first m trials were all successes then this method expresses *certainty* that p=1 and that the next m trials will also be successes.  Professional statisticians prefer to be more conservative in their predictions.

This is truly a foundational problem.  There is no obvious "right answer".  One common solution is to assume a "prior" uniform distribution for p -- that is, p is equally likely to be anywhere between 0 and 1.  Then the observation of a successes and b failures leads to a "posterior" distribution for p according to Bayes's theorem, from which may be obtained a probability to be associated with particular values of c and d.  

But is this a reasonable way to do it?

Other solution have also been proposed.  Even the case a=m, b=0, c=n=1, d=0 is interesting.  Having seen m successes in a row, what odds would you be willing to give that the next trial will be a success, assuming no cheating?

The Bayesian calculation gives an interesting answer, but it's not the only reasonable one.

-- Joe Shipman