Title: Forgetfulness of Balls and Bins
Speaker: Will Perkins, NYU
Abstract:
We throw m balls into n labeled bins at random, but we do it in two
different ways: under the Standard distribution we throw each ball
into a bin uniformly at random; under the Planted distribution we
plant the first n balls in a specific arrangement and then throw the
remaining (m-n) balls uniformly at random. For a given initial
planting we want to know how large m must be as a function of n for
the total variation distance between the two distributions to go to 0.
In other words, given a planting of the first n balls, how many balls
do we have to throw on top before we can't tell that we started with a
planting. This is somewhat analogous to a mixing time problem in the
theory of Markov chains.
The two extreme cases are starting with n balls planted in one bin and
starting with one ball in each bin. In each case there is a
particular statistic that distinguishes between the two distributions
- the number of balls that end up in the planted bin in the first
case, and the number of pairs of balls in each bin in the second. We
show how such a statistic gives a lower bound to TV distance and how
to show that it is the 'right' statistic. We then give the TV limit
for any initial planting and describe three statistics in three
different regimes that separate the two distributions.
The unlabeled case is more complicated but we can find the answer for
some specific plantings. The general case raises some interesting
questions in pure probability.