The Hypothesis Space in Gweon, Tenenbaum, and Schulz (2010)

Ernest Davis, Dept. of Computer Science, New York University
Gary Marcus, Dept. of Psychology, New York University

Most of this was written April 22, 2014; some additional thoughts were added later, as indicated below.

Gweon, Tenenbaum, and Schulz (2010) (henceforth GTS) carried out the following experiment: 15 month old infants were shown a box containing blue balls and yellow balls. In one condition of the experiment, 3/4 of the balls were blue; in the other condition, 3/4 were yellow. In both conditions, the experimenter took out three blue balls in sequence, and demonstrated that all three balls squeaked when squeezed (phase 1 of the experiment). The experimenter then took out an inert yellow ball, and handed it to the baby (phase 2). The experimental finding was that, in condition 1, 80% of the babies squeeze the yellow ball to try to make it squeak, whereas in condition 2, only 33% of the babies squeeze the ball.

The explanation of this finding given by GTS is as follows. The babies are considering two possible hypotheses about the relation of color to squeakiness: Hypothesis A is that all balls squeak; hypothesis B is that all and only blue balls squeak; (the obvious third alternative that only yellow balls squeak is ruled out by the observation, and therefore can be ignored). Thus if A is true, then the yellow ball will squeak; if B is true, it will not. The babies are also considering two possible hypotheses about the experimenter's selection rule for the first three balls. Hypothesis C is that the experimenter is picking at random from the set of all balls; hypothesis D is that she is picking at random from the set of all balls that squeak.

It is assumed that A and B are independent of C and D, and that A, B, C, and D all have prior probability 1/2.

The model thus posits that the babies are considering a hypothesis space where there are two dimensions and two alternatives in each dimension.

This seems to us entirely arbitrary. It seems to us that, from the point of view of the babies' observations, that it would be just as plausible to posit four dimensions;

Additionally, in each category, there are additional hypotheses that are just as plausible as those that GTS are considering.

Thus, it seems to us that the following Bayesian model would be motivated (we exclude hypotheses that are inconsistent with the observations).

Dimension A: Relation of squeak to color.

Dimension B: Selection criterion in phase 1.

Dimension C:: Selection criterion in phase 2.

Dimension D:

There could be additional options in D e.g.

But that is arguably more complicated and therefore reasonably excluded.

There would thus be 144 combinations; however, not all of these are possible or distinct. In particular, assuming that the experimenter knows the truth of dimension A, there are the following logical constraints:

A.1 is inconsistent with C.3, C.6, and D.2

If A.1 is true, B.1 and B.2 are identical; B.3 and B.4 are identical; C.1 and C.2 are identical; and C.4 and C.5 are identical

A.2 is inconsistent with C.2 and C.5.

If A.2 is true, B.2, B.3, and B.4 are identical; and C.3, C.4, and C.6 are identical.

C.2 and C.5 are inconsistent with D.2.

If C.3 or C.6 is true, then D.1 and D.2 are identical.

There remain 43 logically distinct possibilities:

[A1,B1,C1,D1], [A1,B1,C4,D1], [A1,B3,C1,D1], [A1,B3,C4,D1],
[A2,B1,C1,D1], [A2,B1,C1,D2], [A2,B1,C3,D1], [A2,B1,C3,D2], [A2,B2,C1,D1], [A2,B2,C1,D2], [A2,B2,C3,D1],
[A3,B*,C1,D1], [A3,B*,C1,D2], [A3,B*,C2,D1], [A3,B*,C3,D1], [A3,B*,C4,D1], [A3,B*,C4,D2], [A3,B*,C5,D1], [A3,B*,C6,D1]

It is by no means clear what is the best way to assign priors to these.

The truth is A.2, B.2=B.3=B.4, C.3=C.4=C.6, and D.1=D.2. In GTS's models, the babies consider A.1 vs A.2 and B.1 vs B.2; they assume C.4 and D.1. We do not see any principled reason, from the babies' point of view, why any of the other alternatives should a priori be considered less likely, let alone considered impossible.

One can also wonder about the independence of the assumptions between categories, aside from the logical constraints. In particular, if the distinction between the two phases of the experiment is not very clear to the babies --- and it is not obvious that it would be --- then they might assign a higher probability to the combinations where the same rule is being used in both phases (i.e. either B.1 and C.1 or B.2 and C.2), and they might assign a higher probability to D.2, which applies uniformly to both phases, than to D.1, which creates an arbitrary distinction between the two phases.

This paragraph added October 2014

A Bayesian model is a non-empty set of hypotheses; for instance the model used in GTS is the set { [A1,B1,C1,D1], [A1,B1,C4,D1], [A2,B1,C1,D1], [A2,B1,C1,D2], [A2,B1,C3,D1], [A2,B1,C3,D2], [A2,B2,C1,D1], [A2,B2,C1,D2], [A2,B2,C3,D1], } In principle, therefore, there could be as many as 243-1 different Bayesian models (about 8.8 trillion) that a theorist might consider; however most of these are entirely unmotivated and quite implausible. A reasonable Bayesian model, let us say, is one where one consider one possible set of choices for A, one for B, and so on. To compute a lower bound on the size of this collection let us consider only hypothesis spaces in which A.3 and D.1 are included. If A.3 and D.1 are true then all the hypotheses in B and C are distinct and consistent. Therefore, any set of hypotheses of the form ``[some subset of the A'a containing A.3] and [some non-empty subset of the B's] and [some non-empty subset of the C's] and [some subset of the D's containing D.1]'' is a reasonable Bayesian model; and these are all distinct. There are therefore more than 4 * 15 * 63 * 2 = 7560 reasonable distinct Bayesian models. There are also additional models, either not including A.3 or not including D.1; but the number of these is probably quite small.

More models

This section added November 2015 .

There are also other considerations that the baby might bring to bear. For instance, suppose we modify A.3 to a model in which some fraction of blue balls squeak and some other fraction of yellow balls squeak; these fractions are presumably known to adults, though not to the baby. A baby might conjecture that, since the people who put together boxes of toys for babies are clearly benevolent people, they should prefer to put in more toys of the squeakier color. Alternatively the baby might conjecture that balls of the less common color are clearly ``special'', and therefore should be squeakier.

This can be incorporated into the above model as follows:

  1. Change A.3 to "There is a fraction Y of yellow balls that are squeaky, and a fraction B of blue balls that are squeaky."
  2. Add another dimension E, ``Principle of constructing the box'' with three options;

To complete a probabilistic model, one would have to specify in A.3 probability distributions for Y and B and in E.2 and E.3 a specific dependence of the fraction of blue balls on Y and B. For instance, one could specify that Y and B are independent and uniformly distributed between 0 and 1; that in E.2 the fraction of blue balls in the box is B/(B+Y); and that in E.3 the fraction of blue balls in the box is Y/(B+Y).

To estimate the number of Bayesian models in the new theory, note that if A.3 and D.1 are true, then there are no logical dependencies between any of the other hypotheses. Therefore, any set of hypotheses of the form ``[some subset of the A'a containing A.3] and [some non-empty subset of the B's] and [some non-empty subset of the C's] and [some subset of the D's containing D.1] and [some non-empty subset of the E's]' is a reasonable Bayesian model; and these are all distinct. There are therefore more than 4 * 15 * 63 * 2 * 7 = 59,920 reasonable distinct Bayesian models.


Gweon, H., Tenenbaum, J. B., & Schulz, L. E. (2010). Infants consider both the sample and the sampling process in inductive generalization. Proceedings of the National Academy of Sciences, USA, 107, 9066-9071.