The Hypothesis Space in Gweon, Tenenbaum, and Schulz (2010)

Ernest Davis, Dept. of Computer Science, New York University
Gary Marcus, Dept. of Psychology, New York University

April 22, 2014

Gweon, Tenenbaum, and Schulz (2010) (henceforth GTS) carried out the following experiment: 15 month old infants were shown a box containing blue balls and yellow balls. In one condition of the experiment, 3/4 of the balls were blue; in the other condition, 3/4 were yellow. In both conditions, the experimenter took out three blue balls in sequence, and demonstrated that all three balls squeaked when squeezed (phase 1 of the experiment). The experimenter then took out an inert yellow ball, and handed it to the baby (phase 2). The experimental finding was that, in condition 1, 80% of the babies squeeze the yellow ball to try to make it squeak, whereas in condition 2, only 33% of the babies squeeze the ball.

The explanation of this finding given by GTS is as follows. The babies are considering two possible hypotheses about the relation of color to squeakiness: Hypothesis A is that all balls squeak; hypothesis B is that all and only blue balls squeak; (the obvious third alternative that only yellow balls squeak is ruled out by the observation, and therefore can be ignored). Thus if A is true, then the yellow ball will squeak; if B is true, it will not. The babies are also considering two possible hypotheses about the experimenter's selection rule for the first three balls. Hypothesis C is that the experimenter is picking at random from the set of all balls; hypothesis D is that she is picking at random from the set of all balls that squeak.

It is assumed that A and B are independent of C and D, and that A, B, C, and D all have prior probability 1/2.

The model thus posits that the babies are considering a hypothesis space where there are two dimensions and two alternatives in each dimension.

This seems to us entirely arbitrary. It seems to us that, from the point of view of the babies' observations, that it would be just as plausible to posit four dimensions;

Additionally, in each category, there are additional hypotheses that are just as plausible as those that GTS are considering.

Thus, it seems to us that the following Bayesian model would be motivated (we exclude hypotheses that are inconsistent with the observations).

Dimension A: Relation of squeak to color.
Hypotheses:

Dimension B: Selection criterion in phase 1.
Hypotheses:

Dimension C:: Selection criterion in phase 2.
Hypotheses:

Dimension D:

There could be additional options in D e.g.

But that is arguably more complicated and therefore reasonably excluded.

There would thus be 144 combinations; however, not all of these are possible or distinct. In particular, assuming that the experimenter knows the truth of dimension A, there are the following logical constraints:

A.1 is inconsistent with C.3, C.6, and D.2

If A.1 is true, B.1 and B.2 are identical; B.3 and B.4 are identical; C.1 and C.2 are identical; and C.4 and C.5 are identical

A.2 is inconsistent with C.2 and C.5.

If A.2 is true, B.2, B.3, and B.4 are identical; and C.3, C.4, and C.6 are identical.

C.2 and C.5 are inconsistent with D.2.

If C.3 or C.6 is true, then D.1 and D.2 are identical.

There remain 43 logically distinct possibilities:

[A1,B1,C1,D1], [A1,B1,C4,D1], [A1,B3,C1,D1], [A1,B3,C4,D1],
[A2,B1,C1,D1], [A2,B1,C1,D2], [A2,B1,C3,D1], [A2,B1,C3,D2], [A2,B2,C1,D1], [A2,B2,C1,D2], [A2,B2,C3,D1],
[A3,B*,C1,D1], [A3,B*,C1,D2], [A3,B*,C2,D1], [A3,B*,C3,D1], [A3,B*,C4,D1], [A3,B*,C4,D2], [A3,B*,C5,D1], [A3,B*,C6,D1]

It is by no means clear what is the best way to assign priors to these.

The truth is A.2, B.2=B.3=B.4, C.3=C.4=C.6, and D.1=D.2. In GTS's models, the babies consider A.1 vs A.2 and B.1 vs B.2; they assume C.4 and D.1. We do not see any principled reason, from the babies' point of view, why any of the other alternatives should a priori be considered less likely, let alone considered impossible.

One can also wonder about the independence of the assumptions between categories, aside from the logical constraints. In particular, if the distinction between the two phases of the experiment is not very clear to the babies --- and it is not obvious that it would be --- then they might assign a higher probability to the combinations where the same rule is being used in both phases (i.e. either B.1 and C.1 or B.2 and C.2), and they might assign a higher probability to D.2, which applies uniformly to both phases, than to D.1, which creates an arbitrary distinction between the two phases.

A Bayesian model is a non-empty set of hypotheses; for instance the model used in GTS is the set { [A1,B1,C1,D1], [A1,B1,C4,D1], [A2,B1,C1,D1], [A2,B1,C1,D2], [A2,B1,C3,D1], [A2,B1,C3,D2], [A2,B2,C1,D1], [A2,B2,C1,D2], [A2,B2,C3,D1], } In principle, therefore, there could be as many as 243-1 different Bayesian models (about 8.8 trillion) that a theorist might consider; however most of these are entirely unmotivated and quite implausible. To compute a lower bound we can consider only dimensions B and C, which do not interact, and consider models formed by the cross-product of non-empty subsets in those dimensions; this yields 15 * 63 = 945 different Bayesian models. Taking into account the choices in dimensions A and D as well would give a figure that is almost certainly at least 3 or 4 times as large.

Reference

Gweon, H., Tenenbaum, J. B., & Schulz, I. E. (2010). Infants consider both the sample and the sampling process in inductive generalization. Proceedings of the National Academy of Sciences, USA, 107, 9066-9071.