The Hypothesis Space in Gweon, Tenenbaum, and Schulz (2010)

Ernest Davis, Dept. of Computer Science, New York University
Gary Marcus, Dept. of Psychology, New York University

April 22, 2014; final paragraph added October 2014.

Gweon, Tenenbaum, and Schulz (2010) (henceforth GTS) carried out the following experiment: 15 month old infants were shown a box containing blue balls and yellow balls. In one condition of the experiment, 3/4 of the balls were blue; in the other condition, 3/4 were yellow. In both conditions, the experimenter took out three blue balls in sequence, and demonstrated that all three balls squeaked when squeezed (phase 1 of the experiment). The experimenter then took out an inert yellow ball, and handed it to the baby (phase 2). The experimental finding was that, in condition 1, 80% of the babies squeeze the yellow ball to try to make it squeak, whereas in condition 2, only 33% of the babies squeeze the ball.

The explanation of this finding given by GTS is as follows. The babies are considering two possible hypotheses about the relation of color to squeakiness: Hypothesis A is that all balls squeak; hypothesis B is that all and only blue balls squeak; (the obvious third alternative that only yellow balls squeak is ruled out by the observation, and therefore can be ignored). Thus if A is true, then the yellow ball will squeak; if B is true, it will not. The babies are also considering two possible hypotheses about the experimenter's selection rule for the first three balls. Hypothesis C is that the experimenter is picking at random from the set of all balls; hypothesis D is that she is picking at random from the set of all balls that squeak.

It is assumed that A and B are independent of C and D, and that A, B, C, and D all have prior probability 1/2.

The model thus posits that the babies are considering a hypothesis space where there are two dimensions and two alternatives in each dimension.

This seems to us entirely arbitrary. It seems to us that, from the point of view of the babies' observations, that it would be just as plausible to posit four dimensions;

A. The relation between color and squeakiness.
B. The selection criterion for phase 1 of the experiment
C. The selection criterion for phase 2 of the experiment.
D. The rule governing whether or not the experimenter squeezes the ball in phase 2. (The rule governing the experimenter's decision to squeeze in phase 1 turns out not to matter, since she always squeezes and it always squeaks.)

Additionally, in each category, there are additional hypotheses that are just as plausible as those that GTS are considering.

Thus, it seems to us that the following Bayesian model would be motivated (we exclude hypotheses that are inconsistent with the observations).

Dimension A: Relation of squeak to color.
Hypotheses:

1. All balls squeak.
2. All and only blue balls squeak.
3. Balls squeak randomly (by default with probability 1/2).

Dimension B: Selection criterion in phase 1.
Hypotheses:

1. The experimenter chooses toys at random.
2. The experimenter chooses randomly among squeaky toys.
3. The experimenter chooses randomly among blue toys.
4. The experimenter chooses randomly among squeaky blue toys.

Dimension C:: Selection criterion in phase 2.
Hypotheses:

1. The experimenter chooses a toy at random.
2. The experimenter chooses randomly among squeaky toys.
3. The experimenter chooses randomly among non-squeaky toys.
4. The experimenter chooses randomly among yellow toys.
5. The experimenter chooses randomly among squeaky yellow toys.
6. The experimenter chooses randomly among non-squeaky yellow toys.

Dimension D:

1. The experimenter always squeezes the toy in phase 1 and never squeezes in phase 2.
2. In both phases, the experimenter squeezes the toy only if it squeaks.

There could be additional options in D e.g.

3. If the toy in phase 2 squeaks, the experimenter squeezes it; otherwise she chooses randomly whether to squeeze it

But that is arguably more complicated and therefore reasonably excluded.

There would thus be 144 combinations; however, not all of these are possible or distinct. In particular, assuming that the experimenter knows the truth of dimension A, there are the following logical constraints:

A.1 is inconsistent with C.3, C.6, and D.2
If A.1 is true, B.1 and B.2 are identical; B.3 and B.4 are identical; C.1 and C.2 are identical; and C.4 and C.5 are identical
A.2 is inconsistent with C.2 and C.5.
If A.2 is true, B.2, B.3, and B.4 are identical; and C.3, C.4, and C.6 are identical.
C.2 and C.5 are inconsistent with D.2.
If C.3 or C.6 is true, then D.1 and D.2 are identical.

There remain 43 logically distinct possibilities:

[A1,B1,C1,D1], [A1,B1,C4,D1], [A1,B3,C1,D1], [A1,B3,C4,D1],
[A2,B1,C1,D1], [A2,B1,C1,D2], [A2,B1,C3,D1], [A2,B1,C3,D2], [A2,B2,C1,D1], [A2,B2,C1,D2], [A2,B2,C3,D1],
[A3,B*,C1,D1], [A3,B*,C1,D2], [A3,B*,C2,D1], [A3,B*,C3,D1], [A3,B*,C4,D1], [A3,B*,C4,D2], [A3,B*,C5,D1], [A3,B*,C6,D1]

It is by no means clear what is the best way to assign priors to these.

The truth is A.2, B.2=B.3=B.4, C.3=C.4=C.6, and D.1=D.2. In GTS's models, the babies consider A.1 vs A.2 and B.1 vs B.2; they assume C.4 and D.1. We do not see any principled reason, from the babies' point of view, why any of the other alternatives should a priori be considered less likely, let alone considered impossible.

One can also wonder about the independence of the assumptions between categories, aside from the logical constraints. In particular, if the distinction between the two phases of the experiment is not very clear to the babies --- and it is not obvious that it would be --- then they might assign a higher probability to the combinations where the same rule is being used in both phases (i.e. either B.1 and C.1 or B.2 and C.2), and they might assign a higher probability to D.2, which applies uniformly to both phases, than to D.1, which creates an arbitrary distinction between the two phases.

A Bayesian model is a non-empty set of hypotheses; for instance the model used in GTS is the set { [A1,B1,C1,D1], [A1,B1,C4,D1], [A2,B1,C1,D1], [A2,B1,C1,D2], [A2,B1,C3,D1], [A2,B1,C3,D2], [A2,B2,C1,D1], [A2,B2,C1,D2], [A2,B2,C3,D1], } In principle, therefore, there could be as many as 2⁴³-1 different Bayesian models (about 8.8 trillion) that a theorist might consider; however most of these are entirely unmotivated and quite implausible. A reasonable Bayesian model, let us say, is one where one consider one possible set of choices for A, one for B, and so on. To compute a lower bound on the size of this collection let us consider only hypothesis spaces in which A.3 and D.1 are included. If A.3 and D.1 are true then all the hypotheses in B and C are distinct and consistent. Therefore, any set of hypotheses of the form ``[some subset of the A'a containing A.3] and [some non-empty subset of the B's] and [some non-empty subset of the C's] and [some subset of the D's containing D.1]'' is a reasonable Bayesian model; and these are all distinct. There are therefore more than 4 * 15 * 63 * 2 = 7560 reasonable distinct Bayesian models. There are also additional models, either not including A.3 or not including D.1; but the number of these is probably quite small.

Reference

Gweon, H., Tenenbaum, J. B., & Schulz, L. E. (2010). Infants consider both the sample and the sampling process in inductive generalization. Proceedings of the National Academy of Sciences, USA, 107, 9066-9071.