The Hypothesis Space in Gweon, Tenenbaum, and Schulz (2010)

Ernest Davis, Dept. of Computer Science, New York University
Gary Marcus, Dept. of Psychology, New York University

Most of this was written April 22, 2014; some additional thoughts were added later, as indicated below.

Gweon, Tenenbaum, and Schulz (2010) (henceforth GTS) carried out the following experiment: 15 month old infants were shown a box containing blue balls and yellow balls. In one condition of the experiment, 3/4 of the balls were blue; in the other condition, 3/4 were yellow. In both conditions, the experimenter took out three blue balls in sequence, and demonstrated that all three balls squeaked when squeezed (phase 1 of the experiment). The experimenter then took out an inert yellow ball, and handed it to the baby (phase 2). The experimental finding was that, in condition 1, 80% of the babies squeeze the yellow ball to try to make it squeak, whereas in condition 2, only 33% of the babies squeeze the ball.

The explanation of this finding given by GTS is as follows. The babies are considering two possible hypotheses about the relation of color to squeakiness: Hypothesis A is that all balls squeak; hypothesis B is that all and only blue balls squeak; (the obvious third alternative that only yellow balls squeak is ruled out by the observation, and therefore can be ignored). Thus if A is true, then the yellow ball will squeak; if B is true, it will not. The babies are also considering two possible hypotheses about the experimenter's selection rule for the first three balls. Hypothesis C is that the experimenter is picking at random from the set of all balls; hypothesis D is that she is picking at random from the set of all balls that squeak.

It is assumed that A and B are independent of C and D, and that A, B, C, and D all have prior probability 1/2.

The model thus posits that the babies are considering a hypothesis space where there are two dimensions and two alternatives in each dimension.

This seems to us entirely arbitrary. It seems to us that, from the point of view of the babies' observations, that it would be just as plausible to posit four dimensions;

A. The relation between color and squeakiness.
B. The selection criterion for phase 1 of the experiment
C. The selection criterion for phase 2 of the experiment.
D. The rule governing whether or not the experimenter squeezes the ball in phase 2. (The rule governing the experimenter's decision to squeeze in phase 1 turns out not to matter, since she always squeezes and it always squeaks.)

Additionally, in each category, there are additional hypotheses that are just as plausible as those that GTS are considering.

Thus, it seems to us that the following Bayesian model would be motivated (we exclude hypotheses that are inconsistent with the observations).

Dimension A: Relation of squeak to color.
Hypotheses:

1. All balls squeak.
2. All and only blue balls squeak.
3. Balls squeak randomly (by default with probability 1/2).

Dimension B: Selection criterion in phase 1.
Hypotheses:

1. The experimenter chooses toys at random.
2. The experimenter chooses randomly among squeaky toys.
3. The experimenter chooses randomly among blue toys.
4. The experimenter chooses randomly among squeaky blue toys.

Dimension C:: Selection criterion in phase 2.
Hypotheses:

1. The experimenter chooses a toy at random.
2. The experimenter chooses randomly among squeaky toys.
3. The experimenter chooses randomly among non-squeaky toys.
4. The experimenter chooses randomly among yellow toys.
5. The experimenter chooses randomly among squeaky yellow toys.
6. The experimenter chooses randomly among non-squeaky yellow toys.

Dimension D:

1. The experimenter always squeezes the toy in phase 1 and never squeezes in phase 2.
2. In both phases, the experimenter squeezes the toy only if it squeaks.

There could be additional options in D e.g.

3. If the toy in phase 2 squeaks, the experimenter squeezes it; otherwise she chooses randomly whether to squeeze it

But that is arguably more complicated and therefore reasonably excluded.

There would thus be 144 combinations; however, not all of these are possible or distinct. In particular, assuming that the experimenter knows the truth of dimension A, there are the following logical constraints:

A.1 is inconsistent with C.3, C.6, and D.2
If A.1 is true, B.1 and B.2 are identical; B.3 and B.4 are identical; C.1 and C.2 are identical; and C.4 and C.5 are identical
A.2 is inconsistent with C.2 and C.5.
If A.2 is true, B.2, B.3, and B.4 are identical; and C.3, C.4, and C.6 are identical.
C.2 and C.5 are inconsistent with D.2.
If C.3 or C.6 is true, then D.1 and D.2 are identical.

There remain 43 logically distinct possibilities:

[A1,B1,C1,D1], [A1,B1,C4,D1], [A1,B3,C1,D1], [A1,B3,C4,D1],
[A2,B1,C1,D1], [A2,B1,C1,D2], [A2,B1,C3,D1], [A2,B1,C3,D2], [A2,B2,C1,D1], [A2,B2,C1,D2], [A2,B2,C3,D1],
[A3,B*,C1,D1], [A3,B*,C1,D2], [A3,B*,C2,D1], [A3,B*,C3,D1], [A3,B*,C4,D1], [A3,B*,C4,D2], [A3,B*,C5,D1], [A3,B*,C6,D1]

It is by no means clear what is the best way to assign priors to these.

The truth is A.2, B.2=B.3=B.4, C.3=C.4=C.6, and D.1=D.2. In GTS's models, the babies consider A.1 vs A.2 and B.1 vs B.2; they assume C.4 and D.1. We do not see any principled reason, from the babies' point of view, why any of the other alternatives should a priori be considered less likely, let alone considered impossible.

One can also wonder about the independence of the assumptions between categories, aside from the logical constraints. In particular, if the distinction between the two phases of the experiment is not very clear to the babies --- and it is not obvious that it would be --- then they might assign a higher probability to the combinations where the same rule is being used in both phases (i.e. either B.1 and C.1 or B.2 and C.2), and they might assign a higher probability to D.2, which applies uniformly to both phases, than to D.1, which creates an arbitrary distinction between the two phases.

This paragraph added October 2014

A Bayesian model is a non-empty set of hypotheses; for instance the model used in GTS is the set { [A1,B1,C1,D1], [A1,B1,C4,D1], [A2,B1,C1,D1], [A2,B1,C1,D2], [A2,B1,C3,D1], [A2,B1,C3,D2], [A2,B2,C1,D1], [A2,B2,C1,D2], [A2,B2,C3,D1], } In principle, therefore, there could be as many as 2⁴³-1 different Bayesian models (about 8.8 trillion) that a theorist might consider; however most of these are entirely unmotivated and quite implausible. A reasonable Bayesian model, let us say, is one where one consider one possible set of choices for A, one for B, and so on. To compute a lower bound on the size of this collection let us consider only hypothesis spaces in which A.3 and D.1 are included. If A.3 and D.1 are true then all the hypotheses in B and C are distinct and consistent. Therefore, any set of hypotheses of the form ``[some subset of the A'a containing A.3] and [some non-empty subset of the B's] and [some non-empty subset of the C's] and [some subset of the D's containing D.1]'' is a reasonable Bayesian model; and these are all distinct. There are therefore more than 4 * 15 * 63 * 2 = 7560 reasonable distinct Bayesian models. There are also additional models, either not including A.3 or not including D.1; but the number of these is probably quite small.

More models

This section added November 2015 .

There are also other considerations that the baby might bring to bear. For instance, suppose we modify A.3 to a model in which some fraction of blue balls squeak and some other fraction of yellow balls squeak; these fractions are presumably known to adults, though not to the baby. A baby might conjecture that, since the people who put together boxes of toys for babies are clearly benevolent people, they should prefer to put in more toys of the squeakier color. Alternatively the baby might conjecture that balls of the less common color are clearly ``special'', and therefore should be squeakier.

This can be incorporated into the above model as follows:

Change A.3 to "There is a fraction Y of yellow balls that are squeaky, and a fraction B of blue balls that are squeaky."
Add another dimension E, ``Principle of constructing the box'' with three options;
- E.1 The fraction of blue balls in the box is independent of Y and B
- E.2 The box has more of the squeakier color (benevolent box filler).
- E.3 The box has fewer of the squeaker color (special balls).

To complete a probabilistic model, one would have to specify in A.3 probability distributions for Y and B and in E.2 and E.3 a specific dependence of the fraction of blue balls on Y and B. For instance, one could specify that Y and B are independent and uniformly distributed between 0 and 1; that in E.2 the fraction of blue balls in the box is B/(B+Y); and that in E.3 the fraction of blue balls in the box is Y/(B+Y).

To estimate the number of Bayesian models in the new theory, note that if A.3 and D.1 are true, then there are no logical dependencies between any of the other hypotheses. Therefore, any set of hypotheses of the form ``[some subset of the A'a containing A.3] and [some non-empty subset of the B's] and [some non-empty subset of the C's] and [some subset of the D's containing D.1] and [some non-empty subset of the E's]' is a reasonable Bayesian model; and these are all distinct. There are therefore more than 4 * 15 * 63 * 2 * 7 = 59,920 reasonable distinct Bayesian models.

Reference

Gweon, H., Tenenbaum, J. B., & Schulz, L. E. (2010). Infants consider both the sample and the sampling process in inductive generalization. Proceedings of the National Academy of Sciences, USA, 107, 9066-9071.