Due: April 15.

Amy and Barbara are separately researching the question of what very rich people like to drink. They are both working from the same data set. An instance in this data set is a person. There are three attributes:

- 1. The make of the most expensive car the person owns.
- 2. Whether or not the person owns a yacht.
- 3. The person's favorite beverage.

Barbara divides the beverages into two categories: champagne and everything
else. She applied Naive Bayes in the same way. The results of * her
* calculation is that, given that a person
owns a BMW and a yacht, the probability is more than 90 per cent that
his/her favorite drink is champagne.

Describe a data set that supports both of these conclusions. (The kind of "description" you're looking for would be something like, "There are 100 instances of people who own a BMW, own a yacht and drink champagne; 3 instances of people who don't own a BMW, own a yacht, and drink beer;" etc.)

** Answer: **
Let B be the event "Owns a BMW"; Y, the event "Owns a yacht"; N the event
"Prefers non-alcoholic"; C, the event "Prefers champagne,"; and A the
event "Prefers an alcoholic drink other than champagne."

Let s=Freq(B|N)*Freq(Y|N)*Freq(N),

t=Freq(B|~N)*Freq(Y|~N)*Freq(~N),

u=Freq(B|C)*Freq(Y|C)*Freq(C),

v=Freq(B|~C)*Freq(Y|~C)*Freq(~C).

Ann computes Prob(N|B,Y) = s/(s+t) > 0.9; therefore s > 9t.

Barbara computes Prob(C|B,Y) = u/(u+v) > 0.9; therefore u > 9v.

Of course, Freq(~N) > Freq(C) and Freq(~C) > Freq(N), so we have
to make up for these by having a large difference in the relative frequencies.
With a little experimentation, one can find the following solution (not
unique, of course, but characteristic).

Attributes | Number of instances |
---|---|

C,B,Y | 1 |

N,B,Y | 1 |

A,~B,~Y | 8 |

Then Freq(B|N) = Freq(Y|N) = Freq(B|C) = Freq(Y|C) = 1;

Freq(B|~N) = Freq(Y|~N) = Freq(B|~C) = Freq(Y|~C) = 1/9

Freq(N)=Freq(C) = 1/10; and Freq(~N)=Freq(~C) = 9/10.

So w=y=1*1*(1/10) and x=z=(1/9)*(1/9)*(9/10), so w/x=y/z=9 as desired.

P | Q | R | C | Number of instances. |
---|---|---|---|---|

Y | Y | 1 | N | 20 |

Y | Y | 2 | Y | 1 |

Y | Y | 3 | Y | 2 |

Y | N | 1 | Y | 8 |

Y | N | 2 | Y | 2 |

Y | N | 3 | Y | 0 |

N | Y | 1 | N | 12 |

N | Y | 2 | Y | 2 |

N | Y | 3 | Y | 1 |

N | N | 1 | Y | 25 |

N | N | 2 | Y | 1 |

N | N | 3 | Y | 4 |

Trace the execution of the ID3 algorithm, and show the decision tree that it outputs. At each stage, you should compute the average entropy AVG\_ENTROPY(A,C,T) for each attribute A. (The book calls this "Remainder(A)" (p. 660).)

** Answer **

Let T0 be the entire table. ENTROPY(C,ST(P,Y,T0))= -(13/33 log(13/33) + 20/33 log(20/33)) = 0.9674 ENTROPY(C,ST(P,N,T0))= -(33/45 log(33/45) + 12/45 log(12/45)) = 0.8368 AVG_ENTROPY(P,C,T0) = (33/78)*0.9674 + (45/78)*0.8368 = 0.8921 ENTROPY(C,ST(Q,Y,T0))= -(32/38 log(32/38) + 6/38 log(6/38)) = 0.6293 ENTROPY(C,ST(Q,N,T0))= -(40/40 log(40/40) + 0/40 log(0/40)) = 0. AVG_ENTROPY(Q,C,T0) = (38/78) * 0.6293 + (40/78) * 0 = 0.3065 ENTROPY(C,ST(R,1,T0))= -(32/65 log(32/65) + 33/65 log(33/65)) = 0.99987 ENTROPY(C,ST(R,2,T0))= -(6/6 log(6/6) + 0/6 log(0/6)) = 0. ENTROPY(C,ST(R,3,T0))= -(7/7 log(7/7) + 0/7 log(0/7)) = 0. AVG_ENTROPY(R,C,T0) = (65/78) * + (6/78) * 0 + (7/78) * 0 = 0.8332 ENTROPY(C,T0) = -((32/78) log(32/78) + (46/78) log(46/78)) = 0.9767 Decide to split T0 on attribute Q. Let T1=SUBTABLE(Q,Y,T0), T2=SUBTABLE(Q,N) ENTROPY(C,ST(P,Y,T1)) = -((20/23) log(20/23) + (3/23) log(3/23)) = 0.5587 ENTROPY(C,ST(P,N,T1)) = -((12/15) log(12/15) + (3/15) log(3/15)) = 0.7219 AVG_ENTROPY(P,C,T0) = (23/38) * 0.5249 + (15/38) * 0.7219 = 0.6231 ENTROPY(C,ST(R,1,T1)) = -((32/32) log(32/32) + (0/32) log(0/32) = 0 ENTROPY(C,ST(R,2,T1)) = -((3/3) log(3/3) + (0/3) log(0/3) = 0 ENTROPY(C,ST(R,3,T1)) = -((3/3) log(3/3) + (0/3) log(0/3) = 0 AVG_ENTROPY(R,C,T1) = 0. Split T1 on R. C is Y on all instances in T2, so it is within base case 3. Do not split. Final tree: Q ---- (Y) ---- R ---- (1) ---- C=N | | | |-- (2) ---- C=Y | | | |-- (3) ---- C=Y | |-- (N) ---- C=Y.