## Problem Set 5

Assigned: April 1
Due: April 15.

### Problem 1

Machine learning algorithms can be very sensitive to how values of attributes are grouped together. Naive Bayes is one of the most sensitive.

Amy and Barbara are separately researching the question of what very rich people like to drink. They are both working from the same data set. An instance in this data set is a person. There are three attributes:

• 1. The make of the most expensive car the person owns.
• 2. Whether or not the person owns a yacht.
• 3. The person's favorite beverage.
Amy divides the beverages into two categories: alcoholic and non-alcoholic. She then applies the Naive Bayes algorithm to predict the category of drink for an individual with the predicitive values "Owns a BMW" and "Owns a yacht." The result of her calculation is that, given that a person owns a BMW and a yacht, the probability is more than 90 per cent that his/her favorite drink is non-alcoholic.

Barbara divides the beverages into two categories: champagne and everything else. She applied Naive Bayes in the same way. The results of her calculation is that, given that a person owns a BMW and a yacht, the probability is more than 90 per cent that his/her favorite drink is champagne.

Describe a data set that supports both of these conclusions. (The kind of "description" you're looking for would be something like, "There are 100 instances of people who own a BMW, own a yacht and drink champagne; 3 instances of people who don't own a BMW, own a yacht, and drink beer;" etc.)

Answer: Let B be the event "Owns a BMW"; Y, the event "Owns a yacht"; N the event "Prefers non-alcoholic"; C, the event "Prefers champagne,"; and A the event "Prefers an alcoholic drink other than champagne."

Let s=Freq(B|N)*Freq(Y|N)*Freq(N),
t=Freq(B|~N)*Freq(Y|~N)*Freq(~N),
u=Freq(B|C)*Freq(Y|C)*Freq(C),
v=Freq(B|~C)*Freq(Y|~C)*Freq(~C).

Ann computes Prob(N|B,Y) = s/(s+t) > 0.9; therefore s > 9t.
Barbara computes Prob(C|B,Y) = u/(u+v) > 0.9; therefore u > 9v.
Of course, Freq(~N) > Freq(C) and Freq(~C) > Freq(N), so we have to make up for these by having a large difference in the relative frequencies. With a little experimentation, one can find the following solution (not unique, of course, but characteristic).
Attributes Number of instances
C,B,Y 1
N,B,Y 1
A,~B,~Y 8
and the remaining rows are 0.

Then Freq(B|N) = Freq(Y|N) = Freq(B|C) = Freq(Y|C) = 1;
Freq(B|~N) = Freq(Y|~N) = Freq(B|~C) = Freq(Y|~C) = 1/9
Freq(N)=Freq(C) = 1/10; and Freq(~N)=Freq(~C) = 9/10.

So w=y=1*1*(1/10) and x=z=(1/9)*(1/9)*(9/10), so w/x=y/z=9 as desired.

### Problem 2

A. Let D be a data set with three predictive attributes: P, Q, and R and one classification attributes C. Attributes P, Q, and C are Boolean. Attribute R has three values: 1, 2, and 3. The data is as follows

P Q R C Number of
instances.
Y Y 1 N 20
Y Y 2 Y 1
Y Y 3 Y 2
Y N 1 Y 8
Y N 2 Y 2
Y N 3 Y 0
N Y 1 N 12
N Y 2 Y 2
N Y 3 Y 1
N N 1 Y 25
N N 2 Y 1
N N 3 Y 4

Trace the execution of the ID3 algorithm, and show the decision tree that it outputs. At each stage, you should compute the average entropy AVG\_ENTROPY(A,C,T) for each attribute A. (The book calls this "Remainder(A)" (p. 660).)

```Let T0 be the entire table.

ENTROPY(C,ST(P,Y,T0))= -(13/33 log(13/33) + 20/33 log(20/33)) = 0.9674
ENTROPY(C,ST(P,N,T0))= -(33/45 log(33/45) + 12/45 log(12/45)) = 0.8368
AVG_ENTROPY(P,C,T0) = (33/78)*0.9674 + (45/78)*0.8368 = 0.8921

ENTROPY(C,ST(Q,Y,T0))= -(32/38 log(32/38) + 6/38 log(6/38)) = 0.6293
ENTROPY(C,ST(Q,N,T0))= -(40/40 log(40/40) + 0/40 log(0/40)) = 0.
AVG_ENTROPY(Q,C,T0) = (38/78) * 0.6293 + (40/78) * 0 =  0.3065

ENTROPY(C,ST(R,1,T0))= -(32/65 log(32/65) + 33/65 log(33/65)) = 0.99987
ENTROPY(C,ST(R,2,T0))= -(6/6 log(6/6) + 0/6 log(0/6)) = 0.
ENTROPY(C,ST(R,3,T0))= -(7/7 log(7/7) + 0/7 log(0/7)) = 0.
AVG_ENTROPY(R,C,T0) = (65/78) * + (6/78) * 0 + (7/78) * 0 = 0.8332

ENTROPY(C,T0) = -((32/78) log(32/78) + (46/78) log(46/78)) = 0.9767

Decide to split T0 on attribute Q.

Let T1=SUBTABLE(Q,Y,T0), T2=SUBTABLE(Q,N)

ENTROPY(C,ST(P,Y,T1)) = -((20/23) log(20/23) + (3/23) log(3/23)) = 0.5587
ENTROPY(C,ST(P,N,T1)) = -((12/15) log(12/15) + (3/15) log(3/15)) = 0.7219
AVG_ENTROPY(P,C,T0) = (23/38) * 0.5249  + (15/38) * 0.7219      =  0.6231

ENTROPY(C,ST(R,1,T1)) = -((32/32) log(32/32) + (0/32) log(0/32) = 0
ENTROPY(C,ST(R,2,T1)) = -((3/3) log(3/3) + (0/3) log(0/3) = 0
ENTROPY(C,ST(R,3,T1)) = -((3/3) log(3/3) + (0/3) log(0/3) = 0
AVG_ENTROPY(R,C,T1) = 0.

Split T1 on R.

C is Y on all instances in T2, so it is within base case 3. Do not split.

Final tree:

Q ---- (Y) ---- R ---- (1) ---- C=N
|               |
|               |-- (2) ---- C=Y
|               |
|               |-- (3) ---- C=Y
|
|-- (N) ---- C=Y.
```