Problem Set 8
Assigned: Nov. 28
Due: Dec. 12
Let D be a data set with three predictive attributes, P, Q, and R, and one
classification attributes C. Attributes P, Q, and R are Boolean.
Attribute C has three values:
1, 2, and 3. There are 40 instances in total. The data is as follows
Line  P  Q  R  C  Number of instances.

1.  Y  Y  Y  1  2

2.  Y  Y  Y  2  6

3.  Y  Y  Y  3  2

4.  Y  Y  N  1  2

5.  Y  Y  N  2  6

6.  Y  Y  N  3  0

7.  Y  N  Y  1  3

8.  Y  N  Y  2  0

9.  Y  N  Y  3  1

10.  Y  N  N  1  0

11.  Y  N  N  2  0

12.  Y  N  N  3  0

13.  N  Y  Y  1  0

14.  N  Y  Y  2  5

15.  N  Y  Y  3  1

16.  N  Y  N  1  0

16.  N  Y  N  2  0

16.  N  Y  N  3  0

19.  N  N  Y  1  7

20.  N  N  Y  2  1

21.  N  N  Y  3  0

22.  N  N  N  1  0

23.  N  N  N  2  1

24.  N  N  N  3  3

Problem 1:
Suppose that one computes the ``nearest neighbors''
prediction as follows: Given a new instance X, collect all the instances
that agree with X on all three attributes and give each instance 2 votes,
and collect all instances that agree with X on two attributes and
give each instance 1 vote. What is the prediction for P=Y, Q=N, R=N?
What is the prediction for P=Y, Q=N, R=Y?
Problem 2:
What rule is output by the 1R algorithm? What is its prediction for
a new instance with P=Y,Q=N,R=N?
Problem 3:
What is the prediction of Naive Bayes algorithm for a new instance with
P=Y,Q=Y,R=Y? What is its prediction for P=Y,Q=N,R=N?
Problem 4:
What is the root node in the decision tree output by the ID3 algorithm?
In computing the root node,
what is the value of AVG_ENTROPY(A,C,T) for each predictive attribute A?
(The book calls this "Remainder(A)" (p. 660).) When the toplevel
call to ID3 calls itself recursively, what are the subtables that
are passed as parameters? (It suffices to describe these in terms
of the line numbers and attributes from the original table; you do
not have to show the whole table.)