Problem Set 8

Assigned: Nov. 28
Due: Dec. 12

Let D be a data set with three predictive attributes, P, Q, and R, and one classification attributes C. Attributes P, Q, and R are Boolean. Attribute C has three values: 1, 2, and 3. There are 40 instances in total. The data is as follows

Line P Q R C Number of
instances.
1. Y Y Y 1 2
2. Y Y Y 2 6
3. Y Y Y 3 2
4. Y Y N 1 2
5. Y Y N 2 6
6. Y Y N 3 0
7. Y N Y 1 3
8. Y N Y 2 0
9. Y N Y 3 1
10. Y N N 1 0
11. Y N N 2 0
12. Y N N 3 0
13. N Y Y 1 0
14. N Y Y 2 5
15. N Y Y 3 1
16. N Y N 1 0
16. N Y N 2 0
16. N Y N 3 0
19. N N Y 1 7
20. N N Y 2 1
21. N N Y 3 0
22. N N N 1 0
23. N N N 2 1
24. N N N 3 3

Problem 1:

Suppose that one computes the ``nearest neighbors'' prediction as follows: Given a new instance X, collect all the instances that agree with X on all three attributes and give each instance 2 votes, and collect all instances that agree with X on two attributes and give each instance 1 vote. What is the prediction for P=Y, Q=N, R=N? What is the prediction for P=Y, Q=N, R=Y?

Problem 2:

What rule is output by the 1R algorithm? What is its prediction for a new instance with P=Y,Q=N,R=N?

Problem 3:

What is the prediction of Naive Bayes algorithm for a new instance with P=Y,Q=Y,R=Y? What is its prediction for P=Y,Q=N,R=N?

Problem 4:

What is the root node in the decision tree output by the ID3 algorithm? In computing the root node, what is the value of AVG_ENTROPY(A,C,T) for each predictive attribute A? (The book calls this "Remainder(A)" (p. 660).) When the top-level call to ID3 calls itself recursively, what are the sub-tables that are passed as parameters? (It suffices to describe these in terms of the line numbers and attributes from the original table; you do not have to show the whole table.)