Problem Set 6

Assigned: Apr. 5
Due: Apr. 26.

Problem 1

Let A, B, C, D be Boolean random variables.

Given that

A and B are (absolutely) independent.
C is independent of B given A.
D is independent of C given A and B.
Prob(A=T) = 0.3
Prob(B=T) = 0.6
Prob(C=T|A=T) = 0.8
Prob(C=T|A=F) = 0.4
Prob(D=T|A=T,B=T) = 0.7
Prob(D=T|A=T,B=F) = 0.8
Prob(D=T|A=F,B=T) = 0.1
Prob(D=T|A=F,B=F) = 0.2
Compute the following quantities:

Problem 2

Let D be a data set with three predictive attributes: P, Q, and R and one classification attributes C. Attributes P, Q, and C are Boolean. Attribute R has three values: 1, 2, and 3. The data is as follows

P Q R C Number of
Y Y 1 N 2
Y Y 2 Y 10
Y Y 3 Y 15
Y N 1 Y 20
Y N 2 N 3
Y N 3 N 4
N Y 1 N 40
N Y 2 Y 6
N Y 3 Y 5
N N 1 Y 4
N N 2 Y 2
N N 3 Y 4

A. What rule is output by the 1R algorithm? What is its prediction for a new instance with P=Y,Q=Y,R=1?

B. What is the prediction of Naive Bayes algorithm for a new instance with P=Y,Q=Y,R=1? What probability is assigned to the prediction?

C. Trace the execution of the ID3 algorithm, and show the decision tree that it outputs. At each stage, you should compute the average entropy AVG\_ENTROPY(A,C,T) for each attribute A. (The book calls this "Remainder(A)" (p. 660).)