## Problem Set 5

Assigned: April 6
Due: April 20

### Problem 1

Consider the following Bayesian network:

Assume that all the random variables are Boolean, and that you have the following probabilities:
 P(A=T) = 0.9. P(A=F) = 0.1. P(B=T) = 0.5. P(B=F) = 0.5 P(C=T|A=T,B=T) = 1.0. P(C=F|A=T,B=T) = 0.0. P(C=T|A=T,B=F) = 0.8. P(C=F|A=T,B=F) = 0.2. P(C=T|A=F,B=T) = 0.5. P(C=F|A=F,B=T) = 0.5. P(C=T|A=F,B=F) = 0.1. P(C=F|A=F,B=F) = 0.9. P(D=T|B=T) = 0.7. P(D=F|B=T) = 0.3 P(D=T|B=F) = 0.2. P(D=F|B=F) = 0.8 P(E=T|C=T) = 0.1. P(E=F|C=T)=0.9. P(E=T|C=F) = 0.8. P(E=F|C=F)=0.2.
Computed the following quantities:
• A. P(C=T).
• B. P(C=T|A=T).
• C. P(C=T,D=T).
• D. P(A=T|C=T).
• E. P(A=T|B=T,C=T).
• F. P(E=T|A=T).
• G. P(D=T|C=T).

### Problem 2

A standard method for applying an ML classification method designed for discrete data to a data set with numerical attributes is to discretize the values; that is, to divide the space of values into K different values such as "High" and "Low" for K=2 or "High", "Medium", and "Low" for K=3, by choosing K-1 different cutoff points. For instance, if the attribute is an integer between 0 and 100, then one might decide that "Low" is 0-24, "Medium" is 25-50, and "High" is 51-100. Of course this can be done for either predictive or classification attributes or both. In parts A, B and C, assume that the classification attribute is discrete and that we are concerned with discretizing one or more of the predictive attributes. In D the classification attribute is numeric and there are two Boolean predictive attributes.

There are a number of difficulties with discretization:

A. (Open ended --- there is no right or wrong answer) One problem is that one has to choose the number of discrete categories and the cutoff points and that this is generally somewhat arbitrary. Describe an algorithm or a criterion that takes as input a training data set, with a combination of numerical and discrete predictive attributes and a discrete classification attribute, and returns a discretization for each numerical attribute. Discuss the different objectives that arise in choosing a discretization, and whether and how your algorithm addresses them.

B. (Open ended) One particular choice that has to be made is whether or not the algorithm should consider the associated values of the classification attribute in deciding how to discretize the predictive attribute. What are the pros and cons of this? Discuss arguments in both directions.

C. (Open ended). A second problem is that discretization throws away a lot of information. Obviously, one is throwing away information by lumping all the values within a range into a single category; in the example above, we are treating a 0 and a 24 as the same. Slightly less obviously, one is throwing away information in that the relation between categories is lost. For instance, "High" is closer to "Medium" than it is to "Low", but once one has discretized, these are just names, and the learning algorithms don't know this.

Describe a data set in which the first issue is particularly problematic and one in which the second issue is particularly problematic. In both cases, you are trying to design a data set in which the numerical attribute is in fact usefully predictive of the classification attribute, but a large part of this predictive power is lost when the attribute is discretized due to one or the other of these issues.

D. (Not open ended). A third problem is that the results of applying an ML algorithm to a discretized data set may depend very sensitively on the discretization that is used.

Consider a classification problem with two predictive attributes A and B and a numeric classification attribute C with values between 0 and 100. Let D1 be the discretization 0-25 => Low, 26-50 => Medium, 51-100 => High. Let D2 be the discretization 0-50 => Low, 51-75 => Medium, 76-100 => High. Let X be a test example with X.A = X.B = TRUE. Construct a training set T such that, if D1 is applied, then Naive Bayes predicts with high confidence that X.C is Low, and if D2 is applied then Naive Bayes predicts with high confidence that X.C is High.