Problem Set 5

Assigned: Oct. 27
Due: Nov. 3.

Problem 1

Suppose that the training set for a Naive Bayes learner is Mitchell's Table 3.2, on the first page of the handout. What are the probabilities that Play=Yes or Play=No given Sunny, Cool, High, Weak? Be sure to use normalized probabilities; that is, the two probabilities should add up to 1.0.

Problem 2

Given a data set with numeric attributes, one common way of applying Naive Bayes learning is to discretize the attribute. However, there are always many different ways of doing the discretization, and different discretizations can lead to very different results.

Suppose that your original data set has two attributes. The predictive attribute is "Outlook" which has values "Sunny", "Overcast", and "Rainy"; the classification attribute is "Temperature" which has integer values in degrees Farenheit. Thus, a typical instance (row) in the data set might be "Sunny; 72". You are considering two different discretization schemes:

Construct a data set of outlooks and integer temperatures with the following property: If the temperatures are discretized using S1, then Naive Bayes, given "Sunny" will predict "Hot", but if they are discretized using S2, then Naive Bayes given "Sunny" will predict "Freezing".

Extra credit: Suppose that we add a second discrete predictive attributes -- say "Day of the Week" with values "Sunday", "Monday" etc. Describe a data set over "Outlook", "Day" and "Temperature" with the following property: Given "Sunny" and "Monday", if the temperature is discretized using S1, then Naive Bayes will predict that it is 99% sure that the temperature is Hot, but if the temperature is discretized using S2, then Naive Bayes will predict that it is 99% sure that the temperature is Freezing.

Morals:
1. Discretizing is a crude way to deal with numeric data.
2. You have to be very careful before accepting a probabilistic argument at face value.