Problem Set 5

Assigned: Oct. 27
Due: Nov. 3.

Problem 1

Suppose that the training set for a Naive Bayes learner is Mitchell's Table 3.2, on the first page of the handout. What are the probabilities that Play=Yes or Play=No given Sunny, Cool, High, Weak? Be sure to use normalized probabilities; that is, the two probabilities should add up to 1.0.

Answer: According to the Naive Bayes formula,

Prob(Play=Yes | Sunny, Cool, High, Weak) is proportional to
Prob(Play=Yes) Prob(Sunny|Play=Yes) Prob(Cool|Play=Yes) Prob(High|Play=Yes) Prob(Weak|Play=Yes) =
(9/14) * (2/9) * (3/9) * (3/9) * (6/9) = 0.0106

Prob(Play=No | Sunny, Cool, High, Weak) is proportional to
Prob(Play=No) Prob(Sunny|Play=No) Prob(Cool|Play=No) Prob(High|Play=No) Prob(Weak|Play=No) =
(5/14) * (3/5) * (1/5) * (4/5) * (2/5) = 0.0137

Thus Play=No is more likely than Play=Yes. To find the constant of proportionality (normalizing factor) and recover the true probabilities, add these two constants together, to give 0.0243, and divide each of the above products by this sum, giving

Prob(Play=Yes | Sunny, Cool, High, Weak) = 0.4356
Prob(Play=No | Sunny, Cool, High, Weak) = 0.5644

Problem 2

Given a data set with numeric attributes, one common way of applying Naive Bayes learning is to discretize the attribute. However, there are always many different ways of doing the discretization, and different discretizations can lead to very different results.

Suppose that your original data set has two attributes. The predictive attribute is "Outlook" which has values "Sunny", "Overcast", and "Rainy"; the classification attribute is "Temperature" which has integer values in degrees Farenheit. Thus, a typical instance (row) in the data set might be "Sunny; 72".

You are considering two different discretization schemes:

Construct a data set of outlooks and integer temperatures with the following property: If the temperatures are discretized using S1, then Naive Bayes, given "Sunny" will predict "Hot", but if they are discretized using S2, then Naive Bayes given "Sunny" will predict "Freezing".

Answer: Suppose that the data set is

Temperature Outlook
100 Sunny
90 Sunny
80 Sunny
77 Sunny
30 Sunny
20 Sunny
10 Sunny

If we discretize using S1, then we have
Prob(Hot | Sunny) = Prob(Sunny | Hot) Prob(Hot) / Prob(Sunny) = 1 * (4/7) / 1 = 4/7.
Prob(Cold| Sunny) = Prob(Sunny | Cold) Prob(Cold) / Prob(Sunny) = 1 * (3/7) / 1 = 3/7.
and the probabilities of the other temperature ranges are all 0. So the prediction is Hot.

If we discretize using S2, then we have
Prob(Broiling| Sunny) = Prob(Sunny | Broiling) Prob(Broiling) / Prob(Sunny) = 1 * (2/7) / 1 = 2/7.
Prob(Warm| Sunny) = Prob(Sunny | Warm) Prob(Warm) / Prob(Sunny) = 1 * (2/7) / 1 = 2/7.
Prob(Freezing| Sunny) = Prob(Sunny | Freezing) Prob(Freezing) / Prob(Sunny) = 1 * (3/7) / 1 = 3/7.
and the probabilities of the other temperature ranges are all 0. So the prediction is Freezing.

Extra credit: Suppose that we add a second discrete predictive attributes -- say "Day of the Week" with values "Sunday", "Monday" etc. Describe a data set over "Outlook", "Day" and "Temperature" with the following property: Given "Sunny" and "Monday", if the temperature is discretized using S1, then Naive Bayes will predict that it is 99% sure that the temperature is Hot, but if the temperature is discretized using S2, then Naive Bayes will predict that it is 99% sure that the temperature is Freezing.

This is trickier. One data set is as follows:

Temperature Day of Week Outlook Number of instances
30 Monday Sunny 100
35 Tuesday Overcast 1,000,000
90 Monday Sunny 100
90 Tuesday Overcast 9900

Discretizing using S1, we get
Prob(Hot | Monday, Sunny) is proportional to
Prob(Hot) Prob(Monday | Hot) Prob(Sunny | Hot) = (10,000/1,010,100) * (100 / 10,000) * (100 / 10,000) = $10^{-6}$.

Prob(Cold| Monday, Sunny) is proportional to
Prob(Cold) Prob(Monday | Cold) Prob(Sunny | Cold) = (1,000,100/1,010,100) * (100 / 1,000,100) * (100 / 1,000,100) = $10^{-8}$.

The probability of all other temperature ranges is 0. To normalize, we divide by $10^{-6}+10^{-8}$ giving
Prob(Hot | Monday, Sunny) = 0.99
Prob(Cold| Monday, Sunny) = 0.01

On the other hand, if we discretize using S2, we get

Prob(Broiling| Monday, Sunny) is proportional to
Prob(Broiling) Prob(Monday | Broiling) Prob(Sunny | Broiling) = (10,000/1,010,100) * (100 / 10,000) * (100 / 10,000) = $10^{-6}$.

Prob(Freezing| Monday, Sunny) is proportional to
Prob(Freezing) Prob(Monday | Freezing) Prob(Sunny | Freezing) = (100/1,010,100) * 1 * 1 = $10^{-4}$.

The probability of all other temperature ranges is 0. To normalize, we divide by $10^{-4}+10^{-6}$ giving
Prob(Freezing| Monday, Sunny) = 0.99
Prob(Broiling| Monday, Sunny) = 0.01