## Problem Set 5

Assigned: Oct. 27

Due: Nov. 3.
### Problem 1

Suppose that the training set for a Naive Bayes learner is Mitchell's
Table 3.2, on the first page of the handout.
What are the probabilities that Play=Yes or Play=No given
Sunny, Cool, High, Weak? Be sure to use * normalized * probabilities;
that is, the two probabilities should add up to 1.0.
### Problem 2

Given a data set with numeric attributes, one common way of applying
Naive Bayes learning is to discretize the attribute. However, there
are always many different ways of doing the discretization, and different
discretizations can lead to very different results.
Suppose that your original data set has two attributes. The predictive
attribute is "Outlook" which has values "Sunny", "Overcast", and "Rainy";
the classification attribute is "Temperature" which has integer values
in degrees Farenheit. Thus, a typical instance (row) in the data set might be
"Sunny; 72".
You are considering two different discretization schemes:

- Scheme S1 divides the temperature into "Cold" (below 40);
"Cool" (40-59); "Temperate" (60-75); and "Hot" (over 75).
- Scheme S2 divides the temperature into "Freezing" (below 32);
"Chilly" (32-55); "Mild" (56-70), "Warm" (71-85) and "Broiling" (above 85).

Construct a data set of outlooks and integer temperatures with the following
property: If the temperatures are discretized using S1, then Naive Bayes,
given "Sunny" will predict "Hot", but if they are discretized using S2,
then Naive Bayes given "Sunny" will predict "Freezing".

** Extra credit: ** Suppose that we add a second discrete predictive
attributes -- say "Day of the Week" with values "Sunday", "Monday" etc.
Describe a data set over "Outlook", "Day" and "Temperature" with the
following property: Given "Sunny" and "Monday", if the temperature is
discretized using S1, then Naive Bayes will predict that it is 99%
sure that the temperature is Hot, but if the temperature is discretized
using S2, then Naive Bayes will predict that it is 99% sure that the
temperature is Freezing.

Morals:

1. Discretizing is a crude way to deal with numeric data.

2. You have to be * very careful * before accepting a probabilistic
argument at face value.