Sample problems from second half of course
Let me emphasize that this is just a collection of sample problems,
not a sample final exam.
Multiple choice problems
Bayes' Law states that
- A. Prob(P|Q) = Prob(P) / Prob(Q).
- B. Prob(P|Q) = Prob(Q|P)
- C. Prob(P|Q) = Prob(Q|P) / Prob(Q)
- D. Prob(P|Q) = Prob(P) * Prob(Q|P) / Prob(Q)
- E. Prob(P|Q) = Prob(Q) * Prob(Q|P) / Prob(P)
In Naive Bayes learning, we make the assumption that
- A. The classification attribute is independent of the predictive
- B. The classification attribute depends only one predictive attribute.
- C. The predictive attributes are absolutely independent
- D. The predictive attributes are conditionally independent given
the classification attribute.
A support vector machine finds a linear separator that maximizes the "margin",
- A. The number of misclassified data points.
- B. The sum over all misclassified points of the distance from the point
to the separator.
- C. The sum over all misclassified points of the distance from the point
to the separator squared.
- D. The minimum distance from any point to the separator.
In the problem of tag elements E1 ...
EN with tags T1 ... TN, the K-gram
assumption is the assumption that
- A. EI is independent of EI-K
- B. TI is independent of TI-K
- C. EI is conditionally independent of
E1 ... EI-K given EI+1-K ... EI-1
- D. TI is conditionally independent of
T1 ... TI-K given TI+1-K ... TI-1
Learning takes place in a back-propagation network by
- A. Propagating activation levels from the input layer to the output layer.
- B. Propagating activation levels from the output layer to the input layer.
- C. Propagating modification to weights on the arcs from the input layer to the output layer.
- D. Propagating modification to weights on the arcs from the output layer to the input layer.
- E. Adding nodes and links in the hidden layers.
- F. Both adding and deleting nodes and links in the hidden layers.
Long Answer Problems
A. What conditional probabilities are recorded in the above Bayesian
B. For each of the following statements, say whether it is true or false
in the above network:
B and C are independent absolutely.
B and C are independent given A.
B and C are independent given D.
A and D are independent absolutely.
A and D are independent given B.
A and D are independent given B and C.
C. Assuming that all the random variables are Boolean, show how Prob(B=T)
can be calculated in terms of the probabilities
recorded in the above network.
Datasets often contain instances with null values in some of the attributes.
Some classification learning algorithms are able to use such instances
in the training set; other algorithms must discard them.
- A. Can Naive Bayes make use of instances with null values in the
training set? Explain your answer.
- B. Can K-Nearest neighhbors make use of instances with null values in the
training set? Explain your answer.
The version of the ID3 algorithm in the class handout includes a test
"If AVG_ENTROPY(AS,C,T) is not substantially smaller than ENTROPY(C,T)''
then the algorithm constructs a leaf corresponding to the current state of T
and does not recur. "Substantially smaller" here, of course, is rather vague.
Is overfitting more likely to occur if this condition is changed
to require that "AVG_ENTROPY(AS,C,T) is much smaller than ENTROPY(C,T)" or
if the condition is changed to "AVG_ENTROPY(AS,C,T) is at all smaller than
ENTROPY(C,T)"? Explain your answer.
the disadvantage of eliminating the test?
A. What is the sparse data problem in using Naive Bayes for
classifying text? How is it solved?
B. What is the sparse data problem in using the k-gram model for
tagging text? How is it solved?
The most common measure of the quality of a classifier is in terms of the
accuracy of its predictions. Explain why this is not always the best
measure and describe an alternative measure.