Sample Exam Problems from 2nd Half of Course

The final exam will be given Monday, May 7 from 10:00 to 11:50 in Warren Weaver, room 101. It is closed book and closed notes.

The exam is cumulative, covering the material from the entire course. Roughly 1/3 of the exam will be drawn from the first half of the course, and the other 2/3 will be drawn from the second half. Some sample problems from the second half of the course are given below.

Problem 1:

What is the result of doing alpha-beta pruning in the game tree shown below?

Problem 2:

Name three conditions that must hold on a game for the technique of MIN-MAX game-tree evaluation to be applicable.

Answer: (i) It must be (i) zero-sum; (ii) perfect knowledge; (iii) deterministic; (iv) two-player; (v) discrete.

Problem 3:

A. Give an example of a decision tree with two internal nodes (including the root), and explain how it classifies an example.

This is a deterministic decision tree for a data set with two Boolean predictive attributes, A and B, and a Boolean classification C. The tree first tests an example X on attribute A. If X.A is T, then the tree tests on the tree tests on attribute B. If X.B is T, then the tree predicts that X.C is T; otherwise it predicts that X.C is F. If, in the first test, X.A is F, then the tree predicts that X.C is F.

B. Give a high-level description of the ID3 algorithm to construct decision trees from training data. You need not give the definition of entropy or expected entropy.

Answer: The ID3 algorithm builds the decision tree recursively from the top down. At each stage of the recursion, an internal node N is passed a subtable, with those instances that would reach N in the decision procedure, and those attributes that have not already been tested. If the table has more than one classification value, and has some predictive attributes left to test, and is more than a specified minimum splitting size, then the algorithm chooses the best attribute to split on, based on the expected entropy, and recursively passes the various partitions of the table to each of the subnodes.

C. What kinds of techniques can be used to counter the problem of over-fitting in decision trees?

Answer: Prepruning techniques prune the decision tree as it is being built; e.g. by enforcing a minimum splitting size on nodes. Postpruning techniques build the entire decision tree, and then eliminate tests that do not seem to be useful.

Problem 4:

Consider the following data set with three Boolean predictive attributes, W,X,Y and Boolean classification C.
  W   X   Y   C
----------------
  T   T   T   T
  T   F   T   F
  T   F   F   F
  F   T   T   F
  F   F   F   T
We now encounter a new example: W=F, X=T, Y=F. If we apply the Naive Bayes method, what probability is assigned to the two values of C?

Answer: We wish to compute

argmaxc Prob(C=c | W=F, X=T, Y=F) = (by Bayes' Law)
argmaxc Prob(W=F, X=T, Y=F | C=c) Prob(C=c) / Prob(W=F, X=T, Y=F) =
      (dropping the normalizing factor, which does not depend on c)
argmaxc Prob(W=F, X=T, Y=F | C=c) Prob(C=c) = 
      (assuming conditional independence)
argmaxc Prob(W=F|C=c) Prob(X=T|C=c) Prob(Y=F|C=c) Prob(C=c) 
Using the frequencies in the tables to estimate the probabilities, for c=T, this product evaluates to (1/2) * (1/2) * (1/2) * (2/5) = 1/20. For c=F, it evaluates to (1/3) * (1/3) * (1/3) * (3/5) = 1/45. The Bayesian estimate, therefore, is that c=T is more likely than c=F by a factor of 2.25. To be exact the estimated probability that C=T is (1/20)/((1/20)+(1/45)) = 9/13; the estimated probability that C=F is 4/13.

Problem 5:

"Local minima can cause difficulties for a feed-forward, back-propagation neural network." Explain. Local minima of what function of what arguments? Why do they create difficulties?

Answer: Local minima of the error function over the corpus of labelled examples, as a function of the weights on the links. The learning algorithm is in effect doing a hill-climbing search for the value of the weights that gives the minimal error function. If the hill-climbing search finds a local mininum of this function, it will get stuck at that suboptimal state, and will not find the true solution.

Problem 6:

Which of the following describes the process of task execution (classifying input signal) in a feed-forward, back-propagation neural network? Which describe the process of learning? (One answer is correct for each.)

Answer: Task execution is carried out through process (a). Learning is carried out through process (d).

Problem 7

Explain briefly (2 or 3 sentences) the use of a training set and a test set in evaluating learning programs.

Answer: When you wish to evaluate a learning program using a fixed corpus of examples, it is common to divide the corpus into a training set and a test set. The program is first trained by running it on the training set. Then the learning module is turned off, and the quality of the current state of the execution module is tested by running it on the test set. In this way you guard against the possibility of overlearning, in which the program simply learns the examples in the corpus, rather than the underlying concept.

Problem 8

Explain how the minimum description length (MDL) learning theory justifies the conjecture of
A. perfect classification hypotheses (i.e. classification hypotheses that always give the correct classification, given the values of the predictive attributes) for nominal classifications.
B. imperfect classification hypotheses (i.e. hypotheses that do better than chance) for nominal attributes.
C. approximate classification hypotheses for numeric classifications. (i.e. hypotheses that give answers that are nearly correct.)

Answer:
A. The original data can be encoded by storing the predictive attributes for each example plus the hypothesis. If the space saved on recording the classification is less than the space required for the hypothesis, then the overall description length is reduced.
B. The original data can be encoded by storing the predictive attributes plus hypothesis, for all the cases where the hypothesis works correctly, and then enumerating all the exceptions separately. If the space saved recording the classification on the correct instances is less than the space required for the hypothesis, then the overall description length is reduced.
C. Let A,B,C be the predictive attributes; let D be the classification; let f(A,B,C) be the hypothesis. For each example X, record X.A, X.B, X.C and the error X.D - f(A,B,C). If f is a useful hypothesis, then the range of values of the error will be much less than the range of values of X.D; hence the error will require fewer bits, for a given level of precision, than X.D. If the space thus saved is more than the space required to store f, then the description length is reduced.

Problem 9

Give an example of a Datalog program and a query that never returns an answer if the interpreter uses a depth-first search, but that will return an answer if the interpreter is modified to use an iterative deepening search. (Hint: there is an example that uses one rule and one fact.)

Answer:

Program: 
p(X) :- p(X).
p(a).

Query:
?- p(a).

Problem 10

Describe the use of edge detection and of thresholding in low-level computer vision.

Answer: In edge detection, one searches for places in the image where the intensity varies rapidly. Such an edge may indicate a significant boundary between perceived surfaces in the image. In threshholding, one uses thresholds of brightness intensity to segment the image into regions of similar brightness. This division into regions may, at least in part, correspond to significant partitions of the scene.

Problem 11

Describe briefly two ways in which texture can be used in vision systems.

Answer. (i) An abrupt change in the form of the texture can signal a change of surface. (ii) Variance in the size of textural elements are assumed to correspond to a change in the distance of the object. (iii) A change in a textural element that compresses it in one dimension, such as a change from a circle to an ellipse or from a square to a rectangle or parallelogram, can correspond to a change in the orientation of the surface from a head-on view to a sidelong view.