The final exam will be given Thursday, May 7 from 5:00 to 6:50 in Warren Weaver, room 101. It is closed book and closed notes. You will not need a calculator.

- Search
- Blind search -- R+N chap 3 through 3.5.
- Informed search --- R+N secs 4.3.

- Game playing -- R+N chapter 6 through 6.4.
- Automated reasoning -- R+N chaps 7 through 7.6; chapter 8 through 8.3; chapter 9; handouts.
- Probability.
- Machine Learning
- 1R, nearest neighbors, Naive Bayes --- handout.
- Decision trees --- R+N sec 18.3.
- Perceptrons/Back propagation networks --- R+N secs 20.5
- Evaluation --- handout
- Minimum description length learning --- handout

You should understand the following algorithms well, though I would not ask you to carry them out on an exam: GSAT; simulating annealing; ID3 (called "DECISION-TREE-LEARNING in R+N, p. 658); perceptron learning; feed-forward back-propagation neural networks.

s(X,L) --- Person X speaks language L. c(X,Y) --- Persons X and Y can communicate. i(W,X,Y) --- Person W can serve as an interpreter between persons X and Y. j,p,e,f --- Constants: Joe, Pierre, English, and French respectively.A. Express the following statements in L:

- i. Joe speaks English.
- ii Pierre speaks French.
- iii. If X and Y both speak L, then X and Y can communicate.
- iv. If W can communicate both with X and with Y, then W can serve as an interpreter between X and Y.
- v. For any two languages L and M, there is someone who speaks both L and M.
- vi. There is someone who can interpret between Joe and Pierre.

Prob(A=T) = 0.8 Prob(B=T | A=T) = 0.5. Prob(B=T | A=F) = 1. Prob(C=T | B=T) = 0.1 Prob(C=T | B=F) = 0.5 A and C are conditionally independent given B.Evaluate the following terms. (If you wish, you can give your answer as an arithmetic expression such as "0.8*0.5 / (0.8*1 + 0.5*0.1)")

- i. Prob(B=T).
- ii. Prob(A=T | B=T)
- iii. Prob(C=T | A=F).

B. Describe the ID3 algorithm to construct decision trees from training data.

C. What is the entropy of a classification C in table T? What is the expected entropy of classification C if table T is split on predictive attribute A?

D. What kinds of techniques can be used to counter the problem of over-fitting in decision trees?

W X Y C ---------------- T T T T T F T F T F F F F T T F F F F TWe now encounter a new example: W=F, X=T, Y=F. If we apply the Naive Bayes method, what probability is assigned to the two values of C?

- a. Activation levels are propagated from the inputs through the hidden layers to the outputs.
- b. Activation levels are propagated from the outputs through the hidden layers to the inputs.
- c. Weights on the links are modified based on messages propagated from input to output.
- d. Weights on the links are modified based on messages propagated from output to input.
- e. Connections in the network are modified, gradually shortening the path from input to output.
- f. Weights at the input level are compared to the weights at the output level, and modified to reduce the discrepancy.

A. perfect classification hypotheses (i.e. classification hypotheses that always give the correct classification, given the values of the predictive attributes) for nominal classifications.

B. imperfect classification hypotheses (i.e. hypotheses that do better than chance) for nominal attributes.

C. approximate classification hypotheses for numeric classifications. (i.e. hypotheses that give answers that are nearly correct.)