## Sample Exam Problems

### Problem 1:

Consider the following collection of sentences in the propositional logic.
1. P => (Q <=> R).
2. Q <=> ~(R^W)
3. Q => (P^W).
A. Convert this set to CNF. (You need not show the intermediate steps.)

```1A. ~P V ~Q V R.
1B. ~P V Q V ~R.
2A. ~Q V ~R V ~W
2B. Q V R.
2C. Q V W.
3A. ~Q V P
3B. ~Q V W.
```

B. Show how the Davis-Putnam assignment finds a satisfying assumption. (Assume that, when a branch point is reached, the algorithm chooses the first atom alphabetically and tries TRUE before FALSE.)

```S0:
1A. ~P V ~Q V R.
1B. ~P V Q V ~R.
2A. ~Q V ~R V ~W
2B. Q V R.
2C. Q V W.
3A. ~Q V P
3B. ~Q V W.

Try P=T. Delete ~P from 1A, 1B, 3A,
S1:
1A. ~Q V R.
1B. Q V ~R
2A. ~Q V ~R V ~W
2B. Q V R.
2C. Q V W.
3B. ~Q V W.

Try Q=T. Delete ~Q from 1A, 2A, 3A, 3B; delete 1B, 2B, 2C

1A. R.
2A. ~R V ~W
3B. W.

R and W are singleton clauses. Set R=T, W=T. Delete 1A, 3B. Delete ~R and
~W from 2A, leaving the null clause.

Backtrack to S1. Try Q=F. Delete 1A, 2A, 3B. Delete Q from 1B, 2B, 2C.

1B: ~R
2B: R
2C: W.

R and W are singleton clauses. Set R=T, W=T. Delete 1B, 2C. Delete ~R from 1B.
1B is now the null clause fail.

Backtrack to S0, try P=F. Delete 1A, 1B, delete P from 3A.
2A. ~Q V ~R V ~W
2B. Q V R.
2C. Q V W.
3A. ~Q
3B. ~Q V W.

3A is a singleton clause. Set Q=F. Delete 2A, 3A, 3B, delete Q from 2B, 2C.

2B. R
2C. W

2B, 2C are singleton clauses. Set R=T, W=T. Delete 2B, 2C. No clauses left,
succeed. Return P=F, Q=F, R=T, W=T.
```

### Problem 2:

What is the result of doing alpha-beta pruning in the game tree shown below?

### Problem 3:

Consider a domain where the individuals are people and languages. Let L be the first-order language with the following primitives:
```s(X,L) --- Person X speaks language L.
c(X,Y) --- Persons X and Y can communicate.
i(W,X,Y) --- Person W can serve as an interpreter between persons X and Y.
j,p,e,f --- Constants: Joe, Pierre, English, and French respectively.
```
A. Express the following statements in L:
• i. Joe speaks English.
• ii Pierre speaks French.
• iii. If X and Y both speak L, then X and Y can communicate.
• iv. If W can communicate both with X and with Y, then W can serve as an interpreter between X and Y.
• v. For any two languages L and M, there is someone who speaks both L and M.
• vi. There is someone who can interpret between Joe and Pierre.

• i. s(j,e).
• ii. s(p,f).
• iii. forall(X,Y,L) s(X,L) ^ s(Y,L) => c(X,Y)
• iv. forall(W,X,Y) c(X,W) ^ c(Y,W) => i(W,X,Y).
• v. forall(L,M) exists(W) s(W,L) ^ s(W,M)
• vi. exists(W) i(W,j,p)
B. Show how (vi) can be proven from (i)---(v) using resolution. You must show the Skolemized form of each statement, and every resolution that is used in the final proof. You need not show the intermediate stages of Skolemization, or show resolutions that are not used in the final proof.

Answer: The Skolemized forms of (i)-(v) are

• 1. s(j,e).
• 2. s(p,f).
• 3. not s(X,L) V not s(Y,L) V c(X,Y).
• 4. not c(X,W) V not c(Y,W) V i(W,X,Y).
• 5. s(sk1(L,M),L).
• 6. s(sk1(L,M),M).

(sk1(L,M) is a Skolem function mapping L and M to a person who speaks both L and M.)

The Skolemized form of the negation of (vi) is

• 7. not i(W,j,p).

The proof is then as follows:

• 8. not c(j,W) V not c(p,W). (7+4)
• 9. not s(j,L) V not s(W,L) V not c(p,W). (8+3)
• 10. not s(W,e) V not c(p,W). (9+1).
• 11. not s(W,e) V not s(p,L) V not s(W,L). (10+3)
• 12. not s(W,e) V not s(W,f). (11+2)
• 13. not s(sk1(e,M),f). (12+5)
• 14. null clause (13+6)

### Problem 4:

Let A, B, C be Boolean random variables. Assume that
```Prob(A=T) = 0.8
Prob(B=T | A=T) = 0.5.
Prob(B=T | A=F) = 1.
Prob(C=T | B=T) = 0.1
Prob(C=T | B=F) = 0.5
A and C are conditionally independent given B.
```
Evaluate the following terms. (If you wish, you can give your answer as an arithmetic expression such as "0.8*0.5 / (0.8*1 + 0.5*0.1)")
• i. Prob(B=T).
• ii. Prob(A=T | B=T)
• iii. Prob(C=T | A=F).
• i. Prob(B=T).
Prob(B=T) = Prob(B=T|A=T) Prob(A=T) + Prob(B=T|A=F) Prob(A=F) = 0.5*0.8 + 1*0.2= 0.6.
• ii. Prob(A=T | B=T)
Prob(A=T|B=T) = Prob(B=T|A=T) Prob(A=T)/Prob(B=T) = 0.5*0.8/0.6 = 2/3.
• iii. Prob(C=T | A=F).
Prob(C=T|A=F) = Prob(C=T,B=T|A=F) + Prob(C=T,B=F|A=F) =
Prob(C=T|B=T,A=F) Prob(B=T|A=F) + Prob(C=T|B=F,A=F) Prob(B=F|A=F) =
Prob(C=T|B=T) Prob(B=T|A=F) + Prob(C=T|B=F) Prob(B=F|A=F) = 0.1*1 + 0.5 * 0 = 0.1.

### Problem 5:

Consider the following data set with three Boolean predictive attributes, W,X,Y and Boolean classification C.
```  X   Y   Z   C  number of instances
-----------------------------------
T   T   T   T  7
T   T   F   F  1
T   F   T   T  2
F   T   T   F  1
F   F   T   F  8
F   F   F   T  1
```
A. What classifier does the 1R algorithm output?

The best rule based on X would be ``if (X==T) then predict C=T else predict C=F" with accuracy 18/20.
The best rule based on Y would be ``if (Y==T) then predict C=T else predict C=F" with accuracy 15/20.
No rule based on Z gets more than 1/2 right for either value of Z.
Hence, return the rule based on X: ``if (X==T) then predict C=T else predict C=F" with accuracy 18/20.

B. What classification does Naive Bayes predict for a test instance with X=F, Y=T, Z=F, and what probability does it give to that prediction?

Let A=P(X=F,Y=T,Z=F). Then P(C=T|X=F,Y=T,Z=F) = (1/A) * Freq(C=T) * Freq(X=F|C=T) * Freq(Y=T|C=T) * Freq(Z=F|C=T) = A(1/2) * (1/10) * (7/10) * (1/10) = 0.0035

P(C=F|X=F,Y=T,Z=F) = (1/A) * Freq(C=F) * Freq(X=F|C=F) * Freq(Y=T|C=F) * Freq(Z=T|C=F) = A * (1/2) * (9/10) * (2/10) * (1/10) = 0.009

A= 0.009 + 0.0035 =0.00935.
Predict C=F with probability 0.009/0.00935 = 0.96.

C. If the ID3 algorithm is run on this, the top-level node of the decision tree will be a test of X. Show the entire decision tree. (The remaining entropy comparisons are trivial.)

```                                (X)
T -----------|---------- F
(Z)                      (Z)
T---|---F                T---|---F
|       |                |       |
C=T     C=F              C=F     C=T
```
D. Give an argument that no linear discriminator can perfectly classify this data set. (Hint: Compare the effect of attribute Z on C in going from the first line to the second line against its effect in going from the 5th line to the 6th line.)

Answer: The only difference between line 1 and line 2 in the predictive attributes is that Z changes from T to F, causing a change of C from T to F. Therefore in a perfect linear discriminator, Z must have a positive weight. The only difference between line 5 and line 6 in the predictive attributes is that Z changes from T to F, causing a change of C from F to T. Therefore in a perfect linear discriminator, Z must have a negative weight. They can't both be true.

### Problem 6

Explain briefly (2 or 3 sentences) the use of a training set, a validation set and a test set in evaluating learning programs.

Answer: A training set is used by the base level learning algorithm to output a classifier. A validation set is used multiple times by a higher level process to compare different classifiers and to choose the best learning algorithm for the task or the best tuning for parameters in the learning algorithm. A test set is (ideally) used once to evaluate the quality of a classifier.