1. P => (Q <=> R).A. Convert this set to CNF. (You need not show the intermediate steps.)

2. Q <=> ~(R^W)

3. Q => (P^W).

** Answer: **

1A. ~P V ~Q V R. 1B. ~P V Q V ~R. 2A. ~Q V ~R V ~W 2B. Q V R. 2C. Q V W. 3A. ~Q V P 3B. ~Q V W.

B. Show how the Davis-Putnam assignment finds a satisfying assumption. (Assume that, when a branch point is reached, the algorithm chooses the first atom alphabetically and tries TRUE before FALSE.)

S0: 1A. ~P V ~Q V R. 1B. ~P V Q V ~R. 2A. ~Q V ~R V ~W 2B. Q V R. 2C. Q V W. 3A. ~Q V P 3B. ~Q V W. Try P=T. Delete ~P from 1A, 1B, 3A, S1: 1A. ~Q V R. 1B. Q V ~R 2A. ~Q V ~R V ~W 2B. Q V R. 2C. Q V W. 3B. ~Q V W. Try Q=T. Delete ~Q from 1A, 2A, 3A, 3B; delete 1B, 2B, 2C 1A. R. 2A. ~R V ~W 3B. W. R and W are singleton clauses. Set R=T, W=T. Delete 1A, 3B. Delete ~R and ~W from 2A, leaving the null clause. Backtrack to S1. Try Q=F. Delete 1A, 2A, 3B. Delete Q from 1B, 2B, 2C. 1B: ~R 2B: R 2C: W. R and W are singleton clauses. Set R=T, W=T. Delete 1B, 2C. Delete ~R from 1B. 1B is now the null clause fail. Backtrack to S0, try P=F. Delete 1A, 1B, delete P from 3A. 2A. ~Q V ~R V ~W 2B. Q V R. 2C. Q V W. 3A. ~Q 3B. ~Q V W. 3A is a singleton clause. Set Q=F. Delete 2A, 3A, 3B, delete Q from 2B, 2C. 2B. R 2C. W 2B, 2C are singleton clauses. Set R=T, W=T. Delete 2B, 2C. No clauses left, succeed. Return P=F, Q=F, R=T, W=T.

s(X,L) --- Person X speaks language L. c(X,Y) --- Persons X and Y can communicate. i(W,X,Y) --- Person W can serve as an interpreter between persons X and Y. j,p,e,f --- Constants: Joe, Pierre, English, and French respectively.A. Express the following statements in L:

- i. Joe speaks English.
- ii Pierre speaks French.
- iii. If X and Y both speak L, then X and Y can communicate.
- iv. If W can communicate both with X and with Y, then W can serve as an interpreter between X and Y.
- v. For any two languages L and M, there is someone who speaks both L and M.
- vi. There is someone who can interpret between Joe and Pierre.

**Answer:**

- i. s(j,e).
- ii. s(p,f).
- iii. forall(X,Y,L) s(X,L) ^ s(Y,L) => c(X,Y)
- iv. forall(W,X,Y) c(X,W) ^ c(Y,W) => i(W,X,Y).
- v. forall(L,M) exists(W) s(W,L) ^ s(W,M)
- vi. exists(W) i(W,j,p)

** Answer: ** The Skolemized forms of (i)-(v) are

- 1. s(j,e).
- 2. s(p,f).
- 3. not s(X,L) V not s(Y,L) V c(X,Y).
- 4. not c(X,W) V not c(Y,W) V i(W,X,Y).
- 5. s(sk1(L,M),L).
- 6. s(sk1(L,M),M).

(sk1(L,M) is a Skolem function mapping L and M to a person who speaks both L and M.)

The Skolemized form of the negation of (vi) is

- 7. not i(W,j,p).

The proof is then as follows:

- 8. not c(j,W) V not c(p,W). (7+4)
- 9. not s(j,L) V not s(W,L) V not c(p,W). (8+3)
- 10. not s(W,e) V not c(p,W). (9+1).
- 11. not s(W,e) V not s(p,L) V not s(W,L). (10+3)
- 12. not s(W,e) V not s(W,f). (11+2)
- 13. not s(sk1(e,M),f). (12+5)
- 14. null clause (13+6)

Prob(A=T) = 0.8 Prob(B=T | A=T) = 0.5. Prob(B=T | A=F) = 1. Prob(C=T | B=T) = 0.1 Prob(C=T | B=F) = 0.5 A and C are conditionally independent given B.Evaluate the following terms. (If you wish, you can give your answer as an arithmetic expression such as "0.8*0.5 / (0.8*1 + 0.5*0.1)")

- i. Prob(B=T).
- ii. Prob(A=T | B=T)
- iii. Prob(C=T | A=F).

- i. Prob(B=T).

Prob(B=T) = Prob(B=T|A=T) Prob(A=T) + Prob(B=T|A=F) Prob(A=F) = 0.5*0.8 + 1*0.2= 0.6. - ii. Prob(A=T | B=T)

Prob(A=T|B=T) = Prob(B=T|A=T) Prob(A=T)/Prob(B=T) = 0.5*0.8/0.6 = 2/3. - iii. Prob(C=T | A=F).

Prob(C=T|A=F) = Prob(C=T,B=T|A=F) + Prob(C=T,B=F|A=F) =

Prob(C=T|B=T,A=F) Prob(B=T|A=F) + Prob(C=T|B=F,A=F) Prob(B=F|A=F) =

Prob(C=T|B=T) Prob(B=T|A=F) + Prob(C=T|B=F) Prob(B=F|A=F) = 0.1*1 + 0.5 * 0 = 0.1.

X Y Z C number of instances ----------------------------------- T T T T 7 T T F F 1 T F T T 2 F T T F 1 F F T F 8 F F F T 1A. What classifier does the 1R algorithm output?

** Answer:**

The best rule based on X would be ``if (X==T) then predict C=T else predict
C=F" with accuracy 18/20.

The best rule based on Y would be ``if (Y==T) then predict C=T else predict
C=F" with accuracy 15/20.

No rule based on Z gets more than 1/2 right for either value of Z.

Hence, return the rule based on X:
``if (X==T) then predict C=T else predict
C=F" with accuracy 18/20.

B. What classification does Naive Bayes predict for a test instance with X=F, Y=T, Z=F, and what probability does it give to that prediction?

** Answer:**

Let A=P(X=F,Y=T,Z=F). Then P(C=T|X=F,Y=T,Z=F) = (1/A) * Freq(C=T) * Freq(X=F|C=T) * Freq(Y=T|C=T) * Freq(Z=F|C=T) = A(1/2) * (1/10) * (7/10) * (1/10) = 0.0035

P(C=F|X=F,Y=T,Z=F) = (1/A) * Freq(C=F) * Freq(X=F|C=F) * Freq(Y=T|C=F) * Freq(Z=T|C=F) = A * (1/2) * (9/10) * (2/10) * (1/10) = 0.009

A= 0.009 + 0.0035 =0.00935.

Predict C=F with probability 0.009/0.00935 = 0.96.

C. If the ID3 algorithm is run on this, the top-level node of the decision tree will be a test of X. Show the entire decision tree. (The remaining entropy comparisons are trivial.)

(X) T -----------|---------- F (Z) (Z) T---|---F T---|---F | | | | C=T C=F C=F C=TD. Give an argument that no linear discriminator can perfectly classify this data set. (Hint: Compare the effect of attribute Z on C in going from the first line to the second line against its effect in going from the 5th line to the 6th line.)

** Answer: **
The only difference between line 1 and line 2 in the predictive attributes
is that Z changes from T to F, causing a change of C from T to F. Therefore
in a perfect linear discriminator, Z must have a positive weight.
The only difference between line 5 and line 6 in the predictive attributes
is that Z changes from T to F, causing a change of C from F to T. Therefore
in a perfect linear discriminator, Z must have a negative weight. They
can't both be true.

** Answer:**
A training set is used by the base level learning algorithm to output a
classifier. A validation set is used multiple times by a higher level process
to compare different classifiers and to choose the best learning algorithm
for the task or the best tuning for parameters
in the learning algorithm. A test set is (ideally) used once to evaluate
the quality of a classifier.