Artificial Intelligence Midterm Exam: Solutions Problem 1. (35 points) Suppose that the following facts and rules have been entered into Prolog: p(a, [a,a,a]). p(a, [a,b]). p(b, [b,a]). p(b, [a,b]). p(c, [a,c]). q(X,M) :- p(X, [X,Y|L]), !, p(Y, M). q(X,M) :- p(X, [M,X]). For each of the following queries, list all the answers that Prolog will give on successive backtrackings. (For example, for the query ?- p(a,L).'' the answer would be First Prolog returns L=[a,a,a]; then L=[a,b]; then fails.'') A. ?- p(X, [X|L]). B. ?- p(X, [X,Y | L]). C. ?- p(X, [a | X]). D. ?- q(c,Z). E. ?- q(S,T). Answer: A: Succeeds with X=a, L=[a,a]; then with X=a, L=[b]; then with X=b, L=[a]; then fails. B. Succeeds with X=a, Y=a, L=[a]; then with X=a, Y=b, L=[]; then with X=b, Y=a, L=[]; then fails. C. Fails D. Succeeds with Z =a (using the second rule for q.; then fails. E. Succeeds with S=a, T = [a,a,a]; then with S=a, T=[a,b]; then fails. Problem 2 We wish to write a Prolog predicate int(N)'' that returns successive value of the integers on successive backtrackings. That is, the query ?- int(N)'' first succeeds with N=0; then on backtracking succeeds with N=1; then with N=2; and so on. Which of the following is the correct code for int? (Only one is correct.) i. int(N) :- 0. int(N) :- int(M) + 1. ii. int(0). int(N) :- int(M), N is M+1. iii. int(0). int(N) :- int(N-1). iv. int(0). int(N) :- int(N), N is N+1. v. int(0). int(N) :- M is N-1, int(M). Answer: ii. B. Suppose we have the following Prolog definition: w(N) :- int(I), int(J), N is (10 * J) + I. What values are returned for the query ?- w(N)'' on successive backtrackings? Answer: 0, 10, 20, 30, ... Note that, since the call int(J) backtracks forever, Prolog never backtracks to the call to int(I). Problem 3 (40 points) Consider the following simple context free grammar: S -> NP VP NP -> | {} * PP* VP -> {} {NP} {NP} PP* PP -> NP : cold : the, a : can : fish, river, swim, cold : for, in : I : can, fish, swim, caught \end{verbatim} (Note: can'' as a verb means, Put in a can''; e.g. I am busy canning tomatoes today.'' Swim is a noun, as in "John went for a swim.") A. Show all the parse trees for I can fish''. B. The natural reading of I can fish'' is I am able to fish'', rather than I put fish into cans.'' What strategy could a program use to disambiguate this sentence correctly? C. Show the two parse trees for I caught a cold fish.'' D. What strategy could a program use to choose the correct parse tree for I caught a cold fish''? E. In I caught a fish'', the word caught'' means captured''. In I caught a cold'', the word caught'' means became ill with''. What strategy could be used to find the correct meaning of caught'' in each sentence (supposing that these were the only two possibilities)? Answer: A. There are two parse trees S ---> NP ---> ---> I | |-> VP ---> ---> can | |-> ---> fish S ---> NP ---> ---> I | |-> VP ---> ---> can | |-> NP ---> ---> fish B. Simple lexical frequency is best here: "can" is very much more often used as the modal meaning "able to" than as the verb meaning "put in cans". (Incidentally, in spoken English, the two are distinguished. If the meaning is "I am able to fish", the word "can" would be unstressed and the vowel would be likely to shift to "uh". If the meaning were "put into cans", the word would have to get at least equal stress and the vowel enunciated clearly.) C. S ---> NP ---> ---> I | |-> VP ---> ---> caught | |-> NP ---> ---> a | |-> ---> cold | |-> ---> fish S ---> NP ---> ---> I | |-> VP ---> ---> caught | |-> NP ---> ---> a | | | |-> ---> cold | |-> NP ---> ---> fish D. The English construction with two NP's following the verb is used in sentences like "John gave Mary a rose", or "John told Mary a story". Here, the second NP is the object of the verb, and the first NP is the indirect object, the recipient of the action. (I did mention this in class, but I should have reminded you of it on the exam. My apologies.) One can use "catch" in this construction --- e.g. "I caught Mary a fish for breakfast," --- but there is a selectional restriction that the indirect object has to be the kind of thing that can possess or own the direct object; either animate or an organization. One cannot catch fish for a cold, because a cold is not animate and cannot own fish. This was probably too hard a problem for this setting, so I have been lenient in accepting reasonable answers. E. The word "caught" meaning "seized" has a selectional restriction that it takes a physical object as its object. The word "caught" meaning "became ill with" has a selectional restriction that it takes a contagious disease as its object. \noindent {\bf Problem 4} (10 points) \begin{itemize} \item[A.] Let w[1], w[2] ... w[k] be a sequence of k words. Which of the following formulas expresses the trigram assumption? (Only one is correct.) Explain the formula in words. i. P(w[I]) = P(w[I] | w[1] w[2] ... w[I]) ii. P(w[I]) = P(w[I] | w[1] w[2] ... w[I-1]) iii. P(w[I]) = P(w[I] | w[I-2] w[I-1]) iv. P(w[I] | w[1] w[2] ... w[I-1]) = P(w[I] | w[I-1]) v. P(w[I] | w[1] w[2] ... w[I-1]) = P(w[I] | w[I-2] w[I-1]) Answer: v. The probability of finding a specified word w_[I] after a specified sequence w[1] ... w[I-1] depends only on the two most recent words w[I-2] and w[I-1]. (Incidentally, the meaning of formula iii would be this: The probability of finding word w[I] after words w[I-2] and w[I-1] is the same as the probability of finding word w[I] given no information about the other words in the sentence. In other words, this formula would mean that w[I] is independent of the two previous words in the sentence.) B. In reading a new text, it is quite likely that you will encounter a triple of consecutive words that you have never encountered anywhere else. Why is this a problem for building a usable trigram model of natural language? Describe a solution that is commonly used for this problem. Answer: The problem is that, if you are estimate the probability P(w[I] | w[I-2] w[I-1]) as the frequency ratio freq(w[I] | w[I-2] w[I-1]) = #(w[I-2] w[I-1] w[I]) / #(w[I-2] w[I-1]) and the sequence w[I-2] w[I-1] w[I] has never been encountered before, then it will be assigned a probability of zero. That is, it will be treated as impossible and any other reading, no matter how implausible, will be preferred. The solution is to use smoothing, where the probability P(w[I] | w[I-2] w[I-1]) is estimated as a weighted sum, combining the unigram model, the bigram model, and the trigram model P(w[I] | w[I-2] w[I-1]) = L1*freq(w[I]) + L2*freq(w[I] | w[I-1]) + L3*freq(w[I] | w[I-2] w[I-1]) That is, each word w[I] has a certain probability of turning up in an entirely new context, and each word has a certain probability of following a word w[I-1] that it's been seen with before.