Note: As always, it is important to use good test-taking strategy: First work on the problems that you feel confident about, then tackle the ones you are not sure about. In particular, problem 3.D, though not actually difficult once you have completed 3.C, could potentially take you off on wild goose chase if you take a wrong turn. If you are not confident about backward chaining, you should probably leave that till last. There is no need to have the answers in sequential order in the exam booklet.

~(P^Q) <=> R.

** Answer: **

P V R.

Q V R.

~R V ~P V ~Q.

- What is the state space? That is, what is a state, and what
are the neighbors to a given state?
**Answer:**A state is a valuation on the atoms. A neighbor to state S is a valuation that differs on a single atom. - What is the error function?
**Answer:**The number of unsatisfied clauses. - What is the major limitation of this algorithm as compared to Davis-Putnam?
**Answer:**The algorithm is only likely to give you an answer; it is not guaranteed to find one. In particular, if the algorithm does not find a satisfying valuation, you don't know whether that is because no such valuation exists, or just because the algorithm has been unlucky.

b(E,F) -- E occurs before F. That is, E completes before F begins.

m(E,F) -- E meets F. That is, E completes just as F begins

d(E,F) -- E occurs during F. That is, first F starts, then E starts, then E ends, then F ends.

f(E,F) -- E finishes F. That is, F starts before E but they end together.

q(E,F) -- E and F occur at the same time; that is, they start together and end together.Constants: g (Hamlet's encounter with the ghost); a1 (act 1); a2 (act 2); p (Hamlet's encounter with the players); y (performance of the play).

- A. Represent the following statements in L:
- i. If E meets F and F occurs during G then E occurs before G.

**Answer:**forall(E,F,G) m(E,F) ^ d(F,G) => b(E,G). - ii. If E occurs before F and F occurs before G, then E occurs before G.

**Answer:**forall(E,F,G) b(E,F) ^ b(F,G) => b(E,G). - iii. If E finishes F and F meets G then E meets G.

**Answer:**forall(E,F,G) f(E,F) ^ m(F,G) => m(E,G). - iv. For any event E there exists an event F such that F occurs during E.

**Answer:**forall(E) exists(F) d(F,E). - v. If E occurs during F and G occurs at the end of F, then

**Answer:**forall(E,F,G) [m(E,G) ^ m(F,G)] => [f(E,F) V f(F,E) V q(E,F)].

**Answer:** - i. If E meets F and F occurs during G then E occurs before G.
- B. Statements iv and v cannot be expressed in Datalog. Why not?
**Answer:**Statement iv involves an existential quantifier, and statement v involves disjunction. - C. Given the above and the facts "Hamlet encounters the Ghost at the end
of Act 1", "Act 1 meets Act 2", "Hamlet encounters the players during
Act 2'', and ``The encounter with the players precedes the performance of
the play'', show how ``Hamlet's encounter with the ghost precedes the
performance of the play'' may be inferred by Datalog forward chaining.
You need only show the steps that lead to the desired conclusion.
**Answer:**Given i,ii,and iii above and givenvi. f(g,a1).

Unifying vi and vii with the left side of iii under the binding E=g, F=a1, G=a2 gives (x.) m(g,a2).

vii. m(a1,a2).

viii. d(p,a2).

ix. b(p,y).

Unifying x and viii with the left side of i under the binding E=g, F=a2, G=p gives (xi.) b(g,p).

Unifying xi and ix with the left side of ii under the binding E=g, F=p, G=y gives (xii) b(g,y), which is the desired result. - D. Show how the same inference is carried out using Datalog backward
chaining. Again, you need only show the steps that lead to the desired
conclusion. (Very important!)
Goal G0: b(g,y). Unifying with the head of ii under the bindings E=g, G=y, creates the two subgoals G1: b(g,F1); G2: b(F1,y). Goal G1: b(g,F1). Unifying with the head of (i) under the binding E=g,G=F1 creates the two subgoals G3: m(g,F2); G4: d(F1,F2). Goal G3: m(g,F2). Unifying with the head of iii under the binding E=g, G=F2 creates the two subgoals G5: f(g,F3), G6: m(F3,F2). Goal G5: f(g,F3). Unifying with (vi) under the binding F3=a1 succeeds. Goal G6: m(a1,F2). Unifying with (vii) under the binding F2=a2 succeeds. Goal G3 succeeds. Goal G4: d(F1,a2). Unifying with (viii) under the binding F1=p succeeds. Goal G1 succeeds. Goal G2: b(p,y). Unifying with (ix) succeeds. Goal G0 succeeds.

** Answer: **

Assume that all the random variables are Boolean.

- A. What absolute and conditional probabilities are recorded
in the network?
**Answer:**Prob(A=T/F), Prob(B=T/F), Prob(C=T/F | A=T/F,B=T/F), Prob(D=T/F | A=T/F, B=T/F). - B. For each of the following, is it true or false in the network? More
than one may be true:
- i. A and B are independent absolutely.
- ii. A and B are conditionally independent given C and D.
- iii. A and C are independent absolutely.
- iv. A and C are conditionally independent given B and D.
- v. C and D are independent absolutely.
- vi. C and D are conditionally independent given A and B.

**Answer:**(i) and (vi) are true; the rest are false. - C. How is Prob(C=T|A=F) computed from the values in this table?
**Answer:**Prob(C=T|A=F) = Prob(C=T,B=T|A=F) + Prob(C=T,B=F|A=F) = Prob(C=T|B=T,A=F) Prob(B=T|A=F) + Prob(C=T|B=F,A=F) Prob(B=F|A=F) = Prob(C=T|B=T,A=F) Prob(B=T) + Prob(C=T|B=F,A=F) Prob(B=F) =

NG -> NounAssume that "NYU", "computer", "science", and "department", and "office" are all labelled "nouns" in the lexicon.

NG -> NG NG

- A. Show all the parse trees for the NG "NYU computer science department
office". (You may abbreviate the leaves by their initial letters "n", "c", "s",
"d"). Which is the correct one?
1. NG ---> NG ---> Noun ---> n | |-> NG ---> NG ---> Noun ---> c | |-> NG ---> NG ---> Noun ---> s | |-> NG ---> Noun ---> d 2. NG ---> NG ---> Noun ---> n | |-> NG ---> NG ---> NG ---> Noun ---> c | | | |-> NG ---> Noun ---> s | |-> NG ---> Noun ---> d 3. NG ---> NG ---> NG ---> Noun ---> n | | | |-> NG ---> Noun ---> c | |-> NG ---> NG ---> Noun ---> s | |-> NG ---> Noun ---> d 4. NG ---> NG ---> NG ---> Noun ---> n | | | |-> NG ---> NG ---> Noun ---> c | | | |-> NG ---> Noun ---> s | |-> NG ---> Noun ---> d 5. NG ---> NG ---> NG ---> NG ---> Noun ---> n | | | | | |-> NG ---> Noun ---> c | | | |-> NG ---> Noun ---> s | |-> NG ---> Noun ---> d

(2) is the correct tree. -
B. Suppose that the sentence
"Pat works for the NYU computer science department" is input to a chart
parser, which contains the above as part of its grammar. Assume
that the chart parser generates the dotted edge [4,4,NG -> * NG NG]. For each
of the following edges, identify which of the parse trees in your answer to
(A) corresponds to the edge. (You should number the parse trees in (A) and
answer here with those numbers. An edge may correspond to more than one tree.
Note: 4 is the space between "for" and "NYU"; 5 is the space between "NYU"
and "computer" etc.)
- i. [5,6, NG -> Noun *].
**Answer:**All. - ii. [5,7, NG -> NG NG *].
**Answer:**Trees 2 and 4. - iii. [5,7, NG -> NG * NG ]
**Answer:**Tree 1, 2, and 4.

- i. [5,6, NG -> Noun *].

Prof. Jones has determined that the coin dates from the time of Julius Caesar, when it was about the price of a loaf of bread.

- A. Identify two instances of lexical ambiguity, one of syntactic
ambiguity, and one of anaphoric ambiguity.
**Answer:**(These are not all the possible answers:)

Lexical: "determined"="figured out" / "decided" / "made inevitable".

"dates" = "was created at a specified time" / "assigns to a particular time" / "accompanies on romantic outings".

"about" = "approximately" / "concerning"

"from", "time", "of", "price" and "bread" are also lexically ambiguous.Syntactic: "from the time" can attach to "dates" or "determined".

"when ..." can attach to "Julius Caesar", "time", "dates", or "determined". Anaphoric: "it" can refer to "coin" or "time", or even to Prof. Jones' determination.**Answer:** - B. Describe how one possible misreading can be excluded. Be specific.
**Answer:**(Many answers are possible.) The reading "dates = accompanies on romantic outings" can be excluded by selectional restriction: the subject of such a reading must be human, but "coin" cannot be human.

Prob(

where C is the training corpus and

What would be the problem if you use the set of weights
*w _{1}* = 0.98,

** Answer: ** This effectively eliminates the K-gram part of the model,
and makes the choice of tag depend only on the corresponding element.

What would be the problem if you use the set of weights
*w _{1}* = 0.01,

** Answer: ** This is subject to the sparse data problem; if the trigram
does not occur in the corpus, it will be judged as extremely unlikely.

By Bayes' law, Prob(X.C=V | X.P=p, X.Q=q, X.R=r) = (a) Prob(X.P=p, X.Q=q, X.R=r | X.C=V) Prob(X.C=V) / Prob(X.P=p, X.Q=q, X.R=r) We may ignore the factor (b) 1 / Prob(X.P=p, X.Q=q, X.R=r) because (c) that is the same for all values of V (it is a normalizing factor). We assume that attributes (d) P, Q, and R are conditionally independent given (e) C, giving us the equation (f) Prob(X.P=p, X.Q=q, X.R=r | X.C=V) = (g) Prob(X.P=p | X.C=V) * Prob(X.Q=q | X.C=V) * Prob(X.R=r | X.C=V) Thus, we conclude that the value of V that maximizes Prob(X.C=V | X.P=p, X.Q=q, X.R=r) is the same as the value of V that maximizes (h) Prob(X.P=p | X.C=V) * Prob(X.Q=q | X.C=V) * Prob(X.R=r | X.C=V) * Prob(X.C=V). and the numbers on this last expression can all be estimated from the training data.

- A. Can Naive Bayes make use of instances with null values in the
training set? Explain your answer.
**Answer:**Yes. Suppose that P and Q are predictive attributes, C is the classifaction attribute, and an instance I in the training set has a value recorded for I.P but not for I.Q. Then I can be used in computing Prob(X.P|X.C) even though it must be ignored in computing Prob(X.Q|X.C). - B. Can K-Nearest neighhbors make use of instances with null values in the
training set? Explain your answer.
**Answer:**No. The distance from a new instance X to a training instance Y is some function of the difference over all the attributes, and there is no way to compute that if one of the attributes is missing.

** Answer: ** If you require that AVG_ENTROPY(AS,C,T) be much smaller
than ENTROPY(C,T), then you will often fail to split on an attribute
that is actually informative about C. If you eliminate the test, then
you may end up splitting on attributes whose relation to the classification
attribute is actually completely random, and thus end up with a tree that
overfits the data.