Advanced Artificial Intelligence: Final Exam. Dec. 1998

Problem 1:

(15 points)

Let A,B,C,D be Boolean random variables. Consider the following Bayesian network:

How can the following probabilities be calculated from the quantities in the above net?
• A. Prob(C=T)
• B. Prob(A=T | C=T)
• C. Prob(D=T | A=T, C=T)

A. P(C=T) =
P(C=T | A=T, B=T) P(A=T) P(B=T) + P(C=T | A=T, B=F) P(A=T) P(B=F) +
P(C=T | A=F, B=T) P(A=F) P(B=T) + P(C=T | A=F, B=F) P(A=F) P(B=F)

B. P(A=T | C=T) = (by Bayes' Law)
P(C=T | A=T) P(A=T) / P(C=T).
The denominator P(C=T) is given above in (A).
Now, P(C=T | A=T) = P(C=T,B=T | A=T) + P(C=T, B=F | A=T) =
P(C=T | B=T, A=T) P(B=T | A=T) + P(C=T | B=F, A=T) P(B=F | A=T) =
(since B is independent of A)
P(C=T | A=T, B=T) P(B=T) + P(C=T | A=T,B=F) P(B=F).

Putting this all together we get
P(A=T | C=T) =
[P(C=T | A=T, B=T) P(A=T) P(B=T) + P(C=T | A=T, B=F) P(A=T) P(B=F)] /
[P(C=T | A=T, B=T) P(A=T) P(B=T) + P(C=T | A=T, B=F) P(A=T) P(B=F) +
P(C=T | A=F, B=T) P(A=F) P(B=T) + P(C=T | A=F, B=F) P(A=F) P(B=F) ]

C.
P(D=T | A=T, C=T) =
P(D=T,B=T | A=T,C=T) + P(D=T,B=F | A=T,C=T) =
P(D=T | B=T,A=T,C=T) P(B=T | A=T,C=T) + P(D=T | B=F,A=T,C=T) P(B=F | A=T,C=T).

Now P(D=T | B=T, A=T, C=T) = P(D=T | B=T, C=T) since D is independent of A given B and C.

P(B=T | A=T, C=T) = (by Bayes' law)
P(C=T | B=T, A=T) P(B=T | A=T) / P(C=T | A=T) =
P(C=T | B=T, A=T) P(B=T) /
[P(C=T | B=T, A=T) P(B=T) + P(C=T | B=F, A=T) P(B=F)]

Putting all this together gives
P(D=T | A=T, C=T) =
[P(D=T | B=T, C=T) P(C=T | B=T, A=T) P(B=T) +
P(D=T | B=F, C=T) P(C=T | B=F, A=T) P(B=F)] /
[P(C=T | B=T, A=T) P(B=T) + P(C=T | B=F, A=T) P(B=F)]

Problem 2:

(5 points) Let A and B be Boolean random variables. Suppose that you are given a formula for unnormalized probabilities involving A and B. Calculating with this formula, you obtain
```Prob(B = T | A = T) = 3
Prob(B = F | A = T) = 5
Prob(B = T | A = F) = 1
Prob(B = F | A = F) = 4
```
What are the actual probabilities?

```Prob(B = T | A = T) = 3/8
Prob(B = F | A = T) = 5/8
Prob(B = T | A = F) = 1/5
Prob(B = F | A = F) = 4/5
```

Problem 3

(15 points) Consider the sentence, "The oldest sculptures in the museum and some recent copies are on display."

A. The attachment of "and some recent copies" is syntactically ambiguous. What strategy can be used to determine whether it attaches to "sculptures" or to "museum"?

Answer: World knowledge: A sculpture is much more likely to be in a museum than in a copy of a sculpture. (This assumes that the ambiguity in (B) is resolved correctly.)

B. There is also a somewhat subtle anaphoric ambiguity in the sentence. What is it, and how can it be resolved?

Answer: The ambiguity is whether "copies" means "copies of sculptures" or "copies of a museum." The resolution again involves world knowledge: A sculpture is much more likely to be copied than a museum.

C. The semantics of a superlative like ``oldest'' are rather complex. What is the difference in semantic structure between "The oldest sculptures in the museum" and "The marble sculptures in the museum"?

Answer: The word "marble" simply states a property of the sculptures: these sculptures happen to be marble. But it is nonsense to say that these sculptures happen to be oldest. Rather, the meaning of "oldest" is dependent on the qualification "in the museum"; these are sculptures such that no other sculptures in this museum are older.

Problem 4

(20 points)

A. What is the k-gram model, in natural language interpretation? Describe one way it is applied (2 or 3 sentences).

Answer: The k-gram model is a probabilistic model in which it is assumed that the probability of a "word" (loosely interpreted) occuring in a sentence depends only on the previous k-1 words, and, given those k-1 words, is independent of all previous words. "Word" here can be (a) an actual word; (b) a part of speech; (c) a phoneme; (d) a particular meaning of a particular word; among others.

B. How can the k-gram model be viewed as a Markov model?

Answer: The states of the Markov model are tuples of k-1 words. If word wk follows tuple w1 ... wk-1 with probability p in the k-gram model, this will be reflected in the Markov model by an arc from the state w1 ... wk-1 to state w2 ... wk with probability p.

Problem 5

(15 points) Briefly (2 or 3 paragraphs) discuss the comparative advantages and disadvantages of the knowledge-based, linguistic approach to natural language processing versus the statistical, corpus-based approach.

Answer: The main points I wanted you to touch on were: As an advantage of the knowledge-based approach, the fact that, ultimately, a complete natural language system will have to deal with all the issues that arise in this approach. Also that research in this approach can draw on the work in linguistics.

As a disadvantage of the statistical approach, the "sparse data" and the "long tail" problems; the facts that no NL text of plausible size will contain all the low-level combinations that are possible.

As disadvantages to the knowledge-based approach: the fact that many of the strategies proposed are extremely ill-defined, and, in our current state of knowledge, unimplementable except in toy illustrations. The interaction between different kinds of information and the use of plausible inference is particularly poorly understood. The fact that this is a highly labor-intensive approach.

As an advantage of the statistical approach: the fact that working systems can often be made to work quite effectively by compiling statistics over existing online texts with a small fraction of the work that would be needed in the knowledge-based approach. The fact that this system in some respects allows many different types of information --- syntactic, semantic, pragmatic --- to be combined effortlessly.

Problem 6

(15 points) In computer vision,

A. What is segmentation?

Answer: The division of an image into regions corresponding to different surfaces in the actual scene.

B. What is the general viewpoint assumption?

Answer: The assumption that significant qualitative features of the image are stable under sufficiently small changes of viewpoint.

C. What is the difference between an affine and a perspective transformation? (You need not give the formula for the coordinate transformation.) Under what circumstances are the two classes of transformations applicable?

Answer: In an affine transformation, parallel lines remain parallel. In a perspective transformation, a set of parallel lines may turn into a set of lines that meet at a single point. Affine transformations are applicable in cases where the objects in the scene are far from the eye as compared to their size, but their orientation relative to the line of sight varies. Perspective transformations apply when the distance from the eye to an object is small compared to the size of the object, and so some parts of the object are much closer to the eye than others. than

Problem 7

(15 points)

Consider the following microworld, a trivial form of Chinese checkers:

You have an arrangement of locations and a smaller number of pieces. Each piece sits at one location. Each location holds at most one piece.

If piece P1 is at location L1, L1 is next to L2, and L2 is empty, then you can move P1 to L2. If piece P1 is at L1; P2 is at L2; L3 is empty; and L1, L2, and L3 are in sequence on a line; then you can jump P1 over P2 and put it in L3.

This microworld can be described in the situation calculus using the following language L:

```Sorts: Pieces and locations.

Atemporal:       next(L1,L2) : Predicate.
line(L1,L2,L3) : Predicate.

Boolean fluents: at(P,L) : Function.
empty(L) : Function

Actions:         move(P,L1,L2) : Function
jump(P1,P2,L1,L2,L3) : Function

Temporal:        holds(S,F).
result(S,A).
poss(S,A).
```

A. State in L the domain constraint that a location L1 is empty if and only if there is no piece in L1.

Answer: holds(S,empty(L1)) iff not exists(P) holds(S,at(P,L1))

B. State the causal axiom for move.