## Final Exam: Artificial Intelligence December 1998

### Problem 1

(20 points) Consider the problem of lexical disambiguation; that is, determining the meaning of particular words. Suppose that you have a on-line corpus of text in which each word has been tagged with an indication of which of its meanings apply. For instance, suppose the word "bat" has four meanings in the dictionary: (1) noun meaning a club; (2) noun meaning a flying mammal; (3) verb meaning to hit with a club; (4) verb meaning to discuss. Then in the sentence, "The bat is asleep" the word "bat" would be tagged with the label (2). In the sentence "McGwire will bat third", the word "bat" would be tagged with the label (3). Each word in the corpus is tagged similarly.

A. Given such a corpus, how could you automatically set up a trigram model to address the problem of lexical disambiguation for a standard (untagged) text?

Answer: Each element of the trigram model will be a particular meaning of a particular word, such as "bat-3". The model will record the probability that word-meaning w3 follows word-meanings w1, w2. To avoid the sparse data problem, this is estimated as the sum
P(w3 | w1,w2) = L1 Freq(w3 | w1,w2) + L2 Freq(w3 | w2) + L3 Freq(w3),
where L1, L2, L3 are positive weights adding to 1.

B. What is the difference between the trigram model in (A) and the disambiguation strategy, "Frequency in context"? Give an example of a sentence that could be disambiguated using the trigram model but not using frequency in context. Give an example of a sentence that could be disambiguated using frequency in context but not using the trigram model.

Answer: The trigram model in (A) picks the meaning of the word by considering the last two words. The strategy "Frequency in context" picks the meaning of the word by consider the general context of discussion. For example, in the sentence "Last night's baseball game was interrupted by a flock of bats'', the trigram model would correctly interpret ``bats'' to mean ``flying mammals'' as much more likely to follow the words ``flock of''. The ``frequency in context'' heuristic, on the other hand, would get this wrong: the context would be taken to be "baseball game", and "bat" would be interpreted as "club". In the sentences, ``In the zoo, there is a bat", or "One thing you need for a baseball game is a bat", the trigram model provides no guidance, since the two preceding words "is a" do not affect the likelihood of the meanings of "bat". However, frequency in context gets these right; the contexts is "zoo" and "baseball", and the correct meanings are chosen.

### Problem 2

(10 points) What are the inputs and output to the syntactic parser and the semantic analyzer in a natural language understanding system? Give an example of (a) a sentence; (b) the output of the syntactic parser (c) the output of the semantic analysis. (A simple sentence will be fine.)

Answer: The input to the syntactic parser is a morphological analysis of the sentence. The output is a syntax tree. The input to the semantic analyzer is the syntax tree. The output is a representation of the meaning in some standard representation of knowledge. For example, if the sentence is "John likes Mary", the syntax tree would be

```S ---> NP ---> name ---> John
|
|-> VP ---> verb ---> likes
|
|-> NP ---> name ---> Mary
```
and the output of the semantic analysis might be something like "likes(john, mary)".

### Problem 3

(25 points)

Consider the following problem: You are given a directed graph G. Every vertex has an associated positive numerical value. The total value of a path through G is the sum of the values of the vertices on the path. If a path goes more than once through a vertex, it picks up points each time it passes through. For instance, in the graph below, the path ACBACD has value 2+1+3+2+1+5 = 14.

The problem is, given such a graph, a starting vertex, and a quota to be attained, find the shortest path whose value is greater than or equal to the quota. For example, in the above graph, if the starting vertex is A and the quota is 12, then path ADEG, with length 4, is the optimal solution.

The problem can be addressed using search in the following state space: (State space search is not actually the best approach to this problem, but never mind that.)

• A state is a path, such as ACBA.
• An operator on state S is to add an arc out of the last vertex in S. For example, there are four possible operators on state "ACBA": (1) Add arc A->C to get ACBAC. (2) Add arc A->D to get ACBAD. (3) Add arc A->F to get ACBAF. (4) Add arc A->H to get ACBAH.
• The starting state is the path containing just the starting vertex.
• A goal state is a path whose value exceeds the quota.

Let V be the number of vertices in G. Let E be the number of edges. Let O be the maximum out-degree of any vertex; that is, O is the maximum number of outarcs from any vertex. Let M be the minimum value of any vertex. For instance in the above graph V=8, E=15, O=4 (vertex A has 4 out-arcs), M=1.

A Answer: Yes, it is a tree. For any state such as ACBA, there is only one way to construct it from the starting state A: "Add C", "Add B", "Add A"

B. For the above example, with starting vertex A, draw the portion of the state space within two steps of the starting state.

```A ---> AC ---> ACD
|       |
|       |-> ACB
|
|
|-> AF ---> AFE
|
|-> AH ---> AHA
|
|-> AHF
```

C. Give an upper bound on (i) the branching factor; (ii) the depth of the state space; (iii) the size of the state space. Your answer should be a general one, in terms of V, E, I, M, and the quota Q, not an answer specific to the above graph.

Answer: (i) The branching factor is at most O. (ii) The depth is at most Q/M. (iii) The size of the state space is therefore bounded by OQ/M.

D. Suppose that the state space is structured so that the children of a given state are ordered alphabetically. What is the first goal state discovered by a depth-first search? By iterative deepening?

Answer: The first goal discovered by depth-first search will be ACBACBA. The first goal discovered by iterative deepening will be the shortest solution ADEG.

### Problem 4:

(10 points) "Local minima can cause difficulties for a feed-forward, back-propagation neural network." Explain. Local minima of what function of what arguments? Why do they create difficulties?

Answer: Local minima of the error function over the corpus of labelled examples, as a function of the weights on the links. The learning algorithm is in effect doing a hill-climbing search for the value of the weights that gives the minimal error function. If the hill-climbing search finds a local mininum of this function, it will get stuck at that suboptimal state, and will not find the true solution.

### Problem 5:

(15 points) In inductive learning of the definition of a concept from labelled examples, what is a false positive? What is a false negative? Give one example of each.

Answer: An example E is a false positive of rule R for concept C if R accepts E but in fact E is not a instance of C. It is a false negative if if R rejects E but in fact E is a instance of C.

For example, let C be the category "Makes a good pet", and let R be the conjecture, "X makes a good pet if and only if X is a mammal." A canary would be a false negative. A tiger would be a false positive.

### Problem 6:

(10 points) What is the result of doing alpha-beta pruning in the game tree shown below?

### Problem 7:

(10 points) Suppose that you have a corpus of data consisting of pairs of lists, such as the following:
```<[a,b,c,a,d],         [c,a,d]>
<[a,b,c,a,d],         [d]>
<[x,y,a,d,f,a,g,h],   [a,g,h]>
<[x,y,a,d,f,a,g,h],   [d,f,a,g,h]>
<[x,y,a,d,f,a,g,h],   [g,h]>
...
```
These pairs are related by the constraint that the second element in the pair is always a tail of the first element. This relation can be defined as follows in Prolog:
```/* k_tail(L,M,K) succeeds if M is the Kth tail of L.  For instance,
k_tail([a,b,c,a,d],M,1) returns with M bound to [a,b,c,a,d];
k_tail([a,b,c,a,d],M,2) returns with M bound to [b,c,a,d];  etc. */

k_tail(L,L,1).
k_tail([_|L],M,K) :- K1 is K-1, k_tail(L,M,K1).
```
The theory of minimum description length (MDL) learning explains in what sense it is appropriate to learn this program from this data. What is this explanation?

Answer: If there are many lists in the corpus, or if the lists are long, then it is possible to encode the data more compactly by

• 1. Giving the program for k-tail.
• 2. Stating the general rule "k-tail(X,Y,K)", where X and Y are the two columns of the data.
• 3. Giving a table with the X and K values rather than the X and Y values. Since the representation of K is considerably more compact than the representation of Y, this will save space.