Assigned: Oct. 29
Due: Nov. 12
Consider the following scenario. There is a blocks world, with two blocks A and B on a table. The actions are as usual, except that any attempt at an action has a 1/3 probability of failing. If an attempt fails, then the position of the blocks remains the same.
You are playing the following game. Initially, block B is on block A. At each step, you have the option of attempting to move the blocks, or of quitting the game. Each attempt to move the blocks costs $1. When you quit, if B is on A, then you collect 0; if both blocks are on the table, then you collect $1; if A is on B, then you collect $5. (Assume that utility is the same as dollar amount, which is valid for small sums like these.)
A. Explain how this can be construed as a Markov Decision Problem. What is the optimal policy? What is the value of the starting state?
B. We now consider a modification of the above game. Suppose that you do not find out whether an action has succeeded or failed. An attempt at an impossible action always fails.
C. Suppose that, after each action, you are given a report whether the action succeeded or failed, but that this report has only a 2/3 chance of being correct. (This is a partially observable Markov decision problem (POMDP).) What is the value of the following plan:
repeat put(B,table) until report success; repeat put(A,B) until report success; quit.(If you are having trouble summing the infinite series involved, then it will suffice to estimate the value to 3 decimal places.)
A. Suppose that Prob(A|B) = 0.9 and Prob(B|C) = 0.9. What is the range of possible values of Prob(A|C),
B. Suppose that Prob(A|C) = 0.9 and Prob(B|C) = 0.9. What is the range of possible values of Prob(A^B|C),