Fundametal Algorithms, Fall 2002 Solutions for Homework 3, Questions 1-3 written by Ofer H. Gill (1) Joe is faced with a multiple choice question where he gets 10 points if correct, -8 if wrong, and -2 if left unanswered. He must choose one out of three possible answers, and he happens to have no clue about the question. What should be Joe's strategy? (a) Randomly choose one of three answers. (b) Leave the question unanswered. (c) Doesn't matter what he does. Note: In spite the email discussions, you don't need to worry about variance in this question (I don't). However, students that correctly account for variance in selecting a stategy for Joe are still entitled to full credit. Answer: (In two horizontal lines) If Joe guesses, he is correct with 1/3 probability, and, his expected score = 10*(1/3) + -8*(2/3) = -6/3 = -2 This is the same score Joe gets if he doesn't answer. Joe's strategy is thus choice (c). (2) Prove by real induction that when T(n) = T(n/2) + T(n/4) + 1, then T(n) = O(n^a) where a = 0.694... is the solution to the equation: 1/(2^a) + 1/(4^a) = 1 Answer: I claim T(n) <= C*n^a - 1 for all n. I'll prove this by real induction. I'm going to assume that T(2) = 1 as a base for the T function, and, for all x < 2, T(x) = 0. Basis: For all x such that x <= 2, T(x) <= 1 <= C*x^a - 1 assuming that C >= 2. Induction Hypothesis: Assume for all x < y - delta, T(x) <= C*x^a - 1. Which delta value should we pick? We need a delta value such that, for y >= 2, will guarantee we can make an assumption for T(y/4) and T(y/2). Hence, we can choose delta = (1/4)*y. But we must make delta a constant. So, we select y = 2 so that delta = 1/2. (Notice that when y >= 2, we have y/4 < y/2 < y - 1/2.) So now, we assume for all x < y - 1/2 that T(x) <= C*x^a - 1. Induction Step: T(y) = T(y/2) + T(y/4) + 1, and we can safely assume T(y/2) <= C*(y/2)^a - 1, and T(y/4) <= C*(y/4)^a - 1. So, T(y) = T(y/2) + T(y/4) + 1 <= (C*(y/2)^a - 1) + (C*(y/4)^a - 1) + 1 = = C*(y/2)^a + C*(y/4)^a - 1 = C*(y^a)*(1/(2^a) + 1/(4^a)) - 1 = C*(y^a)*1 - 1 = C*(y^a) - 1 And we're done! Thus, T(n) <= C*(n^a) - 1, where C >= 2. (3) (i) Given the relation: T(n) = T(n/5) + T(7n/10) + cn where c > 0 based on the Standard Select (I call it the American Flag Select), find the c based from the text. (ii) Give the best C > 0 such that T(n) <= C*n where n is large. Note C will depend on c. (iii) Joe suggests simplifying the median algorithm by using groups of 3 instead of 5. Why is this impossible? Note: Depending on how you interpret the text, and how long you're assuming operations like partitioning the array around an element takes, your value for c might differ from mine, but this is ok. Answer: (i) Based on the textbook description, Steps 1, 2 and 4 are the steps that dictate the c used. Combining steps 1 and 2, we find the median of the first five elements, then the median of the next 5 entries, then the median of the next five entries, and so on... (I sometimes like to refers to each collection of 5 elements as "stripes".) Assume we find the median of five elements by insert-sorting the five elements and returning the third largest element. Assuming each key-comparision-and-swap takes at most 1 unit of time, Insert-sorting five elements takes time 1+1+2+3+4 = 11. Returning the third largest element from a five-sorted list takes 1 unit of time. So, finding the median of five elements takes a total of 11 + 1 = 12 time. We find the median of five elements a total of n/5 times (we're ignoring the painful math inflicted by using floor and ceiling functions). Therefore, steps 1 and 2 takes time = 12 * (n/5) <= 12n/5 Step 4 involves partitioning the entire array around the median of medians. (I sometimes call the medians found in steps 1 and 2 as the "stars". So step 4 partitions the array around the median of the stars.) Again, I'm assuming each key-comparision-and-swap takes 1 unit of time, then using the same partition algorithm as for Quicksort, then for each entry, comparing to the median of stars and possibly doing a swap takes at most 1 unit of time. For all entries, this give us time n-1. After this, we do a final swap of the median of stars' location so that it is now located to the right of the guys less than it, and to the left of the guys greater than it. This takes 1 unit of time. So, the total time for step 4 is at most n. So, the time for steps 1, 2, and 4 is 12n/5 + n = 17n/5. So, I'm assuming c = 17/5. (ii) I'm going to use a proof by Real Induction assuming that T(1) = 1 and for all x < 1, T(x) = 0. Our goal is to show T(n) <= C*n. Basis: For all x <= 1, T(x) <= 1 <= C*x assuming C >= 1. Induction Hypothesis: Assume for all x < y - delta that T(x) <= C*x. My choice for delta is 2/10 = 1/5, since then (and considering that y >= 1), y - delta = y - 1/5 >= y - y/5 = 4y/5 = 8y/10. And 8y/10 is larger than 7y/10 and y/5. (My train of thought is similar to that used in my answer for question 2.) Induction Step: (Note that C and c are different!) T(y) = T(y/5) + T(7y/10) + cy <= C*(y/5) + C*(7y/10) + cy = C*(9y/10) + cy And, C*(9y/10) + cy <= C*y will work, assuming: C*(9y/10) + cy <= C*y iff cy <= C*(y/10) iff c <= C/10 iff 10c <= C Thus, T(n) <= C*n when C >= max{1, 10c} (For my choice of c=17/5, then 10c = 34. Thus we'd need for C >= 34 in order for the proof to work.) (iii) If we make the "stripes" from groups of 3 instead of the usual groups of 5, then steps 1, 2, and 4 (as mentioned in the book), will still take linear time. But, step 3, finding the median of stars, takes time T(n/3). And the median of stars is always smaller than half of the other stars AND all elements from the groups of 3 larger than those stars. Thus, the median of stars is always smaller than at least 2*((1/2) * n/3) = n/3 elements. This means the median of starts is always larger than at most n - n/3 = 2n/3 elements. (And using a similar argument, the median of stars is always larger than at most 2n/3 elements, and smaller than at least n/3 elements) Thus, in step 5, we proceeding recursively on at most 2n/3 elements. Thus, using groups of 3, the runtime can be expressed as: T(n) = T(n/3) + T(2n/3) + dn, where d is some constant > 0. Joe claims this is linear. I'm going to prove him wrong by showing that T(n) >= C*n*log n, and will use real induction. (Assume when I say log, I mean a log with base 3. This will make my proof simpler...) I'll assume T(1) = 1, and for all x < 1, T(x) = 0. Now for the proof. Basis: For all x <= 1 (and for any C > 0), T(x) >= 0 >= C*x*log x. Induction Hypothesis: Assume for all x < y - delta that T(x) >= C*x*log x I will select delta = 1/4. Since y >= 1, y - delta = y - 1/4 >= y - y/4 = 3y/4, and 3y/4 is larger than 2y/3 and y/3. Induction Step: T(y) = T(y/3) + T(2y/3) + dy >= C*(y/3)*log(y/3) + C*(2y/3)*log(2y/3) + dy = = Cy/3*(log y - 1) + 2Cy/3*(log y + log 2 - 1) + dy = = Cy*log y - Cy + (2Cy * log 2)/3 + dy And, Cy*log y - Cy + (2Cy * log 2)/3 + dy >= Cy*log y assuming: Cy*log y - Cy + (2Cy * log 2)/3 + dy >= Cy*log y iff - Cy + (2Cy * log 2)/3 + dy >= 0 iff - Cy + (2/3)*(log 2)*Cy + dy >= 0 iff dy >= Cy - (2/3)*(log 2)*Cy iff d >= C - (2/3)*(log 2)*C iff d >= (1 - (2/3)*log 2)*C And (1 - (2/3)*log 2)*C >= .57 * C, so d >= .57 * C, so, C <= (1/.57)*d <= 1.76*d Thus, T(n) >= C*n*log n for choice of C such that 0 < C <= 1.76*d