\documentclass[11pt]{article}
\usepackage{graphicx}
\pagestyle{empty}
\newcommand{\sig}{\sigma}
\newcommand{\ra}{\rightarrow}
\newcommand{\eps}{\epsilon}
\newcommand{\gam}{\gamma}
\newcommand{\ah}{\alpha}
\newcommand{\ben}{\begin{enumerate}}
\newcommand{\een}{\end{enumerate}}
\newcommand{\hsa}{\hspace*{1cm}}
\newcommand{\hsb}{\hspace*{2cm}}
\newcommand{\So}{\\ {\tt Solution:}}
\begin{document}
\begin{center} {\Large\bf Fundamental Algorithms, Assignment 8}
\\ Solutions
\end{center}
\ben
\item Suppose, in the Activity Selector problem, we
instead select the last activity to start that is compatible with all previously
selected activities. Describe how this
approach works, write a program for it (psuedocode allowed) and prove that it
yields an optimal algorithm.
\So This approach is symmetric to the one presented in the textbook.
It is a greedy solution in that at each point, we're
selecting the last activity to start, and recursing down to the single
subproblem of finding the optimal solution for all remaining activities
compatible with the ones already scheduled. We can give the recursive
algorithm as follows:
\begin{enumerate}
\item[1] RECURSIVE-ACTIVITY-SELECTOR$(s, f, i)$
\item[2] $m \gets i - 1$
\item[3] \textbf{while } $m > 0$ and $s_i < f_m$
\item[4] \textbf{ do } $m \gets m - 1$
\item[5] \textbf{if } $m > 0$
\item[6] \textbf{ then} \textbf{return} $\{a_m\} \cup $ RECURSIVE-ACTIVITY-SELECTOR $(s, f, m)$
\item[7] \textbf{ else} \textbf{return} $\emptyset$
\end{enumerate}
We initially run this algorithm with $i = n$ where $n$ is the number of tasks.
This is a greedy algorithm, in that we're decomposing the problem recursively into a single
optimal subproblem. We can rigorously prove that this gives us an optimal solution by induction
on the number of activities. But it is easier to note that a dynamic programming solution
to this problem would yield an optimal result, and, as we saw in the book, there are two key
observations we can use on the general recurrence of this problem that show that the greedy
solution is equivalent to the dynamic programming solution. The recurrence for this solution
is given in the book. We note the following two observations (and prove them): \\
\begin{center}
Consider any nonempty subproblem $S_{ij}$, and let $a_m$ be the activity in $S_{ij}$ with with
the latest start time:
\begin{eqnarray*}
s_m = \max \{s_k: a_k \in S_{ij} \}
\end{eqnarray*}
\begin{enumerate}
\item[1] Activity $a_m$ is used in some maximum-size subset of mutually compatible activities of $S_{ij}$
\subitem[Pf] Suppose that $A_{ij}$ is a maximum-size subset of mutually compatible activities of $S_{ij}$. We also
suppose that the activities in $A_{ij}$ are ordered in monotonically increasing order of starting time. Let $a_k$ be
the last activity in $A_{ij}$. If $a_k = a_m$, we're done, since we've shown that $a_m$ is used in constructing the
schedule. Otherwise, we construct the subset $A\prime_{ij} = A_{ij} - \{a_k\} \cup \{a_m\}$. We know the activities in the subset
are disjoint, since $a_k$ is the last activity to start, and $s_m \geq s_k$. The number of activities in the subset are the
same, so it is a maximum-size subset of activities that includes $a_m$.
\item[2] The subproblem $S_{mj}$ is empty, so that choosing $a_m$ leaves the subproblem $S_{im}$ as the only one that may be nonempty
\subitem[Pf] Suppose that $S_{mj}$ is nonempty, so there is some activity $a_k$ such that $f_m \leq s_k < f_k \leq s_j < f_j$. Then,
$a_k$ is also in $S_{ij}$, which has a later start time than $a_m$, contradicting our initial assumption.
\end{enumerate}
\end{center}
\item Students (professors too!) often come up with very clever ideas for
optimization programs. The problem (often!) is that they (sometimes, but that
is enough) give the wrong answer. Here are three approaches and {\em your} problem,
in each case, is to give an example where it yields the wrong answer.
\ben
\item Pick the activity of the shortest duration
from amongst those which do not overlap previously selected activities.
\item Pick the activity which overlaps the fewest other remaining activities
from amongst those which do not overlap previously selected activities.
\item Pick the activity with the earliest start time
from amongst those which do not overlap previously selected activities.
\een
\noindent {\tt Solution:} One example to show that the approach of selecting the activity of least
duration does not yield an optimal solution is the set of tasks $a_1 = (5,7)$, $a_2 = (1,6)$,
$a_3 = (6,10)$. $a_1$ is selected first, but this locks out the other two, which
clearly comprise the optimal solution. \\
An example to show that the strategy of choosing the activities that overlap the fewest
other remaining activities is the following set of tasks: $a_1 = (0,1)$, $a_2 = (1,3)$,
$a_3 = (3,5)$, $a_4=(5,6)$, $a_5=(0,2)$, $a_6=(0,2)$, $a_7=(2,4)$, $a_8=(4,6)$, $a_9=(4,6)$.
$a_1$, $a_4$, and $a_6$ have two overlapping activities, while the others have three.
This means that $a_1$, $a_4$, and $a_6$ will be the activities selected, but the optimal
solution is $a_1$, $a_2$, $a_3$, $a_4$, \\
An example to show that the strategy of choosing the compatible remaining activities with
the earliest start time is $a_1 = (1,4)$, $a_2 = (4,5)$, $a_3 = (2,3)$, $a_4 = (3,4)$. $a_1$
will be selected first, followed by $a_2$, when clearly, the optimal solution consists of
$a_3$, $a_4$, and $a_2$.
\ben \item What is an optimal Huffman code for the following code when the
frequencies are the first eight Fibonacci number?
\[ a:1, b:1, c:2, d:3, e:5, f:8, g:13, h:21 \]
\item
The Fibonacci sequence is defined by initial values $0,1$
with each further term the sum of the previous two terms. Generalize
the previous answer to find the optimal code when there are $n$ letters with
frequencies the first $n$ (excluding the $0$) Fibronacci numbers.
\een
\noindent {\tt Solution:} The Huffman encoding tree is as follows:
\begin{center}
\includegraphics{sol9.eps}
\end{center}
In general, the encoding for the character with the first Fibonacci number will be $h_1 = 1^{n-1}$.
For the character with the $k$th Fibonacci number will be $h_k = 1^{n-k}0$
\item Suppose that in implementing the Huffman code we weren't
so clever as to use Min-Heaps. Rather, at each step we found
the two letters of minimal frequency and replaced them by
a new letter with frequency their sum. How long would that
algorithm take, in Thetaland, as a function of the initial number of
letters $n$.
\So To find the letter of minimal frequency takes time $O(n)$, doing
it twice, adding frequencies, and replacing them by a new letter all
takes $O(n)$. We do this $n$ times so the total time is $O(n^2)$.
It is actually $\Theta(n^2)$ -- as the number of letters decreases
the time decreases but the first $n/2$ times there are at least $n/2$
letters and so finding the minimum takes $\Omega(n)$ and so the total
time for the first $n/2$ times (that is, starting with $n$ letters
until there are $n/2$ letters left) is $\Omega((n/2)\cdot(n/2))=
\Omega(n^2)$. We've sandwiched upper and lower bounds so this gives
total time $\Theta(n^2)$.
\item Give a set of frequencies
on $\{a,b,\ldots,z\}$ so that the Huffman Code will have codelengths
$1$ for $a$, $2$ for $b$,\ldots, $24$ for $x$ and $25$ for $y$ and $z$.
\So We can use the Fibonacci sequence as frequencies, where $z$'s frequency
is the first Fibonacci number, $y$'s is the second, and so on, with $a$'s
frequency is the 26th Fibonacci number. [There are many possible answers here.]
\item Suppose that the Huffman Code for $\{a,b,c,d,e\}$ give
$0$ as the codeword for $a$. What is the smallest that the frequency
of $a$ can be?
(In the answer there will be ``ties" among the frequencies
at some points and you are allowed to break the ties in any way. This
is a surprisingly tricky question. If, say, $f[a]=0.51$ then the Huffman
Code for $a$ will definitely have length one. But the answer is {\em
not} one half! Here is an example with a smaller $f[a]$: $f[a]=.48$,
$f[b]=0.15$,$f[c]=0.14$, $f[d]=0.12$,$f[e]=0.11$. Enjoy!)
\So The smallest would be $1/3$. Consider when there are $3$ letters
left which we can call (as otherwise $a$ would have been joined it
would not end up with codeword $0$) $a,\ah,\beta$.
If $f[a]<1/3$ then $a$ will not
have the maximal frequency so it will be joined with $\ah$ or $\beta$.
Conversely we could have frequencies, say, $1/3,1/6,1/6,1/6,1/6$ which
would yield $a,\ah,\beta$ with $1/3,1/3,1/3$ -- this would be a tie
and we could join $\ah,\beta$ which would yield code words
$0,100,101,110,111$. If $a$ has frequency slighty bigger than $1/3$
we can avoid ties, for example, starting with $0.34, 0.1652,0.1651,
0.1649,0.1648$ we would join $d,e$ to make $\ah$, $c,d$ to make
$\beta$, $\ah,\beta$ to make $\gamma$ and finally $a,\gamma$.
\een
\end{document}