\documentclass[11pt]{article}
\usepackage{graphicx}
\pagestyle{empty}
\newcommand{\sig}{\sigma}
\newcommand{\ra}{\rightarrow}
\newcommand{\eps}{\epsilon}
\newcommand{\gam}{\gamma}
\newcommand{\ah}{\alpha}
\newcommand{\ben}{\begin{enumerate}}
\newcommand{\een}{\end{enumerate}}
\newcommand{\hsa}{\hspace*{1cm}}
\newcommand{\hsb}{\hspace*{2cm}}
\newcommand{\So}{\\ {\tt Solution:}}
\begin{document}
\begin{center} {\Large\bf Fundamental Algorithms, Assignment 8}
\\ Solutions
\end{center}
\ben
\item Suppose that the Huffman Code for $\{v,w,x,y,z\}$ has $0$ or $1$ as the
code word for $z$. {\em Prove} that the frequency for $z$ cannot be less than
$\frac{1}{3}$. {\em Give an example} where the frequency for $z$ is $0.36$ and
$z$ does get code word $0$ or $1$.
\So {\tt First Part.} If $z$ has code word $0$ or $1$ then at the penultimate step there would
be three nodes $z,\ah,\beta$ whose frequencies added to $1$. With $z$ having
frequency less than $\frac{1}{3}$ at least one of $\ah,\beta$ must have
frequency greater than $\frac{1}{3}$. Thus the penultimate step would
``join" $z$ and one of $\ah,\beta$, and so $z$ would not at the end get a
code word of length one. {\tt Second Part.} There are many correct answers here but
perhaps the easiest is if $v,w,x,y$ all have frequency $0.16$. In the first three
steps $v,w$ create $\ah$ with frequency $0.32$; $x,y$ create $\beta$ with frequency
$0.32$ and, as $0.36 > 0.32$, $\ah,\beta$ create $\gam$ with frequency $0.64$ and
finally $\gam,z$ join.
\item Suppose, in the Activity Selector problem, we
instead select the last activity to start that is compatible with all previously
selected activities. Describe how this
approach works, write a program for it (psuedocode allowed) and prove that it
yields an optimal algorithm.
\So This approach is symmetric to the one presented in the textbook.
It is a greedy solution in that at each point, we're
selecting the last activity to start, and recursing down to the single
subproblem of finding the optimal solution for all remaining activities
compatible with the ones already scheduled. We can give the recursive
algorithm as follows:
\begin{enumerate}
\item[1] RECURSIVE-ACTIVITY-SELECTOR$(s, f, i)$
\item[2] $m \gets i - 1$
\item[3] \textbf{while } $m > 0$ and $s_i < f_m$
\item[4] \textbf{ do } $m \gets m - 1$
\item[5] \textbf{if } $m > 0$
\item[6] \textbf{ then} \textbf{return} $\{a_m\} \cup $ RECURSIVE-ACTIVITY-SELECTOR $(s, f, m)$
\item[7] \textbf{ else} \textbf{return} $\emptyset$
\end{enumerate}
We initially run this algorithm with $i = n$ where $n$ is the number of tasks.
This is a greedy algorithm, in that we're decomposing the problem recursively into a single
optimal subproblem. We can rigorously prove that this gives us an optimal solution by induction
on the number of activities. But it is easier to note that a dynamic programming solution
to this problem would yield an optimal result, and, as we saw in the book, there are two key
observations we can use on the general recurrence of this problem that show that the greedy
solution is equivalent to the dynamic programming solution. The recurrence for this solution
is given in the book. We note the following two observations (and prove them): \\
\begin{center}
Consider any nonempty subproblem $S_{ij}$, and let $a_m$ be the activity in $S_{ij}$ with with
the latest start time:
\begin{eqnarray*}
s_m = \max \{s_k: a_k \in S_{ij} \}
\end{eqnarray*}
\begin{enumerate}
\item[1] Activity $a_m$ is used in some maximum-size subset of mutually compatible activities of $S_{ij}$
\subitem[Pf] Suppose that $A_{ij}$ is a maximum-size subset of mutually compatible activities of $S_{ij}$. We also
suppose that the activities in $A_{ij}$ are ordered in monotonically increasing order of starting time. Let $a_k$ be
the last activity in $A_{ij}$. If $a_k = a_m$, we're done, since we've shown that $a_m$ is used in constructing the
schedule. Otherwise, we construct the subset $A\prime_{ij} = A_{ij} - \{a_k\} \cup \{a_m\}$. We know the activities in the subset
are disjoint, since $a_k$ is the last activity to start, and $s_m \geq s_k$. The number of activities in the subset are the
same, so it is a maximum-size subset of activities that includes $a_m$.
\item[2] The subproblem $S_{mj}$ is empty, so that choosing $a_m$ leaves the subproblem $S_{im}$ as the only one that may be nonempty
\subitem[Pf] Suppose that $S_{mj}$ is nonempty, so there is some activity $a_k$ such that $f_m \leq s_k < f_k \leq s_j < f_j$. Then,
$a_k$ is also in $S_{ij}$, which has a later start time than $a_m$, contradicting our initial assumption.
\end{enumerate}
\end{center}
\item Students (professors too!) often come up with very clever ideas for
optimization programs. The problem (often!) is that they (sometimes, but that
is enough) give the wrong answer. Here are three approaches and {\em your} problem,
in each case, is to give an example where it yields the wrong answer.
\ben
\item Pick the activity of the shortest duration
from amongst those which do not overlap previously selected activities.
\item Pick the activity which overlaps the fewest other remaining activities
from amongst those which do not overlap previously selected activities.
\item Pick the activity with the earliest start time
from amongst those which do not overlap previously selected activities.
\een
\noindent {\tt Solution:} One example to show that the approach of selecting the activity of least
duration does not yield an optimal solution is the set of tasks $a_1 = (5,7)$, $a_2 = (1,6)$,
$a_3 = (6,10)$. $a_1$ is selected first, but this locks out the other two, which
clearly comprise the optimal solution. \\
An example to show that the strategy of choosing the activities that overlap the fewest
other remaining activities is the following set of tasks: $a_1 = (0,1)$, $a_2 = (1,3)$,
$a_3 = (3,5)$, $a_4=(5,6)$, $a_5=(0,2)$, $a_6=(0,2)$, $a_7=(2,4)$, $a_8=(4,6)$, $a_9=(4,6)$.
$a_1$, $a_4$, and $a_6$ have two overlapping activities, while the others have three.
This means that $a_1$, $a_4$, and $a_6$ will be the activities selected, but the optimal
solution is $a_1$, $a_2$, $a_3$, $a_4$, \\
An example to show that the strategy of choosing the compatible remaining activities with
the earliest start time is $a_1 = (1,4)$, $a_2 = (4,5)$, $a_3 = (2,3)$, $a_4 = (3,4)$. $a_1$
will be selected first, followed by $a_2$, when clearly, the optimal solution consists of
$a_3$, $a_4$, and $a_2$.
\ben \item What is an optimal Huffman code for the following code when the
frequencies are the first eight Fibonacci number?
\[ a:1, b:1, c:2, d:3, e:5, f:8, g:13, h:21 \]
\item
The Fibonacci sequence is defined by initial values $0,1$
with each further term the sum of the previous two terms. Generalize
the previous answer to find the optimal code when there are $n$ letters with
frequencies the first $n$ (excluding the $0$) Fibronacci numbers.
\een
\noindent {\tt Solution:} The Huffman encoding tree is as follows:
\begin{center}
\includegraphics{sol8.eps}
\end{center}
In general, the encoding for the character with the first Fibonacci number will be $h_1 = 1^{n-1}$.
For the character with the $k$th Fibonacci number will be $h_k = 1^{n-k}0$
\item Suppose that in implementing the Huffman code we weren't
so clever as to use Min-Heaps. Rather, at each step we found
the two letters of minimal frequency and replaced them by
a new letter with frequency their sum. How long would that
algorithm take, in Thetaland, as a function of the initial number of
letters $n$.
\So To find the letter of minimal frequency takes time $O(n)$, doing
it twice, adding frequencies, and replacing them by a new letter all
takes $O(n)$. We do this $n$ times so the total time is $O(n^2)$.
It is actually $\Theta(n^2)$ -- as the number of letters decreases
the time decreases but the first $n/2$ times there are at least $n/2$
letters and so finding the minimum takes $\Omega(n)$ and so the total
time for the first $n/2$ times (that is, starting with $n$ letters
until there are $n/2$ letters left) is $\Omega((n/2)\cdot(n/2))=
\Omega(n^2)$. We've sandwiched upper and lower bounds so this gives
total time $\Theta(n^2)$.
\item Give a set of frequencies
on $\{a,b,\ldots,z\}$ so that the Huffman Code will have codelengths
$1$ for $a$, $2$ for $b$,\ldots, $24$ for $x$ and $25$ for $y$ and $z$.
\So We can use the Fibonacci sequence as frequencies, where $z$'s frequency
is the first Fibonacci number, $y$'s is the second, and so on, with $a$'s
frequency is the 26th Fibonacci number. [There are many possible answers here.]
\een
\end{document}