The Heap Sort
By Samuel Marateck (c) 20001
The definition of a heap is a complete binary tree in which the value
stored in the parent is greater than or equal to that stored in each
of its children. Although a tree is used to explain how a heap works,
the program uses an array to represent the heap. All operations
are done on the array.
The heap sort starts with a tree that's a heap. We will work with the
following initial example with N, the number of nodes equal to 6:
r
/ \
p m
/ \ /
o k c
The initial array configuration corresponding to this is: rpmokc.
We exchange the root with the last node on the tree, c.
c
/ \
p m
/ \ /
o k r
The r is now in its proper place in the sorted array: cpmokr. So we
set N to 5, therefore working only with cpmok, and the tree
representing the heap becomes:
c
/ \
p m
/ \
o k
The tree, however, is no longer a heap. We see which of c's children
is the larger, here it's p, and swap it with c. So the tree is now:
p
/ \
c m
/ \
o k
It's still not a heap. We continue down the subtree which has c as its
root. The reason we don't have to worry about the subtree with m as
its root is that that subtree is already a heap. We see which of c's
children is the larger, here it's o, and swap it with c. So the tree
is now:
p
/ \
o m
/ \
c k
This process is called reheapification. The tree is now a heap and the
corresponding array, including the r is pomckr. Next we exchange the
root p with k
k
/ \
o m
/ \
c p
The array is now komcpr. The p and r are now in their proper place in
the array. So we set N to 4 and the tree becomes
k
/ \
o m
/
c
Reheapification yields
o
/ \
k m
/
c
We exchange the root o with the last node on the heap and get
c
/ \
k m
/
o
The array is now ckmopr. Now the opr are in their proper place in the
array. So we set N to 3 and deal with
c
/ \
k m
Reheapification yields
m
/ \
k c
After the exchange we get
c
/ \
k m
The array becomes ckmopr and mopr are in their proper place in the
array. Set N to 2 and the tree becomes:
c
/
k
Reheapification yields
k
/
c
The exchange produces
c
/
k
and the array becomes ckmopr and the sort is finished.
The process we have just described is called "shifting down".
The time analysis for reheapification is O(logN) since it depends on
the number of levels in the tree. Since there are N items in the list,
an upper limit for heap sort is O(NLogN). Because of this, its advantageous
to use the heap sort for the Huffman coding algorithm. But now you have to use
a reverse heap since the parent should be less than or equal to either
of its children. Here's how the algorithm works.
You remove the top node, reheapify and then remove the new top node, thus
getting the two nodes with the lowest frequencies. You combine their
frequencies creating a new node that is a treelet with its left and right
children being the two constituent nodes. Next insert the combined node at the
top and reheapify. You do this until you have one node that represents the
Huffman tree.