The Heap Sort By Samuel Marateck (c) 20001 The definition of a heap is a complete binary tree in which the value stored in the parent is greater than or equal to that stored in each of its children. Although a tree is used to explain how a heap works, the program uses an array to represent the heap. All operations are done on the array. The heap sort starts with a tree that's a heap. We will work with the following initial example with N, the number of nodes equal to 6: r / \ p m / \ / o k c The initial array configuration corresponding to this is: rpmokc. We exchange the root with the last node on the tree, c. c / \ p m / \ / o k r The r is now in its proper place in the sorted array: cpmokr. So we set N to 5, therefore working only with cpmok, and the tree representing the heap becomes: c / \ p m / \ o k The tree, however, is no longer a heap. We see which of c's children is the larger, here it's p, and swap it with c. So the tree is now: p / \ c m / \ o k It's still not a heap. We continue down the subtree which has c as its root. The reason we don't have to worry about the subtree with m as its root is that that subtree is already a heap. We see which of c's children is the larger, here it's o, and swap it with c. So the tree is now: p / \ o m / \ c k This process is called reheapification. The tree is now a heap and the corresponding array, including the r is pomckr. Next we exchange the root p with k k / \ o m / \ c p The array is now komcpr. The p and r are now in their proper place in the array. So we set N to 4 and the tree becomes k / \ o m / c Reheapification yields o / \ k m / c We exchange the root o with the last node on the heap and get c / \ k m / o The array is now ckmopr. Now the opr are in their proper place in the array. So we set N to 3 and deal with c / \ k m Reheapification yields m / \ k c After the exchange we get c / \ k m The array becomes ckmopr and mopr are in their proper place in the array. Set N to 2 and the tree becomes: c / k Reheapification yields k / c The exchange produces c / k and the array becomes ckmopr and the sort is finished. The process we have just described is called "shifting down". The time analysis for reheapification is O(logN) since it depends on the number of levels in the tree. Since there are N items in the list, an upper limit for heap sort is O(NLogN). Because of this, its advantageous to use the heap sort for the Huffman coding algorithm. But now you have to use a reverse heap since the parent should be less than or equal to either of its children. Here's how the algorithm works. You remove the top node, reheapify and then remove the new top node, thus getting the two nodes with the lowest frequencies. You combine their frequencies creating a new node that is a treelet with its left and right children being the two constituent nodes. Next insert the combined node at the top and reheapify. You do this until you have one node that represents the Huffman tree.