## Lecture 17: Linear time sorting algorithms

Not in DJW.

Sorting time better than O(N * log(N)) is possible if one can use the values directly rather than doing comparisons.

### Bin sort

Assume that the input A is an array of objects with a field value, which is an integer between 0 and m-1. and you are sorting by value.

Create an array bin. bin[i] is a linked list of items with value i with a pointer to the last element.

```for (b=0; b < m; b++) bin[b]=emptyList();
for (i=0; i < A.length; i++)
add A[i] to the end of bin[A[i].value];
s=b[0];   // s is a linked list.
for (b=1; b < m; b++) add bin[b] to the end of s;
```

Suppose that you are sorting an array A of objects by value. Each value it itself an array of integers keys[] of length K. You want to sort lexicographically; that is
• Overall the items are sorted by the first key
• Within items with equal first key, the items are sorted by the second key
• etc.
Like alphabetical order, where the keys are the letters.

Then you can sort an array A of objects using repeated bin sorts from least to most important key.

```for (i=K-1; i>= 0; i--)
binSort A on the ith key
```
The reason this works is that bin sort is a stable sort.

#### Stable sorts

A sorting algorithm is stable if items with equal keys remain in the same order as in the input.

The O(N2) sorts are stable. Mergesort can be made stable, if you are careful when comparing two equal items to use the one from the earlier part of the array. Bin sort is stable, as long as you add to the end of the linked list. Quicksort and heapsort are not stable.

Of course, you can make any sorting algorithm stable by adding an additional field for the original position, and using that when you have two elements whose primary key is equal. But that requires extra memory.

```A = [PAT,SOY,CAR,SAY,COT,PAR,SIT,POT,PAY,PIT,COY,SAT]

Binsort by last letter
R:  CAR, PAR
T:  PAT, COT, SIT, POT, PIT, SAT
Y:  SOY, SAY, PAY, COY

A = [CAR,PAR,PAT,COT,SIT,POT,PIT,SAT,SOY,SAY,PAY,COY] // sorted by last letter

Binsort by second letter:
A: CAR, PAR, PAT, SAT, SAY, PAY
I: SIT, PIT
O: COT, POT, SOY, COY

A = [CAR,PAR,PAT,SAT,SAY,PAY,SIT,PIT,COT,POT,SOY,COY] // sorted by last 2

Binsort by first letter:
C: CAR,COT,COY
P: PAR,PAT,PAY,PIT,POT
S: SAT,SAY,SIT,SOY

A = [CAR,COT,COY,PAR,PAT,PAY,PIT,POT,SAT,SAY,SIT,SOY] // sorted
```
Running time: N objects, each value has K keys, each key a number between 0 and M-1.
Running time = K*(M+N).

#### When does this pay?

Suppose you have 1,000,000 numbers each between 0 and 999,999. You can view each 6 digit number as a set of 6 1-digit keys, so N=1,000,000, M=10, K=6. This gives a running time K*(N+M) = 6,000,000. But it is faster just to use a bin sort with the entire number as the key. Then M=1,000,000, so the time is about N+M = 2,000,000. (Obviously this depends on the constants associated with N and M, which is a little tricky.)

However, suppose you have 1000 6-digit numbers of size 1,000,000. Then you can view each number as a pair of 3-digit numbers; e.g. view the number 76,721 as the pair [076,721]. Now if you do a radix sort based on those keys, you have N=1000, M=1000, K=2, so the total time is about K*(N+M) = 4000. Note that N*log(N) = 10,000 so you are s

By the way, for really efficient code, you would not divide by factors of 1000; instead you would use a power of two. For instance, you could write a number between 0 and 999,999 as the pair of two numbers betwee 0 and 210-1 =1023; e.g. 76,721 would be represented by the pair [74,945] since 76,721 = 74*1024 + 945. Why is this better?

On the other hand, suppose you have 20 6-digit numbers of size 1,000,000. Then if you view the numbers as pairs of 3-digit numbers, we have N=20, M=1000, K=2, so the running time is about K*(N+M) = about 2000. Here you do better just to use an O(N*log(N)) comparison sort. For N=20, N*log(N) = 100.

### Bucket sort

Suppose that you are sorting N objects with keys that are floating point numbers between lower bound L and upper bound U. Carry out the following algorithm:
```let DELTA = (U-L)/N.
divide the range [L,U] into N buckets: [L,L+DELTA], [L+DELTA,L+2*DELTA]
... [L+(N-1)*DELTA,U]
create an array BUCKETS[N] of linked lists of objects
for (i=0; i < N; i++)
j = floor((A[i].value-L)/DELTA);
Add A[i] to the list at BUCKETS[j];
}
S = null;
for (j=0; j < N; j++) {
Sort the linked list at BUCKETS[j]
Append the result to the end of S;
}
return S
```
If the points are randomly evenly (uniformly) distributed over the range [L,U] then very few of the linked lists will have any substantial length. The expected running time is O(N).

### Summary of sorting routines

Insertion sort
Values: Comparable.
Running time:
Worst case: O(N2)
Best case: O(N)
Average case: O(N2)
If input is already sorted: O(N)
In place: Yes.
Stable: Yes.
Comment: Fastest for small collections.

Selection sort
Values: Comparable.
Running time:
All cases: O(N2)
Category: Greedy algorithm.
In place: Yes.
Stable: Yes.
Comment: Most obvious; therefore, most generalizable.

Heapsort
Values: Comparable.
Running time:
All cases: O(N*log(N)) (?)
Category: Clever data structure
In place: Yes.
Stable: No.
Comment: Best for extracting K smallest elements in order for K much less than N (O(N + K*log(N)))
Data structure can be used for a priority queue.     In-place, worst-case O(N*log(N)) sorting algorithm (the only practical one I know of.)

Mergesort
Values: Comparable.
Running time:
All cases: O(N*log(N))
Category: Conceptually divide and conquer. Modified for implementation.
In place: No.
Stable: Yes.
Comment: Good for sorting in external storage (disk or tape).

Quicksort
Values: Comparable.
Running time:
Worst case: O(N2)
Best case: O(N*log(N))
Average case: O(N*log(N))
Category: Divide and conquer
In place: Yes.
Stable: No.
Comment: In practice, the fastest of the comparison sorts on average.

Sorting using unbalanced binary search tree Values: Comparable
Running time:
Worst case: O(N2)
Best case: O(N*log(N))
Average case: O(N*log(N))
If input is already sorted: O(N2)
Category: Clever data structure
In place: No.
Stable: No.

Sorting using balanced binary search tree Values: Comparable Running time:
All cases: O(N*log(N))
Category: Clever data structure
In place: No.
Stable: No.
Comment: I have not taught any of these data structures yet.

Binsort
Value: Integer of size M (or equivalent e.g. character).
Running time:
All cases: O(N+M)
In place: No.
Stable: Yes.