## Lecture 13: Running Times of Algorithms: Order of Magnitude, Worst Case, Asymptotic Analysis

### The toString() method for linked lists

There is one method that runs slow in all our linked list definitions; namely, toString().

To illustrate, run the code at TestSlowToString.java.
This uses the definition of singly linked lists in MyList1.java.
The code here:

• Reads a number N from the command line. (In a command-line oriented system, type "java TestSlowToString 1000". This binds the argument args in main to an array of Strings of length 1 where args[0] = "1000". To do this in a JDK or IDE, get the instructions for "command line arguments")
• Creates a linked list with values from 1 to N.
• Constructs the string representation of the linked list, and prints out the number of characters.

If you run this with argument 10000 the sum prints out immediately but there is a noticeable delay before it computes toString(). With argument 20000, there is a delay of several seconds, and with argument of 100,000 it takes a long time. Meanwhile the sum continue to print out immediately (as you would expect; adding 100,000 numbers on a 1 GHz machine takes 100 microseconds.) What is the problem with toString()?

To answer that we have to look under the hood, as they say.

• A String in Java is an immutable object. You cannot change its value, you can only create a new one.
• Therefore, when you append two strings, with a command like S = S + " " Java has to create a new array and copy both S and " " into it. If S is long, this is slow, even though it is a small change.
• Suppose that N=20,000. By the time you get to the 10,000th node, the string S is already about 40,000 characters long. Therefore executing the statement S = S + A.value.toString() + " "; over the remaining 10,000 elements would take more than 400,000,000 operations.
• For general N once you have reached the N/2th element, the string has length (N/2) * log10N characters. So execute the final N/2 iterations of the loop takes at least (N/2) *(N/2) * log10N = N2 * log10N /4 operations. (It turns out that a more accurate estimate is N2 * log10N /2 operations.)
• There is also a problem of garbage. You are discarding 10,000 arrays of more than 40,000 bytes each; thus more than 400 MBytes. This has to be reclaimed by the memory manager. When N=100,000, you end up discarding 20 GBytes. The memory manager has to be called over and over.

The solution (Azam pointed this out to me) is to use a StringBuffer rather than a String. A StringBuffer is an array of characters that may be up to half empty and is modifiable. It keeps a count of the number of characters beng used in the array.

Here's the code: TestFastToString.java.

When you execute the statment S.append(A.getValue().toString()); it just copies A.getValue().toString()); into the empty space in S as long as there is room. When you run out of room, the system allocates an array twice as long for S, and copies the old value of S into it. That is a slow operation, but if you end up with an array with C characters, the doubling only has to be done log2C. For N=100,000, the number of characters C is about 400,000, so the doubling only has to be done log2400,000 = 19 times. Other than at the doubling steps each character is copied only once, into the array holding S. It turns out that the total time is proportional to C = N * log10N.

### Running time for algorithms

How do you measure how long an algorithm takes to run?

#### Experimentally

Program the algorithm. Put together a collection of typical sample problems. Run the program, and measure the time.

Difficulties: Depends on

• The machine, the programming language, the compiler.
• The size of the problems. Small problems run faster than large problems.
• The choice of problems. What is a "natural" choice of problems?

#### Theoretically

Consider how the computation time grows as a function of the size of the problem.
• Consider only the general growth pattern. Let N be the size of the problem
• Bounded by a constant.
• Proportional to N: 2*N, 100*N, 1,000,000*N are all considered the same thing.
• Proportional to log(N).
• Proportional to N2.
• etc.
Difficulties: Ignoring the constant may be unrealistic. There are algorithms that are in principle good that are never used in practice because the constant factor is too large.
• Which problems of size N?
• Average case.
Problems:
• What do you mean by average? For instance, suppose you are computing whether element x is in linked list L. You can ask what is the average time to answer if x is is in the list. But what is the probability that x is not in the list?
• Often very hard to compute.
• Worst case. Problem: Unduly pessimistic. There are algorithms that run badly in the worst case that are often used in practice, because the worst case is very rare.

### Standard measure for algorithm A

For any N, consider all problems of size N. Let f(N) be the time that A takes on problem of size N on which it runs slowest. Describe the growth rate of f as a function of N.

"Order of magnitude" --- The general growth rate, ignoring constant factors.
"asymptotic" --- as N gets large.
"worst case" --- the worst problem of size N.
analysis.

Advantage: Same for all programming languages, all compilers, (practically) all machines (abstracting away finite memory), (practically) all computational models.

Exceptions: Models with arbitrary amounts of parallelism. Quantum computers.

Mathematical notation.
Assume f(n) and g(n) are functions that are always positive.

f(n) is O(g(n))
means: There is a constant c such that f(n) <= c*g(n) for all n>=1. Examples:

if f(n) = 100n2 and g(n) = 2n2 then f(n) < c*g(n) for c=51 or higher, so f(n) is O(g(n)).

if f(n) = 100,000n and g(n)=n^{3} then f(n) <= 100,000*g(n) so f(n) is O(g(n)).

if f(n) = n3 and g(n) = 100,000n then f(n) is not O(g(n)). Proof: Choose any value of c. Let n = 100*c. Then f(n)/g(n) = 1,000,000 n3 / 100,000 n = 10 n2, so f > c*g(n)

Rules: If f or g is a sum ignore all but the fastest growing term. Ignore any constant factor.
Example: If f(n)=5n3 + 2n2 + 242, just treat it as n3

Powers of n go like the exponent.
Example: n1/2 is O(n2) but not vice versa.

Exponentials go like the base.
Example: 2n is O(3n) but not vice versa.

Logarithms and powers of logarithms are slower than power. Example: log(n)2 is O(n1/2) but not vice versa.

n log n is between n and n2 (and much closer to n).

Used in the form "< Running time > is O( < mathematical function > )"

Examples.

• The time to add an item to the front of a linked list is O(1).
• The time to find an item in a linked list is O(N)
• The time to find an item in an ordered array is O(log n)

### What is the "size" of a problem?

Computation theory: The number of bits.

Usual usage: Some reasonable, relevant measure of size. The length of a linked list. The size of a set. The length of a string. etc.

There may be more than one size parameter.
E.g. The time to compute the intersection of two ordered lists L and M is O(|L|+|M|).