To illustrate, run the code at
TestSlowToString.java.

This uses the definition of singly linked lists in
MyList1.java.

The code here:

- Reads a number N from the command line. (In a command-line oriented
system, type "java TestSlowToString 1000". This binds the argument
`args`in`main`to an array of`Strings`of length 1 where`args[0] = "1000"`. To do this in a JDK or IDE, get the instructions for "command line arguments") - Creates a linked list with values from 1 to N.
- Adds up the elements in the linked list, for comparison.
- Constructs the string representation of the linked list, and prints out the number of characters.

If you run this with argument 10000 the sum prints out immediately but there is a noticeable delay before it computes toString(). With argument 20000, there is a delay of several seconds, and with argument of 100,000 it takes a long time. Meanwhile the sum continue to print out immediately (as you would expect; adding 100,000 numbers on a 1 GHz machine takes 100 microseconds.) What is the problem with toString()?

To answer that we have to look under the hood, as they say.

- A String in Java is an immutable object. You cannot change its value, you can only create a new one.
- Therefore, when you append two strings, with a command like
`S = S + " "`Java has to create a new array and copy both`S`and " " into it. If`S`is long, this is slow, even though it is a small change. - Suppose that
`N=20,000`. By the time you get to the 10,000th node, the string S is already about 40,000 characters long. Therefore executing the statement`S = S + A.value.toString() + " ";`over the remaining 10,000 elements would take more than 400,000,000 operations. - For general
`N`once you have reached the`N/2`th element, the string has length`(N/2) * log`characters. So execute the final_{10}N`N/2`iterations of the loop takes at least`(N/2) *(N/2) * log`operations. (It turns out that a more accurate estimate is_{10}N = N^{2}* log_{10}N /4`N`operations.)^{2}* log_{10}N /2 - There is also a problem of garbage. You are discarding 10,000 arrays of more than 40,000 bytes each; thus more than 400 MBytes. This has to be reclaimed by the memory manager. When N=100,000, you end up discarding 20 GBytes. The memory manager has to be called over and over.

The solution (Azam pointed this out to me) is to use a `StringBuffer`
rather
than a String. A `StringBuffer`
is an array of characters that may be up to
half empty and is modifiable. It keeps a count of the number of characters
beng used in the array.

Here's the code:
TestFastToString.java.

When you execute the statment
`S.append(A.getValue().toString());` it just copies
`A.getValue().toString());` into the empty space in `S`
as long as there is room. When you run out of room, the system allocates
an array twice as long for `S`, and copies the old value of
`S` into it. That is a slow operation, but if you end
up with an array with `C` characters, the doubling only has to be
done `log _{2}C`. For

**Difficulties:** Depends on

- The machine, the programming language, the compiler.
- The size of the problems. Small problems run faster than large problems.
- The choice of problems. What is a "natural" choice of problems?

- Consider only the general growth pattern. Let N be the size of the problem
- Bounded by a constant.
- Proportional to N: 2*N, 100*N, 1,000,000*N are all considered the same thing.
- Proportional to log(N).
- Proportional to N
^{2}. - etc.

- Which problems of size N?
- Average case.

Problems: - What do you mean by average?
For instance, suppose you are computing whether
element x is in linked list L. You can ask what is the average time
to answer if x
*is*is in the list. But what is the probability that x is*not*in the list? - Often very hard to compute.

- Average case.
- Worst case. Problem: Unduly pessimistic. There are algorithms that run badly in the worst case that are often used in practice, because the worst case is very rare.

"Order of magnitude" --- The general growth rate, ignoring constant factors.

"asymptotic" --- as N gets large.

"worst case" --- the worst problem of size N.

analysis.

Advantage: Same for all programming languages, all compilers, (practically) all machines (abstracting away finite memory), (practically) all computational models.

Exceptions: Models with arbitrary amounts of parallelism. Quantum computers.

Mathematical notation.

Assume f(n) and g(n) are functions that are always positive.

if f(n) = 100n^{2}
and g(n) = 2n^{2}
then f(n) < c*g(n) for c=51 or higher, so f(n) is O(g(n)).

if f(n) = 100,000n and g(n)=n^{3} then f(n) <= 100,000*g(n) so f(n) is O(g(n)).

if f(n) = n^{3} and g(n) = 100,000n then f(n) is not O(g(n)).
Proof: Choose any value of c. Let n = 100*c. Then f(n)/g(n) =
1,000,000 n^{3} / 100,000 n = 10 n^{2}, so f > c*g(n)

Rules: If f or g is a sum ignore all but the fastest growing term. Ignore
any constant factor.

Example: If f(n)=5n^{3} + 2n^{2} + 242, just treat it as
n^{3}

Powers of n go like the exponent.

Example: n^{1/2} is O(n^{2}) but not vice versa.

Exponentials go like the base.

Example: 2^{n} is O(3^{n}) but not vice versa.

Logarithms and powers of logarithms are slower than power.
Example: log(n)^{2} is O(n^{1/2}) but not vice versa.

n log n is between n and n^{2} (and much closer to n).

Used in the form "< Running time > is O( < mathematical function > )"

Examples.

- The time to add an item to the front of a linked list is O(1).
- The time to find an item in a linked list is O(N)
- The time to find an item in an ordered array is O(log n)

Usual usage: Some reasonable, relevant measure of size. The length of a linked list. The size of a set. The length of a string. etc.

There may be more than one size parameter.

E.g. The time to compute the intersection of two ordered lists L and M
is O(|L|+|M|).