We are interested in designing good

**algorithms** (a step-by-step procedure for performing
some task in a finite amount of time) and good

**data structures** (a systematic way of organizing and
accessing data).

Unlike v22.102, however, we wish to determine rigorously just
**how good** our algorithms and data structures really
are and whether **significantly better** algorithms are
possible.

We will be primarily concerned with the speed (*time
complexity*) of algorithms.

- Sometimes the
*space complexity*is studied. - The time depends on the input, most often on the size of the input.
- We can run experiments.
- Must choose
*sufficiently many, representative*inputs. - Must use identical hardware to compare algorithms.
- Must
*implement*the algorithm.

- Must choose

We will emphasize instead and analytic framework that is independent of input and hardware, and does not require an implementation. The disadvantage is that we can only estimate the time required.

- Often we ignore multiplicative constants and small input values.
- So we consider
`f(x)=x`equivalent to^{3}-20x^{2}`g(x)=10x`^{3}+10x^{2} - Huh??
- Easy to see that for say
`x > 100, f(x) < 10 g(x)`and`g(x) < 10 f(x)`.

**Homework:** Unless otherwise stated homework
problems are from the last section in the current book chapter.
R-1.1 and R-1.2.

Designed for human understanding. Suppress unimportant details and describe some parts in natural language (English in this course).

The key difference from reality is the assumption of a very simple memory model: Accessing any memory element takes a constant amount of time. This ignores caching and paging for example. It also ignores the word-size of a computer (any size number can be stored in one word and accessed in one operation time).

The time required is simply a count of the **primitive
operations** executed. Primitive operations include

- Assign a value to a variable (independent of the size of the value; but the variable must be a scalar).
- Method invocation, i.e., calling a function or subroutine.
- Performing a (simple) arithmetic operation (divide is OK, logarithm is not).
- Indexing into an array (for now just one dimensional; scalar access is free).
- Following an object reference.
- Returning from a method.

Let's start with a simple algorithm (the book does a different simple algorithm, maximum).

Algorithm innerProduct Input: Non-negative integer n and two integer arrays A and B of size n. Output: The inner product of the two arrays prod ← 0 for i ← 0 to n-1 do prod ← prod + A[i]*B[i] return prod

- Line 1 is one op (assigning a value).
- Loop initializing is one op (assigning a value).
- Line 3 is five ops per iteration (mult, add, 2 array refs, assign).
- Line 3 is executed n times; total is 5n.
- Loop incrementation is two ops (an addition and an assignment)
- Loop incrementation is done n times; total is 2n.
- Loop termination test is one op (a comparison i<n) each time.
- Loop termination is done n+1 times (n successes, one failure); total is n+1.
- Return is one op.

The total is thus `1+1+5n+2n+(n+1)+1 = 8n+4`.

Let's improve it (a very little bit)

Algorithm innerProductBetter Input: Non-negative integer n and two integer arrays A and B of size n. Output: The inner product of the two arrays prod ← A[0]*B[0] for i ← 1 to n-1 do prod ← prod + A[i]*B[i] return prod

The cost is `4+1+5(n-1)+2(n-1)+n+1 = 8n-1`

**THIS ALGORITHM IS WRONG!!**

If n=0, we access A[0] and B[0], which do not exist. The original
version returns zero as the inner product of empty arrays, which is
arguably correct. The best fix is perhaps to change Non-negative

to Positive

. Let's call this algorithm innerProductBetterFixed.

What about if statements?

Algorithm countPositives Input: Non-negative integer n and an integer array A of size n. Output: The number of positive elements in A pos ← 0 for i ← 0 to n-1 do if A[i] > 0 then pos ← pos + 1 return pos

- Line 1 is one op.
- Loop initialization is one op
- Loop termination test is n+1 ops
- The if test is performed n times; each is 2 ops
- Return is one op
- The update of pos is 2 ops but is done ??? times.
- What do we do?

Let `U` be the number of updates done.

- The total number of steps is
`1+1+(n+1)+2n+1+2U = 4+3n+2U`. - The
**best case**occurs when`U=0`(i.e., no numbers are positive and gives an answer of 4+3n. - The
**worst case**occurs when`U=n`(i.e., all numbers are positive and gives an answer of 4+5n. - To determine the
**average case**result is much harder as it requires knowing the input distribution (i.e., are positive numbers likely) and requires probability theory.

Consider a recursive version of innerProduct. If the arrays are of size 1, the answer is clearly A[0]B[0]. If n>1, we recursively get the inner product of the first n-1 terms and then add in the last term.

Algorithm innerProductRecursive Input: Positive integer n and two integer arrays A and B of size n. Output: The inner product of the two arrays if n=1 then return A[0]B[0] return innerProductRecursive(n-1,A,B) + A[n-1]B[n-1]

How many steps does the algorithm require? Let T(n) be the number of steps required.

- If n=1 we do a comparison, two fetches, a product, and a return.
- So T(1)=5.
- If n>1, we do a comparison, a subtraction, a method call, the recursive computation, two fetches, a product, a sum and a return.
- So T(n) = 1 + 1 + 1 + T(n-1) + 2 + 1 + 1 + 1 = T(n-1)+8.
- This is called a
**recurrence equation**. In general these are quite difficult to solve in**closed form**, i.e. without T on the right hand side. - For this simple recurrence, one can see that T(n)=8n-3 is the solution.
- We will learn more about recurrences later.

**Homework:** I should have given some last time.
It is listed in the notes (search for homework). Also some will be
listed this time. **BUT**, due to the Jewish holiday,
none is officially assigned. You can get started if you wish since
all will eventually be assigned, but none will be collected next class.

Now we are going to be less precise and worry only about approximate answers for large inputs.

Big-OhNotation

**Definition**: Let f(n) and g(n) be real-valued
functions of a single non-negative integer argument.
We write `f(n)` is `O(g(n))` if there is a positive real
number `c` and a positive integer `n _{0}`
such that

What does this mean?

For large inputs (`n≤n _{0}`),

Examples to do on the board

`3n-6`is`O(n)`. Some less common ways of saying the same thing follow.`3x-6`is`O(x)`.- If
`f(y)=3y-6`and`id(y)=y`, then`f(y)`is`O(id(y))`. `3n-6`is`O(2n)``9n`is^{4}+12n^{2}+1234`O(n`.^{4})`innerProduct`is`O(n)``innerProductBetter`is`O(n)``innerProductFixed`is`O(n)``countPositives`is`O(n)``n+log(n)`is`O(n)`.`log(n)+5log(log(n))`is`O(log(n))`.`12345`is^{54321}`O(1)`.`3/n`is`O(1)`. True but not the best.`3/n`is`O(1/n)`. Much better.`innerProduct`is`O(100n+log(n)+34.5)`. True, but awful.

A few theorems give us rules that make calculating big-Oh easier.

**Theorem** (arithmetic): Let `d(n)`,
`e(n)`, `f(n)`, and `g(n)` be nonnegative
real-valued functions of a nonnegative integer argument and assume
`d(n)` is `O(f(n))` and `e(n)` is
`O(g(n))`. Then

`ad(n)`is`O(f(n))`for any nonnegative a`d(n)+e(n)`is`O(f(n)+g(n))``d(n)e(n)`is`O(f(n)g(n))`

**Theorem** (transitivity): Let `d(n)`,
`f(n)`, and `g(n)` be nonnegative real-valued functions
of a nonnegative integer argument and assume `d(n)` is
`O(f(n))` and `f(n)` is `O(g(n))`. Then
`d(n)` is `O(g(n))`.

**Theorem** (special functions): (Only `n` varies)

- If
`f(n)`is a polynomial of degree d, then`f(n)`is`O(n`.^{d}) `n`is^{x}`O(a`for any^{n})`x>0``and a>1`.`log(n`is^{x})`O(log(n))`for any`x>0``(log(n))`is^{x}`O(n`for any^{y})`x>0`and`y>0`.

**Homework:** R-1.19 R-1.20

**Definitions**: (Common names)

- If a function is
`O(log(n))`we call it**logarithmic**. - If a function is
`O(n)`we call it**linear**. - If a function is
`O(n`we call it^{2})**quadratic**. - If a function is
`O(n`) with^{k}`k≥1`, we call it**polynomial**. - If a function is
`O(a`with^{n})`a>1`, we call it**exponential**.

**Homework:** R-1.10 and R-1.12.

R-1.13: The outer (i) loop is done 2n times. For each outer
iteration the inner loop is done i times. Each inner iteration is a
constant number of steps so each inner loop is O(i), which is the time
for the ith iteration of the outer loop. So the entire outer loop is
σO(i) i from 0 to 2n, which is O(n^{2}).

Relativesof the Big-Oh

Recall that f(n) is (g(n)) if for large n, f is not much smaller
than g. That is g is some sort of **upper** bound on f.
How about a definition for the case when g is (in the same sense) a
**lower** bound for f?

**Definition**: Let f(n) and g(n) be real valued
functions of an integer value. Then **f(n) is
Ω(g(n))** if g(n) is O(f(n)).

**Remarks**:

- We pronounce f(n) is Ω(g(n)) as "f(n) is big-Omega of g(n)".
- What the last definition says is that we say f(n) is not much smaller than g(n) if g(n) is not much bigger than f(n), which sounds reasonable to me.
- What if f(n) and g(n) are about equal, i.e., neither is much bigger than the other?

**Definition**: We write
**f(n) is Θ(g(n))** if **both**
f(n) is O(g(n)) **and** f(n) is Θ(g(n)).

**Remarks**
We pronounce f(n) is Θ(g(n)) as "f(n) is big-Theta of g(n)"

Examples to do on the board.

- 2x
^{2}+3x is θ(x^{2}). - 2x
^{3}+3x is**not**θ(x^{2}). - 2x
^{3}+3x is Ω(x^{2}). - innerProductRecutsive is Θ(n).
- binarySearch is Θ(log(n)). Unofficial for now.
- If f(n) is Θ(g(n)), the f(n) is &Omega(g(n)).
- If f(n) is Θ(g(n)), then f(n) is O(g(n)).

**Homework:** R-1.6

Recall that big-Oh captures the idea that for large n, f(n) is not much bigger than g(n). Now we want to capture the idea that, for large n, f(n) is tiny compared to g(n).

If you remember limits from calculus, what we want is that f(n)/g(n)→0 as n→∞. However, the definition we give does not use limits (it essentially has the definition of a limit built in).

**Definition**: Let f(n) and g(n) be real valued
functions of an integer variable. We say
**f(n) is o(g(n))**
if for **any** c>0,
there is an n_{0} such that
f(n)≤g(n) for all n>n_{0}.
This is pronounced as "f(n) is little-oh of g(n)".

**Definition**: Let f(n) and g(n) be real valued
functions of an integer variable. We say
**f(n) is ω(g(n)** if
g(n) is o(f(n)). This is pronounced as "f(n) is little-omega of g(n)".

**Examples**: log(n) is o(n) and x^{2} is
ω(nlog(n)).

**Homework:** R-1.4. R-1.22

**Remark**: I changed my mind about homework. Too many
to have each one really graded. We now have homeworks and problem
sets as explained here.

If the asymptotic time complexity is bad, say n^{5}, or
horrendous, say 2^{n}, then for large n, the algorithm will
definitely be slow. Indeed for exponential algorithms even modest n's
(say n=50) are hopeless.

Algorithms that are o(n) (i.e., faster than linear, a.k.a. sub-linear), e.g. logarithmic algorithms, are very fast and quite rare. Note that such algorithms do not even inspect most of the input data once. Binary search has this property. When you look a name in the phone book you do not even glance at a majority of the names present.

Linear algorithms (i.e., Θ(n)) are also fast. Indeed, if the time complexity is O(nlog(n)), we are normally quite happy.

Low degree polynomial (e.g., Θ(n^{2}),
Θ(n^{3}), Θ(n^{4})) are interesting. They
are certainly not fast but speeding up a computer system by a factor
of 1000 (feasible today with parallelism) means that a
Θ(n^{3}) algorithm can solve a problem 10 times larger.
Many science/engineering problems are in this range.

It really is true that if algorithm A is o(algorithm B) then for
large problems A will take **much** less time than B.

**Definition**: If (the number of operations in)
algorithm A is o(algorithm B), we call A
**asymptotically faster** than B.

**Example:**: The following sequence of functions are
ordered by **growth rate**, i.e., each function is
little-oh of the subsequent function.

log(log(n)), log(n), (log(n))^{2}, n^{1/3},
n^{1/2}, n, nlog(n), n^{2}/(log(n)), n^{2},
n^{3}, 2^{n}.

Modest multiplicative constants (as well as immodest additive constants) don't cause too much trouble. But there are algorithms (e.g. the AKS logarithmic sorting algorithm) in which the multiplicative constants are astronomical and hence, despite its wonderful asymptotic complexity, the algorithm is not used in practice.

See table 1.10 on page 20.

**Homework:** R-1.7

This is hard to type in using html. The book is fine and I will write the formulas on the board.

**Definition**:
The sigma notation: ∑f(i) with i going from a to b.

**Theorem**: Assume 0<a≠1.
Then ∑a^{i} i from 0 to n = (1-a^{n+1})/(1-a).

**Proof**:
Cute trick. Multiply by a and subtract.

**Theorem**:
∑i from 1 to n = n(n+1)/2.

**Proof**:
Pair the 1 with the n, the 2 with the (n-1), etc. This gives a bunch
of (n+1)s. For n even it is clearly n/2 of them. For odd it is the
same (look at it).

Recall that log_{b}a = c means that b^{c}=a.
b is called the base and c is called the exponent.

What is meant by log(n) when we don't specify the base?

- Some people use base 10 by default.
- Mathematicians use base e.
- We will use base 2 (common in computer science)

I assume you know what a^{b} is. (Actually this is not so
obvious. Whatever 2 raised to the square root of 3 means it is
**not** writing 2 down the square root of 3 times and
multiplying.) So you also know that
a^{x+y}=a^{x}a^{y}.

**Theorem**:
Let a, b, and c be positive real numbers. To ease writing, I will use
base 2 often. This is not needed. Any base would do.

- log(ac) = log(a)+log(c)
- log(a/c) = log(a) - log(c)
- log(a
^{c}) = c log(a) - log
_{c}(a) = (log(a))/log(c): consider a = c^{logca}and take log of both sides. - c
^{log(a)}= a^{log(c)}: take log of both sides. - (b
^{a})^{c}= b^{ac} - b
^{a}b^{c}= b^{a+c} - b
^{a}/b^{c}= b^{a-c}

- log(2nlog(n)) = 1 + log(n) + log(log(n)) is Θ(log(n))
- log(log(sqrt(n))) = log(.5log(n)) = log(.5)+log(log(n)) = -1 + log(log(n)) = Θ(log(log(n))
- log(2
^{n}) = n = 2^{log(n)}

**Homework:** C-1.12

⌊x⌋ is the greatest integer not greater than x. ⌈x⌉ is the least integer not less than x.

⌊5⌋ = ⌈5⌉ = 5

⌊5.2⌋ = 5 and ⌈5.2⌉ = 6

⌊-5.2⌋ = -6 and ⌈-5.2⌉ = -5

To prove the claim that **there is an n** greater than
1000, we merely have to note that 1001 is greater than 1001.

To refute the claim that **all n are** greater than
1000, we merely have to note that 999 is not greater than 1000.

"P implies Q" is the same as "not Q implies not P". So to show that no prime is a square we note that "prime implies not square" is the same is "not (not square) implies not prime", i.e. "square implies not prime", which is obvious.

Assume what you want to prove is **false** and derive
a contradiction.

**Theorem**:
There are an infinite number of primes.

**Proof**:
Assume not. Let the primes be p_{1} up to p_{k} and
consider the number
A=p_{1}p_{2}…p_{k}+1.
A has remainder 1 when divided by any p_{i} so cannot have any
p_{i} as a factor. Factor A into primes. None can be
p_{i} (A may or may not be prime). But we assumed that all
the primes were p_{i}. Contradiction. Hence our assumption
that we could list all the primes was false.

The goal is to show the truth of some statement for all integers n≥1. It is enough to show two things.

- The statement is true for n=1
**IF**the statement is true for all k<n, then it is true for n.

**Theorem**:
A complete binary tree of height h has 2^{h}-1 nodes.

**Proof**:
We write NN(h) to mean the number of nodes in a complete binary tree
of height h.
A complete binary tree of height 1 is just a root so NN(1)=1 and
2^{1}-1 = 1.
Now we assume NN(k)=2^{k}-1 nodes for all k<h and consider a complete
binary tree of height h. It is just a complete binary tree of height
h-1 with new leaf nodes added. How many new leaves?

Ans. 2^{h-1} (this could be proved by induction as a lemma, but
is fairly clear without induction).

Hence NN(h)=NN(h-1)+2^{h-1} =
(2^{h-1}-1)+2^{h-1} =
2(2^{h-1})-1=2^{h}-1.

**Homework:** R-1.9

Very similar to induction. Assume we have a loop with controlling
variable i. For example a "`for i←0 to n-1`" loop. We then
associate with the loop a statement S(j) depending on j such that

- S(0) is true (just) before the loop begins
**IF**S(j-1) holds before iteration j begins, then S(j) will hold when iteration j ends.

I favor having array and loop indexes starting at zero. However, here it causes us some grief. We must remember that iteration j occurs when i=j-1.

**Example:**:
Recall the countPositives algorithm

Algorithm countPositives Input: Non-negative integer n and an integer array A of size n. Output: The number of positive elements in A pos ← 0 for i ← 0 to n-1 do if A[i] > 0 then pos ← pos + 1 return pos

Let S(j) be "pos equals the number of positive values in the first j elements of A".

Just before the loop starts S(0) is true
**vacuously**. Indeed that is the purpose of the first
statement in the algorithm.

Assume S(j-1) is true before iteration j, then iteration j (i.e., i=j-1) checks A[j-1] which is the jth element and updates pos accordingly. Hence S(j) is true after iteration j finishes.

Hence we conclude that S(n) is true when iteration n concludes, i.e. when the loop terminates. Thus pos is the correct value to return.

Skipped for now.

We trivially improved innerProduct (same asymptotic complexity before and after). Now we will see a real improvement. For simplicity I do a slightly simpler algorithm, prefix sums.

Algorithm partialSumsSlow Input: Positive integer n and a real array A of size n Output: A real array B of size n with B[i]=A[0]+…+A[i] for i ← 0 to n-1 do s ← 0 for j ← 0 to i do s ← s + A[j] B[i] ← s return B

The update of s is performed 1+2+…+n times. Hence the
running time is Ω(1+2+…+n)=&Omega(n^{2}).
In fact it is easy to see that the time is &Theta(n^{2}).

Algorithm partialSumsFast Input: Positive integer n and a real array A of size n Output: A real array B of size n with B[i]=A[0]+…+A[i] s ← 0 for i ← 0 to n-1 do s ← s + A[i] B[i] ← s return B

We just have a single loop and each statement inside is O(1), so the algorithm is O(n) (in fact Θ(n)).

**Homework:** Write partialSumsFastNoTemps, which is also
Θ(n) time but avoids the use of s (it still uses i so my name is not
great).

Often we have a data structure supporting a number of operations that will be applied many times. For some data structures, the worst-case running time of the operations may not give a good estimate of how long a sequence of operations will take.

If we divide the running time of the sequence by the number of
operations performed we get the average time for each operation in the
sequence,, which is called the **amortized running
time**.

Why amortized?

Because the cost of the occasional expensive application is
amortized over the numerous cheap application (I think).

**Example:**: (From the book.) The clearable table.
This is essentially an array. The table is initially empty (i.e., has
size zero). We want to support three operations.

- Add(e): Add a new entry to the table at the end (extending its size).
- Get(i): Return the ith entry in the table.
- Clear(): Remove all the entries by setting each entry to zero (for security) and setting the size to zero.

The obvious implementation is to use a large array A and an integer s indicating the current size of A. More precisely A is (always) of size N (large) and s indicates the extent of A that is currently in use.

We are ignoring a number of error cases.

We start with a size zero table and assume we perform n (legal) operations. Question: What is the worst-case running time for all n operations.

One possibility is that the sequence consists of n-1 add(e)
operations followed by one Clear(). The Clear() takes Θ(n),
which is the worst-case time for any operation (assuming n operations
in total). Since there are n operations and the worst-case is
Θ(n) for one of them, we might think that the worst-case
sequence would take Θ(n^{2}).

But this is wrong.

It is easy to see that Add(e) and Get(i) are Θ(n).

The **total** time for all the Clear() operations is
O(n) since in total O(n) entries were cleared (since at most n entries
were added).

Hence, the amortized time for each operation in the clearable ADT (abstract data type) is O(1), in fact Θ(1).

Overcharge for cheap operations and undercharge expensive so that the excess charged for the cheap (the profit) covers the undercharge (the loss). This is called in accounting an amortization schedule.

Assume the get(i) and add(e) really cost one ``cyber-dollar'', i.e., there is a constant K so that they each take fewer than K primitive operations and we let a ``cyber-dollar'' be K. Similarly, assume that clear() costs P cyber-dollars when the table has P elements in it.

We charge 2 cyber-dollars for every operation. So we have a profit of 1 on each add(e) and we see that the profit is enough to cover next clear() since if we clear P entries, we had P add(e)s.

All operations cost 2 cyber-dollars so n operations cost 2n. Since we have just seen that the real cost is no more than the cyber-dollars spent, the total cost is Θ(n) and the amortized cost is Θ(1).

Very similar to the accounting method. Instead of banking money, you increase the potential energy. I don't believe we will use this methods so we are skipping it.

Want to let the size of an array grow dynamically (i.e., during execution). The implementation is quite simple. Copy the old array into a new one twice the size. Specifically, on an array overflow instead of signaling an error perform the following steps (assume the array is A and the size is N)

- Allocate a new array B of size 2N
- For i←0 to N-1 do B[i]←A[i]
- Make A refer to B (this is A=B in C and java).
- Deallocate the old A (automatic in java; error prone in C)

The cost of this growing operation is Θ(N).

**Theorem**:
Given an extendable array A that is initially empty and of size N, the
amortized time to perform n add(e) operations is Θ(1).

**Proof**:
Assume one cyber dollar is enough for an add w/o the grow and that N
cyber-dollars are enough to grow from N to 2N.
Charge 2 cyber dollars for each add; so a profit of 1 for each add w/o
growing.
When you must do a grow, you had N adds so have N dollars banked.

Book is quite clear. I have little to add.

You might want to know

- Average running time.
- Compare two algorithms for speed.
- Determine the running time dependence of parameters of the algorithm.
- For algorithms that generate approximations, test how close they come to the correct value.

- Memory references (increasingly important--unofficial hardware comment).
- Comparisons (for sorting, searching, etc).
- Arithmetic ops (for numerical problems).

Assume you believe the running time t(n) of an algorithm is
Θ(n^{d}) for some specific d and you want to both
verify your assumption and find the multiplicative constant.

Make a plot of (n, t(n)/n^{d}). If you are right the
points should tend toward a horizontal line and the height of this
line is the multiplicative constant.

**Homework:** R-1.29

What if you believe it is polynomial but don't have a guess for d?

Ans: Use ...

Plot (n, t(n)) on log log paper. If t(n) is
Θ(n^{d}), say t(n) approaches bn^{d}, then
log(t(n)) approaches log(b)+d(log(n)).

So when you plot (log(n), log(t(n)) (i.e., when you use log log paper), you will see the points approach (for large n) a straight line whose slope is the exponent d and whose y intercept is the multiplicative constant d.

**Homework:** R-1.30