Monday and Wednesday 11-12:15

Ciww 102

I start at 0 so that when we get to chapter 1, the numbering will agree with the text.

- gottlieb@nyu.edu (best method)
- http://allan.ultra.nyu.edu/~gottlieb
**two**el's in allan - 715 Broadway, Room 712

There is a web site for the course. You can find it from my home page.

- You can find these lecture notes on the course home page. Please let me know if you can't find it.
- I mirror my home page on the CS web site.
- I also mirror the course pages on the CS web site.
**But**, the official site is allan.ultra.nyu.edu. It is the one I personally manage.- The notes will be updated as bugs are found.
- I will also produce a separate page for each lecture after the lecture is given. These individual pages might not get updated as quickly as the large page

The course text is Goodrich and Tamassia: ``Algorithm Design: Foundations, Analysis, and Internet Examples.

- Available in bookstore.
- I expect to cover most of part I and some of part II

- You are entitled to a computer account, please get it asap.
- Sign up for the Mailman mailing list for the course. http://www.cs.nyu.edu/mailman/listinfo/v22_0310_002_fa03
- If you want to send mail just to me, use gottlieb@nyu.edu not the mailing list.
- Questions about the lectures or homeworks should go to the mailing list. You may answer questions posed on the list as well.
- I will respond to all questions; if another student has answered the question before I get to it, I will confirm if the answer given is correct.

The major components of the grade will be the midterm, the final, and problem sets. I will post (soon) the weights for each.

We will have a midterm. As the time approaches we will vote in class for the exact date. Please do not schedule any trips during days when the class meets until the midterm date is scheduled.

If you had me for 202, you know that in systems courses I also
assign labs. Basic algorithms is **not** a systems
course; there are no labs. There are homeworks and problem sets,
very few if any of these will require the computer. There is a
distinction between homeworks and problem sets.

Problem sets are

*Required*.- Due several lectures later (date given on assignment).
- Graded and form part of your final grade.
- Penalized for lateness, up to one week.

- Optional.
- Due the beginning of
*Next*lecture. - Not accepted late.
- Mostly from the book.
- Collected and returned.
- Able to help, but not hurt, your grade.

I run a recitation session on tuesdays from 2-3:15. I believe there is another recitation section. You need attend only one.

Good methods for obtaining help include

- Asking me during office hours (see web page for my hours).
- Asking the mailing list.
- Asking another student, but ...

**Your homeworks and problem sets must be your own**.

I use the upper left board for homework assignments and announcements. I should never erase that board. Viewed as a file it is group readable (the group is those in the room), appendable by just me, and (re-)writable by no one. If you see me start to erase an announcement, let me know.

It is university policy that a student's request for an incomplete be granted only in exceptional circumstances and only if applied for in advance. Naturally, the application must be before the final exam.

We are interested in designing good

**algorithms** (a step-by-step procedure for performing
some task in a finite amount of time) and good

**data structures** (a systematic way of organizing and
accessing data).

Unlike v22.102, however, we wish to determine rigorously just
**how good** our algorithms and data structures really
are and whether **significantly better** algorithms are
possible.

We will be primarily concerned with the speed (*time
complexity*) of algorithms.

- Sometimes the
*space complexity*is studied. - The time depends on the input, often on just the amount input. For example, the time required to sum N numbers depends on N but not on the numbers themselves (we assume all values fit in one ``word''.
- One could run experiments in order to determine the space complexity.
- Must choose
*sufficiently many, representative*inputs. - Must use identical hardware to compare algorithms.
- Must
*implement*the algorithm.

- Must choose

We will emphasize instead an analytic framework that is independent of input and hardware, and does not require an implementation. The disadvantage is that we can only estimate the time required.

- Often we ignore multiplicative constants and small input values.
- So we consider
`f(x)=x`equivalent to^{3}-20x^{2}`g(x)=10x`^{3}+10x^{2} - Huh??
- Easy to see that for say
`x > 100, f(x) < 10 g(x)`and`g(x) < 10 f(x)`.

**Homework:** R-1.1 and R-1.2
(Unless otherwise stated, homework
problems are from the last section in the current book chapter.)

Designed for human understanding. Suppress unimportant details and describe some parts in natural language (English in this course).

The key difference between the RAM model and a real computer is the assumption of a very simple memory model: Accessing any memory element takes a constant amount of time. This ignores caching and paging for example. (It also assumes the word-size of a computer is large enough to hold any address, which is generally valid for modern-day computers, but was not always the case.)

The time required is simply a count of the **primitive
operations** executed. There are several different possible
sets of primitive operations. For this course we will use

- Assigning a value to a variable (independent of the size of the value; but the variable must be a scalar).
- Method invocation, i.e., calling a function or subroutine.
- Performing a (simple) arithmetic operation (divide is OK, logarithm is not).
- Indexing into an array (for now just one dimensional; scalar access is free).
- Following an object reference.
- Returning from a method.

Let's start with a simple algorithm (the book does a different simple algorithm, maximum).

Algorithm innerProduct Input: Non-negative integer n and two integer arrays A and B of size n. Output: The inner product of the two arrays. prod ← 0 for i ← 0 to n-1 do prod ← prod + A[i]*B[i] return prod

- Line 1 is one op (assigning a value).
- Loop initializing is one op (assigning a value).
- Line 3 is five ops per iteration (mult, add, 2 array refs, assign).
- Line 3 is executed n times; total is 5n.
- Loop incrementation is two ops (an addition and an assignment)
- Loop incrementation is done n times; total is 2n.
- Loop termination test is one op (a comparison i<n).
- Loop termination is done n+1 times (n successes, one failure); total is n+1.
- Return is one op.

The total is thus `1+1+5n+2n+(n+1)+1 = 8n+4`.

**Homework:** Perform a similar analysis for
the following algorithm

Algorithm tripleProduct Input: Non-negative integer n and three integer arrays A. B, and C each of size n. Output: The A[0]*B[0]*C[0] + ... + A[n-1]*B[n-1]*C[n-1] prod ← 0 for i ← 0 to n-1 do prod ← prod + A[i]*B[i]*C[i] return prod

Let's speed up innerProduct (a very little bit).

Algorithm innerProductBetter Input: Non-negative integer n and two integer arrays A and B of size n. Output: The inner product of the two arrays prod ← A[0]*B[0] for i ← 1 to n-1 do prod ← prod + A[i]*B[i] return prod

The cost is `4+1+5(n-1)+2(n-1)+n+1 = 8n-1`

**THIS ALGORITHM IS WRONG!!**

If n=0, we access A[0] and B[0], which do not exist. The original
version returns zero as the inner product of empty arrays, which is
arguably correct. The best fix is perhaps to change Non-negative

to Positive

in the Input specification.
Let's call this algorithm innerProductBetterFixed.

What about if statements?

Algorithm countPositives Input: Non-negative integer n and an integer array A of size n. Output: The number of positive elements in A pos ← 0 for i ← 0 to n-1 do if A[i] > 0 then pos ← pos + 1 return pos

- Line 1 is one op.
- Loop initialization is one op
- Loop termination test is n+1 ops
- The if test is performed n times; each is 2 ops
- Return is one op
- The update of pos is 2 ops but is done ??? times.
- What do we do?

Let `U` be the number of updates done.

- The total number of steps is
`1+1+(n+1)+2n+1+2U = 4+3n+2U`. - The
**best case**(i.e., lowest complexity) occurs when`U=0`(i.e., no numbers are positive) and gives a complexity of 4+3n. - The
**worst case**occurs when`U=n`(i.e., all numbers are positive) and gives a complexity of 4+5n. - To determine the
**average case**result is much harder as it requires knowing the input distribution (i.e., are positive numbers likely) and requires probability theory. - We will primarily study worst case complexity.

Consider a recursive version of innerProduct. If the arrays are of size 1, the answer is clearly A[0]B[0]. If n>1, we recursively get the inner product of the first n-1 terms and then add in the last term.

Algorithm innerProductRecursive Input: Positive integer n and two integer arrays A and B of size n. Output: The inner product of the two arrays if n=1 then return A[0]B[0] return innerProductRecursive(n-1,A,B) + A[n-1]B[n-1]

How many steps does the algorithm require? Let T(n) be the number of steps required.

- If n=1 we do a comparison, two (array) fetches, a product, and a return.
- So T(1)=5.
- If n>1, we do a comparison, a subtraction, a method call, the recursive computation, two fetches, a product, a sum and a return.
- So T(n) = 1 + 1 + 1 + T(n-1) + 2 + 1 + 1 + 1 = T(n-1)+8.
- This is called a
**recurrence equation**. In general these are quite difficult to solve in**closed form**, i.e. without T on the right hand side. - For this simple recurrence, one can see that T(n)=8n-3 is the solution.
- We will learn more about recurrences later.

**Problem Set** #1, Problem 1.

The problem set will be officially assigned a little later, but the first
problem in the set is R-1.27.