class: center, middle, title-slide # CSCI-UA 102 ## Data Structures
## Introduction to the Class .author[ Instructor: Joanna Klukowska
] .license[ Copyright 2020 Joanna Klukowska. Unless noted otherwise all content is released under a
[Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/).
Background image by Stewart Weiss
] --- layout:true template: default name: section class: inverse, middle, center --- layout:true template: default name: poll class: inverse, middle ## Poll
--- layout:true template: default name: breakout class: breakout --- layout:true template:default name:slide class: slide .bottom-left[© Joanna Klukowska. CC-BY-SA.] --- template: section # The Course --- template: slide ## The Instructional Stuff __Instructor:__ - Joanna Klukowska -- __Section Leaders:__ - Kevin Peter - Vincent Xu -- __1 Course Assistants__ -- __4 Graders__ -- __6 Tutors__ --- ## About This Course - NYU Brightspace: https://brightspace.nyu.edu - course info and syllabus - links to slides, labs, assignments, readings, resources - Zoom links for any remote meetings - your grades and course progress - Recitations (required): - you should attend the recitation for which you are registered - if you need to miss your scheduled recitation, contact your section leader, make sure to complete a weekly lab and plan to attend office hours and tutoring hours to resolve any questions you may have - in some cases, you may be able to attend an alternative recitation session on the same day, but check with the instructors first - Course tools: - [Brightspace]( https://brightspace.nyu.edu) - NYU's learning management system (LMS) - [Ed Discussion](https://us.edstem.org/) - our discussion forum and tool for ALL course related communications - [Ed Workspaces and Lessons](https://us.edstem.org/) - tools for collaboration, completing labs and other assessments - [Gradescope](https://www.gradescope.com/) - autograder and submission tool for programming projects - PollEverywhere - regular in-class polls and quizzes - ... --- ## Textbook(s) - there is no one required book, but you do __have to__ use one or two resources to supplement content that we discuss in class - there are recommendations on the syllabus - you can access them in digital format through NYU libraries with no extra fees - keep in mind that no single resource will cover everything that you need, you should plan on using multiple books/resources --- template:poll ### As you start this course (and this semester), what one word pops into your mind? - go to http://pollev.com/joannakl - enter a word (or more) to answer the question above
NOTE: if you enter a phrase consisting of multiple words, the words will be used individually --- template:slide ## Data Structures and Algorithms ### What are they? -- A __data structure__ is a collection of data items in a memory of a running program that are organized in some fashion that allows items to be stored and retrieved by some fixed methods. An __algorithm__ is a logical sequence of discrete steps that describes a complete solution to a given problem commutable in a finite amount of time and space. -- .left-column2[ .smaller[ Examples of data structures from CSCI.UA 101: - __array__: data items are stored one after another in memory and they can be accessed by their index number, - __stack__: collection of items with access (add/remove operations) only at the _top_ of the stack (think of a stack of plates). ]] .right-column2[ .smaller[ Data structures covered in CSCI.UA 102: - __list__ - __stack__ - __queue__ - __tree__ - __binary search tree__ - __hash table__ - __some graphs__ - ... ]] --- template: section # Why Learn to Write Code If AI Can Do It? --- template:slide ## Understanding Is Fundamental ### .purple[AI Can’t Learn for You] AI can generate code, but it can’t **teach you what the code means**. If you don’t understand *why* a hash table is faster than a linked list for lookups, or what causes a binary search tree to degrade to a linked list, you won’t be able to: - Debug it when something breaks - Adapt it to a new use case - Choose the right data structure for the problem --- ## Coding Is Thinking Through the Problem At Hand **Programming is not just typing - it is thinking.** -- When you write code yourself, you practice: - Breaking down complex problems into smaller ones - Thinking step-by-step - Translating ideas into logic and syntax These skills are central to **software design, algorithms, systems, and research**. -- These skills are also central to **instructing AI to code solutions to more complex problems**. --- ## You Can’t Debug What You Do not Understand Even if AI writes the code, you need to: - Read and understand it - Fix bugs - Test to make sure that the code does what it claims it does - Optimize performance To do that, you must understand how data structures and algorithms work under the hood, and you must know the programming language that you are working with. --- ## Foundational Skills Enable Advanced Learning - You do not learn advanced calculus without first understanding and mastering basics of arithmetic. - You do not write a bestseller novel without first learning how to write letters of the alphabet. - You do not win Tour De France without first learning how to ride a kiddie bike. -- Understanding data structures and basics of programming languages unlocks: - Systems programming - Algorithm design - Machine learning - Cybersecurity - Software architecture - ... and many more Without this foundation, your learning and career options will be limited. -- **After all, AI can do the basic things, and someone with the above skills will need to implement AI systems of tomorrow.** --- ## Employers and Interviews Expect It Many technical interviews still require: - Writing code on the spot - Explaining time and space complexity - Implementing or using basic data structures **In real jobs, you’ll often need to build custom solutions—not just copy-paste from AI. That's because these solutions may be new and unique to a given situation and the AI does not _know_ about them yet.** --- ## AI Is a Great Assistant, Not a Replacement AI helps most **when you already understand the problem**. Think of it like a calculator in math: > It’s powerful, but useless if you don’t know what formula to apply. Use AI to: - Get feedback and ideas - Explore alternatives - Speed up boilerplate coding (but only once you know how to code it with ease) But don’t skip **building core skills**. **Do not ask AI to complete tasks that you would not be able to complete on your own.** --- ## Final Thought > Use AI to *learn*, not to *replace* learning. If you master the fundamentals, AI makes you faster and better. If you don’t, it just makes you faster at being confused and then you become easily replaceable by AI. --- template: section # Why do data structures matter? --- template:default background-image: url("img/back3.jpg") class: middle [**Linus Torvals **](https://en.wikiquote.org/wiki/Linus_Torvalds), 2006: > .Large["I will, in fact, claim that the difference between a bad programmer and a good one is whether he(/she) considers his(/her) code or his(/her) data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships. "] --- name: demo1a ## Demo 1: Create a string with lots of 'a's. .left-column2-larger[ ```Java String generateAString1(int number) { String result = ""; for (int i = 0; i < number; i++) { result += "a"; } return result; } ``` ```Java String generateAString2(int number) { StringBuilder sb = new StringBuilder(); for (int i = 0; i < number; i++) { sb.append('a'); } return sb.toString(); } ``` ] --- template: demo1a name: demo1b .right-column2-smaller[ - When `number` is set to 500,000: - the first function takes about __20,000 milliseconds__ (i.e., about 20 seconds) to execute - the second function takes only __~2.5 milliseconds__ (i.e., you do not even notice that any time passed) ] -- .purple[.big[Why such a big difference? ]] -- .below-column2[ - we'll need to look at _how_ the `String` and `StringBuilder` classes are implemented - we need to understand what data structures and algorithms each class uses ] --- template: demo1b .right-column2-large[ BUT WAIT! I have another version: ```Java String generateAString3(int number) { return "a".repeat(number); } ``` ] -- .right-column2-large[ And this takes only __~0.5 milliseconds__. ] --- ## Demo 2: Iterating over a list of numbers .left-column2-large[ Program 1: ```Java LinkedList
numbers = new LinkedList<>(); LinkedList
twiceNumbers = new LinkedList<>(); populateList(numbers, size); for (int i = 0; i < size; i++) { twiceNumbers.addLast(numbers.get(i) * 2 ); } ``` Program 2: ```Java LinkedList
numbers = new LinkedList<>(); LinkedList
twiceNumbers = new LinkedList<>(); populateList(numbers, size); for (int num : numbers ) { twiceNumbers.addLast( num * 2 ); } ``` ] -- .right-column2-small[ In this program we have a list of numbers. We want to iterate over the list, double each number and add it to a new list. What is the difference between these two loops? ---- {{content}} ] -- When `size` is set to 100,000 program 1 takes __3,391 milliseconds__ to execute and program 2 finishes in __~4 milliseconds__. --- ## Demo 3, part 1: ### Creating and searching in collections of `Circle` objects .left-column2-large[ Program 1: store `Circle` objects as a tree ```Java Collection
tree = new TreeSet<>(); //create lots of Circle objects in this collection createCircles( tree, size); ``` Program 2: store `Circle` objects as a list ```Java Collection
list = new LinkedList<>(); //create lots of Circle objects in this collection createCircles( list, size); ``` .smaller[ Common to both: This function creates `size` many `Circle` objects and adds them all to the specified collection. ```Java public static void createCircles(Collection
col, int size) { for (int i = 0; i < size; i++) { col.add(new Circle((int)(Math.random()*Integer.MAX_VALUE)) ); } } ``` ] ] -- .right-column2-small[ In this program we first create a tree and a list of `Circle` objects. How does the _shape_ of the collection going to affect the running time? ---- {{content}} ] -- When `size` is set to 1,000,000 program 1 takes __468 milliseconds__ to execute and program 2 finishes in __161 milliseconds__. --- ## Demo 3, part 2: ### Creating and searching in collections of `Circle` objects .left-column2-large[ Program 1: ```Java Collection
tree = new TreeSet<>(); //create lots of Circle objects createCircles( tree, size); //search for a specific circle Circle c = new Circle(102); tree.contains(c); ``` Program 2: ```Java Collection
list = new LinkedList<>(); //create lots of Circle objects createCircles( list, size); //search for a specific circle Circle c = new Circle(102); list.contains(c); ``` ] -- .right-column2-small[ Once the two collections of `Circle` objects are created, we want to find out if a circle with radius 102 got created. ---- {{content}} ] -- When `size` is set to 1,000,000 program 1 takes __0.01 milliseconds__ to search and program 2 takes __5 milliseconds__ to search. --- ## Demo 4: Calculating Fibonacci Numbers .left-column2[ Calculate the N'th Fibonacci number recursively: ```Java public static long fibonacci1 ( long number ) { //base case if (number == 0 ) return 0; else if (number == 1 ) return 1; //recursive case else return fibonacci1(number - 1) + fibonacci1(number - 2); } ``` ] .right-column2[ Calculate the N'th Fibonacci number iteratively: ```Java public static long fibonacci2 ( long number ) { //base case if (number == 0 ) return 0; else if (number == 1 ) return 1; //iteratively compute the result else { long tmp1 = 0; long tmp2 = 1; long result = 0; int counter = 2; while ( counter <= number) { result = tmp1 + tmp2; tmp1 = tmp2; tmp2 = result; counter++; } return result; } } ``` ] -- .smaller[ - When we try to compute the 40
th
Fibonacci number: - the code on the left takes __430 milliseconds__ - the code on the right takes __~0.003 milliseconds__ - BUT, the code on the left appears so much simpler ] --- ## Technical Interviews And if Linus Torvalds and my demos did not convince you that understanding data structures is important and exciting, there are always technical job interviews: -- - most technical interview questions are based on data structures and algorithms - examples: - design a min-stack that performs typical stack operations and is able to retrieve the smallest element at any time quickly - describe an algorithm that can parse through a mathematical expression and validate if all brackets and parenthesis are in a correct order and matched - find all anagrams of a given string that are also words in the given dictionary - determine if a given binary tree is a search tree -- - why? -- .left-column2[ - because for a software developer / programmer / computer scientist data structures and algorithms are essential tools (just like a wrench is for a plumber; would you hire a plumber without a proper toolbox? ) ] .right-column2[
.small[
Plumber in a green uniform working on a sink. (official US Government [EPA] produced image, public domain.) Source: https://openclipart.org/detail/195900/friendly-plumber
]
] --- template: breakout ### Group Discussion: Why do we care so much about programs being fast? - Turn to the people around you and introduce yourself. - In groups of 2-4 people discuss why you think it is important for the programs to run fast. - You should think about it from two different perspectives: - why do **you** want a program on your computer to work fast? - why is it beneficial to the **society** if the programs run fast? (this one is a bit more challenging) - After ~5 minutes, some groups will get a chance to report on what they came up with. --- template: section # How to succeed? --- ## Tips for the Course - __Understand the concepts and reasons for doing things__. Do not use a particular solution/approach just becase a tutor or TA (or even the instructor) said it was a good idea. Do not memorize things if you do not know what they are doing. Question things! Be curious! -- - __Use pen and paper__ (or the modern equivalents). Draw things out and experiment with the ideas. It often helps to draw out variables and objects of a piece of code that you are analyzing and update the picture as you go through the code. This type of hand-eye interaction stimulates different parts of your brain and helps you learn! -- - __Try solving problems yourself__. I mean really work them out BEFORE asking for a solution, looking at a solution posted by others or trying to search the web. -- - __Write code solutions yourself__. Don't just read the code or watch the video showing the code. Don't copy and paste, type it up yourself and then compare it to the code we wrote in class or the code that is in a book. -- - __Treat mistakes and bugs as your friends__. The more problems you encounter, the easier it is to recognize them in the future. Your brain actually rewires itself when you make a mistake and fix it - this is how learning happens. -- - __Make friends__. Make sure that you meet at least a few other people in the class and get to know them. These friendships will benefit you and other people in the class, and, in some case, may last a lifetime. --- template: section # Course Syllabus