Module #6

Lists are the primary 'sequence object' in Python which allows you to store an almost infinite number of values within a single identifier. This module introduces you to the basics of using Lists in Python as well as a number of different kinds of problems that can be easily solved using a variety of list-specific functions and methods. Lastly, the module will introduce additional data times: tuples and sets.


List Basics

Sequence Objects

The programs that we have written so far have been designed to operate using textual data (strings), logical data (booleans) and numeric data (integers and floating Point numbers). These data types have the ability to represent and store one piece of data at a time - for example:

x = 5			# integer
y = 5.0			# floating point number
z = 'hello'		# string
q = True		# boolean

However, there are times when we need to write a program that keeps track of many values at the same time. For example, if we wanted to write a program to keep track of final exam scores for a group of 50 students in a class we would need to create 50 different variables, like this:

test_01 = 95.45
test_02 = 89.35
test_03 = 76.43
...
...
test_50 = 97.11
One way to solve this problem is to use a "sequence" data type, which has the ability to hold multiple values within the same identifier. In many programming languages we call these "arrays," but in Python we refer to this data type as a list.

List Basics

An easy way to think about a list in Python is to think about how a book operates in the real world. A book is a single object (i.e. "Harry Potter and the Chamber of Secrets") that can contain any number of sub-items (i.e. pages).

You can create a list in Python by using square brackets, which almost look like the covers of a book in the real world. For example:

my_book = ["Page 1", "Page 2", "Page 3"]
The above code will create a new list in Python that holds three strings – "Page 1", "Page 2" and "Page 3" – in that order.

Lists can contain any data type that we have covered so far - for example:

my_list = [100, 200, 300]

Lists can also mix data types.

my_list = ['Craig', 5.0, True, 67]

You can print the value of a list using the print function. This will print all of the values stored in the list in the order in which they are represented:

my_list = ["a", "b", "c"]
print (my_list)

Just like with a string, you can use the repetition operation (*) to ask Python to repeat a list. For example:

my_list = [1, 2] * 3
print (my_list)

#>> [1, 2, 1, 2, 1, 2]

You can use the concatenation operation (+) to ask Python to combine lists, much like how you would combine two strings. For example:

my_list = [1, 2] + [99, 100]
print (my_list)

# >> [1, 2, 99, 100]

Indexing List Elements

In a book, you can reference a page by its page number and in a list you can reference an element stored in a list using its index. Indexes are integer values that represent the position of an item within a list. Indexes always start at zero (the same way a string index begins at zero). For example:

my_list = ["Apple", "Pear", "Peach"]
print (my_list[0])

# >> Apple
You will raise an exception if you attempt to access an element outside the range of a list. For example:
my_list = ["Apple", "Pear", "Peach"]
print (my_list[4]) # Index doesn’t exist!

Lists are "mutable" data types, which means that they can be changed once they have been created (unlike strings). If you know the index of an item you wish to change, you can simply use the assignment operator to update the value of the item at that position in the list. For example:

my_list = [1, 2, 3]
print (my_list)
# >> [1,2,3]

my_list[0] = 99
print (my_list)
# >> [99,2,3]

Sample program: This program demonstrates list creation, repetition, concatenation and accessing individual elements within a list.



Working with Lists

Iterating over a List

When working with lists you will often need to access many or all of the elements within a list to solve a certain problem. For example, imagine that you had the following list of price values:

prices = [1.10, 0.99, 5.75]

If you wanted to compute 7% sales tax on each price you would need to access each item in the list and multiply it by 0.07. For a list with three elements this is pretty easy:

tax_0 = prices[0] * 0.07
tax_1 = prices[1] * 0.07
tax_2 = prices[2] * 0.07

However, as your lists become larger and larger this technique will become unmanagable (imagine you had 1,000 prices in the list!) -- the solution to this problem is to use a repetition structure to iterate over the contents of the list.

The simplest way to iterate over a list is to use a for loop to iterate over all items in the list. When you do this, the target variable of your loop assumes each value of each element of the list in order.

Sample Program: The program below demonstrates how to quickly iterate over a list using a for loop.


Programming Challenge: Given the list below, write a program that counts the # of A’s (scores between 90 and 100). Extension: Count the # of B’s, C’s, D’s and F’s. Click the "Run" button to check your work, and click here to download the solution.


As you can see, a for loop is a convenient way to sequentially iterate through a list. The target variable in a for loop assumes the value of the current item in the list as you iterate. However, the target variable isn’t very helpful if you want to change the value of an item in a list since it just represents a copy of the data. For example:

Sample Program: The list below remains unchanged because we are not modifying the values stored in the list.


In order to change a list item you need to know the index of the item you wish to change. You can then use that index value to change an item at a given position in the list. For example:

Sample Program: The list below does change because we are using index notation to change a value at a particular position in the list.


There are two main techniques for iterating over the index values of a list:

  • Setting up a counter variable outside the list and continually updating the variable as you move to the next position in the list
  • Using the range function to create a custom range that represents the size of your list

If you set up an accumulator variable outside of your loop, you can use it to keep track of where you are in a list. For example:

Sample Program: Using an accumulator variable to keep track of our current position in a list.


To improve upon this example, we can use the len function to determine the size of our list (rather than just hard coding our loop to end after 3 iterations). The len function can take a list as an argument and will return the integer value of the size of the list. Example:

Sample Program: Using the len function to count the number of elements in a list and then using that result to control our loop.


You can also use the range function to construct a custom range that represents all of the indexes in a list. This technique can be a bit cleaner to implement since you don't need to worry about setting up and maintaining a counter variable:

Sample Program: Use the range function to create a custom range that represents all valid positions in a list.


Programming Challenge: Given the following list of student test scores, apply a class "curve" to each score. The class curve is as follows:

  • 90 or above: no curve
  • 80 to 90: +2 points
  • 70 to 80: +5 points
  • Lower than 70: +8 points

Click the "Run" button to check your work, and click here to download the solution.

Creating Lists

You can create an empty list with no elements using the following syntax:

mylist = []

With an empty list, you cannot place content into the list using index notation since there are no "slots" available to be used in the list. You can, however, append values to the list using the concatenation operator, like this:

mylist = []
mylist += ["hello"]
mylist += ["world"]
print (mylist)

# >> ["hello","world"]

Since you cannot access an element outside of the range of a list it is sometimes necessary to set up a correctly sized list before you begin working with it. For example:

# create a list of 7 zeros
daily_sales = [0] * 7

Programming Challenge: Write a program that asks the user for daily sales figures for a full week (Sunday – Saturday). Store these values in a list and print them out at the end of your program. Here's a sample running of your program:

Enter sales for Day #1: 100
Enter sales for Day #2: 200
Enter sales for Day #3: 300
Enter sales for Day #4: 400
Enter sales for Day #5: 500
Enter sales for Day #6: 600
Enter sales for Day #7: 700

Sales for the week:  [100,200,300,400,500,600,700]

Click the "Run" button to check your work, and click here to download the solution.


Slicing Lists

Sometimes you need to extract multiple items from a list. Python contains some built in functions that make it easy for you to “slice” out a portion of a list. The syntax for list slicing is identical to the syntax for string slicing. To slice a list you use a series of "slice indexes" to tell Python which elements you want to extract. Example:

new_list = old_list[start:end]

Python will copy out the elements from the list on the right side of the assignment operator based on the start and end indexes provided. It will then return the result set as a new list. Note that slice indexes work just like the range function – you will grab items up until the end index, but you will not grab the end index itself. Here's an example:

list_1 = ['zero', 'one', 'two', 'three', 'four', 'five’]
list_2 = list_1[1:3]
print (list_1)
print (list_2)

# >> ['zero', 'one', 'two', 'three', 'four', 'five’]
# >> ['one', 'two']

If you omit the start_index in a slice operation, Python will start at the first element of the list. If you omit the end_index in a slice operation, Python will go until the last element of the list. If you supply a third index, Python will assume you want to use a step value. This works the same as the step value you would pass to the range function

Programming Challenge: Given the following list, write a program that does the following:

  • Extract the first 3 elements of the list into a new list
  • Extract the characters b, c, and d into a new list
  • Extract the last 4 characters into a new list

Click the "Run" button to check your work, and click here to download the solution.


Finding items in a list

You can easily test to see if a particular item is in a list by using the in operator. Here’s an example:

my_list = ['pie', 'cake', 'pizza']

if 'cake' in my_list:
	print ("I found cake!")
else:
	print ("No cake found.")

The in operator lets you search for any item in a list. It will return a Boolean value that indicates whether the item exists somewhere in the list.

Programming Challenge: Given the following lists, write a program that lets the user type in the name of a product. If the product name exists in our inventory, you should print out that it is in our inventory. Otherwise you should print out that the product is not found. Ensure that your program is case insensitive (i.e. searches for "Apple" or "apple" or "APPLE" should all succeed). Click the "Run" button to check your work, and click here to download the solution.


Programming Challenge: Given these two lists, write a program that finds all elements that exist in both lists (i.e. the integer 2 exists in both lists). Store your results in a list and print it out to the user. The expected answer is:

[1, 2, 3]

Click the "Run" button to check your work, and click here to download the solution.



List Functions

Appending Items to a List

You have already seen a few ways in which you can add items to lists:

  • Repeat the list using the * operator
  • Concatenate the list using the + operator

Another way to add items to a list is to use the append method. The append method is a function that is built into the list datatype. It allows you to add items to the end of a list. Example:

mylist = ['Christine', 'Jasmine', 'Renee']
mylist.append('Kate')
print (mylist)

Programming Challenge: Write a program that continually prompts a user to enter in a series of first names. The user can elect to stop entering names when they supply the string "end." Store these first names in a list and print them out at the end of your program. Extension: Prevent the user from entering duplicate names (hint: use the in operator). Click the "Run" button to check your work, and click here to download the solution.


Removing Items from a List

You can remove an item from a list by using the remove method. Here’s an example:

prices = [3.99, 2.99, 1.99]
prices.remove(2.99)
print (prices)

Note that you will raise an exception if you try and remove an item that is not in the list. In order to avoid this, you should make sure to test to see if it is in the list first using the in operator, or use a try / except block to catch any errors you might raise.

Programming Challenge: Continually ask the user for a product name. Next, see if that product name is included in the inventory list below. If it is, remove the product from the list and then print the current list of products to the user. If the product is not on the list you should alert the user that we do not currently carry the product in question. You can end the program when the list of products is exhausted or when the user types the string "end." Click the "Run" button to check your work, and click here to download the solution.


Sometimes you want to delete an item from a particular position in a list. You can do this using the del keyword. For example, say you wanted to delete the item in position #0 in the following list:

Sample Program: Using the del keyword to remove an item from a particular position in a list.


Re-ordering List Items

You can have Python sort items in a list using the sort method. Here’s an example:

Sample Program: This program sorts a list in ascending alphabetical order.


Python can also reverse the items in a list using the reverse method. This method will swap the elements in the list from beginning to end (i.e. [1,2,3] becomes [3,2,1]) - note that this method does not sort the list at all. It simply reverses the values in the list. Here's an example:

Sample Program: This program reverses a list.


Getting the Location of an Item in a List

You can use the index method to ask Python to tell you the index of where an item can be found in a list. The index method takes one argument – an element to search for – and returns an integer value of where that element can be found. Caution: The index method will throw an exception if it cannot find the item in the list. Here’s an example:

Sample Program: Demonstration of how to find the location of an item in a list using the index method.


Programming Challenge: The lists below are organized in such a way that the item at position 0 in the first list matches with the item at position 0 in the second list. With this in mind, write a product price lookup program that works as follows:

Enter a product:  peanut butter
This product costs 3.99

Click the "Run" button to check your work, and click here to download the solution.


Getting the Largest and Smallest Items in a List

Python has two built-in functions that let you get the highest and lowest values in a list. They are called min and max – here’s an example:

Sample Program: Demonstration of how to find the largest and smallest items in a list.


Type Annotations for Lists, Default Values

To add a type annotation for lists, you'll have to bring in the List type by importing the typing module:

from typing import List

After you've imported the type, List, you can use it like you would with any other type annotations:

planets: List = ['mercury', 'venus', 'earth', 'mars']

It's common to store the same types of values within a list. For example, you may want to store a bunch of floats to represent the values coming from an ambient light sensor on your phone... or you may want a collection of strings representing all of the verbs in a text. In fact, types analogous to lists in some other languages restrict those types so that they can only contain on kind of value.

If your intent is store only one type of value in your list, you may also want to specify that in your type annotation. This can be done by placing square brackets ([ and ], like indexing) after List, and inserting the type name between the brackets. Here's a list that stores only integers:

nums: List[int] = [1, 2, 3, 4]

PyCharm's type checker will warn you if you have an item that doesn't conform to your list type annotation:

nums: List[int] = ['a', 'b']
# this will show as a type error

Mixed types should also trigger an error above, but it may depend on which type checker you're using. At the time this writing, PyCharm's type checker does not correctly identify this as an issue.

⚠️ If you're using a list as a parameter, do not give it a default value!

Here's an example... imagine that we have a function that accepts a list as an argument, but we default the argument to an empty list in case the function is called without a value passed in. What do you think will be printed out?

Sample Program: We have a function that uses a list, a mutable type, as a default value for a parameter. What do you think the output will be?



What 🤯? Why does this happen? Python doesn't actually reset the default value on each call of the function. It's set just once, and when it's changed, it accumulates those changes. So, DO NOT USE A LIST AS A DEFAULT VALUE FOR A FUNCTION PARAMETER.


Tuples

Overview

A tuple is a sequence type (just like lists and strings). That is, it's a compound data type (a type that is made up of other values), and it's composed of an ordered sequence of values. Like lists, they're a collection of any kind of values (the values that make up a tuple can be any type). With that said, though, there are a few major differences between tuples and other sequence types:

  1. syntax
  2. mutability
  3. semantics

Syntax, Operators/Functions, and Type Annotations

To create a tuple, you simply need comma separated values. This is a tuple:

t = 1, 2, 3
print(type(t))
# prints out type, tuple!

The commas used in a function definition's parameters do not create a tuple (they are there only to specify separate parameters). If there's ambiguity in the intent of using commas, then a tuple should be surrounded with parentheses. For example, if a tuple were used as an argument in a function call, then it has to be within parentheses so that the tuple is taken as a single argument rather than each element being used as a separate argument:

# this prints out 3 integers, 1, 2 and 3.
print(1, 2, 3)  

# but... what if we wanted to print out the tuple: 1, 2, 3
# surround it with parentheses!
print((1, 2, 3))

A tuple can also be created from other sequence types... that is, you can convert a list or a string into a tuple. Just like type conversion we've seen before, use a function named after the type we would like to convert to. In this case: tuple.

number_list = [1, 2, 3]
t = tuple(number_list)  # voila, new tuple!
print(t)  # prints out (1, 2, 3)
print(type(t))  # prints out <class 'tuple'>

Because tuples are sequences, operators and functions that you've seen used on strings and lists can also be used with tuples. The operations and functions include:

  1. indexing and slicing
  2. concatenation and repetition
  3. looping, membership and length
  4. comparisons

Sample Program: A tuple is a sequence type, so common sequence operators and functions will work on tuples too! Note that the online editor will not surround tuples with parentheses when printed out.



Finally, if you'd like to specify that a variable is a tuple, you'll have to bring in the typing module so that you can use Tuple as a type annotation. Just like using List, you can specify the types within your tuple. There are a couple of details that make type annotations for tuples a little tricky:

  1. the tuple must be surrounded by parentheses (this language syntax design decision was to help avoid ambiguity in syntax)
  2. the type of each element in a tuple must also be declared (unlike lists, tuples are immutable, and consequently a tuples length is constant...and each element's type can be specified)
t: Tuple[int, int, int] = (1, 2, 3)

From this, we can see that tuples are clearly syntactically different from other sequence types -- lists and strings.

Mutability

As mentioned above, tuples are immutable. This makes them similar to strings: they cannot be changed. Another way of thinking about a tuple is that it's like an immutable list.

This means that you can't:

  1. add elements to a tuple
  2. remove elements from a tuple
  3. change a tuple's values (well, this depends!)

⚠️ You will get a runtime error if you try to change a tuple. All of the following will cause an exception:

numbers = 1, 2, 3

# AttributeError (tuples don't have an append method)
numbers.append(4) 

# TypeError (items in tuples can't be deleted)
del numbers[0]

# an item in a tuple can't be reassigned
numbers[0] = 'can i change this?'

There's one catch, though. If an element in a tuple is mutable, then that element can be changed... but the tuple itself cannot. For example, if one of the elements in a tuple is a list, that list can be modified... but it cannot be replaced entirely:

t = ([1, 2, 3], 4)
t[0][0] = 'omg, changed!' # ok!

# but, we still cannot reassign the first element entirely
# t[0] = 'will not work'

Semantics

When deciding between using a tuple and a list, intention should also be taken into account:

  • tuples are typically used to store a collection of heterogeneous values (that is, the values are different from one another, at least in terms of semantics)... lists on the other hand are meant to be a homogeneous collection of values (that is, the items are the same as in meaning)
  • for example, we may have a list of ints [80, 90, 87]... which is literally just a list of ints (they're all the same thing - maybe test scores or temperatures)
  • but we may have a date represented by a tuple (1969, 7, 20) ... it looks like just a bunch of numbers, but each "slot" has a meaning: year month date… or ("Frankenstein", 1818, "Mary", "Shelley")
  • lists are more likely to be iterated over... whereas a tuple is more likely to be indexed into to use individual parts of the tuple

These are just semantics - the language has no way of enforcing this usage (though other languages do - for example, in Java, the data structures that are analogous to Python lists must contain elements all of the same type). comparisons

Summary of Tuples Compared to Other Sequence Types

Based on the material above, we can see that tuples, while similar to sequence types through shared operations and functions, are actually quite different from lists and strings:

  1. syntactically, a tuple is represented by comma separated values, and -- when appropriate -- parentheses
  2. unlike a list, a tuple is immutable; it cannot be changed
  3. in terms of semantics, tuples are typically meant to represent a collection of unlike elements to be used individually

Programming Challenge: Now that you know about some basic tuple syntax and behavior, write a program that uses tuples:

  1. create a tuple containing three values (these can be any values that you like) and assign it to a variable, t
  2. create a new variable that contains the tuple t repeated twice, call the variable repeated_t
  3. create a new variable that contains repeated_t added to the tuple, ('penultimate', 'ultimate'); name the variable added_t
  4. print out the last 4 elements of added_t by retrieving a "sub" tuple (hint: slice!)
  5. print out the first element of added_t
  6. print out the number of elements contained in added_t
  7. print out the type of added_t

Use tuple operators or functions to do this! See the expected output below (again, note that the online editor may have slightly different output... such as not wrapping tuples with parentheses and using type instead of class when printing out the type of a value).

# if your tuple contained the values 1, 2, and 3...
(2, 3, 'penultimate', 'ultimate')
1
8
class 'tuple'

Click the "Run" button below to test your program. You can download the solution to this problem by clicking here.



Methods

Tuples only support two methods:

  • count(val) - returns the number of times val occurs in the tuple
  • index(val) - returns the index of the first occurrence of val within the tuple (ValueError occurs if val is not in the tuple)

Unpacking

Sequence types can be unpacked. That is, if you have multiple comma separated variable names on the left-hand side and a sequence type on the right-hand side, then each variable name will be bound to the element that matches positionally in the sequence.

city, state = ['brooklyn', 'ny']
print(city, state)
If the number of variable names and number elements do not match, a runtime error will occur.

You can unpack any sequence, but due to semantics (tuples are meant to pack different values together that you'll later use individually), you'll often see unpacking used with tuples often:

t = (7, 20, 1969)
m, d, y = t

We've actually seen tuple unpacking before! Remember multiple assignment 🤔?

x, y = 0, 0

Yup! That's a tuple on the right-hand side. Also, remember returning multiple values from a function? Also tuple unpacking:

def f():
    return 0, 0 
x, y = f()

One other place that we'll see unpacking is within a for loop. If a list of tuples is iterated over, the loop variable can be unpacked into individual loop variables:

names = [('ang', 'alice'), ('benson', 'bob')]
for last, first in names:
   n = first + ' ' + last
   print(n)

Just like regular tuple unpacking, in order for this to work, the number of elements in each tuple within the object being iterated over must be the same... and must match the number of loop variables.

This comes in handy when you need to iterate over a list and get a list's element as well as index. You'll have to use the built-in function enumerate to get a list-like object composed of tuples to do this. Each tuple contains the index and value of every element in the original list.

result = enumerate(['alice', 'bob', 'carol'])
# result is _like_ [(0, 'alice'), (1, 'bob'), (2, 'carol')]

When you loop over an enumerated list, you can unpack each tuple into separate loop variables: the index and the value.

Sample Program: This program will print out the index and value of every element in the list. It uses the built-in function enumerate to transform a list into an iterable object composed of tuples... which are unpacked in the for loop.



Programming Challenge: Given the list provided, numbers, double every number in the list. The enumerate function and tuple unpacking must be used in the solution. The original list must be modified. Here's what the resulting output should be.

[2, 200, 2000]

Click the "Run" button below to test your program. You can download the solution to this problem by clicking here.



As *args

Finally ... remember *args for variable number of arguments? What does that actually do? Well, it takes all of the arguments passed in, and it puts it into a single parameter called args. The type of args is actually a tuple!

Sample Program: Let's look at what *args does in more detail...




Sets

Overview

A set is "an unordered collection of distinct hashable objects". Let's dissect this definition:

  1. "an unordered collection" ... This means that a set is a compound type so it's capable of containing multiple values, like lists, strings, and tuples. However, unlike lists, strings, and tuples, a set is unordered... so a set is not a sequence type. Consequently, some set operators and functions behave differently from those that you may be familiar with from sequences (for example, sets do not support repetition, *)
  2. "distinct" ... Sets do contain duplicates, so any operations or functions used with them - from set creation to set intersection, results in set of unique elements
  3. "hashable objects" ... This basically means that the values in sets must be immutable

One last thing not apparent in the definition... sets are mutable, that is they can be modified (elements can be added, removed, changed).

So, in Python, sets are:

  1. unordered
  2. distinct
  3. mutable, but elements contained must be immutable
Throughout the module, we'll be printing out sets. However, the example output may not match up exactly with the output that you get when running the same code. This is because sets are unordered, and consistent ordering is not guaranteed!

Syntax and Creation

To create a set:

  • start with curly braces, and then with the curly braces, place values... comma separated if more than one
    s1 = {1, 2, 3}
    s2 = {1}
    print(s1, s2) # {1, 2, 3} {1}
    print(type(s1), type(s2)) # <class 'set'> <class 'set'>
  • an alternative is to use the set function; pass in an iterable object, such as a list, string, or tuple, as the argument ... and each element will become a distinct element in the set
    s1 = set([1, 2, 3])
    s2 = set('abc')
    print(s1, s2) # {1, 2, 3} {'a', 'b', 'c'}
  • an empty set can be created by calling the set function without any arguments... ⚠️ do not use two curly braces without elements as an empty set - it actually represents a different empty data type (a dictionary)!
    empty = set()

Regardless of how you create a set, your set will be composed of distinct elements - that is, there will be no duplicates in your set:

s1 = {1, 1, 1, 1, 1}
print(s1) # {1}
s2 = set("settttttttt!!!!!")
print(s2) # {'s', 'e', 't', '!'}

When working with sets (creating, adding elements, etc.) the elements that make up the set must be immutable. If you try to use a mutable type with a set, you'll get a runtime error ☹️:

uh_oh = {1, 2, []} # TypeError!

Operations and Basic Methods

Python sets support the following methods / operators:

  • union, | - returns new set composed of all elements in one combined with elements from another set(s)
  • intersection, & - returns new set composed of all elements that are common between one set and another set(s)
  • difference, - - returns a new set composed of elements that are in one set but not in another
  • isubset/issuperset, <=/>= - determines if one set is either a subset or superset of another set

Note that these operations can be used by working with operators or by calling methods. Let's take a look at operators first. This demonstrates three basic set operators: | (union), & (intersection), - (difference), <= (subset), and >= (superset)...

s2 = {3, 4, 5, 6}
print(s1 | s2) # union: {1, 2, 3, 4, 5, 6}
print(s1 & s2) # intersection: {3, 4}
print(s1 - s2) # difference: {1, 2}
print({1, 3} <= s1) # is subset: True
print({1, 2, 3, 100} >= s1) # is superset: False

Both the <= (subset) and >= (superset) operators also return true if both operands are the same. If you want a "proper" subset (that is, if the sets are equal, then they cannot be a superset or subset of each other), use < and >.

The union, intersection and difference operators can be chained (note that union and intersection are commutative; the order of the operands does not matter):

print({1} | {2, 3} | {4} | {5} | {6})  # {1, 2, 3, 4, 5, 6}
print({2} & {1, 2, 3} & {2, 3, 4, 5} & {1, 2}) # {2}
print({1, 2, 3, 4}  - {3, 4} - {2}) # {1}

All of these operators must have a set for each operand... otherwise, a runtime error will occur:

{1, 2, 3} | [4, 5] # TypeError!

Python also supports these operations as methods. Each operation has a corresponding method named after the operation, like .union or .intersection. When using these methods, a new set is returned. Additionally, these methods allow for iterable types other than sets to be used as the "other operand".

The following code shows that set operations can also be used by calling methods named after the operation they perform: .union, .intersection, .difference, .issubset, and .issuperset. Each method either creates a new set or -- in the case of .issubset and issuperset -- returns a boolean. Both the original set and the argument are not modified.

s1 = {1, 2, 3, 4}
s2 = {3, 4, 5, 6}

print(s1.union(s2))  # {1, 2, 3, 4, 5, 6}
print(s1.intersection(s2)) # {3, 4}
print(s1.difference(s2)) # {1, 2}
print(s1.issubset({1, 2, 3, 4, 5}))

# note that iterables can be passed in as well!
print(s1.union('abc')) # {1, 2, 3, 4, 'a', 'b', 'c'}
print(s1.issuperset([1, 2, 2, 2])) #  True

More Operators

As we saw above, sets have operators and methods that are different from other sequence types. However, because it is a collection of elements, there are still some similarities: you can still retrieve the number of elements in a set (len), test for membership in a set (in, not in), and, lastly, iterate over a set (with for) - just like lists, strings and tuples.

Sample Program: The program below demonstrates in, len and for loops used with sets.



Modifying a Set

Even though the elements in a set are immutable, a set itself is mutable; it can be changed! Sets can be changed by using "augmented assignment operators" (much like incrementing with +=), and there are also several methods that can be called to change a set.

Augmented assignment operators

|=, &=, and -= can all be used to modify a set through union, intersection and difference. They all modify the set on the left hand side of the augmented assignment operator used:

s = {1, 2, 3}
s |= {4, 5}
print(s) # {1, 2, 3, 4, 5}
s &= {3, 4, 5, 6, 7}
print(s) # {3, 4, 5}
s -= {3, 4}
print(s) # {5}

Methods

There are several methods that can be called on sets that add, remove or change elements.

  1. Adding
    • .add(ele) - add a new element, ele, to the set; does not return a value
  2. Removing
    • .remove(ele) - removes element, ele, from the set; does not return a value... error if ele isn't in the set
    • .discard(ele) - removes element, ele, from the set; does not return a value... no error if ele isn't found
    • .pop() - remove random element from set and return it... error if there are no elements in the list
  3. Modifying
    • .update(update_set) - updates the elements in the set using the elements in update_set (similar to union); does not return a value

Sample Program: The following code uses add and remove to modify a set.



All Together

Programming Challenge: Write a program that asks the user for 4 words. Then the program should continually ask the user for words that the user remembers; prompting the user to continue after taking in a remembered word. At the end, the program will list out all of the correctly remembered words without duplicates. See the example interaction below (text after a colon, :, is entered by the user).

Give me 4 words
Word 1, plz: foo
Word 2, plz: foo
Word 3, plz: bar
Word 4, plz: baz
How many words can you remember?
Give me a word you remember:baz
Do you remember any more words (Y to give more words): Y
Give me a word you remember:WAT????
Do you remember any more words (Y to give more words): Y
Give me a word you remember:foo
Do you remember any more words (Y to give more words): N
These are the words your remembered correctly:
* foo
* baz

Click the "Run" button below to test your program. You can download the solution to this problem by clicking here.



Immutable Sets

If you want an immutable set, you could use the frozenset type. Immutable sets are created by calling the built-in, frozenset (you can pass in a set... or an iterable... to create the new frozenset). Most regular set operators will work with a frozenset, but methods that mutate the set won't (like add, remove, etc.).

Again, frozensets support regular set operations

s1 = frozenset({1, 2})
s2 = frozenset({2, 3})
print(s1 | s2) # frozenset({1, 2, 3})

But, you can't change them, so they don't have methods like add

s = frozenset([1, 2])
s.add(3) # error!
Oddly, augmented assignment operators (like `|=`) work, but what they're _actually_ doing is creating a new set and binding the variable name to the new set (rather than mutating the original set). So, while behavior seems similar to regular sets, there's actually a big difference behind the scenes.

Motivations

We now know quite a few compound data types: strings, lists, tuples, sets, and even range objects. Why would you use a set over these other types? Use a set when you want to take advantage of its characteristics and behavior:

  1. when you want to create a collection of elements that will never have duplicates
  2. when you want to use common set operations, like union, intersection, and difference

Type Annotations

Type annotations can be added for a set. Just like lists and tuples, you'll have to bring in a type from the typing module... and you can specify the types contained within a set:

from typing import Set
aliens: Set[str] = {'zim', 'alf', 'jet', 'et'}

Bonus: Hashable Objects

The definition of a set included the term hashable objects. What does that mean?

A hash is a function that transforms or maps a value into some other value:

  • the original value can be some arbitrary size (in terms of storage)
  • but the resulting value will be a fixed size (for example, a 32 bit integer)
  • an example may be a function that takes the string "potato" and maps it to the value 123
  • some hash functions may be designed to make it very unlikely that two different input values will cause the same output value (that is, the function is collision resistant)
  • if two inputs are the same, they should produce the same output

The fact that hashing produces the same output if both inputs are the same could be useful when comparing values... especially when comparing values that would be cumbersome to compare otherwise (think of comparing two tuples, each with 100 values vs comparing two numbers).

There's a built in function in Python called, hash... and it will return the hashed value, an integer, of the object passed into it (note that the resulting integer from hashing the same value may differ between program runs):

# the following print statements should produce
# two different integers...
print(hash("potato"))
print(hash(24))

If the hash is the same, then the objects are equal!

meal_1 = "breakfast"
meal_2 = "breakfast"

print(hash(meal_1) == hash(meal_2)) # True
print(meal_1 == meal_2) # True

Objects in Python that do not change (immutable objects) can be hashed. Mutable objects cannot be hashed (if an object is mutable, then using its hash to compare it with something else wouldn't be possible!).

# a list is mutable, so it can't be hashed
hash(['toast', 'jam']) # error!

Aaaaannnd... back to our definition of a set: an unordered collection of hashable objects. That just means that every element in a set must be immutable (otherwise, the element would not be hashable). Why is it important that sets have hashable elements? It allows for quicker look-up of elements in a set:

  • imagine that the elements of a set are stored in buckets, each with a label (this data structure is called a hash table)
  • the label of the bucket is the hash of the element in the bucket
  • to retrieve an element, hash it to create a label
  • ...and retrieve the element at that label

This is a very high-level explanation of how sets work, but it's adequate to show that it's faster than going through every element in the set, and checking to see if it's equal to the element you're trying to find.


List, Set and Dictionary Comprehensions

Note that the video and material below also discusses dictionary comprehensions. We have not worked with dictionaries yet, so those sections can be skipped until the module that covers dictionaries is completed.

Mapping and Filtering

Two operations that are commonly used to create new lists is map and filter

  • map transforms all of the elements in an old list in some way, and it puts all of the resulting new values in a new list.
  • filter creates a new list composed of the elements from an old list that pass a certain test

Sample Program: Map values to a new array: in this example, we adjust a list of test scores so that they're all incremented by 5. This is implemented by using an accumulator, a normal for loop, and incrementing.



Sample Program: Another common task we have is creating a new list out of elements in a list that meet a certain condition. This is called filtering. Here's an example of that, again using an accumulator, regular iteration with a for loop... and an if statement to conditionally transform the original element.

We start off with a list of musicians:

artists = ['billie eilish', 'the cure', 'lil uzi vert', 'roy orbison', 'lil wayne', 'the knife']

The program below picks out a list of all of the artists that have a name that starts with the string "lil"... and puts them into a new list. We can use the same pattern as map, but this time, just add an if statement:



Note that we could also combine both patterns to filter and map at the same time.

List Comprehensions

Map and filter are so common that Python has a concise syntax to create lists by transforming elements and / or filtering elements without having to write 3 or 4 lines of code. Python allows us to create new lists by manipulating elements in an iterable object... all in one line. This language feature is called a list comprehensions - an expression that gives back a new list by mapping and / or filtering values in an existing iterable object (such as a list, range, tuple, set, etc.)

Going back to our original map example pattern with a for loop and accumulator, we can make a generic version:de

iterable_object = ...
accumulator = []
for element in iterable_object:
  accumulator.append(expression that uses element to create new_element)

Where:

  • iterable_object is the value that is looped over
  • accumulator is the variable where we put the new list
  • element is the loop variable
  • expression transforms the loop variable, element, with the result being added to the accumulator

With a list comprehension, we can do the same, but in a single line! Here's the generic version first:

accumulator = [expression for element in iterable_object]

Taken step-by-step:

  1. start a list comprehension by using square brackets:
    • []
  2. define the loop variable and iteration
    • [for element in iterable_object]
  3. write an expression using the loop variable, the result will be added to the accumulator
    • [expression for element in iterable_object]
    • expression may be something like: element * 2
    • [element * 2 for element in iterable_object]

You can also filter elements:

accumulator = [element for element in iterable_object if some_condition]

...where some_condition controls whether or not an element is included in the accumulator. If the condition is True, then add the element. Typically, you'll use the loop variable, element in some_condtion:

accumulator = [element for element in iterable_object if len(element) > 3]

Lastly, you can filter and map in the same list comprehension:

accumulator = [element * 2 for element in iterable_object if len(element) > 3]

Now let's see this in action using our original sample programs:

Sample Program: Convert the two examples from earlier into list comprehensions:

  • take a list of scores and create a new list composed of all of the old scores incremented by 5...
  • take a list of musicians and create a new list composed only of the musicians from the original list that have a name that starts with "lil"

At then end, there's an example of filtering and mapping at the same time... taking all of the musicians named "lil" and converting their name to uppercase.



Programming Challenge: Write code to...

  1. create a list of the square root of all of the numbers 1 through 100 (inclusive)
  2. given this list, greetings = ['hi', 'hello', 'hola'], create a new list based off of greetings where all occurrences of 'h' with 'j' and add a number of exclamation points equal to equal to the number of characters in the string for every element in the original list
  3. create a list of unicode code points of every character in the string 'Made in NY' that's lowercase (note that islower doesn't work in the online editor, so use an IDE or compare unicode code points (97 through 122 are lowercase letters)

Print out each list created. See the expected output below.

[1.0, 1.4142135623730951, 1.7320508075688772, 2.0, … 10.0]
['ji!!', 'jello!!!!!', 'jola!!!!']
[97, 100, 101, 105, 110]

Click the "Run" button below to test your program. You can download the solution to this problem by clicking here.



Set and Dictionary Comprehensions

A similar syntax can be used to create sets and dictionaries. (There are no tuple comprehensions, though!)

For sets, wrap your set comprehension in curly braces instead of square brackets. The code below takes a sentence (from a quora question about sentences with repeated words) and gives a set of the words that exist in the sentence:

  • duplicates are removed because we use a set comprehension
  • lower normalizes casing on the words
  • the conditional filters out articles (a, an, and the) from the resulting set
sentence = "You cannot end a sentence with because because because is a conjunction"
words = {w.lower() for w in sentence.split(' ') if w not in ('a', 'an', 'the')}
print(words)

For a dictionary, also use curly braces. However, the expression in the beginning of the dictionary comprehension must specify both the key and value pairs in the dictionary separated by a colon.

The following code makes a new dictionary with keys that are each letter in the string, "abc"... and the values all initialized to 0:

d = {k: 0 for k in 'abc'}
print(d) # {'a': 0, 'b': 0, 'c': 0}

Quiz

Now that you've completed this module, please visit our NYU Classes site and take the corresponding quiz for this module. These quizzes are worth 5% of your total grade and are a great way to test your Python skills! You may also use the following scratch space to test out any code you want.

Feedback

Tell us what you thought about this module (it's anonymous).
How helpful did you find this module on a scale of 1-5:
Very unhappy face
Unhappy face
Neutral face
Happy face
Very happy face
Which resource(s) in this module did you find the most helpful (check all that apply):

Copyright 2014-2018