Collection¶
People often need to deal with a collection of objects in a consistent way.
Python provides three types of language constructs to support collection operation:
- built-in collection types that can contain multiple elements
- creating new collections from existing ones
- creating classes that contain data collection
Data Types¶
- sequence
str
: an immutable sequence of characters.list
: a mutable sequence of any objects.tuple
: an immutable sequence of objects.- set
set
: a mutable unordered collection of unique objects.- mapping
dict
: a mutable mapping of key and value pairs.
Creating New Collections From Existing Ones¶
Python provides two convenient constructs to create new collections:
- List/set/dictionary comprehension
- Generator expression
Many times, these constructs make your program simple and efficient.
1 String¶
A string is a sequence of characters.
A string is an immutable object.
A programmer creates a string literal by surrounding text with single or double quotes, such as 'MARY'
, "MARY"
, '41'
, or "41"
.
An empty string is a sequence type with 0 elements, created with a pair of single/double quotes. Ex: my_str = ""
or my_str=''
.
Python use backslash \
to escape special characters. For example: "\n"
represents a newline character. It is also used to escape a slash or quotation symbols. For example: "a slash \\ and an escaped \" double quotation mark.
1.1 Built-in String Operations¶
A programmer can access a character at a specific index using alphabet[index]
. index
starts from 0
and ends with len(alphabet) - 1
.
The len()
built-in function can be used to find the length of a string (and any other sequence type).
Use +
to concatenate two strings.
Use in
to determine if a character or a substring exists in a string. It returns True
or False
Use for
loop to iterate every character.
text = "Hello World"
print(len(text)) # 11
print (text + " to every one.") # Hello World to every one.
print("H" in text, "hi" in text) # True False
for char in text:
print(char, end=" ")
# H e l l o W o r l d
1.2 Slicing¶
Python has a special slicing syntax sequence[start:stop:step]
to get a subset of a sequence.
start
: optional, starting index of the slice.stop
: the last index (exclusive) of the slide or the number of items to get. It is optional with a default tolen(sequence)
if thestart
argument is specified.step
: optional, the step value with a default of1
.
You can slice a string to get another string.
You can also use slicing syntax in other sequence types that works similarly.
text = "hello world"
print(text[2]) # regular index: "l"
print(text[2:]) # from index 2 to end: "llo world"
print(text[0:3]) # first three: "hel"
print(text[::4]) # every fourth character: "hor"
print(text[3::2]) # from index 3, every other characters: "l ol"
1.3 Common String Methods¶
Because string is an immutable object, any method that changes the string content will create a new string.
Following are some examples:
hi = "Hi"
text = "hello world"
# find the index of the first occurrence of a substring, -1 if not found
print(text.find('o')) # 4
print(text.find("alice")) # -1
# Lower case, upper case and title case
print(hi.lower()) # hi
print(text.upper()) # HELLO WORLD
print(text.title()) # Hello World
# split a string into a list by the specified separator string
# default is white space
print(text.split()) # ["hello", "world"]
print(text.split("ll")) # ["he", "o world"]
# replace a substr with another substr
print(hi.replace('i', 'a')) # Ha
print(text.replace("world", "alice")) # hello alice
# join a list of strings together with the desired separator string
print(", ".join(["alice", "bob", "cindy"])) # alice, bob, cindy
2 List¶
A list is a sequential container (similar to a string) object that contains a number of elements. The objects in an list are called elements or items of the list.
It is a mutable object whose value is changed in place.
List elements can be in different types, but it is a best practice to put same-type elements into a list.
2.1 Creating A List¶
There are two common ways to create a list.
First, you can create a list literally by listing elements in brackets and separating by commas.
Second, Python has a built-in list()
function that can convert certain types of objects to lists.
# create a list from literals
some_numbers = [3, 5 ,7]
names = ['Alice', 'Bob', 'Cindy']
# elements can be of different types - not recommended
some_data = [3, 'Alice', 12.5]
# use list() function
generated_numbers = list(range(3, 8, 2))
letters = list('abc')
# print can print a list directly
print(generated_numbers)
print(letters)
2.2 Basic Operations¶
A list has a sequence of elements. A basic requirement is to access one element, all elements or some elements.
- one element: index
- all elements: loop
- some elements: slice
2.2.1 Index¶
Each element in a list has an index associated with it, starting from 0
.
The first element has an index of 0
, the second element has an index of 1
, and so on and so forth.
If the index is out of range, Python raises an IndexError
exception.
The last element has an index of the list length minus 1
.
You can use negative index numbers to access elements from the end of the list. For example, -1
identifies the last element in a list, -2
identifies the next to last element, and so on an so forth.
The index syntax is to put an index in a pair of brackets, right after the list variable name.
numbers = [3, 5 ,7]
print(numbers[0], numbers[1], numbers[2])
print(numbers[-1], numbers[-2], numbers[-3])
# oops, IndexError if the index is out of range
print(numbers[5])
2.2.2 Unpack a List¶
You can unpack a tuple and assign its elements to different variables.
You can prefix the last variable with a *
to match multiple elements of a list.
Use _
if you don't need the elements.
numbers = [1, 2, 3, 4, 5]
first, second, *rest = numbers
print(first, second, rest) # 1 2 [3, 4, 5]
first, _, third, *_ = numbers
print(first, third) # 1, 3
first, second, *_ = numbers
print(first, second) # 1, 2
first, *_, last = numbers
print(first, last) # 1, 5
2.2.3 Accessing All Elements¶
You can loop (iterate) over a list to access all its elements.
Both for
and while
loops can be used, but the Pythonic way is using for
because it is less error-prone and simpler than while
loop.
When you need the element index, the Pythonic way is to use built-in enumerate()
functions.
numbers = list(range(3, 8, 2))
# use for
for number in numbers:
doubled = number * 2
print(f'Double element {number} is {doubled}')
# use enumerator if you need the index
for index, number in enumerate(numbers):
print(f'Index: {index}, Value: {number}')
# Not recommended
length = len(numbers)
for index in range(length):
print(f'Index: {index}, Value: {numbers[index]}')
# Not recommended
index = 0
while index < len(numbers):
print(f'Index: {index}, Value: {numbers[index]}')
index += 1
2.2.4 Slicing a List¶
A slice
is a span of items taken from a list. It is used to select some elements from a list. To slice a list, you use the list_name[start : end : step]
to specify the start index and end index of a list. Like the range
syntax, it doesn't include the end
index. Following are some examples:
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
# index from 0 to 5, excluding 5
weekday = days[0:5]
print(f'Weekdays are: {weekday}')
# default start is 0
weekday2 = days[:5]
print(f'Weekdays version 2 are: {weekday}')
weekends = days[5:7]
print(f'Weekends are {weekends}')
# default end is the length
weekends2 = days[5:]
print(f'Weekends version 2 are {weekends}')
odd_days = days[::2]
print(f'Odd days are {odd_days}')
2.3 List and Built-in Operations¶
Python has several built-in operators and functions working with a list.
- Operators
in
: check if an item is a list element.not in
: check if an item is not a list element.+
: combine two lists*
: repeat a list for a number of timesdel
: delete an element from a list- Functions
len
: get the length of a listmin
: get the minimum element of a listmax
: get the maximum element of a listsum
: get the sum of number elements of a list
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
weekday = days[:5]
today = 'Thu'
is_weekday = today in weekday
print(f'Today is {today}. Is weekday: {is_weekday}')
is_weekend = today not in weekday
print(f'Today is {today}. Is weekend: {is_weekend}')
tens = [10, 20, 30]
hundreds = [500, 600, 700]
all = tens + hundreds
print(all)
repeated_tens = tens * 3
print(repeated_tens)
numbers = [3, 5, 7]
length = len(numbers)
smallest = min(numbers)
biggest = max(numbers)
total = sum(numbers)
print(f'Length: {length}, Min: {smallest}, Max: {biggest}, Sum: {total}')
2.4 List Methods¶
Lists have numerous methods that you can use to manipulate a list.
You use list_name.method_name()
to call a method that work on a list.
It is important to differentiate an in-place modification and an operation that returns a new list. Both +
and slicing return a new list.
Method Examples¶
list_name.append(element)
: add an element to the end of the list.list_name.index(element)
: find the first index of an element, raise aValueError
if the item is not found. To avoid exception, useelement in list_name
to check the existence first.list_name.insert(index, element)
: insert an item at the specified index.list_name.sort()
: sort the items in the list.
You can find more list methods in Python List Document.
numbers = [3, 5, 7]
n2 = numbers
n3 = numbers[:2]
numbers.append(42)
print(numbers) # [3, 5, 7, 42]
if (5 in numbers):
print(numbers.index(5)) # 1
numbers.insert(1, 50)
print(numbers) # [3, 50, 5, 7, 42]
numbers.sort()
print(numbers) # [3, 5, 7, 42, 50]
print(n2)
print(n3)
2.5 Nested List¶
A list can have other lists as its elements. There is nothing special for nested lists, you just use the index to access each element in a list. For example:
numbers = [1, [2, 3], [4, 5, 6]]
numbers[1].append(42)
del numbers[2][2]
print(numbers)
2.6 List as Mutable Argument¶
Be careful when you pass a list as an argument to a function because the list is a mutable object.
If the function changes the list value in the function body, the passed-in object changes because an argument is just an alias to the object.
It is a best practice that you should not change a mutable argument in a function body. If there is a need, make a copy of the object.
However, if elements of a list are mutable, for example, nested lists, you need a deep copy. Check Python copy document for more details.
def report_sum(numbers):
numbers.append(10)
print(sum(numbers))
scores = [3, 5, 7]
print(f"before call scores are {scores}")
report_sum(scores)
print(f"after call scores are {scores}")
def report_sum2(numbers):
new_numbers = numbers.copy()
new_numbers.append(10)
print(sum(new_numbers))
scores = [3, 5, 7]
print(f"before call scores are {scores}")
report_sum2(scores)
print(f"after call scores are {scores}")
2.7 A Stack¶
A stack is a data structure that stores elements in an last in, first out (LIFO) manner. For example, Python runtime uses stack to manage calls -- named as a call stack. A stack supports two basic methods:
append
that adds an element to the top of a stack. The operation is often calledpush
pop
that pops an element from the top of a stack.
When using a list to implement a stack, the top is the end of a list. You can append
(also called push in stack) an element and pop
an element. Both operate at the end of a list.
numbers = [1, 2, 3]
numbers.append(37)
numbers.append(42)
top = numbers.pop()
print(top)
3 Tuple¶
A tuple consists of a number of values separated by commas.
A tuple is an immutable object.
Tuple is often used to represent a short sequence of data that have different data types. For example, returning multiple values as a tuple.
Tuple Operations¶
The tuple operations are similar to the list operation, except that it doesn't support any write operation because it is immutable.
You access tuple by index or slicing.
You can unpack a tuple and assign its elements to different variables.
You can use built-in operator (such as +
, ==
etc) and built-in functions (such as len()
, max()
) and so on with tuple.
# create a tuple
person1 = "Alice", 19
print(person1[0]) # Alice
# you can create a tuple in parentheses
person2 = ("Bob", 20)
person1 == person2 # False
name, age = person1
print(name, age) # Alice, 19
# an empty tuple
empty = ()
# a tuple with one element
one = 1,
numbers = 1, 2, 3, 4, 5
print(numbers[0]) # 1
print(numbers[::2]) # (1, 3, 5)
print(max(numbers)) # 5
first, second, *rest = numbers
print(first, second, rest)
first, _ = numbers
print(first) # 1
Tuple Is Immutable, But¶
Tuple is a collection of elements.
Actually it contains a collection of element references. The number and the position of references are immutable.
However, if the reference points to a mutable object such as a list, the list data can be changed.
You can see that a tuple is an immutable structure, but its value may change.
alice = "Alice", [3, 4, 5]
alice[1].append(42)
print(alice) # ('Alice', [3, 4, 5, 42])
# however, you cannot using the assignment to change tuple element
# both give TypeError: you cannot using the assignment to change tuple element
alice[0] = "Bob"
alice[1] += 42
4 Set¶
A set is an unordered collection of unique elements. Sets have the following properties:
- unordered: Elements in the set do not have a position or index.
- unique: No elements in the set share the same value.
- immutable elements: all elements are immutable.
Built-in operators and functions work with a set like a list or tuple.
You can use for
loop to iterate all elements of a set, though the order might change for different runs.
But you don't access individual set elements as you do with a list or a tuple.
You often use in
operator to check membership. Set is unique in its support of many set operations like union, intersection, difference, and symmetric difference.
4.1 Create a Set¶
There are two ways to create a set:
- using a
{}
to include a sequence of object separated by,
- using
set()
operation to create a set from another sequential data like a list or a tuple. - you can only use
set()
to create an empty set because{}
is used for empty dictionary.
fruits = {"apple", "orange", "banana"}
numbers = {1, 2, 3, 2, 3, 5} # set removes duplicated elements
print(numbers) # 1, 2, 3, 5
odds = set(range(1, 6, 2))
print(odds) # 1, 3, 5
4.2 Immutable Elements in Set¶
Set is kind of special because it is an mutable object but all its elements must be immutable.
For example, the following code raises a TypeError
because a list is a mutable object. Technically, Python cannot run hash
operation on an mutable object.
The tuple version works because a tuple is immutable.
odds = [1, 3, 5]
evens = [2, 4]
# TypeError because mutable object
my_set = {odds, evens}
# now it works
odds_2 = (1, 3, 5)
evens_2 = (2, 4)
my_set = {odds_2, evens_2}
4.3 Set Operations¶
Using in
or not in
operator to check membership.
You can add or remove elements.
Set supports most set operations like like union, intersection, difference, and symmetric difference.
fruits = {"apple", "orange", "banana"}
print("apple" in fruits) # True
print("kiwi" in fruits) # False
fruits.add('kiwi')
fruits.add("apple") # redundant, ignored
fruits.remove("apple")
more_fruits = {"kiwi", "pear"}
print(fruits) # {'banana', 'kiwi', 'orange'}
print(more_fruits) # {'pear', 'kiwi'}
# union
print(fruits | more_fruits) # {'orange', 'banana', 'pear', 'kiwi'}
# difference
print(fruits - more_fruits) # {'orange', 'banana'}
5 Dictionary¶
A dictionary is an unordered collection of elements where each element has two parts: a key and a value. Or you can say that an element is a key-value pair.
The key can be any object as long as it is immutable. Common key types include int
and string
.
People use dictionaries to store key-value pairs thus it is easy to find out a value. For example, you use student_id
to retrieve a student object.
5.1 Create a Dictionary¶
You use {}
to create a dictionary. The {}
creates an empty dictionary. You can use a dictionary variable as a boolean expression to check if it is empty. To create elements, create a sequence of key: value
pairs separated by ,
.
Another approach to create a dictionary is using the built-in dict()
function. The argument is a sequence of key-value pairs. If the keys are simple string, you can call it using keyword arguments.
empty_dict = {}
print(empty_dict)
students = {90: 'Alice', 27: 'Bob', 50: 'Cindy'}
print(students)
more_students = {90: 'Alice', 27: 'Bob', 90: 'Cindy', 200: 'Mike'}
print(more_students)
# use the dict() built-in function
students = dict([(90, 'alice'), (27, 'bob')])
print(students)
my_dict = dict(A='alice', B='Bob')
print(my_dict)
5.2 Read or Write a Dictionary Element¶
You uses the dictionary_name[key]
to access an individual element.
You can read or update the value in the key-value pair. There is no way to change the key because it is immutable.
Be careful, there are two cases that could be wrong in using dictionaries:
- A non exist key throws a
KeyError
exception. To avoid it, useget
method with a specified default value. For example:students.get(42, 'Unknown')
- when the
dictionary_name[key]
is on the left hand side, you set a new value for an existing key or create a new key-value pair if the key doesn't exist. Any typo in the key name could be a big bug.
students = {90: 'Alice', 27: 'Bob', 50: 'Cindy'}
# read a value for a key
name_with_id_90 = students[90]
print(name_with_id_90)
# change a value for a key
students[90] = 'Mike'
print(students[90])
# add a new key-value pair because 97 doesn't exist
students[97] = 'Bill'
print(students)
# reading a value for a non-exist key throws a KeyError exception
name_nobody = students[404]
5.3 Other Operations¶
The built-in len
function tells how many elements in a dictionary.
The in
and not in
operators test whether a key exists in a dictionary.
The del
operator delete a key-value pair from a dictionary if the specified key exists, otherwise, it throws a KeyError
exception. The syntax is del dictionary_name[key]
. To avoid exception, use in
to make sure the key is there before del
.
month_days = {'Jan': 31, 'Apr': 30, 'Jul': 31}
print(f'It has {len(month_days)} elements')
if 'Jan' in month_days:
print('Jan is in the dictionary')
if 'Feb' not in month_days:
print('Feb is not in the dictionary')
if 'Jan' in month_days:
del month_days['Jan']
print(month_days)
# throw a KeyError exception because the key doesn't exist
del month_days['Jan']
print(month_days)
5.4 Iterate a Dictionary¶
- You can use
for key in dictionary_name:
to iterate over all keys of a dictionary. Then you usedictionary_name[key]
to access each value. - The
items
method returns a sequence of key-value pairs. Therefore, you can usefor key, value in dictionary_name.items():
to iterate over a dictionary. - The
values()
method returns all values. Don't assume any order of the return values! - The
keys()
method returns all keys.
month_days = {"Jan": 31, "Apr": 30, "Jul": 31}
for month in month_days:
print(f'{month} has {month_days[month]} days')
for month, days in month_days.items():
print(f'{month} has {days} days')
days_sequence = month_days.values()
for days in days_sequence:
print(days, end=' ')
print()
for key in month_days.keys():
print(f'Month key is {key}', end='; ')
print()
5.5 More Methods¶
The dictionary has more methods. The following is a list of commonly-used methods. Try them.
clear
: clear all elementspop
: return the value and remove the key-value pair. For example:month_days.pop("Jan")
.popitem
: remove the latest inserted element the dictionary, return the removed element. For example:month_days.popitem()
.
You can also use built-in del
operator to remove a key-value pair without return value. For example: del month_days["Jan"]
Exercise: please write a phone book program that lets users to input and query phone book by first name or phone number. The search should be case-insensitive.
6 Creating New Collections From Existing Ones¶
Python provides two convenient constructs to create new collections:
- List/set/dictionary comprehension: create a list/set/dictionary from an iterable object.
- Generator expression: create an iterable object from a sequence object.
An iterable
object is a collection object or something that you can apply the for
loop. For example, range(5)
is an iterable object.
The advantage of iterable
is that it is lazy, generating one value at a time. range(1_000_000_000)
doesn't create one billion numbers when it is initialized, it generate one number in each for
loop. Thus it doesn't need memory for all numbers.
6.1 Motivation¶
When you want to create a list from a sequence with simple computation, you can use a list comprehension to simplify the code.
For example, to create a list of square roots from the first 5 integers, you can use either of the following code snippets:
import math
roots = []
for number in range(5):
roots.append(math.sqrt(number))
print(roots)
import math
numbers = range(5)
# you need to use list() to get the list result
roots = list(map(math.sqrt, numbers))
print(roots)
6.2 List Comprehension¶
Python let you use list comprehension to simplify the code. The list comprehension has a syntax like [expression for member in iterable]
.
import math
roots = [math.sqrt(number) for number in range(5)]
print(roots)
6.3 Filtering Elements¶
You can use list comprehension with an additional if condition
construct to filter out the elements.
lower_letters = [char for char in "Hello World" if char.islower()]
print(lower_letters) # ['e', 'l', 'l', 'o', 'o', 'r', 'l', 'd']
6.4 Set and Dictionary Comprehensions¶
Similar to a list comprehension, you can create a set or a dictionary from an iterable object.
Just replace the square bracket []
with the curly braces {}
.
lower_letters = {char for char in "Hello World" if char.islower()}
print(lower_letters) # {'d', 'e', 'r', 'l', 'o'}
squares = { number: number * number for number in range(5)}
print(squares) # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
7 Generator Object¶
A generator object is an iterable object that you can use in a for
loop.
Python has a keyword yield
that creates a generator object that returns one value in each iteration of the for
loop.
The following function is a generator function that returns a generator object.
def squares(size):
for number in range(size):
yield number * number
five_squares = [square for square in squares(5)]
print(five_squares) # [0, 1, 4, 9, 16]
7.1 Why Generator?¶
The reason for having generator function and generator objects is the so-called lazy computation.
Suppose that you have one billion data records, it is impossible or inefficient to load all records into the computer memory. The lazy computation that load and process one record at a time has many benefits:
- you can show some result/progress when the first item is processed.
- you don't need a large memory to hold all records.
- you might stop processing the rest of data for certain conditions.
7.2 Generator Expression¶
Instead of using a generator function, you can use a generator expression to create a generator object from an iterable object.
The syntax is similar to list/set comprehension, just use ()
in place of []
.
If the source iterable object is a generator object, the iterator expression is also a generator object that is computed one at a time.
squares = (number * number for number in range(5))
for square in squares:
print(square, end=", ")
# output: 0, 1, 4, 9, 16,