Python¶
The first sentence of The Python Tutorial
Python is an easy to learn, powerful programming language.
Part of the reason for this claim is that Python Data Model defines the interfaces of the building blocks (constructs) of the language itself.
Why? What? How?
Python Data Model¶
- Overview
- Special Methods
- Scalar Value: Number and Boolean
- Collection
- Syntax: Callable and Loop
- meta-programming
- Attribute
- Class
Overview¶
- Motivation
- Language Constructs
- What is Python Data Model?
- Pythonic Style
Motivation¶
There are three basic requirements in the design of a programming language:
- Consistent: it is easy to predict the behaviors of a construct.
- Composable: constructs can be selected and assembled in various combinations to enable desired behaviors.
- Open: developers can create new types that work in the same way as the built-in types and standard library types.
Language Constructs¶
- Data
- Atomic types and composite types (sequence, mapping, and set)
- Built-in types and new types (Classes)
- Operations
- built-in operators such as
+
,-
,>=
, list index[]
, function call()
, and so on. - built-in functions such as
len()
,repr()
,bool()
, etc. - user defined functions
- built-in operators such as
- Program Control Structure: specific syntax for statements, branches and loops. For example:
for
loop statement orwith
context manager statement.- work with user defined types
- Meta programming: write code that generate code
What is Python Data Model?¶
Python data model is the set of APIs that defines the interfaces of the language constructs that satisfies the three basic requirements:
- consistent: it is standardized by Python language specification and PEPs.
- Composable: the APIs work well with each other.
- Open: new objects fit well with the Python language syntax.
It is defined in Python Language Reference Data Model.
Pythonic Style¶
"There should be one-- and preferably only one --obvious way to do it." - The Zen of Python
Python promotes an idiomatic coding style, the so-called Pythonic style, that leverages Python data model and demonstrates idiomatic language features.
For example, to find an object's length, you use the built-in len()
function, not a function like length()
/size()
, or a method like my_object.len()
or my_object.size()
.
Every Python developers should be familiar with common Python idioms. Following are two resources:
Common Python Idioms: List Comprehension and Generator Expression¶
As described in Python Docs, two common operations on an iterator’s output are
- performing some operation on every element
- selecting a subset of elements that meet some condition.
List comprehensions and generator expressions (short form: listcomp
and genexp
) are a concise notation for such operations, borrowed from the functional programming language Haskell
.
- use
liscomp
for small data collection that can be created in one call. - use
genexps
for large data sets or you don't need all the values at once.
# list comprehension: create a list all at once
# to create a list of squares of numbers from 0 to 9
result1 = []
for i in range(10):
result1.append(i * i)
# using list comprehension - a common idiom in python
result2 = [i * i for i in range(10)]
# using a generator expression: computes the values as necessary (lazy evaluation)
weights = [0.1, 0.2, 0]
inputs = [8.5, 0.65, 1.2]
bias = 2
sum1 = 0
for weight, input in zip(weights, inputs):
sum1 += weight * input
sum1 += bias
# using a generator expression, generate a pair each time when needed
# good for large data sets or you don't need all the values at once
wi_in_gen = (weight * input for weight, input in zip(weights, inputs))
# now all values are needed, one by one
sum2 = sum(wi_in_gen, bias)
print(sum1, sum2) # 2.98 2.98
import random
TEN_MILLION = 10**7
# Generator expression to create one million random numbers between 1 and 100
# just a generator expression, no numbers are generated yet
random_numbers = (random.randint(1, 100) for _ in range(TEN_MILLION))
# Display the first 10 numbers - only generate the first 10 numbers
first_ten = [next(random_numbers) for _ in range(10)]
print(first_ten)
Python Data Model: Special Methods¶
The Python data model is a set of APIs. The APIs are defined as a set of standard special methods.
All special methods follow a special naming style: starting and ending with double underscores: __*__
. They are known as dunder
(double underscore) methods.
Developers should not create or use any dunder
identifier not standardized by the language reference because they are subject to breakage without warning in future Python versions.
In a Python interpreter, built-in functions, operators, and special syntax invoke these special class methods to perform data operations.
Build-in Functions¶
len()
invokes the __len__()
method to get the length/size of an object.
repr()
invokes the __repr__()
method to compute the string representation (serialized string) of an object. It is used by developer for debugging purpose.
str()
, format()
, and print()
invokes __str__()
method to compute a user friendly representation of an object. The default implementation of object
calls object.__repr__()
.
Built-in Operators¶
+
invokes the __add__()
method on its left operand. If the first operand doesn't define the __add__()
method, it invokes __radd__()
method of the right operand. If both are not defined, it returns NotImplemented
exception.
==
invokes the __eq__()
method. By default, object implements eq() by using is
that checks if two references point to the same object. In most cases, this is not what you want and you should implement the __eq__()
method.
self[key]
invokes self.__getitem__(self, key)
for sequence type where key
is an integer and mapping type where key
is any immutable value.
Method call self(...)
invokes self.__call__(self, ...)
. If a class defines __call__(self, ...)
method, its instances are callable using syntax instance(...)
.
Special Syntax¶
for
statement uses __iter__
method to loop over items of a collection.
with
statement uses __enter__
and __exit__
methods to manage object context. Classes deal with file, database, and network should use the two methods to manage resources.
Built-in Types and New Types¶
Python defines a set of built-in types. Each type has a set of valid operations. New types are defined using class
to emulate the built-in type behaviors. Following sections will give examples emulating built-in types and customizing new type creation.
- Number and Bool
- Collection
- Class and Metaclass
Emulating Numeric Operations¶
Python is a high level programming language that has built-in numeric operators and functions such as +
, -
(unary negation or binary subtraction), *
, /
, //
(floor division), %
, **
, <
, <=
, ==
, !=
, >
, >=
, abs
, &
(bitwise And), ~
(bitwise inversion), ^
(bitwise XOR), |
(bitwise Or), and so on.
Each of these operators or functions has one or more corresponding special methods. For example, +
invokes either __add__
of its left operand or __radd__
of its second operand if the __add__
is not defined by the left operand.
If a new type defines the corresponding methods, an instance of the type can be operands of the built-in operators and functions
+
and *
Operators¶
As an example, the following code defines a Value
type that works well with +
and *
. It also defines __repr__
to have a better string representation of the data.
class Value:
def __init__(self, data):
self.data = data
def __repr__(self):
return f"Value(data={self.data})"
def __add__(self, other):
out = self.data + other.data
return Value(out)
def __mul__(self, other):
out = self.data * other.data
return Value(out)
a = Value(2.0)
b = Value(3.0)
c = Value(-2.0)
print(a + b * c)
Boolean Value and Operation with Scalar Values¶
Any Python object can be used in a boolean context or be an operand of built-in bool()
function. Boolean context include conditions in if
or while
statement, or as operands of and
, or
, and not
logical operators. Every object is either truthy
or falsy
in a boolean context.
By default, any instance of a new type is truthy
unless either __bool__()
or __len__()
method is defined in the type. In a boolean context or a call of bool()
, the __bool()__
method is called. If the __bool__()
method is not defined, Python calls __len__()
method. If the result is 0, it is falsy
(False
). Otherwise, it is truthy
(True
).
The Value
type defines a __bool__()
method in the following code.
Additionally it let Value
with scalar numbers.
class Value:
def __init__(self, data):
self.data = data
def __repr__(self):
return f"Value(data={self.data})"
def __add__(self, other):
other = other if isinstance(other, Value) else Value(other)
out = self.data + other.data
return Value(out)
def __mul__(self, other):
other = other if isinstance(other, Value) else Value(other)
out = self.data * other.data
return Value(out)
def __bool__(self):
return bool(self.data)
a = Value(0)
b = Value(3)
print(bool(a), bool(b), (a + 1) * b) # False True Value(data=3.0)
# how about 2 + a, 3 * b, a - b
class Value:
def __init__(self, data):
self.data = data
def __repr__(self):
return f"Value(data={self.data})"
def __add__(self, other):
other = other if isinstance(other, Value) else Value(other)
out = Value(self.data + other.data)
return out
def __mul__(self, other):
other = other if isinstance(other, Value) else Value(other)
out = Value(self.data * other.data)
return out
def __rmul__(self, other): # other * self
return self * other
def __neg__(self): # -self
return self * -1
def __sub__(self, other): # self - other
return self + (-other)
def __radd__(self, other): # other + self
return self + other
a = Value(0)
b = Value(3)
print(bool(a), bool(b), 2 * (1 + a) - b) # False True Value(data=6.0)
Collection¶
Python built-in collection types include str
, list
, tuple
, range
, set
, dict
, and so on. They all have three special methods:
__len__()
to support built-inlen()
function orbool()
function.__iter__()
to supportfor
, unpacking, and other iteration operations.__contains__
to supportin
operator.
Except set
, all collection types support getting a value by a key (an index or any immutable object) using syntax obj[key]
.
- It is equivalent to
type(obj).__getitem__(obj, key)
. - The
__getitem__
can also be used to support iteration without defining__iter__
method.
By implementing the corresponding special methods, a new type can emulate a built-in collection type the works well in a Pythonic style.
from collections import namedtuple
Item = namedtuple("Item", "name quantity")
class ShoppingList:
def __init__(self, items):
self.items = items
self.names = [name for (name, _) in items]
def __getitem__(self, index):
return self.items[index]
def __len__(self):
return len(self.items)
def __contains__(self, name):
return name in self.names
milk = Item("Milk", 1)
banana = Item("Banana", 5)
bread = Item("Bread", 2)
my_list = ShoppingList([milk, banana, bread])
print(my_list[0]) # Item(name='Milk', quantity=1)
for item in my_list:
print(item) # print each item
print(f"There are {len(my_list)} items.") # There are 3 items.
has_milk = "Milk" in my_list
has_chip = "Chip" in my_list
print(has_milk, has_chip) # True False
# to improve for loop
from collections import namedtuple
Item = namedtuple("Item", "name quantity")
class ShoppingList:
def __init__(self, items):
self.items = items
self.names = [name for (name, _) in items]
# __getitem__ method is for indexing
def __getitem__(self, index):
return self.items[index]
# more efficient than __getitem__ for iteration
def __iter__(self):
return iter(self.items)
def __len__(self):
return len(self.items)
def __contains__(self, name):
return name in self.names
Callable¶
Both functions, classes, and methods are callable in Python: you append a pair of parentheses after the name of a function or a class. Python allows instances of a class to be callable like functions by define a __call__
method in the class.
It is often used to implement function-like behavior for a class instance. For example, the following class allows each instance to have a different count start and step.
class Counter:
def __init__(self, start=0):
self._count = start
def __call__(self):
self._count += 1
return self._count
counter = Counter()
print(counter()) # Output: 1
print(counter()) # Output: 2
counter = Counter(10)
print(counter()) # Output: 11
print(counter()) # Output: 12
Meta-programming: Code That Generates Code¶
Python has a set of special methods that can be used to customize the class definition. These methods include:
- Customize instance creation
- Instance creation
- Attribute management
- Advanced: customize class creation
- Class creation:
__init_subclass__
- Class decorator
metaclass
- Class creation:
In meta-programming, instances and classes are objects that are created and customized at runtime.
Python tools/frameworks such as PyTorch
and Django
use meta-programming to make it easy to develop application. An application developer don't use them often but it is better to know the concepts.
Why Meta Programming¶
- Simple: writing less code
- Powerful: expressive code
For example, in data analysis, one needs a simple way to create classes that mainly hold data. The data class should have good string representation and be able to tell their equality.
Without `@dataclass``,¶
The following code use type hints for better readability.
class Person:
def __init__(self, name: str, age: int):
self.name = name
self.age = age
def __repr__(self):
return f"Person(name={self.name}, age={self.age})"
def __eq__(self, other):
if not isinstance(other, Person):
return False
return self.name == other.name and self.age == other.age
With @dataclass
¶
# both __repr__ and __eq__ methods are defined
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int
alice = Person("Alice", 30)
bob = Person("Bob", 30)
alice2 = Person("Alice", 30)
print(alice) # Person(name=Alice, age=30)
print(alice == bob) # false
Instance and Attributes¶
- Instance creation:
__new__
. You use it to customize instance creation and initialization. - Attribute management:
__init__
,__getattribute__
,__getattr__
,__setattr__
,property
and descriptor. You use these methods to control the.
attribute access behavior of a class and its instances.
The __init__(self, ...)
Method¶
Most classes only define the __init__(self, ...)
method to set attributes of an instance of the class. There are some questions for this method:
- When is it called? You might know the answer: it is called when you create an instance by calling a
cls(...)
wherecls
is a class name. You almost never call this method directly. - The
__init__(self, ...)
returns nothing, how could Python create the instance? It is not clear. - Where does the first argument
self
come from? It is not clear.
There is more to explain about the instance creation.
Instance Creation¶
There are two steps to create an instance of a class:
__new__()
static method creates a new instance. It is the constructor method. Its first argument is a class, often named ascls
. It returns a new instance that is passed to the next initialization step.__init()__
instance initializer method that set the attributes of an instance. Its first argument is the newly created instance, often named asself
. It has no return value.
All classes are subclass of the object
base class. If a class doesn't define any of the method, Python calls the default implementation defined in the object
base class.
__new__()
Use Cases¶
You rarely need to define __new__()
. It is often used to develop frameworks or libraries.
__new__()
is mainly used to define subclasses of immutable types (like int
, str
, or tuple
) to customize instance attributes because it is too late to change anything once the instance is created.
It is also used in a metaclass in order to customize class creation.
Following is an example that create a Name
instance that has titled string. The first char is an uppercase one. Because str
is immutable, you cannot change it in __init__()
method.
class Name:
def __new__(cls, name):
instance = super().__new__(cls)
instance.name = name.title()
return instance
def __repr__(self) -> str:
return self.name
name1 = Name("alice")
name2 = Name("BOB")
print(name1, name2) # Alice Bob
Property¶
A property let you use a set of methods as a normal data attribute. It is Pythonic way to implement getter
and setter
methods to customize the attribute behavior.
The advantage of using methods behind a data attribute are:
- Uniform access: you can use a simple data attribute or methods without change its usage.
- Getter and setter control: you can validate, transform the attribute access behaviors.
- Computed properties: a method allows to calculate and/or cache the result.
import math
class Circle:
def __init__(self, radius):
self._radius = radius
@property # this is the getter method
def radius(self):
return self._radius
@radius.setter # this is the setter method
def radius(self, value):
if value < 0:
raise ValueError("Radius cannot be negative")
self._radius = value
@property # calculated read-only value
def area(self):
return math.pi * self._radius**2
circle = Circle(1)
radius = circle.radius
area = circle.area
print(f"Radius {radius} has an area of {area}")
# output: Radius 1 has an area of 3.141592653589793
circle.radius *= 10
radius = circle.radius
area = circle.area
print(f"Radius {radius} has an area of {area}")
# output: Radius 10 has an area of 314.1592653589793
circle.radius = -1 # ValueError: Radius cannot be negative
Descriptor¶
Descriptors are used to customize the behavior of getting, setting, or deleting an attribute's value. Descriptors provide a general mechanism to control attribute access in Python classes.
A Python property is actually a specific implementation of the descriptor that often used to decorate class methods. It can also be used as a typical descriptor that is defined as a class attribute.
One descriptor can be used by multiple attributes.
Descriptor Implementation¶
To create a descriptor, you typically define one or more of the following methods within a class:
__set_name__(self, owner, attribute_name)
: this method store the attribute name in each descriptor instance.__get__(self, instance, owner)
: This method is called when you access the attribute's value. The parameters are descriptor instance, the instance of the object it's accessed on, and the class of that object. You should return the value you want to provide for the attribute.__set__(self, instance, value)
: This method is called when you set the attribute's value. The parameters are the descriptor instance, the instance of the object it's set on, and the new value. You can implement custom logic to handle the setting of the value.__delete__(self, instance)
: This method is called when you delete the attribute. It is rarely used.
class PositiveNumber:
def __set_name__(self, owner, attribute_name):
# different attribute has different storage name
self.storage_name = attribute_name
def __get__(self, instance, owner):
# this is a low level function that
# must directly manipulate object mapping.
return instance.__dict__[self.storage_name]
def __set__(self, instance, value):
if value < 0:
raise ValueError("Value cannot be negative")
instance.__dict__[self.storage_name] = value
class Size:
# define a class attribute with a descriptor instance
# the attribute is actually stored/accessed in an instance of Size
width = PositiveNumber()
length = PositiveNumber()
def __init__(self, width, length):
self.width = width
self.length = length
@property
def area(self):
return self.width * self.length
desk_size = Size(2, 5)
print(desk_size.area)
desk_size.length = 10
print(desk_size.area)
desk_size.length = -1 # ValueError: Value cannot be negative
Dynamic Attributes¶
Dynamic attributes are useful when you need to store additional data with an object, but the attribute names are not known until runtime.
All defined attributes are usually stored in the object's __dict__
that is a dictionary.
The special method __getattr__(self, name)
is called when the name
attribute is not found in the current object. Therefore, it can be used to define dynamic (computed) attribute value or raise AttributeError
if the requested name is invalid.
The __setattr__(self, name, value)
method is called whenever an attribute is assigned a value on an instance. You can define this method in your class to control and customize the behavior of attribute assignment.
Both property and descriptor use these attribute management methods to create the read/write attributes. If possible, you should use property to define dynamic attributes because it is the simplest. Descriptor is the choice if multiple classes/attributes have the same logic. Django models
is a descriptor.
class DynamicConfig:
def __init__(self, config_dict):
self.config_dict = config_dict
def __getattr__(self, name):
if name in self.config_dict:
return self.config_dict[name]
else:
raise AttributeError(f"Config attribute '{name}' not found")
# Create a dynamic config object
config = DynamicConfig({"database": "mysql", "host": "localhost", "port": 3306})
# Access dynamic attributes
print(config.database) # Output: mysql
print(config.host) # Output: localhost
print(config.port) # Output: 3306
# Try accessing a non-existent attribute
try:
print(config.username)
except AttributeError as e:
print(e) # Output: Config attribute 'username' not found
Too Much Redundancy¶
Python is famous for its simplicity. But the following object-oriented programming code
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
def __repr__(self):
return f"Person(name={self.name}, age={self.age})"
# Create an __eq__ method that compares the attributes
def __eq__(self, other):
return (self.name == other.name) and (self.age == other.age)
is horrible because
- each attribute is repeated THREE times
- each attributed is repeated the fourth time in
__repr__
- each attributed is repeated the fifth time in
__eq__
- both
__repr__
and__eq__
have the same logic for all data classes: print and compare each attribute.
Class Attributes Have Simple Syntax¶
Class is an also object
in Python - it has class attributes that are typed ONCE.
class Person:
name: str # better syntax using type hint
age: int # type hint
def __repr__(self) -> str:
return f"Person(name={self.name}, age={self.age})"
However
- Class attributes are shared by all its instances.
- Assigning the same attribute to an instance creates a new instance attribute
# all instances share the same class attributes
alice = Person()
bob = Person()
Person.name = "Alice"
Person.age = 30
bob.name = "Bob" # it creates a new instance attribute
print(alice, bob) # Person(name=Alice, age=30) Person(name=Bob, age=30)
The Solution: Creating Instance Attributes from the Class Attributes¶
Python provides three approaches to customize class/instance creation:
- allow a base class to customize its subclass behaviors.
- class decorator: it takes a class as an argument and returns a - decorated class with desired behavior.
- metaclass: define the behavior and structure of other classes.
Solution 1: Base Class¶
The special method _init_subclass__
is defined in a base class to customize the creation of its subclasses.
However, it is not recommended because inheritance a strong relationship - the chid class inherits everything from its parent class.
class DataClassBase:
def __init_subclass__(cls) -> None:
attributes = [attr for attr in cls.__annotations__]
def _init(self, *args, **kwargs):
for attr, value in zip(attributes, args):
setattr(self, attr, value)
for attr, value in kwargs.items():
setattr(self, attr, value)
def _repr(self):
attrs = ", ".join(f"{attr}={getattr(self, attr)}" for attr in attributes)
return f"{cls.__name__}({attrs})"
def _eq(self, value: object) -> bool:
if not isinstance(value, self.__class__):
return False
return self.__dict__ == value.__dict__
setattr(cls, "__init__", _init)
setattr(cls, "__repr__", _repr)
setattr(cls, "__eq__", _eq)
class Person(DataClassBase):
name: str
age: int
class Point(DataClassBase):
x: float
y: float
alice = Person("Alice", 30)
bob = Person("Bob", 25)
print(alice, bob, alice == bob)
p1 = Point(1.0, 2.0)
p2 = Point(30.0, 50.0)
Solution 2: Class Decorator¶
A class decorator takes a class as an argument and returns a new class to replace the decorated class. Because it can be applied to any class, it is more flexible and more complex than the base class approach. @dataclass
is a class decorator defined in standard library. It customizes class attributes such as instance attributes, __init__()
, __repr__()
, __eq__()
, and so on.
For the case of the Vector
class, the class decorator logic is similar to the VectorBase
.
def MyDataClass(cls):
attributes = [attr for attr in cls.__annotations__]
def _init(self, *args, **kwargs):
for attr, value in zip(attributes, args):
setattr(self, attr, value)
for attr, value in kwargs.items():
setattr(self, attr, value)
def _repr(self):
attrs = ", ".join(f"{attr}={getattr(self, attr)}" for attr in attributes)
return f"{cls.__name__}({attrs})"
def _eq(self, value: object) -> bool:
if not isinstance(value, self.__class__):
return False
return self.__dict__ == value.__dict__
setattr(cls, "__init__", _init)
setattr(cls, "__repr__", _repr)
setattr(cls, "__eq__", _eq)
return cls
@MyDataClass
class Person:
name: str
age: int
@MyDataClass
class Point:
x: float
y: float
alice = Person("Alice", 30)
bob = Person("Bob", 25)
print(alice, bob, alice == bob)
p1 = Point(1.0, 2.0)
p2 = Point(30.0, 50.0)
print(p1, p2, p1 == p2)
Solution 3: Metaclass¶
Metaclass is the most advanced and most capable approach to customize class creation. However, it is the most complex one that should be avoided if other approaches work for you. If you are not sure whether you need it, you don't.
A metaclass is a class whose instances are classes -- a class' class, thus the name metaclass. It is essentially a class factory.
A meta class is used in the metaclass
argument class MyClass(metaclass=MyMetaClass): ...
__new__()
and __init__()
¶
When Python sees a class definition like class MyClass(metaclass=MyMetaClass): ...
, it first calls MyMetaClass.__new__()
to create a new class. Then it calls MyMetaClass.__init__()
to set the new class attributes.
The MyMetaClass.__init__()
method has the following arguments:
cls
: the new class created by the__new__()
method.name
: the name of the new class.bases
: a tuple consists of base classes of the new class.attributes
: a mapping represents the attributes of the new class.
For the simple purpose of reducing boilerplate code of the Person
class, the logic is similar to other examples. Again, it is for demo purpose, you hardly need to use it in your application development or data analysis career.
class DataClassMeta(type):
def __init__(cls, name, bases, dct):
fields = cls.__annotations__
def __init__(self, *args, **kwargs):
for field_name in fields:
if field_name in kwargs:
setattr(self, field_name, kwargs[field_name])
else:
if args:
setattr(self, field_name, args[0])
args = args[1:]
def __repr__(self):
field_values = ", ".join(
f"{field_name}={repr(getattr(self, field_name))}"
for field_name in fields
)
return f"{name}({field_values})"
def __eq__(self, other):
if not isinstance(other, cls):
return False
return all(
getattr(self, field_name) == getattr(other, field_name)
for field_name in fields
)
# Attach the __init__, __repr__, and __eq__ methods to the new class
setattr(cls, "__init__", __init__)
setattr(cls, "__repr__", __repr__)
setattr(cls, "__eq__", __eq__)
class Person(metaclass=DataClassMeta):
name: str
age: int
class Point(metaclass=DataClassMeta):
x: float
y: float
alice = Person("Alice", 30)
bob = Person("Bob", 25)
print(alice, bob, alice == bob)
p1 = Point(1.0, 2.0)
p2 = Point(30.0, 50.0)
print(p1, p2, p1 == p2)
Summary¶
Python languages constructs are consistent, composable and open. You can define new data types that work seamlessly with
- built-in operators such as
+
,-
,>=
, list index[]
, function call()
, and so on. - built-in functions such as
len()
,repr()
,bool()
, etc. - specific syntax such as
for
loop statement andwith
context manager statement.
You can customize new types using meta-programming that make your code simple and powerful.
- For attributes:
@property
, descriptor, dynamic attributes - For class: Base class, class decorator, metaclass.