You're computer scientists, so you know how to code — and Python is so intuitive that you can just about pick it all up by looking at example code. This notebook is a quick review of standard Python syntax. The only distinctive bit is section 3.5 on Comprehensions, and section 4.1 on Functions. For the rest, please just skim through, and then try the (unassessed) warmup exercises in ex0.
We can use Python interactively like a calculator. Here are some simple expressions and their values. Try entering these yourself, in your own notebook, then press shift+enter or choose Cell | Run Cells from the menu.
3+8
1.618 * 1e5
x = 3
y = 2.2
z = 1
x * y + z
If we want to type in a very long line, we can split it using a backslash.
"Perhaps the immobility of the things that surround us is forced " \
+ "upon them by our conviction that they are themselves, and not " \
+ "anything else, and by the immobility of our conceptions of them. "
Jupyter will only show the output from the last expression in a cell. If we want to see multiple values, print them explicitly. Alternatively, let the last expression be a tuple.
print(x * y + z)
print(x * (y + z))
"A tuple of results:", x*y+z, x*(y+z)
Python does its best to print out helpful error messages. When something goes wrong, look first at the last line of the error message to see what type of error it was, then look back up to see where it happened. If your code isn't working and you ask for help in the Q&A forum on Moodle, please include the error message!
x = 'hello'
y = x + 5
y
1 x = 'hello' ----> 2 y = x + 5 3 y TypeError: can only concatenate str (not "int") to str
7 / 3 # floating point division
7 // 3 # integer division (rounds down)
min(3,4), max(3,4), abs(-10)
round(7.4), round(-7.4), round(3.4567, 2)
3**2 # power
5 <<1, 5 >> 2 # bitwise shifting
7 & 1, 6 | 1 # bitwise operations
(3+4j).real, (3+4j).imag, abs(3+4j) # complex numbers
The usual logical operators work too, though the syntax is wordier than other languages. Python's truth values are True
and False
.
3**2 + 4**2 == 5**2 # use == to test if values are equal
(x,y,z) = (5, 12, False)
x < y or y < 10 # precedence: (x < y) or (y < 10)
x < y and not y < 15 # precendence@ (x < y) and (not (y < 15))
(x == y) == z
'lower' if x < y else 'higher' # same as Java's (x < y) ? 'lower' : 'higher'
Some useful maths functions are found in the maths
module. To use them you need to run
import math
. (It’s common to put your import statements at the top of the notebook, as they only need to be
run once per session, but they can actually appear anywhere.)
import math
math.floor(-3.4), math.ceil(-3.4)
math.pow(9, 0.5), math.sqrt(9)
math.exp(2), math.log(math.e), math.log(101, 10)
math.sin(math.pi*1.3), math.atan2(3,4)
import cmath # for functions on complex numbers
cmath.sqrt(-9)
cmath.exp(math.pi * 1j) + 1
import random # for generating random numbers
random.random(), random.random()
Python strings can be enclosed by either single quotes or double quotes. Strings (like everything else in Python) are objects, and they have methods for various string-processing tasks. See the String Methods documentation for a full list.
"shout".upper() # "SHOUT"
"hitchhiker".replace('hi', 'ma') # "matchmaker"
'i' in 'team' # False
x = '''
Also, a multi-line string can be
entered with triple-quotes.
'''
A handy way to splice values into strings is with f-strings, i.e. strings with f
before the opening quote. Each chunk of the string enclosed in {⋅} is evaluated, and the result is spliced back into the string. The chunk can also specify the output format. The documentation describes more format specifiers.
name,age = 'Zaphod', 27
f"My name is {name} and I will be {age+1} next year"
f"The value of π to 3 significant figures is {math.pi:.3}"
If you do any serious data processing in Python, you will likely find yourself needing regular expressions. The supplementary notebooks show how to use regular expressions for data cleanup.
import re
s = 'In 2024 there will be an election'
re.search(r'(\d+)', s)[0] # '2024'
re.sub(r'a(n?) (\w+)ion', 'calamity', s) # 'In 2019 there will be calamity'
Python has four common types for storing collections of values: tuples, lists, dictionaries, and sets. In IA courses on OCaml and Java we learnt about lists versus arrays. In those courses, and in IA Algorithms, we study the efficiency of various implementation choices. In Python, you shouldn’t think about these things, at least not in the first instance. The Pythonic style is to just go ahead and code, and only worry about efficiency after we have working code. As the famous computer scientist Donald Knuth said,
Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.
Only when we have special requirements should we switch to a dedicated collection type, such as a deque or a heap or the specialized numerical types we’ll learn about in section 2.
Python lists and Python tuples are both used to store sequences of elements. They both support iterating over the elements, concatenation, random access, and so on. They’re a bit like lists, and a bit like arrays.
a = [1, 2, 'buckle my shoe'] # a list
b = (3, 4, 'knock at the door') # a tuple
len(a), len(b)
a[0], a[1], b[2] # indexes start at 0
a[-1], a[-2] # negative indexes count from the end
3 in a, 3 in b # is this item contained in the collection?
a + list(b) # ℓ1+ℓ2 concatenates two lists
tuple(a) + b # t1+t2 concatenates two tuples
list(zip(a,b)) # zip(ℓ1,ℓ2) gives [(ℓ1[0],ℓ2[0]), (ℓ1[1],ℓ2[1]), ...]
As you see, both lists and tuples can hold mixed types, including other lists or tuples. You can convert a list to a tuple and vice versa, and extract elements. The difference is that lists are mutable, whereas tuples are immutable
a[0] = 5
a.append('then')
a.extend(b)
a # [5, 2, 'buckle my shoe', 'then', 3, 4, 'knock at the door']
b[0] = 5
----> 1 b[0] = 5 TypeError: 'tuple' object does not support item assignment
To sort a list, we have a choice between sorting in-place or returning a new sorted list without changing the original.
names = ['bethe', 'alpher', 'gamov']
sorted(names) # ['alpher', 'bethe', 'gamov'], returns a new list
names # ['bethe', 'alpher', 'gamov'], unchanged from before
names.sort()
names # ['alpher', 'bethe', 'gamov'], sorted in-place
Another common operation is to concatenate a list of strings. Python’s syntax for this is unusual:
', '.join(names) + ' wrote a famous paper on nuclear physics'
x = list(range(10)) # [0,1,2,3,4,5,6,7,8,9]
x[1:3] # start is inclusive and end is exclusive, so x[1:3] == [x[1],x[2]]
x[:2] # first two elements
x[2:] # everything after the first two
x[-3:] # last three elements
x[:-3] # everything prior to the last three
x[::4] # every fourth element
We can assign into slices.
x[::4] = [None, None, None]
The other very useful data type is the dictionary, what Java calls a Map or HashMap.
room_alloc = {'Adrian': None, 'Laura': 32, 'John': 31}
room_alloc['Guarav'] = 19 # add or update an item
del room_alloc['John'] # remove an item
room_alloc['Laura'] # get an item
room_alloc.get('Alexis', 1) # get item if it exists, else default to 1
'Alexis' in room_alloc # does this dictionary contain the key 'Alexis'?
To iterate over items in a dictionary, see the next example …
Python supports the usual control flow statements: for
, while
, continue
, break
, if
, else
.
To iterate over items in a list,
for item in list:
… # do something with item
To iterate over items and their positions in the list together,
for i, name in enumerate(['bethe', 'alpher', 'gamov']):
print(f"Person {name} is in position {i}")
To just do something a number of times, if we don't care about the index, it's conventional to call the loop variable _
.
x = 2
for _ in range(5):
x *= 2
To iterate over two lists simultaneously, zip
them.
for x,y in zip(['apple','orange','grape'], ['cheddar','wensleydale','brie']):
print(f"{x} goes with {y}")
We can also iterate over (key,value) pairs in a dictionary. Suppose we're given a dictionary of room allocations and we want to find the occupants of each room.
room_alloc = {'adrian': 10, 'chloe': 5, 'guarav': 10, 'shay': 11,
'alexis': 11, 'rebecca': 10, 'zubin': 5}
occupants = {}
for name, room in room_alloc.items(): # iterate over keys and values
if room not in occupants:
occupants[room] = []
occupants[room].append(name)
for room, occupants_here in occupants.items():
ns = ', '.join(occupants_here)
print(f'Room {room} has {ns}')
Python has a distinctive piece of syntax called a comprehension for creating lists. It’s a very common pattern to write code that transforms lists, e.g.
ℓ = ... # start with some list [ℓ0, ℓ1, . . . ]
f = ... # some function we want to apply, to get [f(ℓ0), f(ℓ1), . . . ]
res = []
for i in range(len(ℓ)):
x = ℓ[i]
y = f(x)
res.append(y)
This is so common that Python has special syntax for it,
res = [f(x) for x in ℓ]
There’s also a version which only keeps elements that meet a criterion,
res = [f(x) for x in ℓ if t]
Here's a concrete example:
xs = range(10)
[x**2 for x in xs if x % 2 == 0]
This section of the notes is to compare and contrast the Python language to what you have learnt in the courses so far using OCaml and Java. This section of the course is here for your general interest, and it’s not needed for the Scientific Computing course, apart from section 1.4.1 on defining functions.
The development of the Python language is documented in Python Enhancement Proposals (PEPs). Significant changes in the language, or in the standard libraries, are discussed in mailing lists and written up for review as a PEP. They typically suggest several ways a feature might be implemented, and give the reason for choosing one of them. If consensus is not reached to accept the PEP, then the reasons for its rejection are also documented. They are fascinating reading if you are interested in programming language design.
The code snippet below shows how we define a function in Python. There are several things to note:
The function is defined with a default argument, c=0
. You can invoke it by either roots(2,3,1)
or roots(2,3)
.
Functions can be called with named arguments, roots(b=3, a=2)
, in which case they can be
provided in any order.
In scientific computing, we’ll come across many functions that accept 10 or more arguments, all of them with sensible defaults, and typically we’ll only specify a few of the arguments. This is why defaulting and named arguments are so useful.
import math
def roots(a, b, c=0):
"""Return a list with the real roots of c*(x**2) + b*x + a == 0"""
if b == 0 and c == 0:
raise Exception("This polynomial is constant")
if c == 0:
return [-a/b]
elif a == 0:
return [0] + roots(b=c, a=b)
else:
discr = b**2 - 4*c*a
if discr < 0:
return []
else:
return [(-b+s*math.sqrt(discr))/2/c for s in [-1,1]]
Some more notes:
This function either returns a value, or it throws an exception i.e. generates an error message and finishes. If your function finishes without an explicit return statement, it will return None. Unlike Java, it’s possible for different branches of your function to return values of different types — at risk to your sanity.
This function returns a single variable, namely a list. If you want to return several variables, return them in a tuple, and unpack the tuple using multiple assignment as shown in section 1.1.
It’s conventional to document your function by providing a documentation string as the first line.
You can see help for a function with ?. If we run ?roots
we’re shown
Signature: roots(a, b, c=0)
Docstring: Return a list with the real roots of c*(x**2) + b*x + a == 0
File: /path_to_notebook/<ipython-input-53-6cf3a0af9585>
Type: function
In Python as in OCaml, functions can be returned as results, assigned, put into lists, passed as arguments to other functions, and so on.
import random
def noisifier(σ):
def add_noise(x):
return x + random.uniform(-σ, σ)
return add_noise
fs = [noisifier(σ) for σ in [0.1, 1, 5]]
[f(1.5) for f in fs]
In this example above, noisifier
is a function that returns another function. The inner function ‘remembers’
the value of σ under which it was defined; this is known as a closure.
We can use lambda
to define anonymous functions, i.e. functions without names. This often used to
fill in arguments.
def illustrate_func(f, xs):
for x in xs:
print(f"f({x}) = {f(x)}")
illustrate_func(lambda b: roots(1,b,2), xs = range(5))
A generator (or lazy list, or sequence) is a list where the elements are only computed on demand. This lets us implement infinite sequences. In Python, we can create them by defining a function that uses the yield statement:
def fib():
x,y = 1,1
while True:
yield x
x,y = (y, x+y)
fibs = fib()
[next(fibs) for _ in range(10)]
When we call next(fibs)
, the fib code runs through until it reaches the next yield
statement, then it
emits a value and pauses. Think of fibs
as an execution pointer and a call stack: it remembers where
it is inside the fib
function, and calling next tells it to resume executing until the next time it hits yield
.
We can also transform generators using syntax a bit like list comprehension:
even_fibs = (x for x in fib() if x % 2 == 0)
[next(even_fibs) for _ in range(10)]
It’s often handy for functions to be able to return either a value, or a marker that there is no value.
For example, head(list)
should return a value unless the list is empty in which case there’s nothing to
return. A common pattern in a language like OCaml is to have a datatype that explicitly supports this,
for example we’d define head
to return an enumeration datatype
None | Some[’a]
. This forces everyone who uses head to check whether or not the answer is None
.
In Python, the return type of a function isn’t constrained. It’s a common convention to return
None
if you have nothing to return, and a value otherwise, and to trust that the person who called you
will do the appropriate checks.
Enumeration types are also used for type restriction, e.g. to limit what can be placed in a list. When we actually do want to achieve this, Python isn’t much help. It does have an add-on library for enumeration types but it’s a lot of work for little benefit.
One situation where enumeration types are very useful is when working with categorical values in data. When working with data, the levels of the enumeration are decided at runtime (by the contents of the data we load in), so pre-declared types are no use anyway.
Python uses dynamic typing, which means that values are tagged with their types during execution and checked only then. To illustrate, consider the functions
def double_items(xs):
return [x*2 for x in xs]
def goodfunc():
return double_items([1,2,[3,4]]) + double_items("hello world")
def badfunc():
return double_items(10)
We won’t be told of any errors until badfunc()
is invoked, even though it’s clear when we define it that
badfunc will fail.
Python programmers are encouraged to use duck typing, which means that you should test values
for what they can do rather than what they’re tagged as. “If it walks like a duck, and it quacks like a
duck, then it’s a duck”. In this example, double_items(xs)
iterates through xs
and applies *2
to every
element, so it should apply to any xs
that supports iteration and whose elements all support *2
. These
operations mean different things to different types: iterating over a list returns its elements, while
iterating over a string returns its characters; doubling a number is an arithmetical operation, doubling
a string or list repeats it. Python does allow you to test the type of a value with e.g.
if isinstance(x, list): ...
, but programmers are encouraged not to do this.
Python’s philosophy is that library designers are providing a service, and programmers are adults. If a library function uses comparison and addition, and if the end-user programmer invents a new class that supports comparison and addition, then why on earth shouldn’t the programmer be allowed to use the library function? (I’ve found this useful for simulators: I replaced ‘numerical timestamp’ with ‘rich timestamp class that supports auditing, listing which events depended on which other events’, and I didn’t have to change a single line of the simulator body.) Some statically typed languages like Haskell and Scala support this via dynamic type classes, but their syntax is rather heavy.
To make duck typing useful, Python has a long list of special method names so that you can
create custom classes supporting the same operations as numbers, or as lists, or as dictionaries.
For
example, if you define a new class with the method __iter__
then your new class can be iterated
over just like a list. (The special methods are sometimes called dunder methods, for "double underline".)
Example: trees. Suppose we want to define a tree whose leaves are integers and whose branches can have an arbitrary number of children. Actually, in Python, there’s nothing to define: we can just start using it, using a list to denote a branch node.
x = [1,[[2,4,3],9],[5,[6,7],8]]
To flatten a list like this we can use duck typing: given a node n
, try to iterate over its children, and if
this fails then the node must be a leaf so just return [n]
.
def flatten(n):
try:
return [y for child in n for y in flatten(child)]
except TypeError as e:
return [n]
flatten(x)
This would work perfectly well for trees containing arbitrary types — unless the end-user programmer puts in leaves which are themselves iterable, in which case the duck typing test doesn’t work — unless that is the user’s intent all along, to be able to attach new custom sub-branches …
A solution is to define a custom class for branch nodes, and use isinstance
to test each element
to see if it’s a branch node. This is not very different to the OCaml solution, which is to declare nodes
to be of type ‘either leaf or branch’ — except that Python would still allow leaves of arbitrary mixed
type.
Python is an object-oriented programming language. Every value is an object. You can see the class
of an object by calling type(x)
. For example,
x = 10
type(x) # reports int
dir(x) # gives a list of x’s methods and attributes
It supports inheritance and multiple inheritance, and static methods, and class variables, and so on. It doesn’t support interfaces, because they don’t make sense in a duck typing language.
Here’s a quick look at a Python object, and at how it might be used for the flatten function earlier.
class Branch(object):
def __init__(self, children):
self.children = children
def flatten(n):
if isinstance(n, Branch):
return [y for child in n.children for y in flatten(child)]
else:
return [n]
x = Branch([10,Branch([3,2]),"hello"])
flatten(x)
[10, 3, 2, 'hello']
Every method takes as its first argument a variable referring to the current object, this
in Java. Python
doesn’t support private and protected access modifiers, except by convention: the convention is that
attributes and functions whose name beings with an underscore are considered private, and may be
changed in future versions of the library.
The next lines of code are surprising. You can ‘monkey patch’ an object, after it has been created, to change its attributes or give it new attributes. Like so many language features in Python, this is sometimes tremendously handy, and sometimes the source of infuriating bugs.
y = Branch([])
y.my_label = "added an attribute"