Using Python

You're computer scientists, so you know how to code — and Python is so intuitive that you can just about pick it all up by looking at example code. This notebook is a quick review of standard Python syntax. The only distinctive bit is section 3.5 on Comprehensions, and section 4.1 on Functions. For the rest, please just skim through, and then try the (unassessed) warmup exercises in ex0.

Contents

1. A first session

We can use Python interactively like a calculator. Here are some simple expressions and their values. Try entering these yourself, in your own notebook, then press shift+enter or choose Cell | Run Cells from the menu.

If we want to type in a very long line, we can split it using a backslash.

Jupyter will only show the output from the last expression in a cell. If we want to see multiple values, print them explicitly. Alternatively, let the last expression be a tuple.

Python does its best to print out helpful error messages. When something goes wrong, look first at the last line of the error message to see what type of error it was, then look back up to see where it happened. If your code isn't working and you ask for help in the Q&A forum on Moodle, please include the error message!

      1 x = 'hello'
----> 2 y = x + 5
      3 y

TypeError: can only concatenate str (not "int") to str

2. Basic Python expressions

2.1 MATHS AND LOGIC

All the usual mathematical operators work, though watch out for division which uses different syntax to Java.

The usual logical operators work too, though the syntax is wordier than other languages. Python's truth values are True and False.

Some useful maths functions are found in the maths module. To use them you need to run import math. (It’s common to put your import statements at the top of the notebook, as they only need to be run once per session, but they can actually appear anywhere.)

2.2 STRINGS AND FORMATTING

Python strings can be enclosed by either single quotes or double quotes. Strings (like everything else in Python) are objects, and they have methods for various string-processing tasks. See the String Methods documentation for a full list.

A handy way to splice values into strings is with f-strings, i.e. strings with f before the opening quote. Each chunk of the string enclosed in {⋅} is evaluated, and the result is spliced back into the string. The chunk can also specify the output format. The documentation describes more format specifiers.

If you do any serious data processing in Python, you will likely find yourself needing regular expressions. The supplementary notebooks show how to use regular expressions for data cleanup.

3 Collections and control flow

Python has four common types for storing collections of values: tuples, lists, dictionaries, and sets. In IA courses on OCaml and Java we learnt about lists versus arrays. In those courses, and in IA Algorithms, we study the efficiency of various implementation choices. In Python, you shouldn’t think about these things, at least not in the first instance. The Pythonic style is to just go ahead and code, and only worry about efficiency after we have working code. As the famous computer scientist Donald Knuth said,

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

Only when we have special requirements should we switch to a dedicated collection type, such as a deque or a heap or the specialized numerical types we’ll learn about in section 2.

3.1 LISTS AND TUPLES

Python lists and Python tuples are both used to store sequences of elements. They both support iterating over the elements, concatenation, random access, and so on. They’re a bit like lists, and a bit like arrays.

As you see, both lists and tuples can hold mixed types, including other lists or tuples. You can convert a list to a tuple and vice versa, and extract elements. The difference is that lists are mutable, whereas tuples are immutable

----> 1 b[0] = 5

TypeError: 'tuple' object does not support item assignment

To sort a list, we have a choice between sorting in-place or returning a new sorted list without changing the original.

Another common operation is to concatenate a list of strings. Python’s syntax for this is unusual:

3.2 SLICING

We can pick out subsequences using the slice notation, x[start:end:sep].

3.3 DICTIONARIES

The other very useful data type is the dictionary, what Java calls a Map or HashMap.

To iterate over items in a dictionary, see the next example …

3.4 CONTROL FLOW

Python supports the usual control flow statements: for, while, continue, break, if, else.

To iterate over items in a list,

for item in list:
      # do something with item

To iterate over items and their positions in the list together,

To just do something a number of times, if we don't care about the index, it's conventional to call the loop variable _.

To iterate over two lists simultaneously, zip them.

We can also iterate over (key,value) pairs in a dictionary. Suppose we're given a dictionary of room allocations and we want to find the occupants of each room.

3.5 COMPREHENSIONS

Python has a distinctive piece of syntax called a comprehension for creating lists. It’s a very common pattern to write code that transforms lists, e.g.

 = ... # start with some list [ℓ0, ℓ1, . . . ]
f = ... # some function we want to apply, to get [f(ℓ0), f(ℓ1), . . . ]
res = []
for i in range(len()):
    x = [i]
    y = f(x)
    res.append(y)

This is so common that Python has special syntax for it,

res = [f(x) for x in ]

There’s also a version which only keeps elements that meet a criterion,

res = [f(x) for x in  if t]

Here's a concrete example:

4 Python as a programming language

This section of the notes is to compare and contrast the Python language to what you have learnt in the courses so far using OCaml and Java. This section of the course is here for your general interest, and it’s not needed for the Scientific Computing course, apart from section 1.4.1 on defining functions.

The development of the Python language is documented in Python Enhancement Proposals (PEPs). Significant changes in the language, or in the standard libraries, are discussed in mailing lists and written up for review as a PEP. They typically suggest several ways a feature might be implemented, and give the reason for choosing one of them. If consensus is not reached to accept the PEP, then the reasons for its rejection are also documented. They are fascinating reading if you are interested in programming language design.

4.1 FUNCTIONS AND FUNCTIONAL PROGRAMMING

The code snippet below shows how we define a function in Python. There are several things to note:

In scientific computing, we’ll come across many functions that accept 10 or more arguments, all of them with sensible defaults, and typically we’ll only specify a few of the arguments. This is why defaulting and named arguments are so useful.

Some more notes:

In Python as in OCaml, functions can be returned as results, assigned, put into lists, passed as arguments to other functions, and so on.

In this example above, noisifier is a function that returns another function. The inner function ‘remembers’ the value of σ under which it was defined; this is known as a closure.

We can use lambda to define anonymous functions, i.e. functions without names. This often used to fill in arguments.

4.2 GENERATORS

A generator (or lazy list, or sequence) is a list where the elements are only computed on demand. This lets us implement infinite sequences. In Python, we can create them by defining a function that uses the yield statement:

When we call next(fibs), the fib code runs through until it reaches the next yield statement, then it emits a value and pauses. Think of fibs as an execution pointer and a call stack: it remembers where it is inside the fib function, and calling next tells it to resume executing until the next time it hits yield.

We can also transform generators using syntax a bit like list comprehension:

4.3 NONE AND MAYBE, AND ENUMERATION TYPES

It’s often handy for functions to be able to return either a value, or a marker that there is no value. For example, head(list) should return a value unless the list is empty in which case there’s nothing to return. A common pattern in a language like OCaml is to have a datatype that explicitly supports this, for example we’d define head to return an enumeration datatype None | Some[’a]. This forces everyone who uses head to check whether or not the answer is None.

In Python, the return type of a function isn’t constrained. It’s a common convention to return None if you have nothing to return, and a value otherwise, and to trust that the person who called you will do the appropriate checks.

Enumeration types are also used for type restriction, e.g. to limit what can be placed in a list. When we actually do want to achieve this, Python isn’t much help. It does have an add-on library for enumeration types but it’s a lot of work for little benefit.

One situation where enumeration types are very useful is when working with categorical values in data. When working with data, the levels of the enumeration are decided at runtime (by the contents of the data we load in), so pre-declared types are no use anyway.

4.4 DYNAMIC TYPING

Python uses dynamic typing, which means that values are tagged with their types during execution and checked only then. To illustrate, consider the functions

def double_items(xs):
    return [x*2 for x in xs]
def goodfunc():
    return double_items([1,2,[3,4]]) + double_items("hello world")
def badfunc():
    return double_items(10)

We won’t be told of any errors until badfunc() is invoked, even though it’s clear when we define it that badfunc will fail.

Python programmers are encouraged to use duck typing, which means that you should test values for what they can do rather than what they’re tagged as. “If it walks like a duck, and it quacks like a duck, then it’s a duck”. In this example, double_items(xs) iterates through xs and applies *2 to every element, so it should apply to any xs that supports iteration and whose elements all support *2. These operations mean different things to different types: iterating over a list returns its elements, while iterating over a string returns its characters; doubling a number is an arithmetical operation, doubling a string or list repeats it. Python does allow you to test the type of a value with e.g. if isinstance(x, list): ..., but programmers are encouraged not to do this.

Python’s philosophy is that library designers are providing a service, and programmers are adults. If a library function uses comparison and addition, and if the end-user programmer invents a new class that supports comparison and addition, then why on earth shouldn’t the programmer be allowed to use the library function? (I’ve found this useful for simulators: I replaced ‘numerical timestamp’ with ‘rich timestamp class that supports auditing, listing which events depended on which other events’, and I didn’t have to change a single line of the simulator body.) Some statically typed languages like Haskell and Scala support this via dynamic type classes, but their syntax is rather heavy.

To make duck typing useful, Python has a long list of special method names so that you can create custom classes supporting the same operations as numbers, or as lists, or as dictionaries. For example, if you define a new class with the method __iter__ then your new class can be iterated over just like a list. (The special methods are sometimes called dunder methods, for "double underline".)

Example: trees. Suppose we want to define a tree whose leaves are integers and whose branches can have an arbitrary number of children. Actually, in Python, there’s nothing to define: we can just start using it, using a list to denote a branch node.

To flatten a list like this we can use duck typing: given a node n, try to iterate over its children, and if this fails then the node must be a leaf so just return [n].

This would work perfectly well for trees containing arbitrary types — unless the end-user programmer puts in leaves which are themselves iterable, in which case the duck typing test doesn’t work — unless that is the user’s intent all along, to be able to attach new custom sub-branches …

A solution is to define a custom class for branch nodes, and use isinstance to test each element to see if it’s a branch node. This is not very different to the OCaml solution, which is to declare nodes to be of type ‘either leaf or branch’ — except that Python would still allow leaves of arbitrary mixed type.

4.5 OBJECT-ORIENTED PROGRAMMING

Python is an object-oriented programming language. Every value is an object. You can see the class of an object by calling type(x). For example,

It supports inheritance and multiple inheritance, and static methods, and class variables, and so on. It doesn’t support interfaces, because they don’t make sense in a duck typing language.

Here’s a quick look at a Python object, and at how it might be used for the flatten function earlier.

Every method takes as its first argument a variable referring to the current object, this in Java. Python doesn’t support private and protected access modifiers, except by convention: the convention is that attributes and functions whose name beings with an underscore are considered private, and may be changed in future versions of the library.

The next lines of code are surprising. You can ‘monkey patch’ an object, after it has been created, to change its attributes or give it new attributes. Like so many language features in Python, this is sometimes tremendously handy, and sometimes the source of infuriating bugs.