Tick 1: numpy

We'll model a population of $N$ individuals, each of them either Susceptible, or Infected, or Recovered. Once Recovered, they do not become Infected again. Each day, each person goes to a locus. If there are any Infected people in a locus, each Susceptible person at that locus has a probability prob_infect of becoming Infected. At the end of the day, each person who was Infected at the start of the day Recovers with probability prob_recover. (A simple probability calculation shows that the average number of days a person is infected is 1/prob_recover.)

The spread of the epidemic depends on the number of people gathering in each locus. If some loci have many people, the epidemic will spread quicker. If loci are limited to a small number of people, the epidemic will spread slower. We'll model loci as having random numbers of people, we'll assign people randomly to loci, and we'll generate a new set of loci each day.

This assignment tests your vectorized thinking. You will be asked to run simulations on a population of tens of thousands of individuals, in tens of thousands of locations, over many timesteps. YOUR CODE MUST USE NUMPY VECTORIZED OPERATIONS to iterate over the population or over loci. Do not use Python 'for' loops or list comprehensions, and do not use np.vectorize either. For any other iteration, e.g. over simulated timesteps, or over simulation parameters, it's a good idea to use 'for' loops or list comprehension.

Numpy warmup exercises - not assessed

These are optional warmup exercises, to get you used to numpy. Use the following autograder settings:

import ucamcl
GRADER = ucamcl.autograder('https://markmy.solutions', course='scicomp').subsection('ex1')

For these exercises, there is no answer for you to submit. Instead, the autograder shows a model answer. Use it like this:

q = GRADER.fetch_question('q1')
print(q)

Question (q1) from section 2.2. Here is some standard Python code:

import math, random
x = random.uniform(-1, 1)
y = random.uniform(-1, 1)
d = math.sqrt(x**2 + y**2)

We'd like to repeat this a million times, and find the mean and standard deviation of the d values. Implement this using numpy vectorized code.

Question (q2) from section 2.3. Look up the help for np.argmax. Generate the vectors

x = np.linspace(.1,10,1000)
a = x**2
b = 0.01*(np.exp(x)-1)

and find the first x value at which a<b.

Question (q3) from section 2.3. I have a list of names which I can sort alphabetically using np.argsort,

names = np.array(['alexis','chloe','guarav','shay','adrian','rebecca'])
i = np.argsort(names)
names_rank = np.char.add(names[i], np.arange(1,7).astype(str))

How can I put names_rank back in the original order?

Question (q4) from section 3. For a numpy matrix a, what is the relationship between a.shape and len(a)?

Question (q5) from section 3. Look up the numpy help for np.arange and reshape, and use these functions to produce the $3\times5$ matrix $$ b = \left( \begin{matrix} 1 & 2 & 3 & 4 & 5\\ 6 & 7 & 8 & 9 & 10\\ 11 & 12 & 13 & 14 & 15 \end{matrix} \right) $$

Look up the help for np.sum, and compute the length-5 vector of column sums and the length-3 vector of row sums.

Question (q6) from section 3. Find two different ways to use numpy to create the column vector [[1],[2],...,[n]].

Question (q7) from section 3. A permutation matrix is a square matrix of 0s and 1s, where each row contains exactly one 1, and each column likewise. (The code snippet in section 2.3 of notes, for 'advanced indexing', creates a $3\times3$ permutation matrix.)

Write code to generate a random $n\times n$ permutation matrix.

Question (q8) from section 2.2. In a Exercise ex0 Question q6 you wrote a Pythonic simulator for a queue, based on the recursion $$ q_{t+1} = \max(q_t + a_t - C, 0). $$ It can be proved that another way to get the same answer is with the formula $$ q_t = q_0 + x_t - \min(0, y_t) $$ where $$ x_t = \sum_{u=0}^{t-1} (a_u-C) \quad\text{and}\quad y_t = \min_{1 \leq u \leq t} (q_0 + x_u). $$ Given a vector $a=[a_0,a_1,\dots,a_{t-1}]$,

Questions 1a - worth 1 mark

GRADER = ucamcl.autograder('https://markmy.solutions', course='scicomp').subsection('tick1a')

Question (q1). Here is code to generate a vector of $N$ sizes for loci.

def random_sizes(N, avg_size): return np.random.geometric(1/avg_size, size=N)

We don't actually want a vector of $N$ sizes, we want a vector of sizes that adds up to $N$. Write a function trunc_sizes(s) that truncates the vector returned by random_sizes, and reduces the size of the final locus if necessary.

q = GRADER.fetch_question('q1')
ans = trunc_sizes(q.s)
GRADER.submit_answer(q, np.array(ans))

Question (q2). Let loc_sizes be a vector of locus sizes that sums up to $N$, and let it have length $M$. Write a function person_in(loc_sizes) that computes a length-$N$ vector, containing the index of the locus that each person is in, as in the diagram below. Write another function first_occ(loc_sizes) that computes a length-$M$ vector, containing the index of the first person in each locus.

q = GRADER.fetch_question('q2')
ans = {'person_in': person_in(q.loc_sizes), 'first_occ': first_occ(q.loc_sizes)}
GRADER.submit_answer(q, ans)

Question (q3). Let infected be a vector of length $N$ consisting of 0s and 1s. Write a function exposure(infected,loc_sizes) that returns a vector of length $N$, where each entry $i$ counts the number of infected people in the locus that person $i$ is in. Assume that people are assigned to loci sequentially, as in the diagram above—do not implement shuffling at this stage.

q = GRADER.fetch_question('q3')
ans = exposure(np.array(q.infected), q.loc_sizes)
GRADER.submit_answer(q, ans)

Question (q4). Write a function

sim(N, T, n0, avg_loc_size=2.1, prob_infect=0.07, prob_recover=1/10)

which runs the simulator on a population of size $N$ for $T$ days, where $n_0$ is the number of people who were initially infected, prior to the first simulated day. Remember that people are assigned randomly to loci.

Your simulator should return a matrix with $T$ rows and 4 columns, which record respectively for each day (1) the number of new infections that day, (2) the number of recoveries that day, (3) the total number currently infected at the end of that day, (4) the total number recovered by the end of that day.

q = GRADER.fetch_question('q4')
x = sim(N=q.N, T=q.T, ...) # fill in the rest from q
ans = x[-1,3]/q.N
GRADER.submit_answer(q, ans)

Question (q5). Simulate a population of $N=50,000$ over $T=200$ days with $n_0=200$. Plot the percentage of the population that been infected, and that has recovered, with day number on the x-axis. Also plot the number of currently infected people, and the number of new infections on each day.

You don't have to submit your plot to the autograder, but you must include it in your submitted notebook. Your plot should look something like this. In your plot, you should pay attention to (1) making sure the plots share the same x-axis, (2) colouring the lines and showing a legend. Don't worry about fine-grained control of the plot, such as legend placement or plot size or colour scheme.

Questions 1b - worth 1 mark

GRADER = ucamcl.autograder('https://markmy.solutions', course='scicomp').subsection('tick1b')

Question (q6). A simple way to estimate the R number is R=days*N/D where N is the total number of new infections, where D is total number of infected-person-days, i.e. what you get by summing over days the number of infected people at the start of each day, and where days=1/prob_recover is the average number of days a person is infected. Write a function

r(X, t0, t1, days)

to compute the R number, where X is the matrix produced by your simulator, and N and D are computed over days t0,t0+1,...,t1-1.

q = GRADER.fetch_question('q6')
ans = r(np.array(q.X), t0=q.t0, t1=q.t1, days=q.days)
GRADER1b.submit_answer(q, ans)

Question (q7). We wish to model household bubbles separately from public loci. We'll treat household bubbles as just another set of loci, but let each person go to the same household-bubble locus every day. Bubble sizes can be generated using the distribution of household size in the UK:

def bubble_sizes(N, num_households):
    return np.random.choice([1,2,3,4,5,6], p=[.29,.35,.15,.14,.05,.02], size=(num_households,N)).sum(0)

Modify your simulator so that the $n_0$ initially infected people are the first in the list (to reflect the fact that infections are likely to be clustered in households), and thereafter, on each day,

  1. people go to random public loci as before, and may become infected (infection probability prob_infect)
  2. people go to their home bubbles and may become infected (infection probability prob_infect_home)
  3. some people may recover, as before
q = GRADER1b.fetch_question('q7')
x = sim2(N=q.N, T=q.T, n0=q.n0, ...) # fill in the rest from q
ans = x[-1,3] / q.N
GRADER1b.submit_answer(q, ans)

Question (q8). Simulate a population of $N=100,000$ over $T=200$ days with parameters $n_0=2000$, prob_infect_home=.2, prob_infect=.01, avg_loc_size=2.1, and prob_recover=1/10, for bubble sizes 1,2,3,4. Run five different simulations for each bubble size. Plot the percentage of the population that has been infected, and the number of currently infected people, on each day.

You don't have to submit your plot to the autograder, but you must include it in your submitted notebook. Your plot should look something like this. In your plot, you should pay attention to (1) making sure the plots share the same x-axis, (2) colouring the lines and showing a legend. Don't worry about fine-grained control of the plot, such as legend placement or plot size or colour scheme.

Question (q9). Run your simulator for bubble sizes 1, 2, 3, 4, and compute the R number as measured over timespan t0=10 to t1=25 to capture the initial wave.

ans = ... # array of four numbers, the R number of bubble sizes [1,2,3,4]
GRADER1b.submit_answer(GRADER1b.fetch_question('q9'), ans)

Investigation - not assessed

For 1a: Try different values for average locus size, and plot the R number as a function of average locus size. Try different distributions of locus size, and again plot R as a function of average locus size. Can you suggest an approximate formula for how R depends on average locus size? How about how R depends on average locus size and on prob_infect?

For 1b: Can you suggest an approximate formula for how R depends on average number of people in a bubble, and on prob_infect_home?