In this tick we'll implement a simple epidemic simulator.
In our simulator, each person's state is either Susceptible, or Infected, or Protected (protected either by vaccine or by having recovered from infection). We will implement an agent-based simulator in which we keep track of the state of every person; this is in contrast to aggregate equation-based simulators in which we only keep track of the total number of people in each state.
Our simulator will be a discrete-time simulator, one timestep per day. Each day,
prob_socialize
) or to isolate.prob_infect_susceptible
or prob_infect_protected
depending on their state.prob_recover
, and become Protected.A simple probability calculation shows that the average number of days a person is Infected is days = 1/prob_recover
. The average number of people that a single infected individual would infect during the course of their infection, in an entirely Susceptible population, is R0 = days * prob_socialize * prob_infect_susceptible
.
In tick 1a we will use this simulator to look at how vaccination rate affects the R number. Then in tick 1b we'll modify the simulator slightly, to use location-based grouping rather than random grouping, in order to see how restricting travel affects the R number.
np.vectorize
either. For any other iteration, e.g. over simulated timesteps, or over simulation parameters, it's a good idea to use 'for' loops or list comprehension.
This epidemic simulator is a very simple model!
There are all sorts of ways we might make it more complex: we could allow larger groups, and multiple interactions per day; we could imagine that each person has their own proclivities for socializing and group size; we could allow infectivity to vary from person to person or day to day; and so on. But it's generally not a good idea to make a model more complex than it has to be to answer the question at hand. The more complex the model, the more parameters there are to pick values for, and the harder it is to (a) fit the parameters to the real world, (b) come up with a useful answer. And a great deal of useful real-world epidemiology is done with even simpler models, namely aggregate equation-based simulators.
Our simulator is just about the simplest possible simulator that will let us investigate the question of restricting travel. The goal of a simple simulator like this isn't to make precise numerical forecasts. It's to learn qualitatively which factors are important and which factors are unimportant — in other words, to learn which factors it would be useful to incorporate into a more complex simulator and which factors we can leave out.
Each question comes with instructions on how to submit your answer. Here's a video explanation, if you prefer videos.
from IPython.display import YouTubeVideo
YouTubeVideo('2Tud7FXhYP4', width=560, height=315, start=303)
You should follow Marie Kondo's advice: "If a line of code does not spark joy, delete it."
You are free to structure your code however you like — you don't have to give your functions the names and arguments that this notebook describes. Indeed, as you declutter your notebook, you may find it helpful to rewrite it as a single simulator function that can be used for both tick 1a and 1b.
from IPython.display import YouTubeVideo
YouTubeVideo('2Tud7FXhYP4', width=560, height=315, start=490)
Question (q1). The model needs us to randomly group the people who socialize into pairs. Suppose there are $M$ of them. We can group them into pairs
as follows. First, randomly permuting the vector $[0,1,\dots,M-1]$. Let m1
consist of the first $M//2$ items and m2
consist of the next $M//2$ items. Interpret this as "m1[i]
is paired with m2[i]
". We're using $M//2$ rather than $M/2$ in case $M$ is odd, in which case there's one person left over who doesn't join in a pair.
Write a function pairs(M)
that returns the tuple (m1,m2)
. For example if you run pairs(6)
you might get the output
(array[(3, 0, 1)], array([2, 4, 5]))
To submit your answer,
import ucamcl
GRADER = ucamcl.autograder('https://markmy.solutions', course='scicomp')
GRADER1a = GRADER.subsection('tick1a')
q = GRADER1a.fetch_question('q1')
m1,m2 = pairs(q.n)
ans = {'n': len(np.unique(np.concatenate([m1,m2]))), 's': np.std(np.abs(m1-m2))}
GRADER1a.submit_answer(q, ans)
Question (q2). Let popn
be a vector with one entry per person in the population, with values 0
for Susceptible, 1
for Protected, 2
for Infected. Write a function is_exposed(popn, m1, m2)
which returns a boolean vector of the same length as popn
, indicating whether each person was in a group with an Infected person. For example,
is_exposed(np.array([0,0,2,0,0,1]), np.array([3,0,1]), np.array([2,4,5]))
should return
array([False, False, True, True, False, False])
To submit your answer,
q = GRADER1a.fetch_question('q2')
ans = is_exposed(q.popn, q.m1, q.m2)
GRADER1a.submit_answer(q, ans)
Question (q3). Write a function update(popn, prob)
to simulate one day. Here prob
is a dictionary, e.g.
prob = {'socialize': 0.8, 'infect_susceptible': 0.6, 'infect_protected': 0.1, 'recover': 0.1}
Your function should return a pair (popn, new_infected)
where popn
is the updated population vector and new_infected
is the number of people who became infected this day.
To submit your answer,
q = GRADER1a.fetch_question('q3')
popn = np.where(np.arange(q.n) < q.i0, 2, 0)
popn2,new_infected = update(popn, q.prob)
ans = {'i':np.sum(popn2==2), 'ni':new_infected}
GRADER1a.submit_answer(q, ans)
Question (q4). Write a function sim(N, T, i0, p0, prob)
to simulate T
days on a population of size N
, of whom i0
are Infected and p0
are Protected at the beginning of day 0. It should return a matrix with T
rows and 4 columns. The four columns of row $i$ should be (1) the number of
Susceptible, (2) Protected, and (3) Infected people at the end of day $i$, as well as (4) the number of new infections that day.
To submit your answer,
q = GRADER1a.fetch_question('q4')
x = sim(N=q.N, T=q.T, i0=q.i0, p0=q.p0, prob=q.prob)
ans = x[-1,1] / q.N
GRADER1a.submit_answer(q, ans)
Question (q5). Simulate a population of N=50000
over T=200
days with i0=100
and p0=30000
, with the probabilities as in question (q3). Plot the percentage of the population in each of the three states, with day number on the horizontal axis. Also plot the number of new infections each day.
You don't have to submit your plot to the autograder, but you must include it in your submitted notebook. Your plots should look something like this. In your plots, you should pay attention to (1) making sure the two plots share the same x-axis, (2) colouring the lines and showing a legend. Don't worry about fine-grained control of the plot, such as legend placement or plot size or colour schemes.
Question (q6). Repeat the simulation from question (q5), but with p0=0
, with p0=15000
, and with p0=30000
. Run five simulations at each value of p0
. Plot the percentage of the population infected.
Again, you don't have to submit your plot to the autograder, but you must include it in your submitted notebook. Your plots should look something like this.
Question (q7). A simple way to estimate the R number is R=days*N/D
where N
is the total number of new infections in a window of time, where D
is total number of infected-person-days, i.e. what you get by summing the number of infected people at the start of each day in the window, and
where
days=1/prob_recover
is the average number of days a person is infected.
Write a function
r(x, t0, t1, days)
to compute the R number, where x
is the matrix produced by your simulator, and N
and D
are computed over days t0,t0+1,...,t1-1
.
To submit your answer,
GRADER1b = GRADER.subsection('tick1b')
q = GRADER1b.fetch_question('q7')
ans = r(np.array(q.x), t0=q.t0, t1=q.t1, days=q.days)
GRADER1b.submit_answer(q, ans)
Question (q8). Compute the average R number for each of the three scenarios in question (q6), averaged over the five simulation runs, over the window t0=7, t1=14
.
There is nothing to submit for this question. You code should produce roughly these R numbers.
p0 |
R |
---|---|
0 | 4.36 |
15000 | 3.45 |
30000 | 2.34 |
Your simulator is random,
so you'll get slightly different answers each time you run it. Your answers should lie in these 95% confidence intervals:
4.36 $\pm$ 0.04, 3.45 $\pm$ 0.05, 2.35 $\pm$ 0.07.
Question (q9). In these simulations, the epidemic grows exponentially, reaches a peak, then dies down. In the UK however, over summer 2021, COVID levels seemed to reach a level and fluctuate around that level. What might explain this? One possible answer is that it's due to limiting social interactions: if people mainly socialize in a small social circle, then the epidemic will burn through the population steadily rather than explosively — like a candle rather than like gunpowder.
Write a function localpairs(M, σ)
which works as follows. Let each person $i$ go to location $i + X_i$ where $X_i$ is a Gaussian random variable with mean 0 and standard deviation σ
. (Such a random variable can be generated using np.random.normal(loc=0, scale=σ)
. Generate new random variables each day.) Then sort people in order of location. The first two people in this sorted list form a pair, the second two form a pair, and so on. Thus, if σ
is small then people will mostly socialize with the same people, and if σ
is very large then it will be like the full random mixing from tick 1a.
To submit your answer,
q = GRADER1b.fetch_question('q9')
m1,m2 = localpairs(q.n, q.σ)
ans = {'n': len(np.unique(np.concatenate([m1,m2]))), 's': np.std(np.abs(m1-m2))}
GRADER1b.submit_answer(q, ans)
Question (q10). Modify the simulator to use localpairs
. Run simulations with the same parameters as in question (q5) but with p0=0
and σ
either 5, or 20, or 100. Run five simulations for each value of σ
. Plot the percentage of the population infected.
Again, you don't have to submit your plot to the autograder, but you must include it in your submitted notebook. Your plots should look something like this.
Question (q11). Compute the R number for a simulation run.
To submit your answer,
q = GRADER1b.fetch_question('q11')
x = sim_with_locality(N=q.N, T=q.t1, i0=q.i0, p0=q.p0, prob=q.prob, σ=q.σ)
ans = r(x, t0=q.t0, t1=q.t1, days=1/q.prob['recover'])
GRADER1b.submit_answer(q, ans)