Stoat-finding particle filter

Our friend Data Stoat has gone missing! The GPS sensor that they normally carry has stopped working. But Data still has a low-res camera with mobile uplink, so we know what sort of scenery they're in. Can you help find Data Stoat?

Submit your answer on Moodle, in the form of a greyscale png file the same dimensions as the map. Your png should be a heatmap, where each pixel is coloured according to the probability that Data Stoat is in that location, white = highest probability, black = zero probability. I will normalize the pixels to form a proper probability distribution, and your score will be the probability you assign to Data Stoat's true location.

Hidden Markov models

The underlying probability model is a hidden Markov model, where $X_n$ is the location at timestep $n$ and $Y_n$ is the noisy observation. We wish to compute the distribution of $X_n$ given observations $(y_0,\dots,y_n)$.

$$ \begin{eqnarray} &X_0& \to &X_1& \to &X_2& \to \cdots\\ &\downarrow& &\downarrow& &\downarrow&\\ &Y_0& &Y_1& &Y_2& \end{eqnarray} $$

We can find this distribution iteratively. Suppose we know the distribution of $(X_{n-1} | y_0,\dots,y_{n-1})$. Then we can find the distribution of $(X_n | y_0,\dots,y_{n-1})$ using the law of total probability and memorylessness,

$$ \Pr(x_n|y_0,\dots,y_{n-1}) = \sum_{x_{n-1}} \Pr(x_n |x_{n-1}) \Pr(x_{n-1}|y_0,\dots,y_{n-1}) $$

and then we can find the distribution of $(X_n | y_0,\dots,y_n)$ using Bayes's rule and memorylessness,

$$ \Pr(x_n|y_0,\dots,y_{n-1},y_n) = \text{const}\times\Pr(x_n|y_0,\dots,y_{n-1}) \Pr(y_n|x_n). $$

The method is laid out in Example Sheet 4, which you should attempt first, before coding. The example sheet asks you to solve the equations exactly, producing a vector of probabilities, $\Pr(x_n|y_0,\dots,y_n)$ for each location $x_n$ on the map. But when the map is large, then it's not practical to compute the sum nor the constant. Instead, we can use an empirical approximation.

Empirical approximation

The idea of empirical approximation is that, instead of working with probability distributions, we work with weighted samples, and we choose the weights so as approximate the distribution we're interested in. Formally, if we want to approximate the distribution of a random variable $Z$, and we have a collection of points $z_1,\dots,z_n$, we want weights so that $$ \mathbb{P}(Z\in A) \approx \sum_{i=1}^n w_i 1_{z_i\in A}. $$ We've used weighted samples in two ways in the course:

This practical will use an empirical approximation technique called a particle filter. In the particle filter, we'll use weighted samples to approximate the distributions of $(X_n | y_0,\dots,y_{n-1})$ and of $(X_n|y_0,\dots,y_n)$. Each sample is called a particle.

Data import

We have data from several animals which are wandering over a terrain. The animals are equipped with GPS and with cameras, but the GPS on animal 0 has stopped working. We would like to find out where this animal is.

Here is the terrain and the GPS+camera data. The camera records roughly the rgb values of the animal's current location, though there is some noise.

Coding note. Given an image map_image, we can treat it as a simple array, and get at the pixels with map_image[i,j]. I'll interpret this as x-coordinate i and y-coordinate j. However, the plotting routine ax.imshow interprets the first coordinate as y and the second as x, so I need to transpose the array before plotting it.

0. Empirical approximation of $X_0$

We'll start with a prior belief that the animal's location $X_0$ is uniformly distributed over the map.

The first step is to create a sample from this initial distribution. We'll store it as an $M\times 3$ array, one row per particle, with columns for $x$ coordinate, $y$ coordinate, and weight $w$. The prior distribution is uniform, so $w=1/M$. We'll call this matrix δ0.

Here's also a handy function to visualize the particles, use show_particles.

1. Empirical approximation of $(X_0|y_0)$: updated weights

Let's first find the distribution of $(X_0|y_0)$, where $y_0$ is the first observation. Bayes's rule tells us the distribution:

$$ \operatorname{Pr}_{X_0}(x|y_0)=\text{const}\times \operatorname{Pr}_{X_0}(x)\Pr(y_0|X_0=x). $$

We've seen the empirical approximation for this, in lecture notes on computational Bayes's rule. We take a sample from $X_0$, and we set the weight of sample point $p$ to be proportional to $\Pr(y_0|X_0=p)$. We already build ourselves a sample from $X_0$, in the previous section. We'll call our new weighted particles π0.

To apply Bayes's rule, we need a probability model for $Y_0$ given a particle's location. A reasonable guess is that $Y_0$ is a noisy version of the colour patch around the supposed location. Here's a handy utility to extract the average colour of a patch:

#TODO: TASK 1

Implement a pr(y,loc) function to compute the probability of observing y if the true location is loc.

It's up to you to invent the probability model. A reasonable probability model is that the observed rgb values in y are independent Gaussian random variables, with mean patch(map_image, loc), and with a standard deviation that you should pick. If you want to be cleverer, consider a probability model based on closeness in HSL colour space.

Now we can set the weight of all the particles, and thereby obtain a sample of $(X_0|y_0)$.

2. Empirical approximation of $(X_1|y_0)$: wandering particles

The next step is to find the distribution of the animal's location, after a timestep. Formally, we want to find the distribution of $(X_1|y_0)$.

We believe that the animal's location follows a Markov chain, in other words, that $X_1$ is generated from $X_0$ according to some random transition. It's easy to generate a sample of $X_1$ values: just take a sample of $X_0$ values, and apply a random transition to each of of the particles. Likewise, if we have a weighted sample representing $(X_0|y_0)$, and we apply a random transition to each particle, and leave the weights unchanged, we'll get a weighted sample representing $(X_1|y_0)$.

We'll now compute a weighted sample for $(X_1|y_0)$, and call it δ1. To do this, we need a probability model for how the animal moves in each timestep.

#TODO: TASK 2

Implement a walk(loc) function to simulate the animal's movement in one timestep.

A reasonable probability model is that the animal chooses a direction uniformly in the range $[0,2\pi)$, and then chooses a random distance, for example an Exponential random variable with mean 5. And then truncate the position to ensure it lies on the map — otherwise the patch function won't work.

Now we can apply this movement to all the particles, and thereby obtain a sample of $(X_1|y_0)$.

3. Iterate

Now we can apply these two update steps iteratively, updating the sample based on successive observations. The two steps of the iteration are

  1. Given a weighted sample $(X_{n-1}|y_0,\dots,y_{n-1})$, get a weighted sample of $(X_n|y_0,\dots,y_{n-1})$ by making each particle take a random step.
  1. Given a weighted sample $(X_n|y_0,\dots,y_{n-1})$, get a weighted sample of $(X_n|y_0,\dots,y_n)$ by multiplying the weight of each particle $p$ by $\Pr(y_n|X_n=p)$ and then rescaling weights.

(Step 2 is a generalization of the Bayes update rule. The simple Bayes update rule is for when we start with a uniformly-weighted sample from the prior, and this version applies when we start with a weighted sample from the prior.)

When you run this, you will likely find that the output is completely useless! The problem is numerical instability. We're only using 2000 samples, and many of these samples get assigned a weight that is almost zero, so we end up with a tiny sample.

#TODO: TASK 3

Plot a histogram of particle weights, after 0, 1, 5, 50, and 100 timesteps.

4. Particle hygiene

Empirical approximations work better when the weights are reasonably well distributed. There's nothing stopping us from modifying our set of particles to balance out the weights, as long as we don't mess up the empirical approximation. For example, we could take a particle with excessively high weight, and split it into two particles each with half the weight. After the next walk step, those two particles will diverge, and so we'll hopefully get a good spread.

#TODO: TASK 4

Implement a function prune_spawn(particles). This should delete the lowest-weighted 20% of the particles. Then it should take the highest-weighted 20% of the particles, and split them in two. In other words it should create a duplicate at the same location, and give both the original and the duplicate half the weight. The two versions will diverge in the future, as they take different steps.

Apply this function every iteration, and show an animation of the result.

After processing 100 observations, you should see something like the picture in the cell below.

#TODO: TASK 5

Smooth your predictions in whatever way you think is appropriate. Create a greyscale png file the same dimensions as the map. Your png should be a heatmap, where each pixel is coloured according to the probability distribution of the final hidden state $X_N$, white = highest probability, black = zero probability.

(For scoring, the pixel values will be normalized to sum to one, so it doesn't matter what value you assign to the color white.)