Exercise: probabilistic neural networks

In this assignment you will first investigate a simple regression model, and then you will use an autoencoder to learn the structure of the MNIST dataset.

What to submit. Your answers should contain an explanation of what you do, and 2–4 central commands to achieve it. The focus of your answer should be interpretation: explain what the numerical values and graphs you produce mean, and why they are as they are. The text of your answer to each question should be no more than a paragraph or two. Marks will be awarded based on the clarity and insight in your explanations.

DO NOT SUBMIT FULL SOURCE CODE, unless it is as an appendix. Do not repeat the question text in your answers. If you submit your answers as a Jupyter notebook, structure the notebook in two sections: a section at the top for the examiner to read with just your answers and trimmed code snippets, and a section at the bottom with all your working code.

What to optimize. This coursework is about the probabilistic interpretation of neural networks, not about neural network design. The networks described here can be trained reasonably well in ten minutes on a low-end laptop. (Also see the appendix for how to load and save your models.) You don't need to optimize your neural network designs, you don't need to run cross-validation for hyperparameters, and you don't need to implement early-stopping etc.

DATASETS

Questions (a) and (b) use the xkcd dataset. Each datapoint is a pair $(x,y)$ of real numbers.

Questions (c)–(f) use the MNIST dataset. This can be obtained with the Python package torchvision. Each datapoint is a pair [numpy_array, int] representing images and labels.

Very many PyTorch building blocks assume that the data comes in batches. The DataLoader converts the full mnist list [(img,lbl),...] into batches [(img_batch,lbl_batch), ...] where each img_batch is an array with an extra dimension prepended.

QUESTIONS

Question (a)

Question (b)

I find that my fitted heteroscedastic model achieves a higher training log likelihood than the RWiggle model, but I also observe that during the course of training its log likelihood frequently dips much lower.

Question (c)

In the appendix is code for the autoencoder described in lecture notes. The generator is for Bernoulli images, and the encoder is Gaussian. Train it on MNIST images.

Question (d)

Modify the autoencoder code so that it uses 100 samples for Monte Carlo approximation. (This isn't helpful for training, but it's a good idea if we want to actually know the log likelihood of a particular datapoint.)

Show some likely and some unlikely images. Can you identify any features that make an image likely or unlikely?

Question (e)

The autoencoder in the appendix defaults to using 4 dimensions for the latent space. Train another version with 20 dimensions. Evaluate them. Explain carefully your grounds for comparison.

Don't worry about making a "fair comparison" in the sense of e.g. putting in equal training effort; simply train for a fixed number of epochs, and treat the resulting neural networks as given. The focus of your answer should be on evaluation metrics.

Optional. Also compare to the PyTorch example autoencoder.

Question (f)

Consider the generative model $X=f(Z,y)$ where $y\in\{0,1,\dots,9\}$ is the image label, and $Z$ is a latent random variable. The hope is that $Z$ should capture the "style" of the digit: thus we could generate a stylistically similar set of digits by fixing $Z$ and varying $y$, or we could generate random samples of a single digit by fixing $y$ and varying $Z$.

Appendix

Wiggly line

Autoencoder

Interactive training

You may find it helpful to save and load trained models:

torch.save(mymodel.state_dict(), 'filename.pt')

mymodel = MyModel()
mymodel.load_state_dict(torch.load('filename.pt'))

I find it helpful to be train, interrupt, inspect, resume training. For this purpose, I set up the model and the data cycler in one cell

mymodel = ...
iter_mnist = enumerate_cycle(mnist_batched, shuffle=False)

and I do the optimization in a second cell

optimizer = optim.Adam(mymodel.parameters())
with Interruptable() as check_interrupted:
    for (epoch,batch_num),(imgs,lbls) in iter_mnist:
        check_interrupted()
        ... # optimization step

This way I can interrupt (Kernel | Interrupt), inspect the state, and resume, and it will pick up exactly where it left off.