Coursework 3: probabilistic neural networks

In this assignment, you will use an autoencoder to learn the structure of the MNIST dataset.

What to submit. Your answers should contain an explanation of what you do, and 2–4 central commands to achieve it. Complete listings are unnecessary. The focus of your answer should be interpretation: explain what the numerical values and graphs you produce mean, and why they are as they are. The text of your answer to each question should be no more than a paragraph or two.

What to optimize. This coursework is about the probabilistic interpretation of neural networks, not about neural network design. The networks described here can be trained reasonably well in 5 epochs, which takes tens of minutes on a low-end laptop. (Also see the appendix for how to load and save your models.) You don't need to optimize your neural network designs, you don't need to run cross-validation for hyperparameters, and you don't need to implement early-stopping etc.

Data import

The MNIST dataset can be obtained with the Python package torchvision. Each datapoint is a pair [numpy_array, int] representing images and labels.

Very many PyTorch building blocks assume that the data comes in batches. The DataLoader converts the full mnist list [(img,lbl),...] into batches [(img_batch,lbl_batch), ...] where each img_batch is an array with an extra dimension prepended.

Question (a)

In the appendix is code for the autoencoder described in lecture notes. The generator is for Bernoulli images, and the encoder is Gaussian. Train it on MNIST images.

Question (b)

Modify the autoencoder code so that it uses 100 samples for Monte Carlo approximation. (This isn't helpful for training, but it's a good idea if we want to actually know the log likelihood of a particular datapoint.)

Show some likely and some unlikely images. Can you identify any features that make an image likely or unlikely?

Question (c)

The autoencoder in the appendix defaults to using 4 dimensions for the latent space. Train another version with 20 dimensions. Evaluate them. Explain carefully your grounds for comparison.

Don't worry about making a "fair comparison" in the sense of e.g. putting in equal training effort; simply train for a fixed number of epochs, and treat the resulting neural networks as given. The focus of your answer should be on evaluation metrics.

Optional. Also compare to the PyTorch example autoencoder.

Question (d)

Consider the generative model $X=f(Z,y)$ where $y\in\{0,1,\dots,9\}$ is the image label, and $Z$ is a latent random variable. The hope is that $Z$ should capture the "style" of the digit: thus we could generate a stylistically similar set of digits by fixing $Z$ and varying $y$, or we could generate random samples of a single digit by fixing $y$ and varying $Z$.

Question (e)

Explain how you might implement an autoencoder for random sequences, such as the letters of a name. You need only describe the design; you do not need to implement it.

This is an open ended question, and there are diverse opinions about the right way to do it. In your answer you should describe the issues, but you shouldn't write more than two paragraphs.

Appendix

Autoencoder

Interactive training

You may find it helpful to save and load trained models:

torch.save(mymodel.state_dict(), 'filename.pt')

mymodel = MyModel()
mymodel.load_state_dict(torch.load('filename.pt'))

I find it helpful to be train, interrupt, inspect, resume training. For this purpose, I set up the model and the data cycler in one cell

mymodel = ...
iter_mnist = enumerate_cycle(mnist_batched, shuffle=False)

and I do the optimization in a second cell

optimizer = optim.Adam(mymodel.parameters())
with Interruptable() as check_interrupted:
    for (epoch,batch_num),(imgs,lbls) in iter_mnist:
        check_interrupted()
        ... # optimization step

This way I can interrupt (Kernel | Interrupt), inspect the state, and resume, and it will pick up exactly where it left off.