Modelling and machine learning¶

Coursework: variational autoencoder¶

DATA¶

This assessment uses the MNIST dataset, which can be obtained with the Python package torchvision. Each datapoint is a pair [numpy_array, int] representing images and labels.

Very many PyTorch building blocks assume that the data comes in batches. The DataLoader converts the full mnist list [(img,lbl),...] into batches [(img_batch,lbl_batch), ...] where each img_batch is an array with an extra dimension prepended.

In [2]:
import torchvision

mnist = torchvision.datasets.MNIST(
    root = 'pytorch-data/',  # where to put the files
    download = True,         # if files aren't here, download them
    train = True,            # whether to import the test or the train subset
    # PyTorch uses PyTorch tensors internally, not numpy arrays, so convert them.
    transform = torchvision.transforms.ToTensor()
)


# Images can be plotted with matplotlib imshow
import torch
import matplotlib.pyplot as plt
show = [mnist[i] for i in [59289, 28001, 35508, 43876, 23627, 14028]]
show = torch.stack([img for img,lbl in show])
x = torchvision.utils.make_grid(show, nrow=6, pad_value=1)
fig,ax = plt.subplots()
ax.imshow(x.numpy().transpose((1,2,0)))
ax.axis('off')
plt.show()

mnist_batched = torch.utils.data.DataLoader(mnist, batch_size=100)

QUESTIONS¶

Question (a) [3 marks]¶

In the appendix is code for the autoencoder described in lecture notes. The generator is for Bernoulli images, and the encoder is Gaussian. Train it on MNIST images.

  • For images [59289, 28001, 35508, 43876], show the image, together with a sample of two reconstructed images from each of them.

  • To image 35508, add noise in varying amounts. Take these noisy images and reconstruct them.

  • Generate 6 random images.

  • Take two images, 28001 and 43876. Create a sequence of images that interpolates between these two, using simple linear interpolation in the latent space.

[Note: This assessment is about the probabilistic interpretation of neural networks, not about neural network design. The autoencoder in the appendix can be trained reasonably well in ten minutes on a low-end laptop. You don't need to optimize the neural network designs, you don't need to run cross-validation for hyperparameters, and you don't need to implement early-stopping etc.]

Question (b) [4 marks]¶

Modify the autoencoder code so that it uses 100 samples for Monte Carlo approximation. (This isn't helpful for training, but it's a good idea if we want to actually know the log likelihood of a particular datapoint.)

Show some likely and some unlikely images. Can you identify any features that make an image likely or unlikely?

[Note: Parts (b) and (c) are invitations to investigate. Masters-level questions are the jumping off point for an investigation, unlike undergraduate questions which are just looking for a solution. If you answer exactly all the questions as they are spelled out in the assignment, and don’t think of any other angles, you won’t get full marks in a masters course. It’s not appropriate to conduct investigations for every single question; but in this assignment parts (b) and (c) invite you to go further.]

Question (c) [4 marks]¶

The autoencoder in the appendix defaults to using 4 dimensions for the latent space. Train another version with 20 dimensions. Evaluate them. Explain carefully your grounds for comparison.

[Note: Don't worry about making a "fair comparison" in the sense of e.g. putting in equal training effort; simply train for a fixed number of epochs, and treat the resulting neural networks as given. The focus of your answer should be on evaluation metrics.]

Optional. Also compare to the PyTorch example autoencoder.

Question (d) [9 marks]¶

Consider the generative model $X=f(Z,y)$ where $y\in\{0,1,\dots,9\}$ is the image label, and $Z$ is a latent random variable. The hope is that $Z$ should capture the "style" of the digit: thus we could generate a stylistically similar set of digits by fixing $Z$ and varying $y$, or we could generate random samples of a single digit by fixing $y$ and varying $Z$.

  • Given $x$ and $Y=y$, what is the perfect sampling distribution $\tilde{Z}$?

  • Design an autoencoder for this generative model, and implement it, and train it. Illustrate its output by showing four different stylistically-similar sets of digits.

[Note: Explain concisely how your code works. You must give enough detail that the reader can reproduce what you did, but you should not give any boilerplate code nor details that a reader can be expected to know. Do not expect the reader to read your code.]

Appendix¶

Autoencoder¶

In [ ]:
class BernoulliImageGenerator(nn.Module):
    def __init__(self, d=4):
        super().__init__()
        self.d = d
        self.f = nn.Sequential(
            nn.Linear(d, 128),
            nn.LeakyReLU(),
            nn.Linear(128, 1728),
            nn.LeakyReLU(),
            nn.Unflatten(1, (12,12,12)), # -> B×12×12×12
            nn.Conv2d(12, 36, 3, 1),     # -> [B×36×10×10]
            nn.LeakyReLU(),
            nn.Flatten(1),               # -> [B×3600]
            nn.Unflatten(1, (4,30,30)),  # -> [B×4×30×30]
            nn.Conv2d(4, 4, 3, 1),       # -> [B×4×28×28]
            nn.LeakyReLU(),
            nn.Conv2d(4, 1, 1, 1),       # -> [B×1×28×28]
            nn.Sigmoid()
        )

    def forward(self, z):
        return self.f(z)

    def loglik(self, x, z):
        xr = self(z)
        return (x*torch.log(xr) + (1-x)*torch.log(1-xr)).sum((1,2,3))


class GaussianEncoder(nn.Module):
    def __init__(self, decoder):
        super().__init__()
        self.d = decoder.d
        self.f = decoder
        self.g = nn.Sequential(
            nn.Conv2d(1, 32, 3, 1),
            nn.LeakyReLU(),
            nn.Conv2d(32, 64, 3, 1),
            nn.MaxPool2d(2),
            nn.Flatten(1),
            nn.Linear(9216, 128),
            nn.LeakyReLU(),
            nn.Linear(128, self.d*2)
        )

    def forward(self, x):
        μτ = self.g(x)
        μ,τ = μτ[:,:self.d], μτ[:,self.d:]
        return μ, torch.exp(τ/2)

    def loglik_lb(self, x):
        μ,σ = self(x)
        kl = 0.5 * (μ**2 + σ**2 - torch.log(σ**2) - 1).sum(1)
        ε = torch.randn_like(σ)
        ll = self.f.loglik(x, z=μ+σ*ε)
        return ll - kl