This assessment uses the MNIST dataset, which can be obtained with the Python package torchvision
. Each datapoint is a pair [numpy_array, int]
representing images and labels.
Very many PyTorch building blocks assume that the data comes in batches.
The DataLoader
converts the full mnist list [(img,lbl),...]
into batches
[(img_batch,lbl_batch), ...]
where each img_batch
is an array with an extra dimension prepended.
import torchvision
mnist = torchvision.datasets.MNIST(
root = 'pytorch-data/', # where to put the files
download = True, # if files aren't here, download them
train = True, # whether to import the test or the train subset
# PyTorch uses PyTorch tensors internally, not numpy arrays, so convert them.
transform = torchvision.transforms.ToTensor()
)
# Images can be plotted with matplotlib imshow
import torch
import matplotlib.pyplot as plt
show = [mnist[i] for i in [59289, 28001, 35508, 43876, 23627, 14028]]
show = torch.stack([img for img,lbl in show])
x = torchvision.utils.make_grid(show, nrow=6, pad_value=1)
fig,ax = plt.subplots()
ax.imshow(x.numpy().transpose((1,2,0)))
ax.axis('off')
plt.show()
mnist_batched = torch.utils.data.DataLoader(mnist, batch_size=100)
In the appendix is code for the autoencoder described in lecture notes. The generator is for Bernoulli images, and the encoder is Gaussian. Train it on MNIST images.
For images [59289, 28001, 35508, 43876]
, show the image, together with a sample of two reconstructed images from each of them.
To image 35508, add noise in varying amounts. Take these noisy images and reconstruct them.
Generate 6 random images.
Take two images, 28001 and 43876. Create a sequence of images that interpolates between these two, using simple linear interpolation in the latent space.
[Note: This assessment is about the probabilistic interpretation of neural networks, not about neural network design. The autoencoder in the appendix can be trained reasonably well in ten minutes on a low-end laptop. You don't need to optimize the neural network designs, you don't need to run cross-validation for hyperparameters, and you don't need to implement early-stopping etc.]
Modify the autoencoder code so that it uses 100 samples for Monte Carlo approximation. (This isn't helpful for training, but it's a good idea if we want to actually know the log likelihood of a particular datapoint.)
Show some likely and some unlikely images. Can you identify any features that make an image likely or unlikely?
[Note: Parts (b) and (c) are invitations to investigate. Masters-level questions are the jumping off point for an investigation, unlike undergraduate questions which are just looking for a solution. If you answer exactly all the questions as they are spelled out in the assignment, and don’t think of any other angles, you won’t get full marks in a masters course. It’s not appropriate to conduct investigations for every single question; but in this assignment parts (b) and (c) invite you to go further.]
The autoencoder in the appendix defaults to using 4 dimensions for the latent space. Train another version with 20 dimensions. Evaluate them. Explain carefully your grounds for comparison.
[Note: Don't worry about making a "fair comparison" in the sense of e.g. putting in equal training effort; simply train for a fixed number of epochs, and treat the resulting neural networks as given. The focus of your answer should be on evaluation metrics.]
Optional. Also compare to the PyTorch example autoencoder.
Consider the generative model $X=f(Z,y)$ where $y\in\{0,1,\dots,9\}$ is the image label, and $Z$ is a latent random variable. The hope is that $Z$ should capture the "style" of the digit: thus we could generate a stylistically similar set of digits by fixing $Z$ and varying $y$, or we could generate random samples of a single digit by fixing $y$ and varying $Z$.
Given $x$ and $Y=y$, what is the perfect sampling distribution $\tilde{Z}$?
Design an autoencoder for this generative model, and implement it, and train it. Illustrate its output by showing four different stylistically-similar sets of digits.
[Note: Explain concisely how your code works. You must give enough detail that the reader can reproduce what you did, but you should not give any boilerplate code nor details that a reader can be expected to know. Do not expect the reader to read your code.]
class BernoulliImageGenerator(nn.Module):
def __init__(self, d=4):
super().__init__()
self.d = d
self.f = nn.Sequential(
nn.Linear(d, 128),
nn.LeakyReLU(),
nn.Linear(128, 1728),
nn.LeakyReLU(),
nn.Unflatten(1, (12,12,12)), # -> B×12×12×12
nn.Conv2d(12, 36, 3, 1), # -> [B×36×10×10]
nn.LeakyReLU(),
nn.Flatten(1), # -> [B×3600]
nn.Unflatten(1, (4,30,30)), # -> [B×4×30×30]
nn.Conv2d(4, 4, 3, 1), # -> [B×4×28×28]
nn.LeakyReLU(),
nn.Conv2d(4, 1, 1, 1), # -> [B×1×28×28]
nn.Sigmoid()
)
def forward(self, z):
return self.f(z)
def loglik(self, x, z):
xr = self(z)
return (x*torch.log(xr) + (1-x)*torch.log(1-xr)).sum((1,2,3))
class GaussianEncoder(nn.Module):
def __init__(self, decoder):
super().__init__()
self.d = decoder.d
self.f = decoder
self.g = nn.Sequential(
nn.Conv2d(1, 32, 3, 1),
nn.LeakyReLU(),
nn.Conv2d(32, 64, 3, 1),
nn.MaxPool2d(2),
nn.Flatten(1),
nn.Linear(9216, 128),
nn.LeakyReLU(),
nn.Linear(128, self.d*2)
)
def forward(self, x):
μτ = self.g(x)
μ,τ = μτ[:,:self.d], μτ[:,self.d:]
return μ, torch.exp(τ/2)
def loglik_lb(self, x):
μ,σ = self(x)
kl = 0.5 * (μ**2 + σ**2 - torch.log(σ**2) - 1).sum(1)
ε = torch.randn_like(σ)
ll = self.f.loglik(x, z=μ+σ*ε)
return ll - kl