Computer Laboratory


Teaser figure
Graphical illustration of the two phases of our loss. Phase 1 denotes the adversarial training of the discriminators. The generated image is produced by the scale-specific generator Gk , which takes as input the upscaled output of the previous level added with the task specific distortions zk . For SISR, no distortions are added (zk = 0). The levels are sequentially trained from the coarsest scale to the finest. In Phase 2, the discriminators are frozen and used as feature extractors over whose outputs an L2 distance is measured between the ground-truth training image xi and the restoration output x̃ i . The distance is measured between the two images at every scale k and intermediate layers of the discriminator.


Central to the application of neural networks in image restoration problems, such as single image super resolution, is the choice of a loss function that encourages natural and perceptually pleasing results. A popular choice for a loss function is a pre-trained network, such as VGG and LPIPS, which is used as a feature extractor for computing the difference between restored and reference images. However, such an approach has multiple drawbacks: it is computationally expensive, requires regularization and hyper-parameter tuning, and involves a large network trained on an unrelated task. In this work, we explore the question of what makes a good loss function for an image restoration task. First, we observe that a single natural image is sufficient to train a lightweight feature extractor that outperforms state-of-the-art loss functions in single image super resolution, denoising, and JPEG artefact removal. We propose a novel Multi-Scale Discriminative Feature (MDF) loss comprising a series of discriminators, trained to penalize errors introduced by a generator. Second, we show that an effective loss function does not have to be a good predictor of perceived image quality, but instead needs to be specialized in identifying the distortions for a given restoration method.



We provide a comprehensive comparison of qualitative results for different loss functions across different applications. To begin with, we show results for two Single Image Super-Resolution (SISR) networks, namely, Enhanced Deep Super-Resolution (EDSR) and Super-Resolution ResNet (SR-ResNet). Further, we show the results for the applications of image denoising and JPEG artefact removal.

Single Image Super-Resolution (SISR)

Image denoising

JPEG artefact removal

We compare the performance of different losses for two codec compression qualities.

Hyper-parameter tuning for VGG and LPIPS

To find the best weightage, we conduct a hyper-parameter search over controlling the weightage sum of of VGG/LPIPS and MSE feature-wise loss fucntions: MSE + weight * VGG/LPIPS.


Please contact Aamir Mustafa with any questions regarding the method.


This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement N◦ 725253–EyeCode).