DISCOVERING DISTINCTIVE "SEMANTICS" IN SUPER-RESOLUTION NETWORKS

ABSTRACT

Image super-resolution (SR) is a representative low-level vision problem. Although deep SR networks have achieved extraordinary success, we are still unaware of their working mechanisms. Specifically, whether SR networks can learn semantic information, or just perform complex mapping functions? What hinders SR networks from generalizing to real-world data? These questions not only raise our curiosity, but also influence SR network development. In this paper, we make the primary attempt to answer the above fundamental questions. After comprehensively analyzing the feature representations (via dimensionality reduction and visualization), we successfully discover the distinctive "semantics" in SR networks, i.e., deep degradation representations (DDR), which relate to image degradation instead of image content. We show that a well-trained deep SR network is naturally a good descriptor of degradation information. Our experiments also reveal two key factors (adversarial learning and global residual) that influence the extraction of such semantics. We further apply DDR in several interesting applications (such as distortion identification, blind SR and generalization evaluation) and achieve promising results, demonstrating the correctness and effectiveness of our findings.

1. INTRODUCTION

The emergence of deep convolutional neural network (CNN) has given birth to a large number of new solutions to low-level vision tasks (Dong et al., 2014; Zhang et al., 2017) . Among these signs of progress, image super-resolution (SR) has enjoyed a great performance leap. Compared with traditional methods (e.g., interpolation (Keys, 1981) and sparse coding (Yang et al., 2008) ), SR networks can achieve better performance with improved efficiency.



Figure 1: Distributions of the deep representations of classification and super-resolution networks. For classification networks, the semantics of the deep feature representations are artificially predefined according to the training data (category labels). However, for SR networks, the learned deep representations have a different kind of "semantics" from classification. During training, the SR networks are only provided with downsampled clean LR images. There is not any supervision signal related to image degradation information. Surprisingly, we find that SR networks' deep representations are spontaneously discriminative to different degradations. Notably, NOT an arbitrary SR network has such a property. In Sec. 4.3, we reveal two factors that facilitate SR networks to extract such degradation-related representations, i.e., adversarial learning and global residual.

