KNOWLEDGE DISTILLATION BASED DEGRADATION ESTIMATION FOR BLIND SUPER-RESOLUTION

Abstract

Blind image super-resolution (Blind-SR) aims to recover a high-resolution (HR) image from its corresponding low-resolution (LR) input image with unknown degradations. Most of the existing works design an explicit degradation estimator for each degradation to guide SR. However, it is infeasible to provide concrete labels of multiple degradation combinations (e.g., blur, noise, jpeg compression) to supervise the degradation estimator training. In addition, these special designs for certain degradation, such as blur, impedes the models from being generalized to handle different degradations. To this end, it is necessary to design an implicit degradation estimator that can extract discriminative degradation representation for all degradations without relying on the supervision of degradation groundtruth. In this paper, we propose a Knowledge Distillation based Blind-SR network (KDSR). It consists of a knowledge distillation based implicit degradation estimator network (KD-IDE) and an efficient SR network. To learn the KDSR model, we first train a teacher network: KD-IDE T . It takes paired HR and LR patches as inputs and is optimized with the SR network jointly. Then, we further train a student network KD-IDE S , which only takes LR images as input and learns to extract the same implicit degradation representation (IDR) as KD-IDE T . In addition, to fully use extracted IDR, we design a simple, strong, and efficient IDR based dynamic convolution residual block (IDR-DCRB) to build an SR network. We conduct extensive experiments under classic and real-world degradation settings. The results show that KDSR achieves SOTA performance and can generalize to various degradation processes. The code is available at Github.

1. INTRODUCTION

Single image super-resolution (SISR) aims to recover details of a high-resolution (HR) image from its low-resolution (LR) counterpart, which has a variety of downstream applications (Dong et al., 2014; Zhang et al., 2019; Xia et al., 2022d; Fritsche et al., 2019; Xia et al., 2022c; b) . These stateof-the-art methods (Kim et al., 2016; Lim et al., 2017; Lai et al., 2017; Xia et al., 2022a; Wang et al., 2018b) usually assume that there is an ideal bicubic downsampling kernel to generate LR images. However, this simple degradation is different from more complex degradations existing in real-world LR images. This degradation mismatch will lead to severe performance drops. To address the issue, blind super-resolution (Blind-SR) methods are developed. Some Blind-SR works (Wang et al., 2021a; Luo et al., 2022) use the classical image degradation process, given by Eq. 1. Recently, some works (Cai et al., 2019; Bulat et al., 2018) attempted to develop a new and complex degradation process to better cover real-world degradation space, which forms a variant of Blind-SR called real-world super-resolution (Real-SR). The representative works include BSR-GAN (Zhang et al., 2021) and Real-ESRGAN (Wang et al., 2021b) , which introduce comprehensive degradation operations such as blur, noise, down-sampling, and JPEG compression, and control the severity of each operation by randomly sampling the respective hyper-parameters. To better simulate the complex degradations in real-world, they also apply random shuffle of degradation orders (Zhang et al., 2021) and second-order degradation (Wang et al., 2021b) respectively. Since Blind-SR faces almost infinite degradations, introducing prior degradation information to SR networks can help to constrain the solution space and boost SR performance. As shown in Fig. 1 , the way to obtain degradation information can be divided into three categories: (1) Several Non-Blind SR methods (Zhang et al., 2018a; Shocher et al., 2018; Zhang et al., 2020; Soh et al., 2020; Xu et al., 2020) directly take the known degradation information as prior (Fig. 1 In this paper, we aim to design an efficient implicit degradation representation (IDR) learning SR framework that can easily adapt to any degradation process. To this end, we develop a novel knowledge distillation based Blind-SR Network (KDSR). Specifically, as shown in Fig.  = (x ⊗ k) ↓ s +n, where ⊗ denotes convolution operation. x and y are HR and corresponding LR images respectively. k is blur kernel and n is additional white Gaussian noise. ↓ s refers to downsampling operation with scale factor s. The severity of blur and noise are unknown, which are randomly sampling the respective hyper-parameters to adjust severity and form almost infinite degradation space. Given an input LR image y and applied blur kernel k, the classic blind-SR methods (Gu et al., 2019; Luo et al., 2020; 2022) pretrain an explicit degradation estimator to estimate the blur kernel applied on y with the supervision of groundtruth k. Then, their SR network can use the estimated blur kernel to perform SR on LR image y. The SR network is trained with loss function L rec . L rec = ∥I HR -I SR ∥ 1 , where I HR and I SR are real and SR images separately. Real-world blind-SR is a variant of classic blind-SR, in which more complicated degradation is adopted. The real-world blind-SR approaches (Wang et al., 2021b; Zhang et al., 2021) introduce comprehensive degradation operations such as blur, noise, down-sampling, and JPEG compression, and control the severity of each operation by randomly sampling the respective hyper-parameters. Moreover, they apply random shuffle of degradation orders and second-order degradation to increase T , where D T is used to guide the SR. After that, we move on to KDSR S training. We initialize the KDSR S with the KDSR T 's parameters and make the KDSR S learn to directly extract D ′ S same as D ′ T from LR images.

3.2. KNOWLEDGE DISTILLATION BASED IMPLICIT DEGRADATION ESTIMATOR

Most Blind-SR methods elaborately design an explicit degradation estimator for each degradation type and process. There are several limitations for explicit degradation estimators: (1) These special designs for specific degradation processes make the explicit estimator hard to be transferred to other degradation settings. (2) It is complex to provide various degradation labels for explicit degradation estimator training, especially the random combination of multiple degradations (Wang et al., 2021b) . Therefore, we develop a KD based implicit degradation estimator (KD-IDE), which can distinguish various degradations accurately without the supervision of degradation ground-truth. As shown in Fig. 2 (c), we can divide KD-IDE into several parts: (1) We take the LR images and the concatenation of LR and HR images as input for KD-IDE S and KD-IDE T , respectively. Specially, for the KD-IDE T (Fig. 2 (d) ), it can easily extract the degradation which makes HR degrade to LR images by providing paired HR and LR images and jointly being optimized with the SR network. Since there is a spatial size difference between HR and LR images, we perform the Pixel-Unshuffle operation on HR images I HR ∈ R 3×4H×4W to be I HR ′ ∈ R 48×H×W and then concatenate it with LR images to obtain an input I ∈ R 51×H×W . (2) The input passes through the first convolution to become feature maps. It is noticeable that the input channels in the first convolution are 3 and 51 for KD-IDE S and KD-IDE T , respectively. (3) After that, we use numerous residual blocks to further extract features and obtain a rough degradation vector by Average Pooling operation. (4) We use the two linear layers to refine the degradation vector and obtain IDR D ′ ∈ R 4C , which is used for KD. (5) Although D ′ has 4C channels to accurately present degradation and can give more degradation information for the student network to learn, it will consume a large number of computational resources used in IDR-DDC. Hence, we need to further compress it with a linear layer and obtain another IDR D ∈ R C to guide SR. More details of KD training are given in Sec. 3.4.

3.3. IMAGE SUPER-RESOLUTION NETWORK

As for the design of SR network, we should consider three points: (1) After we obtain the IDR, it is important to design a SR network that can fully use the estimated degradation prior for SR. (2) An ideal Blind-SR network is likely to be used in practice, the structure of which should be simple. Thus, we also try to make the network formed by one type of simple and strong enough module. (3) The huge computation consumption usually limits the application of models, especially on edge devices. Thus, it is necessary to design an efficient model. As shown in Fig. 2 Specifically, to fully use the estimated IDR, as displayed in Fig. 2 (a), we generate specific convolution weights according to the IDR D. However, if we generate ordinary convolution weights, the computational cost will be quite large and affect the efficiency of the network. Thus, we further introduce depthwise convolution (Howard et al., 2017), which merely consumes about 1 C computation and parameters of ordinary convolution. The IDR-DDC can be mathematically expressed as: W = Reshape (ϕ (D)) , F out [i, :, :] = F in [i, :, :] ⊗ W[i, :, :, :], i ∈ [0, C), where ϕ(.) and ⊗ are two linear layers and convolution operation separately; Then, D T is used to generate specific degradation weights for dynamic convolution. After that, the specific degradation SR network will restore the LR images. By jointly optimizing the teacher SR network and KD-IDE T with the L 1 Loss (Eq. 2), the KD-IDE can effectively extract accurate IDR to guide SR network. (2) After finishing KDSR T training, we move on to train KDSR S . As shown in the Fig. 2 (e), different from KD-IDE T , we only input the LR images to the KD-IDE S , obtaining IDR D S and D ′ S . The other steps are the same as KDSR T training except for the adopted loss functions. Specifically, we introduce a knowledge distillation (KD) function (Eq. 6) to enforce the KD-IDE S directly extracting the same accurate IDR as KD-IDE T from LR images. In addition, for the classic degradation model (Eq. 1), following previous Blind-SR works (Gu et al., 2019; Wang et al., 2021a), we adopt L rec (Eq. 2) and can set the total loss function as L classic (Eq. 7). For more complex degradation processes (Real-SR), following L vis (Eq. 3) of Real-ESRGAN (Wang et al., 2021b), we propose L real (Eq. 8). More details are given in appendix. D ∈ R C indicates IDR, ϕ (D) ∈ R CK h Kw is the output of ϕ(.), W ∈ R C×1×K h ×Kw is L kl = j=[0,4C) D ′ T norm (j) log D ′ T norm (j) D ′ Snorm (j) , L classic = λ rec L rec + λ kl L kl , L real = λ rec L rec + λ kl L kl + λ per L per + λ adv L adv , where D ′ T norm and D ′ Snorm are normalized with softmax operation of D ′ T and D ′ S separately. L per and L adv are perceptual and adversarial loss. λ kl , λ per and λ adv denote the balancing parameters.

4.1. SETTINGS

We train and test our method on classic and real-world degradation settings. For the classic degradation, following previous works (Gu et For the real-world degradation, in Sec. 4.4, similar to Real-ESRGAN (Wang et al., 2021b) , we adopt DF2K and OutdoorSceneTraining (Wang et al., 2018a ) datasets for training. We set the learning rate of the KDSR T to 2 × 10 -4 and pre-train it with only Eq. 2 by 1000K iterations. Then, we optimize KDSR S with Eq. 7 by 1000K iterations and continue to train it with Eq. 8 by 400K iterations. The learning rate is fixed as 10 -4 . For optimization, we use Adam with β 1 = 0.9, β 2 = 0.99. In both two stages of training, we set the batch size to 48, with the input patch size being 64.

4.2. EVALUATION WITH ISOTROPIC GAUSSIAN KERNELS

We first evaluate our KDSR on degradation with isotropic Gaussian kernels. We compare the KDSR with several SR methods, including RCAN (Zhang et al., 2018c) , ZSSR (Shocher et al., 2018) , The quantitative results are shown in Tab. 1. We can see that our KDSR S -M surpasses DASR by 0.6dB, 0.39dB, 0.67dB and 1.24dB on Set5, Set14, Urban100 and Manga109 datasets separately. In addition, compared with the Blind-SR method DANv2, our KDSR S -M achieves better performance consuming only 21% FLOPs of DANv2. It is because that DANv2 uses an iterative strategy to estimate accurate explicit blur kernels, which requires many computations. Besides, compared with the SOTA Blind-SR method DCLS, our KDSR S -L achieves better performance on almost all datasets consuming less time. It is notable that DCLS specially designed an explicit degradation estimator for blur kernel, while the KD-IDE in our KDSR is simple and can adapt to any degradation process. The qualitative results are shown in Fig. 3 . We can see that our KDSR S -L has more clear textures compared with other methods. Our KDSR S -M also achieves better visual results than DANv2. 4. The quantitative results are shown in Tab. 3. Compared with the recent real-world SR method MM-RealSR, our KDSR S -GAN performs better, only consuming about 50% runtime. In addition, KDSR S -GAN outperforms SOTA real-world SR method Real-ESRGAN on LPIPS, PSNR, and SSIM, only consuming its 75% FLOPs. Furthermore, we provide qualitative results in Fig. 5 . We can see that our KDSR S -GAN produces more visually promising results with clearer details and textures. More qualitative results are provided in appendix.

5. ABLATION STUDY

Knowledge Distillation Based Blind-SR Network. In this part, we validate the effectiveness of the components in KDSR, such as KD and IDR-DDC (Tab. 4). KDSR S 4 is actually the KDSR S -M adopted in Tab. (2) We define L 1 for optimization (Eq. 9). (3) Motivated by KD loss in SR model compression (Gao et al., 2018) , we define L 2 (Eq. 10). L 1 = 1 4C 4C i=1 |D ′ S (i) -D ′ T (i)| , L 2 = 1 4C 4C i=1 (D ′ S (i) -D ′ T (i)) 2 , where D ′ T and D ′ S ∈ R 4C are IDRs extracted by KDSR T -M and KDSR S -M respectively. We apply these three loss functions on KDSR S -M separately to learn the IDR from KDSR T -M. Then, we evaluate them on 4× Urban100 with Gaussian8 kernels. The results are shown in Tab. 5 . We can see that the performance of L kl is better than L 1 and L 2 . That means that the degradation information is mainly contained in the distribution of IDR D rather than in its absolute values. The Visualization of KD-IDE. To further validate the effectiveness of our KD-IDE, we use t-SNE (Van der Maaten & Hinton, 2008) to visualize the distribution of extracted IDR. Specifically, we generate LR images from BSD100 (Martin et al., 2001) with different isotropic Gaussian kernels and feed them to KDSR T , KDSR S , KDSR S without KD, and DASR (Wang et al., 2021a) to generate IDR D for Fig. 6 (a), (b), (c), and (d) respectively. We can see from Fig. 6 (a) and (b) that KDSR T can distinguish different degradations, and KDSR S also learn this ability from KDSR T well. In addition, comparing Fig. 6 (b) and (c), we can see that KDSR S obtaining IDR extraction knowledge from KDSR T can distinguish various degradations better than KDSR S without adopting KD. That further demonstrates the effectiveness of our KD-IDE. Furthermore, we compare KDSR S and DASR (Fig. 6 (b) and (d)), and the results show that KDSR S can distinguish various degradations more clear than DASR, which shows the superiority of KD based IDE to metric learning based IDE.

6. CONCLUSION

Most Blind-SR methods tend to elaborately design an explicit degradation estimator for a specific type of degradation to guide SR. Nevertheless, it is difficult to provide the labels of multiple degradation combinations to train explicit degradation estimators, and these specific designs for certain degradation make them hard to transfer to other degradation processes. To address these issues, we develop a knowledge distillation based Blind-SR (KDSR) network, consisting of a KD-IDE and an efficient SR network that is stacked by IDR-DCRBs. We use KD to make KD-IDE S directly extract the same accurate IDR as KD-IDE T from LR images. IDR-DCRBs of SR network use IDR based depthwise dynamic convolution to fully and efficiently utilize the extracted IDR to guide SR. Extensive experiments on classic and complex real-world degradation processes demonstrate that the proposed KDSR can achieve a general state-of-the-art Blind SR performance.



Figure 1: The illustration of different degradation estimators. (a) Non-blind SR methods directly use known degradation information to guide SR networks, such as SRMD (Zhang et al., 2018a). (b) Many Blind-SR methods estimate the explicit degradation with the supervision of ground-truth degradation. (c) Several methods use metric learning to distinguish degradation roughly. (d) Our knowledge distillation (KD) based implicit degradation estimator can estimate accurate implicit degradation representation to guide SR without ground-truth degradation supervision. Blind-SR methods (Gu et al., 2019; Luo et al., 2020; Wang et al., 2021a; Liang et al., 2022; Luo et al., 2022) adopt explicit degradation estimators, which are trained with ground-truth degradation (Fig. 1 (b)). However, these explicit degradation estimators are elaborately designed for specific degradation processes. The specialization makes them hard to transfer to handle other degradation processes. In addition, it is challenging to annotate precise ground-truth labels to represent the multiple degradation combination (Zhang et al., 2021; Wang et al., 2021b) for supervised degradation learning. Therefore, developing implicit degradation representation (IDR) based methods is important. (3) Recently, as shown in Fig. 1 (c), DASR (Wang et al., 2021a) and MM-RealSR (Mou et al., 2022) use metric learning to estimate IDR and quantize degradation severity respectively. However, metric learning methods roughly distinguish degradations by pushing away or pulling close features, which is unstable and cannot fully capture discriminative degradation characteristics for Blind-SR.

Figure 2: The overview of our proposed knowledge distillation based Blind-SR network (KDSR), which consists of a KD based implicit degradation estimator network (KD-IDE) and a SR network mainly formed by the IDR based Depthwise Dynamic convolution (IDR-DDC). degradation complexity. Since degradation is complex and cannot provide specific degradation labels, they directly use SR networks without degradation estimators. Their SR networks emphasize visual quality trained with L vis . L vis = λ rec L rec + λ per L per + λ adv L adv , (3) where L per and L adv are perceptual (Johnson et al., 2016) and adversarial loss (Wang et al., 2021b). As shown in Fig. 2, we propose a KDSR, consisting of KD-IDE and an efficient SR network. Different previous explicit degradation estimation based blind-SR methods Gu et al. (2019); Luo et al. (2020; 2022); Liang et al. (2022), our KD-IDE does not requires degradation labels for training, which can generalize to any degradation process. Moreover, our the design of our SR network is neat and efficient, which is practical and can fully use degradation information for SR. There are two stage training processes for KDSR, including Teacher KDSR T and Student KDSR S training. We first train the KDSR T : we input the paired HR and LR images to the KD-IDE T and obtain the implicit degradation representation (IDR) D T and D ′T , where D T is used to guide the SR. After that, we move on to KDSR S training. We initialize the KDSR S with the KDSR T 's parameters and make the KDSR S learn to directly extract D ′ S same as D ′ T from LR images.3.2 KNOWLEDGE DISTILLATION BASED IMPLICIT DEGRADATION ESTIMATORMost Blind-SR methods elaborately design an explicit degradation estimator for each degradation type and process. There are several limitations for explicit degradation estimators: (1) These special designs for specific degradation processes make the explicit estimator hard to be transferred to other degradation settings. (2) It is complex to provide various degradation labels for explicit degradation estimator training, especially the random combination of multiple degradations(Wang et al., 2021b). Therefore, we develop a KD based implicit degradation estimator (KD-IDE), which can distinguish various degradations accurately without the supervision of degradation ground-truth.

(a), (b), and (d), our SR network can be divided into three hierarchies. (1) We first propose a convolution unit called IDR based Depthwise Dynamic Convolution (IDR-DDC). Motivated by UDVD (Xu et al., 2020), we adopt the dynamic convolution to use IDR to guide SR.

weights of dynamic convolution ; F in and F out ∈ R C×H×W are input and output feature maps respectively. (2) As shown in Fig. 2 (b), motivated by EDSR (Lim et al., 2017), we develop IDR based Dynamic Convolution Residual Blocks (IDR-DCRB) to realize deep model. For the first convolution of IDR-DCRB, we use the IDR-DDC to utilize degradation information. However, IDR-DDC lacks interaction between different channels. Thus, we adopt ordinary convolution as the second convolution. (3) For simplicity, as shown in Fig. 2 (d) or (e), we mainly stack the IDR-DCRB to form the SR network. 3.4 TRAINING PROCESS KDSR has a two-stage training process. (1) As shown in the Fig. 2 (d), we first train the teacher KDSR T . we input the paired LR and HR images to the KD-IDE T obtain the IDR D T and D ′ T .

Figure 3: Visual comparison (4×) of Blind-SR methods on isotropic Gaussian kernels. 800 images in DIV2K (Agustsson & Timofte, 2017) and 2,650 images in Flickr2K (Timofte et al., 2017) as the DF2K training set. The batch sizes are set to 64, and the LR patch sizes are 64×64. We use Adam optimizer with β 1 = 0.9, β 2 = 0.99. We train both teacher and student networks with 600 epochs and set their initial learning rate to 10 -4 and decrease to half after every 150 epochs. The loss coefficient λ rec and λ kd are set to 1 and 0.15 separately. The SR results are evaluated with PSNR and SSIM on the Y channel in the YCbCr space. (1) In Sec. 4.2, we train and test on isotropic Gaussian kernels following the setting in Gu et al. (2019). Specifically, the kernel sizes are fixed to 21×21. In training, the kernel width σ ranges are set to [0.2, 4.0] for scale factors 4. We uniformly sample the kernel width in the above ranges. For testing, we adopt the Gaussian8 (Gu et al., 2019) kernel setting to generate evaluation datasets. Gaussian8 uniformly chooses 8 kernels from range [1.80, 3.20] for scale 4. The LR images are obtained by blurring and downsampling the HR images with selected kernels. (2) In Sec. 4.3, we also validate our methods on anisotropic Gaussian kernels and noises following the setting in (Wang et al., 2021a). Specifically, We set the kernel size to 21×21 for scale factor 4. In training, we use the additive Gaussian noise with covariance σ = 25 and adopt anisotropic Gaussian kernels characterized by Gaussian probability density function N (0, Σ) with zero mean and varying covariance matrix Σ. The covariance matrix Σ is determined by two random eigenvalues λ 1 , λ 2 ∼ U (0.2, 4) and a random rotation angle θ ∼ U (0, π).

Figure 4: 4× visual comparison. Noise levels are set to 10 and 20 for these two images separately. IKC (Gu et al., 2019), DAN (Luo et al., 2020), AdaTarget (Jo et al., 2021), and DASR (Wang et al., 2021a). Note that RCAN is a state-of-the-art SR method for bicubic degradation. For a fair comparison on different model sizes, we develop KDSR S -M and KDSR S -L by adjusting the depth and channels of the network. We apply Gaussian8 (Gu et al., 2019) kernel setting on five datasets, including Set5 (Bevilacqua et al., 2012), Set14 (Zeyde et al., 2010), B100 (Martin et al., 2001), Urban100 (Huang et al., 2015), and Manga109 (Matsui et al., 2017), to generate evaluation datasets.

Figure 5: 4× visual comparison on real-world SR competition benchmarks. 4.4 EVALUATION ON REAL-WORLD SR We further validate the effectiveness of KDSR on Real-World datasets. As described in Sec. 4.1, we introduce GAN (Goodfellow et al., 2014) and perceptual Johnson et al. (2016) loss to train our network with the same high-order complex degradation process as Real-ESRGAN (Wang et al., 2021b), obtaining KDSR S -GAN. We compare our methods with the state-of-the-art GAN-based SR methods, including Real-ESRGAN, BSRGAN (Zhang et al., 2021), MM-RealSR (Mou et al., 2022), ESRGAN (Wang et al., 2018b). We evaluate all methods on the dataset provided in the challenge of Real-World Super-Resolution: AIM19 Track2 (Lugmayr et al., 2019) and NTIRE2020 Track1 (Lugmayr et al., 2020). Since AIM19 and NTIRE2020 datasets provide a paired validation set, we use the LPIPS (Zhang et al., 2018b), PSNR, and SSIM for the evaluation.

IDR extracted by KDSRT . (b) IDR extracted by KDSRS. (c) IDR extracted by KDSRS without KD. (d) IDR extracted by DASR.

Figure 6: Visualization of IDR with different isotropic Gaussian blur kernels σ on various methods. works to our model. Here, we define three classic KD functions: (1) We use the Kullback Leibler divergence to measure distribution similarity (L kl , Eq. 6). (2) We define L 1 for optimization (Eq. 9). (3) Motivated by KD loss in SR model compression (Gao et al., 2018), we define L 2 (Eq. 10).

For the training process, we first input HR and LR images to the teacher KD-IDE T , which is optimized with the SR network together. Given the paired HR and LR images, teacher KD-IDE T can easily extract the latent degradation information in LR images. Then, we use a student KD-IDE S to learn to extract the same IDR as that of KD-IDE T from LR images directly. Extensive experiments can demonstrate the effectiveness of the proposed KDSR. Our main contributions are threefold:• We propose KDSR, a strong, simple, and efficient baseline for Blind-SR, can generalize to any degradation process, which addresses the weakness of explicit degradation estimation. • We propose a novel KD based implicit degradation representation (IDR) estimator. To the best of our knowledge, the design of IDR estimation has received little attention so far. Besides, we propose an efficient IDR-based SR network to fully utilize IDR to guide SR. • Extensive experiments show that the proposed KDSR can achieve excellent Blind-SR performance in different degradation settings from simple to complex.

al., 2019; Luo et al., 2022), we combine 4× SR quantitative comparison on datasets with Gaussian8 kernels. The bottom three methods marked in rouse use IDR to guide blind SR. The FLOPs and runtime are computed based on an LR size of 180 × 320. Best and second best performance are in red and blue colors, respectively.

PSNR results achieved on Set14 (Zeyde et al., 2010) under anisotropic Gaussian blur and noises. The bottom two methods marked in rouse use IDR to guide blind SR. The best results are marked in bold. The runtime is measured on an LR size of 180 × 320. .99 28.14 28.20 28.12 27.99 27.80 27.87 26.52 10 25.74 25.91 25.97 26.00 25.96 25.88 25.75 25.50 24.67 20 24.72 24.89 24.92 24.89 24.92 24.82 24.70 24.59 23.84

3 EVALUATION WITH ANISOTROPIC GAUSSIAN KERNELS AND NOISES We evaluate our KDSR on degradation with anisotropic Gaussian kernels and noises by adopting 9 typical blur kernels and different noise levels. We compare our KDSR with SOTA blind-SR methods, including RCAN (Zhang et al., 2018c), IKC (Gu et al., 2019), DCLS (Luo et al., 2022) and DASR (Wang et al., 2021a). Since RCAN, IKC, and DCLS cannot deal with noise degradation, we use DnCNN (Zhang et al., 2017), a SOTA denoising method, to denoise images for them.The quantitative results are shown in Tab. 2. Compared with the SOTA explicit degradation estimation based on Blind-SR methods DCLS, our KDSR S surpasses it by over 1 dB under almost all degradation settings consuming 29.4% parameters and 5.1% runtime. Furthermore, as σ = 20, our KDSR S surpasses DASR about 1dB with less parameters and runtime. This shows the superiority of knowledge distillation based IDR estimation and efficient SR network structure. In addition, we provide visual comparison in Fig.4. We can see that KDSR S has sharper edges, more realistic details, and fewer artifacts compared with other methods. More visual results are given in appendix.

4× SR quantitative comparison on real-world SR competition benchmarks. The FLOPs and runtime are computed based on an LR size of 180 × 320. The best results are marked in bold.

1, and KDSR T is KDSR T 4's corresponding teacher network.(1)  We directly input the degradation blur kernels into the KDSR S 4, obtaining KDSR S 3. Compared with KDSR S 3, KDSR S 4 has a similar performance by estimating IDR. That demonstrates that our KDSR S 4 can estimate quite accurate IDR to guide Blind-SR. (2) We cancel the KD in KDSR S 4 to obtain KDSR S 2, which means that KDSR S 2 cannot learn the IDR extraction from KDSR T . Comparing KDSR S 4 and KDSR S 2, we can see that the KD scheme can bring 0.42dB improvement for KDSR S 4, which demonstrates that KD can effectively help KDSR S 4 to learn the IDR extraction ability from KDSR T .(3) Based on KDSR S 2, we replace the IDR-DDC in IDR-DCRB with ordinary convolution to obtain KDSR S 1. KDSR S 2 is 0.17dB higher than KDSR S 1, which demonstrates the effectiveness of IDR-CDC. (4) Besides, KDSR S 4 is 0.2dB lower than its teacher KDSR T . That means KDSR S 4 cannot completely learn the IDR extraction ability from KDSR T . 4C as learning objects to learn the ability to extract IDR from LR images. Therefore, we cannot directly apply these experiences from previous PSNR results evaluated on Urban100 with Gaussian8 (Gu et al., 2019) kernels for 4× SR. The FLOPs and runtime are both measured on an LR size of 180 × 320.

ACKNOWLEDGMENTS

This work was partly supported by the Alexander von Humboldt Foundation, the National Natural Science Foundation of China(No. 62171251), the Natural Science Foundation of Guangdong Province(No.2020A1515010711), the Special Foundations for the Development of Strategic Emerging Industries of Shenzhen(Nos.JCYJ20200109143010272 and CJGJZD20210408092804011) and Oversea Cooperation Foundation of Tsinghua.

