SELECTIVE FREQUENCY NETWORK FOR IMAGE RESTORATION

Abstract

Image restoration aims to reconstruct the latent sharp image from its corrupted counterpart. Besides dealing with this long-standing task in the spatial domain, a few approaches seek solutions in the frequency domain in consideration of the large discrepancy between spectra of sharp/degraded image pairs. However, these works commonly utilize transformation tools, e.g., wavelet transform, to split features into several frequency parts, which is not flexible enough to select the most informative frequency component to recover. In this paper, we exploit a multi-branch and content-aware module to decompose features into separate frequency subbands dynamically and locally, and then accentuate the useful ones via channel-wise attention weights. In addition, to handle large-scale degradation blurs, we propose an extremely simple decoupling and modulation module to enlarge the receptive field via global and window-based average pooling. Integrating two developed modules into a U-Net backbone, the proposed Selective Frequency Network (SFNet) performs favorably against state-of-the-art algorithms on five image restoration tasks, including single-image defocus deblurring, image dehazing, image motion deblurring, image desnowing, and image deraining 1 .

1. INTRODUCTION

Image restoration aims to recover a high-quality image by removing degradations, e.g., noise, blur, and snowflake. In view of its important role in surveillance, self-driving techniques, and remote sensing, image restoration has gathered considerable attention from industrial and academic communities. However, due to its ill-posed property, many conventional approaches address this problem based on various assumptions (Zhang et al., 2022; Yang et al., 2020b) or hand-crafted features (Karaali & Jung, 2017) , which are incapable of generating faithful results in real-world scenarios. Recently, deep neural networks have witnessed the rapid development of image restoration and obtained favorable performance compared to conventional methods. A flurry of convolutional neural networks (CNN) based methods have been developed for diverse image restoration tasks by inventing or borrowing advanced modules, including dilated convolution (Luo et al., 2022; Zou et al., 2021) , U-Net (Ronneberger et al., 2015) , residual learning (Zhang et al., 2017) , multi-stage pipeline (Zhang et al., 2019b) , and attention mechanisms (Liu et al., 2019) . However, with convolution units, these methods have limited receptive fields, and thus they are not capable of capturing long-range dependencies. This requirement is essential for restoration tasks, since a single pixel needs information from its surrounding region to be recovered. More recently, many researchers have tailored Transformer (Vaswani et al., 2017) for image restoration tasks, such as motion deblurring (Tsai et al., 2022) , dehazing (Guo et al., 2022; Song et al., 2022) and desnowing (Chen et al., 2022b; c) . Nonetheless, the above-mentioned methods mainly conduct restoration in the spatial domain, which do not sufficiently leverage frequency discrepancies between sharp/degraded image pairs. To this end, a few works utilize the transformation tools, e.g., wavelet transform or Fourier transform, to decompose features into different frequency components and then treat separate parts individually to reconstruct the corresponding feature (Selesnick et al., 2005; Yang & Fu, 2019; Zou et al., 2021; Mao et al., 2021) . Nevertheless, wavelet transform decouples the feature map into different subbands in a fixed manner, and thus it is not capable of distinguishing the most informative or useless frequency components to enhance or suppress. In addition, these methods need corresponding inverse Fourier/wavelet transform, leading to additional computation overhead. To overcome the above drawbacks and select the most informative frequency component to reconstruct, we propose a novel decoupling and recalibration module for image restoration tasks, named Multi-branch Dynamic Selective Frequency module (MDSF). Specifically, we utilize the multi-branch learnable filters to generate high-and low-frequency maps dynamically and locally, and then leverage the channel-wise attention mechanism, modified from (Li et al., 2019) , to emphasize or suppress the resulting frequency components. Our module has two key advantages. Firstly, according to the input and task, the decoupling step dynamically generates filters to decompose feature maps. Secondly, our module does not introduce extra inverse transform. Receptive field is another critical factor for image restoration tasks due to the various sizes of degradation blurs (Suin et al., 2020; Son et al., 2021) . To complement the above dynamic module, MDSF, that processes features locally, we further propose a simple yet effective module, dubbed Multibranch Compact Selective Frequency module (MCSF), to enhance the helpful frequency signals based on multiple and relatively global receptive fields. Specifically, we utilize global and windowbased average pooling techniques to attain disparate frequency maps, and then use learnable parameters to modulate the resulting maps without resorting to any convolution layers. Compared to MDSF, besides the enlarged receptive fields, MCSF is lightweight enough to be embedded in multiple positions of the backbone. The main contributions of this study are summarized as follows: • We propose a multi-branch dynamic selective frequency module (MDSF) that is capable of decoupling feature maps into different frequency components dynamically via the theoretically proved filters, and selecting the most informative components to recover. • We develop a multi-branch compact selective frequency module (MCSF) that performs frequency decoupling and recalibration using multi-scale average pooling operations to pursue a large receptive field for large-scale degradation blurs. • Incorporating MDSF and MCSF into a U-shaped backbone, the proposed selective frequency network (SFNet) achieves state-of-the-art results on five image restoration tasks, including image defocus/motion deblurring, dehazing, deraining, and desnowing.

2. RELATED WORK

Image Restoration. Prior to the deep learning era, a great number of methods have been proposed for image restoration problems based on various assumptions and hand-crafted features (Sezan & Tekalp, 1990; Kundur & Hatzinakos, 1996; Calvetti et al., 1999) . In recent years, with the rapid development of deep learning, a flurry of approaches have been investigated utilizing convolutional neural networks for image motion deblurring (Zamir et al., 2021; Yuan et al., 2020; Cui et al., 2023; Purohit et al., 2021) , defocus deblurring (Ruan et al., 2022; Abuolaim & Brown, 2020; Son et al., 2021) , desnowing (Chen et al., 2021c; 2020a) , dehazing (Dong et al., 2020; Liu et al., 2019; Ren et al., 2016; Zhang et al., 2018) , and deraining (Yang et al., 2020b; Wang et al., 2019) . More recently, to capture long-range dependencies, many works have borrowed Transformer (Vaswani et al., 2017) from the natural language processing field into image restoration (Chen et al., 2021a; Liang et al., 2021; Zamir et al., 2022; Wang et al., 2022) and specific tasks such as image motion deblurring (Tsai et al., 2022) , dehazing (Song et al., 2022) , and desnowing (Chen et al., 2022b) . In this study, instead of exploiting a more advanced backbone for image restoration, we pay more attention to the frequency selection mechanism based on efficient CNN. Frequency Based Image Restoration. Many algorithms have been developed to address various low-level vision problems from a frequency perspective. Specifically, Chen et al. (Chen et al., 2021c) propose a hierarchical desnowing network based on dual-tree complex wavelet representation (Selesnick et al., 2005) . Yang et al. (Yang & Fu, 2019) 

3. METHODOLOGY

We first describe the overall architecture of SFNet (Fig. 1 (a) ). Then we present the proposed modules, MDSF (Fig. 1 (d, e )) and MCSF. The loss functions follow in the final part.

3.1. OVERALL ARCHITECTURE

Our network adopts the encoder-decoder architecture to learn hierarchical representations. Specifically, SFNet consists of a three-scale decoder and a three-scale encoder. Each scale is comprised of a ResBlock (Fig. 1 (c )). MDSF is only deployed in the last residual block of each ResBlock while MCSF exists in all blocks. Following previous methods (Cho et al., 2021; Mao et al., 2021; Tu et al., 2022) , multi-input and multi-output mechanisms are used to ease training difficulty. Specifically, input images of reduced sizes are merged into the main path via the shallow layer (Fig. 1 (b )), and the predicted images are produced by 3 × 3 convolutional layers after each scale of decoder. In addition, we adopt feature-level and image-level skip connections to assist training. In Fig. 1 , we only show the top-level image skip connection for clarity. The up-sampling and down-sampling layers are implemented by transposed and strided convolutions, respectively.

3.2. MULTI-BRANCH DYNAMIC SELECTIVE FREQUENCY MODULE (MDSF)

To select the informative frequency component to reconstruct, MDSF mainly contains two elements: frequency decoupler (Fig. 1 (d) ) and modulator (Fig. 1 (e) ). Decoupler decomposes features into separate frequency parts dynamically based on learned filters, and then modulator utilizes channelwise attention to accentuate the useful frequency. Additionally, to provide various local receptive fields, MDSF splits features among the channel dimension, and then applies different filter sizes to separate parts. We only show the one-branch case in Fig. 1 (d) for simplicity. To dynamically decompose feature maps, we utilize the learnable and theoretically proven low-pass filter (refer to Appendix B for the proof) and the corresponding high-pass filter to generate low-and high-frequency maps. The learned filters are shared across the group dimension to strike a balance between complexity and feature diversity. Specifically, given any feature map X ∈ R C×H×W , where C is the number of channels and H × W denotes the spatial dimension, we firstly leverage the filter-generating layer to produce the low-pass filter for each group of the input, formulated as F l = Softmax((B(W (GAP(X))))) where F l ∈ R k 2 g×1×1 , k × k is the kernel size of low-pass filter; g denotes the number of groups; B, W , and GAP are Batch Normalization (Ioffe & Szegedy, 2015) , the parameters of convolution and global average pooling, respectively. The group-based operation has fewer parameters and lower complexity than generating filters for each pixel. The number of groups is discussed in Sec 4.3. To attain the high-pass filter, we subtract the resulting low-pass filter from the identity kernel with central value as one and everywhere else as zero. Next, for each group feature X i ∈ R Ci×H×W , where i is the group index and C i = C g , its low-and high-frequency components can be obtained by using the corresponding reshaped filter F L and F H (∈ R g×k×k ), which is expressed as: X l i,c,h,w = p,q F L i,p,q X i,c,h+p,w+q ; X h i,c,h,w = p,q F H i,p,q X i,c,h+p,w+q where c is the index of a channel; h and w denote spatial coordinates; and p, q ∈ {-1, 0, 1}. After decoupling the feature map into different frequency components, we leverage the frequency modulator to emphasize the genuinely useful part for reconstruction, as illustrated in Fig. 1 (e) . The modulator works among the channel dimension based on the modified SKNet (Li et al., 2020) . Formally, given two frequency maps, X l and X h , we first generate the fused feature by, Z = W f c (GAP(X l + X h )) (3) where W f c is the parameters of a fully connected layer. To attain channel-wise weights, we use two other fully connected layers followed by concatenation and Softmax function, formulated as: [W l , W h ] c = e [W l (Z),W h (Z)]c 2C j e [W l (Z),W h (Z)]j (4) where W l and W h are channel-wise attention weights for two frequency parts; W l and W h are parameters of fully connected layers; [•, •] denotes concatenation; and c is the channel index of concatenated features. Compared to SKNet (Li et al., 2020) , which performs Softmax on each channel as W l c = e W l (Z)c e W l (Z)c +e W h (Z)c , we consider all channels into consideration to facilitate interactions between different channels of two maps. Then, the final weights can be obtained by split operation. Based on the above one-branch case, the multiple branches with varied filter sizes can be express as: X = [M 1 (D 1 (X 1 )), ..., M m (D m (X m ))] (5) where D and M denote decoupler and modulator, respectively, and X m is the equally split feature.

3.3. MULTI-BRANCH COMPACT SELECTIVE FREQUENCY MODULE (MCSF)

Since receptive field plays a critical role in image restoration, where degradation blurs always differ in size (Son et al., 2021; Mao et al., 2021) , we develop MCSF to efficiently enlarge the receptive field of SFNet. MCSF has two branches with different receptive fields, i.e., the global branch and window-based branch. Considering these branches share a similar paradigm, we only detail the window-based one, which is inspired by the idea of window-based attention (Liu et al., 2021) . Specifically, given the split feature X ∈ R C 2 ×H×W , it is partitioned into four windows, each with the size of 2C × H 2 × W 2 . To get the low-frequency part, global average pooling is applied to the resulting windows (refer to Appendix C for analyses of this option). The corresponding high-frequency part can be obtained by subtracting the low-frequency map from the partitioned feature. To select the useful frequency subbands, we rescale these two maps by learnable weights, which are directly optimized by backpropagation. Finally, the updated frequency maps are reversed to the original resolution. The global branch has a similar pipeline, yet with a global receptive field. Compared to MDSF, besides the enlarged receptive field, MCSF does not accomplish frequency decoupling and modulating with the aid of convolution layers, resulting in fewer parameters and lower complexity (see Tab. 9 for details). Hence, MCSF can be embedded in multiple positions.

3.4. LOSS FUNCTION

To facilitate the frequency selection process, we adopt L 1 loss in both spatial and frequency domains:  L spatial = 3 r=1 1 S r ∥ Xr -Y r ∥ 1 ; L f requency = 3 r=1 1 S r ∥F( Xr ) -F(Y r )∥ 1 (6)

4.1. SETTINGS

We evaluate the proposed SFNet on five restoration tasks: image motion/defocus deblurring, image deraining, image dehazing, and image desnowing. More details of the used datasets and training settings for each task are provided in Appendix A. FLOPs are computed on patch size of 256 × 256. We train separate models for different tasks. Unless mentioned otherwise, the following parameters are adopted. The batch size is set as 4 with patch size of 256 × 256. Each patch is randomly flipped horizontally for data augmentation. The initial learning rate is 1e -4 and gradually reduced to 1e -6 with the cosine annealing (Loshchilov & Hutter, 2016) . Adam (β 1 = 0.9, β 2 = 0.999) is used for training. N is set to 15 in Fig. 1 (c ). MDSF has two branches with filter kernel sizes of 3 × 3 and 5 × 5, respectively, and the number of groups is 8. We use PyTorch to implement our models on an NVIDIA Tesla V100 GPU. Image dehazing results. We perform dehazing experiments on the synthetic benchmark RESIDE (Li et al., 2018) and real hazy dataset Dense-Haze (Ancuti et al., 2019) . For RESIDE, we train our models for the indoor and outdoor scenarios separately, and then evaluate on the corresponding SOTS-Indoor and SOTS-Outdoor testsets. The quantitative results are shown in Tab. 2. Our method obtains the highest scores on all metrics. Particularly on the outdoor scene, our network generates a substantial gain of 4.87 dB PSNR over DeHamer (Guo et al., 2022) . Compared to the recent works, DehazeFormer-L (Song et al., 2022) and PMNet (Ye et al., 2022) , our method receives 0.19 dB and 2.83 dB higher PSNR on SOTS-Indoor testset, respectively. Additionally, we validate the performance of our approach on the real hazy dataset Dense-Haze (Ancuti et al., 2019) . The results are shown in Tab. 4. As we can see, SFNet exhibits the superior ability of dealing with the realworld dehazing problem, receiving a gain of 0.84 dB over DeHamer (Guo et al., 2022) . The dehazed results in Fig. 3 illustrate that SFNet is more effective in removing haze than other methods.

Single

Image motion deblurring results. We evaluate our method on both the synthetic and real-world datasets. The numerical comparisons on the synthetic GoPro (Nah et al., 2017) and HIDE (Shen et al., 2020) (Nah et al., 2017) . Table 7 : Deraining comparisons with previous methods on five deraining datasets: Rain100H (Yang et al., 2017) , Rain100L (Yang et al., 2017) , Test100 (Zhang et al., 2019a) , Test1200 (Zhang & Patel, 2018) and Test2800 (Fu et al., 2017) . demonstrates stronger generalization capability to HIDE dataset than Stripformer on all metrics. In addition to the synthetic datasets, we further evaluate the effectiveness of our network on the realworld dataset. Tab. 5 shows quantitative comparisons on the newly proposed RSBlur (Rim et al., 2022) dataset. SFNet sets new state-of-the-art on this dataset, providing a substantial gain of 0.37 dB PSNR over the previous best method Uformer-B (Wang et al., 2022) . Fig. 4 illustrates that SFNet produces more visually pleasant result than competing algorithms. Image desnowing results. We compare our method on the CSD (Chen et al., 2021b) dataset with existing state-of-the-art methods (Chen et al., 2022a; Valanarasu et al., 2022; Chen et al., 2022c) . As shown in Tab. 6, our framework yields a 4.66 dB PSNR improvement over the Transformer model MSP-Former (Chen et al., 2022c) . The visual results in Fig. 5 show that our method is more effective in removing spatially varying snowflakes than competitors. More results on SRRS (Chen et al., 2020a) and Snow100K (Liu et al., 2018) are provided in Appendix E. Image deraining results. Following recent works (Jiang et al., 2020; Purohit et al., 2021; Tu et al., 2022) , we compare PSNR/SSIM scores on the Y channel in YCbCr color space. Tab. 7 shows that our method achieves the best performance on the average PSNR category compared to competing approaches. Moreover, on the Test100 dataset (Zhang et al., 2019a) , the proposed SFNet obtains a performance boost of 0.30 dB PSNR over MLP model MAXIM-2S (Tu et al., 2022) . Visual results shown in Fig. 6 illustrate that our model recovers more fine details without artifacts. Computational overhead comparisons. In Tab. 8, we evaluate the computational costs of five motion deblurring methods on the GoPro testset (Nah et al., 2017) . Evaluated on the fullresolution image, our method achieves fastest speed than other state-of-the-art algorithms while achieving comparable performance with fewer parameters. 

4.3. ABLATION STUDIES

In this section, we first demonstrate the effectiveness of the proposed modules, and then investigate the effects of different designs for each module. Finally, we delve into the mechanism of MDCF to demonstrate its validity. Following the recent method (Tu et al., 2022) , all models are trained on the GoPro (Nah et al., 2017) dataset for 1000 epochs, and N is set to 7 in Fig. 1 . Figure 7 : Variance (Left) and Mean (Right) difference between the ground truth and input/results of three methods on SOTS-Outdoor testset. Models are trained on OTS dataset (Li et al., 2018) . Influence of each module. Tab. 9 shows that MDSF and MCSF yield performance gains of 0.22 and 0.25 dB over the baseline model with few introduced computing burdens. Deployed only in a single position in each scale, MDSF produces the similar performance with MCSF, demonstrating the effectiveness of dynamic frequency selection mechanism. Furthermore, in Fig. 7 , we plot the statistic differences between the ground truth and results of three methods on dehazing. With the frequency selection mechanism, the statistics of our results are closer to that of ground truth. Design choices for MCDF. We study the influence of the number of MCDF in Tab. 10, where 2 MCSF means that we employ the proposed MCSF in last two residual blocks of each ResBlock. As can be seen, using more MCSF leads to the consistently increasing performance from 31.22 to 31.45 dB PSNR while only introducing 0.01 M parameters and 0.04 G FLOPs. Due to its few introduced parameters and low complexity, we insert MCSF in each residual block for frequency learning. Design choices for MDSF. To understand the impact of the number of groups in MDSF, we test various configurations in Tab. 11. Generally, the increasing number of groups leads to higher PSNR, demonstrating the effectiveness of the filter diversity. However, the accuracy saturates at group 16, which is probably caused by overfitting. We finally pick 8 groups for better performance. Alternatives for MDSF. To examine the advantage of our design, we compare our decoupler with several alternatives in Tab. 12. We first substitute the learning-based and fixed frequency separation methods for our decoupler. We form Conv method (Tab. 12a) by using strided convolution to generate different frequency parts with reduced resolution (Pang et al., 2020) . The Octconv (Tab. 12b) version (Chen et al., 2020b) shares the similar idea with Conv, which utilizes down-sampling to reduce network redundancy. These variants only introduce extra low-frequency signal to the network. We further utilize fixed separation methods to replace the proposed decoupler. Gaussian (Tab. 12c) and Wavelet (Tab. 12d) produce the similar results, much lower than our MDSF. Additionally, Wavelet needs more parameters to deal with its multiple branches. Since our filter kernel is generated by learning, we further compare our MDSF with two attention approaches to verify the validity of the proposed selection mechanism. Specifically, we utilize the widely used window-based self-attention (Wang et al., 2022) We first verify the properties of the alleged low-/high-pass filters in MDSF. To this end, we iteratively apply the produced filters to the image. The variance and corresponding spectral features of intermediate images are provided in Fig. 8 . Taking the low-pass filter as an example, with the increasing of iteration times, the variance of the image decreases constantly, and the high-frequency signals in spectral features are reduced drastically. The high-pass filter exhibits the opposite properties. These results demonstrate the effectiveness of our filters. It is remarkable that the high-pass filter produces large variance with fewer iterations, hence it is more effective than the low-pass filter. As a result, it is easy for MDSF to introduce more high-frequency signals into the network for reconstruction. In MDSF, we generate different filters for each group to enhance the diversity of frequency features. To delve into this mechanism, we visualize the group-wise spectral features in Fig. 9 . As expected, different groups focus on the learning of disparate low-/high-frequency signals, enriching the diversity of frequency representations for selection. We further compare the feature maps before and after our MDSF in Fig. 10 . Using the attained filters, the decoupler of MDSF produces different frequency components. The high-frequency feature contains much edge information. The resulting feature after modulator recovers more details of the number plate that is blurry in the initial feature.

5. CONCLUSION

We present an image restoration framework, SFNet, which is built on frequency selection mechanism. We develop two key modules, MDSF and MCSF, to conduct frequency decomposition and recalibration with different receptive fields. Specifically, our multi-branch dynamic selective frequency module (MDSF) builds a dynamic filter to decompose feature maps into various frequency parts and utilizes channel attention to perform accentuation, thus effectively selecting the most informative frequency to recover. Furthermore, the proposed multi-branch compact selective frequency module (MCSF) introduces a simple yet effective manner to enlarge the receptive field and conduct frequency selection. With both designed modules, SFNet achieves state-of-the-art results on five image restoration tasks, demonstrating the validity of our frequency selection mechanism. 



Figure 1: (a) Overall architecture of the proposed SFNet. (b) Shallow layer extracts the shallow feature for low-resolution images. (c) ResBlock contains the proposed modules: MDSF (Decoupler (d) and Modulator (e)) and MCSF. MDSF is shown in the one-branch case for clarity. Invert depicts the operation of subtracting the low-pass filter from the identity filter. integrate both high-and low-frequency residual information. Yoo et al. (Yoo et al., 2018) complete image restoration based on the estimation of the DCT coefficient distribution. In this work, we pursue a dynamic and efficient manner to select the useful frequency part at multiple receptive fields.

Figure 2: Single-image defocus deblurring results on the DPDD dataset (Abuolaim & Brown, 2020).where r denotes the index of input/output images of different resolutions; F represents fast Fourier transform; S r is the number of elements for normalization; and Xr , Y r are output and target images, respectively. The final loss function is given by L = L spatial + λL f requency , where λ is set as 0.1.

Figure3: Image dehazing results on the SOTS-Indoor dataset(Li et al., 2018).

Figure4: Image motion deblurring results on the GoPro dataset(Nah et al., 2017).

Figure 5: Image desnowing results on the CSD dataset (Chen et al., 2021c).

Figure 6: Image deraining results on the Rain100H dataset (Yang et al., 2017).

Figure 8: The variance and discrete Fourier transform of the resulting images as we iteratively impose the produced filters on the image. Left: The low-pass filter. Right: The high-pass filter.Group1 Group7 Group6 Group5 Group4 Group3 Group2Group8

Figure 10: The internal features of MDSF. With our frequency selection mechanism, MDSF produces more fine details than the initial feature, e.g., the number plate. Zoom in for the best view.

develop the wavelet based U-Net to replace up-sampling and down-sampling.Zou et al. (Zou et al., 2021) utilize wavelet transform based module to help recover texture details. Yang et al.(Yang et al., 2020a)  devise a wavelet structure similarity loss function for training.Mao et al. (Mao et al., 2021) use Fourier transform to

Quantitative comparisons with previous leading single-image defocus deblurring methods on DPDD testset(Abuolaim & Brown, 2020), which contains 39 outdoor and 37 indoor scenes. .772 0.040 0.297 21.25 0.599 0.058 0.373 23.45 0.683 0.049 0.336 DMENet 25.50 0.788 0.038 0.298 21.43 0.644 0.063 0.397 23.41 0.714 0.051 0.349 JNB 26.73 0.828 0.031 0.273 21.10 0.608 0.064 0.355 23.84 0.715 0.048 0.315 DPDNet 26.54 0.816 0.031 0.239 22.25 0.682 0.056 0.313 24.34 0.747 0.044 0.277 KPAC 27.97 0.852 0.026 0.182 22.62 0.701 0.053 0.269 25.22 0.774 0.040 0.227 IFAN 28.11 0.861 0.026 0.179 22.76 0.720 0.052 0.254 25.37 0.789 0.039 0.217 Restormer 28.87 0.882 0.025 0.145 23.24 0.743 0.050 0.209 25.98 0.811 0.038 0.178 SFNet 29.16 0.878 0.023 0.168 23.45 0.747 0.049 0.244 26.23 0.811 0.037 0.207

Image dehazing comparisons on the synthetic dehazing datasets: SOTS-Outdoor and SOTS-Indoor(Li et al., 2018).

Image motion deblurring results on GoPro(Nah et al., 2017) and HIDE(Shen et al., 2020) datasets.

Image motion deblurring results on RSBlur dataset(Rim et al., 2022).

.810 14.92 0.592 27.03 0.884 24.31 0.861 23.38 0.835 22.48 0.796 SEMI 22.35 0.788 16.56 0.486 25.03 0.842 24.43 0.782 26.05 0.822 22.88 0.744 DIDMDN 22.56 0.818 17.35 0.524 25.23 0.741 28.13 0.867 29.65 0.901 24.58 0.770 UMRL 24.41 0.829 26.01 0.832 29.18 0.923 29.97 0.905 30.55 0.910 28.02 0.880 RESCAN 25.00 0.835 26.36 0.786 29.80 0.881 31.29 0.904 30.51 0.882 28.59 0.857 PreNet 24.81 0.851 26.77 0.858 32.44 0.950 31.75 0.916 31.36 0.911 29.42 0.897 MSPFN 27.50 0.876 28.66 0.860 32.40 0.933 32.82 0.930 32.39 0.916 30.75 0.903 MAXIM-2S 31.17 0.922 30.81 0.903 38.06 0.977 33.80 0.943 32.37 0.922 33.24 0.933 SFNet 31.47 0.919 31.90 0.908 38.21 0.974 33.69 0.937 32.55 0.911 33.56 0.929

Overall comparisons between motion deblurring methods on the GoPro(Nah et al., 2017) test set.

Ablation studies for individual proposed modules.

Ablation study for the number of MCSF.

Alternatives for MDSF.

ACKNOWLEDGEMENT

This work was supported by the National Natural Science Foundation of China under Grant (62172409), Shenzhen Science and Technology Program (JCYJ20220818102012025, JCYJ20220530145209022), and Beijing Nova Program (Z201100006820074).

