SELECTIVE FREQUENCY NETWORK FOR IMAGE RESTORATION

Abstract

Image restoration aims to reconstruct the latent sharp image from its corrupted counterpart. Besides dealing with this long-standing task in the spatial domain, a few approaches seek solutions in the frequency domain in consideration of the large discrepancy between spectra of sharp/degraded image pairs. However, these works commonly utilize transformation tools, e.g., wavelet transform, to split features into several frequency parts, which is not flexible enough to select the most informative frequency component to recover. In this paper, we exploit a multi-branch and content-aware module to decompose features into separate frequency subbands dynamically and locally, and then accentuate the useful ones via channel-wise attention weights. In addition, to handle large-scale degradation blurs, we propose an extremely simple decoupling and modulation module to enlarge the receptive field via global and window-based average pooling. Integrating two developed modules into a U-Net backbone, the proposed Selective Frequency Network (SFNet) performs favorably against state-of-the-art algorithms on five image restoration tasks, including single-image defocus deblurring, image dehazing, image motion deblurring, image desnowing, and image deraining 1 .

1. INTRODUCTION

Image restoration aims to recover a high-quality image by removing degradations, e.g., noise, blur, and snowflake. In view of its important role in surveillance, self-driving techniques, and remote sensing, image restoration has gathered considerable attention from industrial and academic communities. However, due to its ill-posed property, many conventional approaches address this problem based on various assumptions (Zhang et al., 2022; Yang et al., 2020b) or hand-crafted features (Karaali & Jung, 2017) , which are incapable of generating faithful results in real-world scenarios. Recently, deep neural networks have witnessed the rapid development of image restoration and obtained favorable performance compared to conventional methods. A flurry of convolutional neural networks (CNN) based methods have been developed for diverse image restoration tasks by inventing or borrowing advanced modules, including dilated convolution (Luo et al., 2022; Zou et al., 2021) , U-Net (Ronneberger et al., 2015) , residual learning (Zhang et al., 2017) , multi-stage pipeline (Zhang et al., 2019b) , and attention mechanisms (Liu et al., 2019) . However, with convolution units, these methods have limited receptive fields, and thus they are not capable of capturing long-range dependencies. This requirement is essential for restoration tasks, since a single pixel needs information from its surrounding region to be recovered. More recently, many researchers have tailored Transformer (Vaswani et al., 2017) for image restoration tasks, such as motion deblurring (Tsai et al., 2022 ), dehazing (Guo et al., 2022; Song et al., 2022) and desnowing (Chen et al., 2022b; c) . Nonetheless, the above-mentioned methods mainly conduct restoration in the spatial domain, which do not sufficiently leverage frequency discrepancies between sharp/degraded image pairs. To this end, a few works utilize the transformation tools, e.g., wavelet transform or Fourier transform, to decompose features into different frequency components and then treat separate parts individually to reconstruct the corresponding feature (Selesnick et al., 2005; Yang & Fu, 2019; Zou et al., 2021; Mao et al., 2021) . Nevertheless, wavelet transform decouples the feature map into different subbands in a fixed manner, and thus it is not capable of distinguishing the most informative or useless frequency components to enhance or suppress. In addition, these methods need corresponding inverse Fourier/wavelet transform, leading to additional computation overhead. To overcome the above drawbacks and select the most informative frequency component to reconstruct, we propose a novel decoupling and recalibration module for image restoration tasks, named Multi-branch Dynamic Selective Frequency module (MDSF). Specifically, we utilize the multi-branch learnable filters to generate high-and low-frequency maps dynamically and locally, and then leverage the channel-wise attention mechanism, modified from (Li et al., 2019) , to emphasize or suppress the resulting frequency components. Our module has two key advantages. Firstly, according to the input and task, the decoupling step dynamically generates filters to decompose feature maps. Secondly, our module does not introduce extra inverse transform. Receptive field is another critical factor for image restoration tasks due to the various sizes of degradation blurs (Suin et al., 2020; Son et al., 2021) . To complement the above dynamic module, MDSF, that processes features locally, we further propose a simple yet effective module, dubbed Multibranch Compact Selective Frequency module (MCSF), to enhance the helpful frequency signals based on multiple and relatively global receptive fields. Specifically, we utilize global and windowbased average pooling techniques to attain disparate frequency maps, and then use learnable parameters to modulate the resulting maps without resorting to any convolution layers. Compared to MDSF, besides the enlarged receptive fields, MCSF is lightweight enough to be embedded in multiple positions of the backbone. The main contributions of this study are summarized as follows: • We propose a multi-branch dynamic selective frequency module (MDSF) that is capable of decoupling feature maps into different frequency components dynamically via the theoretically proved filters, and selecting the most informative components to recover. • We develop a multi-branch compact selective frequency module (MCSF) that performs frequency decoupling and recalibration using multi-scale average pooling operations to pursue a large receptive field for large-scale degradation blurs. • Incorporating MDSF and MCSF into a U-shaped backbone, the proposed selective frequency network (SFNet) achieves state-of-the-art results on five image restoration tasks, including image defocus/motion deblurring, dehazing, deraining, and desnowing.

2. RELATED WORK

Image Restoration. Prior to the deep learning era, a great number of methods have been proposed for image restoration problems based on various assumptions and hand-crafted features (Sezan & Tekalp, 1990; Kundur & Hatzinakos, 1996; Calvetti et al., 1999) . In recent years, with the rapid development of deep learning, a flurry of approaches have been investigated utilizing convolutional neural networks for image motion deblurring (Zamir et al., 2021; Yuan et al., 2020; Cui et al., 2023; Purohit et al., 2021) , defocus deblurring (Ruan et al., 2022; Abuolaim & Brown, 2020; Son et al., 2021 ), desnowing (Chen et al., 2021c; 2020a ), dehazing (Dong et al., 2020;; Liu et al., 2019; Ren et al., 2016; Zhang et al., 2018), and deraining (Yang et al., 2020b; Wang et al., 2019) . More recently, to capture long-range dependencies, many works have borrowed Transformer (Vaswani et al., 2017) from the natural language processing field into image restoration (Chen et al., 2021a; Liang et al., 2021; Zamir et al., 2022; Wang et al., 2022) and specific tasks such as image motion deblurring (Tsai et al., 2022 ), dehazing (Song et al., 2022 ), and desnowing (Chen et al., 2022b) . In this study, instead of exploiting a more advanced backbone for image restoration, we pay more attention to the frequency selection mechanism based on efficient CNN. 



Based Image Restoration. Many algorithms have been developed to address various low-level vision problems from a frequency perspective. Specifically, Chen et al. (Chen et al., 2021c) propose a hierarchical desnowing network based on dual-tree complex wavelet representation (Selesnick et al., 2005). Yang et al. (Yang & Fu, 2019) develop the wavelet based U-Net to replace up-sampling and down-sampling. Zou et al. (Zou et al., 2021) utilize wavelet transform based module to help recover texture details. Yang et al. (Yang et al., 2020a) devise a wavelet structure similarity loss function for training. Mao et al. (Mao et al., 2021) use Fourier transform to

