A SIMPLE BUT EFFECTIVE AND EFFICIENT GLOBAL MODELING PARADIGM FOR IMAGE RESTORATION

Abstract

Global modelling-based image restoration frameworks (e.g., transformer-like architecture) have gained popularity. Despite the remarkable advancement, their success may be at the cost of model parameters and FLOPs while the intrinsic characteristics (e.g., the task-specific degradation) are ignored. The objective of our work is orthogonal to previous studies and tailors a simple yet effective and efficient global modelling paradigm for image restoration. The key insights which motivate our study are two-fold: 1) Fourier transform is capable of disentangling image degradation and content component, serving as the image degradation prior embedded into image restoration framework; 2) Fourier domain innately embraces global property where each pixel of Fourier space is involved with all spatial pixels. We obey the de facto global modeling rule "spatial interaction + channel evolution" of previous studies. Differently, we customize the core designs: Fourier spatial interaction modeling and Fourier channel evolution. Equipped with the above-mentioned designs, our image restoration paradigm is verified on mainstream image restoration tasks including image de-raining, image enhancement, image de-hazing, and guided image super-resolution. Extensive experiments suggest that our paradigm achieves the competitive performance with fewer computational resources. Our main focus is not to beat previous frameworks but provide an alternative global modeling-based customized image restoration framework with efficient structure. Code will be publicly available.

1. INTRODUCTION

Image restoration aims to recover the latent clear image from its given degraded version. It is a highly ill-posed and challenging issue as there exists infinite feasible results for single degraded image. The representative image restoration tasks include image de-raining, image de-hazing, lowlight enhancement, guided image super-resolution, etc. In the past decades, a mount of research efforts have been devoted to solving the single image restoration problem, which can be classified into two categories: traditional optimization methods and deep learning-based methods (Zhang et al., 2018; Ren et al., 2018; Zhang et al., 2018; Ren et al., 2016b; Fu et al., 2021; Zhang et al., 2020; Liu et al., 2021a) . In terms of traditional image restoration methods, they formulate the image restoration process as an optimization problem and develop various image priors of the expected latent clear image to constrain the solution space, e.g., dark channel prior for image de-hazing (Dark, 2009) , histogram distribution prior for underwater image enhancement (Li et al., 2016) , non-local mean prior for image de-noising (Dixit & Phadke, 2013) , sparse image prior for guided image super-resolution (Kim & Kwon, 2010) as well as the commonly-used local and non-local smooth prior (Chen et al., 2013) , low-rank prior (Ren et al., 2016a) . However, aforementioned image priors are difficult to develop and these traditional methods involve the iteration optimization, thus consuming the huge computational resources and further hindering their usage. In a word, the common sense is to explore the potential image prior to relieve the optimization difficulty of the ill-posed image restoration. On the line of deep learning-based methods, convolutional neural networks (CNNs) have received widespread attention and achieved promising improvement in image restoration tasks over traditional methods (Liu et al., 2020; Ma et al., 2021; Zhang et al., 2021a; Zhou et al., 2021; 2022b) . More recently, transformer and multi-layer perceptrons (MLPs)-based global modeling paradigms have struck the image restoration field and significantly surpassed the CNN-based methods. Despite the remarkable advancement, they are arbitrarily used for image restoration tasks while ignoring the intrinsic characteristics of specific image restoration task. The success may be owing to the huge cost of computational resources, limiting their practical applications, especially on resource-limited devices. We therefore wonder "Can we provide a customized global modeling image restoration paradigm in a simple but effective and efficient manner?" To this end, motivated by our observations on Fourier transformation for image restoration tasks in Figure 1 , we tailor a simple yet effective and efficient global modelling paradigm, which is orthogonal to previous studies and customized for image restoration. The core insights of our work are two-folder: 1) general image restoration prior: Fourier transform is capable of disentangling image degradation and content component, serving as the image degradation prior embedded into image restoration framework; 2) global modeling: Fourier domain innately embraces global property where each pixel of Fourier space is involved with all spatial pixels. As shown in Figure 2 , the existing global modeling paradigm (e.g., transformer and MLP-Mixer) follow the the de-facto global modeling rule "spatial interaction + channel evolution". Similarly, we obey the rule and customize the core designs: Fourier spatial interaction and Fourier channel evolution. Such designs are different from previous works and provide new insights on global modeling network structures for image restoration. Equipped with the above-mentioned designs, our image restoration paradigm tailed for image restoration is described in Figure 3 . Extensive experiments are conducted on mainstream image restoration tasks including image de-raining, image enhancement, image de-hazing, and guided image super-resolution. Experimental results suggest that our paradigm achieves the competitive performance with fewer computational resources. To emphasize, our main focus is not to beat previous frameworks but provide an alternative global modelling-based customized image restoration framework with efficient structure. Our contributions are summarized as follows: (1) We contribute the first global modeling paradigm for image restoration in a simple but effective and efficient manner. (2) We implicitly embed the Fourier-based general image degradation prior into our core structures: Fourier spatial modeling and Fourier channel evolution, which provides new insights on the designs of global modelingbased image restoration network. (3) Our proposed paradigm achieves the competitive performance on several mainstream image restoration tasks with fewer computational resources. Image restoration. Image restoration aims to restore an image degraded by degradation factors (e.g. rain, haze, noise, lowlight) to a clear counterpart, which has been studied for a long time. Traditional image restoration methods are usually designed as an optimization problem, which incorporate specific priors of latent clear image to constrain the solution space (Dark, 2009; Li et al., 2016; Dixit & Phadke, 2013; Kim & Kwon, 2010) . For example, dark channel prior (Dark, 2009) is proposed for image dehazing and histogram distribution prior (Li et al., 2016) is developed for underwater image enhancement. These methods involve iteration optimization, thus consuming the huge computational resources and limiting their application. Recently, deep learning-based methods have achieved impressive performance in a datadriven manner. Among them, most algorithms are designed with CNN-based architectures. Early works stack deep convolution layers for improving model representation ability, such as VDSR (Kim et al., 2016) , DnCNN (Zhang et al., 2017) , and ARCNN (Dong et al., 2015) . Based on them, advanced methods have adopted more powerful architecture designs, such as residual block (Tai et al., 2017; Ehrlich & Davis, 2019) and dense block (Zhang et al., 2020; Dong et al., 2020) . Besides, attention mechanisms (Zhang et al., 2018; 2021c) and multi-stage mechanism (Zamir et al., 2021; Chen et al., 2021c) have brought into image restoration algorithms that elevate the performance. However, the locality property of convolution operation limits the perception of global information that is critical for image restoration (Dixit & Phadke, 2013; Berman et al., 2016) .

2. RELATED WORK

Global modeling. In recent years, global modeling techniques have gained much popularity in the computer vision community. A line of these methods is based on transformer (Vaswani et al., 2017) , which has been adapted in numerous vision tasks such as vision recognition (Liu et al., 2021b; Xia et al., 2022) and segmentation (Chen et al., 2021b; Cao et al., 2021) . Different from CNNbased architectures, transformer learns long-range dependencies between image patch sequences for global-aware modeling (Dosovitskiy et al., 2020) . Due to its characteristic, various image restoration algorithms based on transformer have been proposed in recent years, which achieve superior performance in restoration tasks such as image image dehazing (Chun-Le Guo, 2022), image deraining (Xiao et al., 2022) and low-light image enhancement (Xu et al., 2022) . Among them, a pioneer work IPT directly applies vanilla transformers to image patches (Chen et al., 2021a) , while Uformer (Wang et al., 2022b) and SwinIR (Liang et al., 2021) apply efficient window-based local attention models on several image restoration tasks. However, the huge computation cost and parameters of transformer framework limit practical application. As another line of global modeling paradigm, multi-layer perceptrons (MLPs)-based methods have attracted attention in vision problems (Tolstikhin et al., 2021) . To adapt this architecture for image restoration problems, MAXIM adopts a multi-axis MLP based mechanism to perceive information with global receptive field (Tu et al., 2022b) . Nevertheless, it still costs enormous computation resources and is thus hard to apply in compact devices. In total, all above architectures are not fully to explore priors that are specific for image restoration tasks, which is important to lift performance. Recently, Fourier transformation has presented its effectiveness for global modeling (Chi et al., 2019; 2020) . Instead of further exploring the efficacy of Fourier as global modeling in high-level tasks such as image classification, video action classification, human keypoint detection in (Chi et al., 2019) , our work is the first to focus on the customized image restoration framework designs. The work proposed in (Chi et al., 2019) pays more attention to the global property while our framework further explores the intrinsic prior tailored for image restoration. In addition, different from existing Fourier techniques Chi et al. (2020) that emphasize the micro basic operator with the global receptive field, our work aims to focus on the macro framework design. In our work, we pay more attention to the customized image restoration global modeling framework. In this work, we investigate to incorporate restoration prior with Fourier transformation to conduct effective global modeling, which is efficient for practical application. Different from existing transformer-based Wang et al. (2022a); Zamir et al. (2022) and MLP-based methods Tu et al. (2022a) that do not contain the intrinsic knowledge about image restoration tasks and only roughly focus on the global operator designs, our proposed framework is the first to explore the customized image restoration global modeling paradigm. Unlike these works that only consider global modeling, our work with efficient structure also meets the requirement of image restoration on edge devices with limited computation sources. In a word, our proposed framework incorporates both advantages of the global modeling mechanism and general image degradation prior that are introduced by Fourier transformation, thus achieving better performance.

3. METHOD

In this section, we first revisit the properties of Fourier transformation for image and then present an overview of the proposed global modeling paradigm, as illustrated in Figure 3 . We further provide details of the fundamental building block of our method. Finally, we deep into the new loss functions proposed in our work.

3.1. PRELIMINARY OF FOURIER TRANSFORMATION FOR IMAGE

As recognized, the Fourier transform is widely used to analyze the frequency content of images. For the images of multiple color channels, the Fourier transform is calculated and performed for each channel separately. For simplicity, we eliminate the notation of channels in formulas. Given an image x ∈ R H×W ×C , the Fourier transform F converts it to Fourier space as the complex component F(x), which is expressed as: F(x)(u, v) = 1 √ HW H-1 h=0 W -1 w=0 x(h, w)e -j2π( h H u+ w W v) , F -1 (x) defines the inverse Fourier transform accordingly. Both the Fourier transform and its inverse procedure can be efficiently implemented by FFT/IFFT algorithms (Frigo & Johnson, 1998) . The amplitude component A(x)(u, v) and phase component P(x)(u, v) are expressed as: A(x)(u, v)) = R 2 (x)(u, v)) + I 2 (x)(u, v)), P(x)(u, v)) = arctan[ I(x)(u, v)) R(x)(u, v)) ], where R(x) and I(x) represent the real and imaginary part respectively. Note that the Fourier transformation and inverse procedure are computed independently on each channel of feature maps. Targeting at image restoration, we employ Fourier transformation to conduct the detailed frequency analysis by revisiting the properties of phase and amplitude components, as shown in Figure 1 . It can be observed that the degradation effect is transferred (mainly in the amplitude component) when swapping the amplitude component and phase component of a degraded image and its clear version. The phenomenon indicates that Fourier transform is capable of disentangling image degradation and content component and the degradation mainly lies in the amplitude component. This motivates us to leverage Fourier transform as the image degradation prior embedded into image restoration framework.

3.2. FRAMEWORK

Structure flow. Our main goal is to develop a simple but effective and efficient global modeling paradigm for image restoration in a U-shaped hierarchical architecture, detailed in Figure 3 . Given a Optimization flow. Besides the network designs for image restoration, we also introduce a new loss function to enable the network for better optimization, thus reconstructing the more pleasing results in both spatial and frequency domains. In detail, it consists of two parts: spatial domain loss and frequency domain loss. In contrast to existing methods that usually adopt pixel-level losses with local guidance in the spatial domain, we additionally propose the frequency domain supervision loss via Fourier transformation that is calculated on the global frequency components. Motivated by spectral convolution theorem, direct emphasis on the frequency content is capable of better reconstructing the global information, thus improving the restoration performance. Let H O and GT denote the network output and the corresponding ground truth respectively. We propose a joint spatial-frequency domain loss for supervising the network training. In spatial domain, we adopt L1 loss L spa = ∥H O -GT ∥ 1 . (3) In frequency domain, we first employ DFT to convert H O and GT into Fourier space where the amplitude and phase components are calculated. Then, the L1-norm of amplitude difference and phase difference between H O and GT are summed to produce the total frequency loss L f re = ∥A(H O ) -A(GT )∥ 1 + ∥P(H O ) -P(GT )∥ 1 . Finally, the overall loss function is formulated as follows L = L spa + λL f re , Figure 4 : Details of the Fourier Prior Embedded Block. Our block follows the global modeling rule "spatial interaction + channel evolution" but is with new designs: Fourier spatial interaction modeling and Fourier channel evolution. where λ is weight factor and set to 0.1 empirically.

3.3. FOURIER PRIOR EMBEDDED BLOCK

As shown in Figure 4 , the fundamental building block dubbed as Fourier prior embedded block contains two key elements: (a) Fourier spatial interaction, (b) Fourier channel evolution. Fourier spatial interaction. In terms of the multi-channel feature maps, Fourier transformation is performed independently over each channel. Fourier prior embedded block takes the feature maps as input and then performs Fourier transformation to convert the spatial features into the real and imagery components. Suppose that the features denote as X ∈ R H×W×B , the corresponding Fourier transformation is expressed as X (b) I , X (b) R = F(X (b) ), where b = 1, . . . , B, X (9) According to spectral convolution theorem in Fourier theory, processing information in Fourier space is capable of capturing the global frequency representation in frequency domain. Finally, we merge the Fourier spatial interacted feature X S by concatenating each component X b S with the spatial ones processed by the half-instance normalization block, thus generating the output S X . I = σDW (b) (X (b) I ), S (b) R = σDW (b) (X (b) R ), Fourier channel evolution. Followed by the spatial interaction, Fourier channel evolution aims to perform the point-wise channel interaction. Similarly, we first transform the previous step output S X into the real and imagery components as C R and C I and then employ a stack of convolution operator with kernel size of 1 × 1 and ReLU function for channel interaction where each position in frequency space is shared. The channel interaction can be written as follows: (12) Finally, we perform the similar merging process with the first step, thus achieving the global modeling for both spatial and channel dimensions. CX I = σconv(cat[C 1 I , . . . , C B I ]), ( ) CX R = σconv(cat[C 1 R , . . . , C B R ]),

4. EXPERIMENT

To demonstrate the efficacy of our proposed customized image restoration paradigm, we conduct extensive experiments on multiple computer vision tasks, including image de-raining, image enhancement, image dehazing, and guided image super-resolution. More results can be found in the Appendix.

4.1. EXPERIMENTAL SETTINGS

Low-light image enhancement. We evaluate our paradigm on two popular benchmarks, including LOL (Chen Wei, 2018) and Huawei (Hai et al., 2021) . LOL dataset consists of 500 low-/normal-light image pairs, and we split 485 for training and 15 for testing. Huawei dataset contains 2480 paired images, and we split 2200 for training and 280 for testing. Further, we compared our paradigm with the following 13 state-of-the-art low-light image enhancement methods: SRIE (Fu et al., 2016) , RetinexNet (Chen Wei, 2018) , MBLLEN (Lv et al., 2018) , Enlighten-GAN (Jiang et al., 2021) , GLADNet (Wang et al., 2018) , Xu et al. (Xu et al., 2020) , TBEFN (Lu & Zhang, 2020) , KinD (Zhang et al., 2019) , Zero-DCE++ (Li et al., 2021) , DRBN (Yang et al., 2020) , RetinexDIP (Zhao et al., 2021) , RUAS (Liu et al., 2021a) , KinD++ (Zhang et al., 2021b) and URetinex (Wu et al., 2022) . Image De-raining. Following the work (Zamir et al., 2021) , our proposed paradigm is evaluated over 13,712 clean-rain image pairs, gathered from multiple synthetic datasets. With this single trained model, we perform evaluation on Rain100H and Rain100L. Further, we report the performance comparison between our designed paradigm and several representative state-of-the-art methods: DerainNet (Yang et al., 2017b) , SEMI (Wei et al., 2019) , DIDMDN (Zhang & Patel, 2018) , UMRL (Yasarla & Patel, 2019) , RESCAN (Li et al., 2018b) , PReNet (Ren et al., 2019) , MSPFN (Jiang et al., 2020 ), MPRNet (Zamir et al., 2021) , HINet (Chen et al., 2021c) . Image Dehazing. We evaluate the proposed method on synthetic and real-world datasets. For synthetic scenes, we employ RESIDE (Li et al., 2018a) dataset. The subset Indoor Training Set (ITS) of RESIDE contains a total of 13990 hazy indoor images, generated from 1399 clear images. The subset Synthetic Objective Testing Set (SOTS) of RESIDE consists of 500 indoor hazy images and 500 outdoor ones. In addition, we adopt two real-world datasets: Dense-Haze (Ancuti et al., 2019) and NH-HAZE (Ancuti et al., 2020) to evaluate the generalization. Both of the two datasets consist of 55 paired images. We compare our paradigm with the promising methods: DCP (He et al., 2010) and DehazeNet (Cai et al., 2016) , AOD-Net (Li et al., 2017) , GridDehazeNet (Liu et al., 2019) , FFA-Net (Qin et al., 2020) , MSBDN (Dong et al., 2020) and AECR-Net (Wu et al., 2021) . Guided Image Super-resolution. Following (Zhou et al., 2022a; Yan et al., 2022) , we adopt the pan-sharpening, the representative task of guided image super-resolution for evaluations. The WorldView II, WorldView III, and GaoFen2 in (Zhou et al., 2022a; Yan et al., 2022) are used. To verify the effectiveness of our paradigm, we choose the following representative pansharpening methods for comparison: 1) six state-of-the-art deep-learning based methods, including PNN (Masi et al., 2016) , PANNET (Yang et al., 2017a) , MSDCNN (Yuan et al., 2018) , SRPPNN (Cai & Huang, 2021) , GPPNN (Xu et al., 2021b) and INNformer (Zhou et al., 2022a) ; 2) five promising traditional methods, namely SFIM (Liu., 2000) , Brovey (Gillespie et al., 1987) , GS (Laben & Brower, 2000) , IHS (Haydn et al., 1982) , GFPCA (Liao et al., 2017) . Several widely-used image quality assessment (IQA) metrics are employed to evaluate the performance, including the relative dimensionless global error in synthesis (ERGAS) (Alparone et al., 2007) , the peak signal-to-noise ratio (PSNR), Structural Similarity Index (SSIM), and the spectral angle mapper (SAM) (J. R. H. Yuhas & Boardman, 1992) .

4.2. COMPARISON AND ANALYSIS

We perform quantitative performance comparison on the mainstream image restoration tasks in Table 1, Table 2, Table 3, and Table 4 , where the best results are highlighted in bold. From the results, it can observed that our proposed paradigm achieves the competitively promising performance with fewer computational burden against the the baselines across all testing datasets on mainstream tasks, suggesting the effectiveness of our designs. For example, for the pan-sharpening, our paradigm obtains 0.17dB, 0.18dB, and 0.06dB PSNR gains than state-of-art method on the WorldView-II, WorldView-III and GaoFen2 datasets, respectively. In addition, in terms of image enhancement, our paradigm achieves the comparable results with the transformer-based SNRformer with the huge reduce of model parameters and FLOPs. The consistent conclusion can be found in other tasks. 

4.3. ABLATION STUDIES

To investigate the contribution of the key components, we have conducted comprehensive ablation studies on the WorldView-II satellite dataset of the Pan-sharpening task in terms of the number of network architecture stages and the frequency loss function. More ablated studies can be found in the Appendix. Impact of the hierarchical number. To explore the impact of hierarchical number, i.e., the domsampling stages in our U-shape network, we experiment the proposed network with varying num- bers. The corresponding quantitative number K comparison from 1 to 4 is reported in Table 5 . Observing the results from Table 5 , it shows that the model performance can obtain considerable improvements at cost of computation (i.e., large hierarchical number)s. To balance the performance and computational complexity, we set K = 4 as default setting for pan-sharpening in this paper. Effectiveness of the frequency loss. The new frequency loss aims to directly emphasize the global frequency information optimization. In Table 6 , we remove it to examine its effectiveness. The results in Table 6 demonstrate that removing it severally degrades all metrics dramatically, indicating its significant role in our network.

5. LIMITATIONS

First, the more comprehensive experiments on broader computer vision tasks (e.g., image de-noising and image de-blurring) have not been explored. Second, our proposed global modeling paradigm still follows the underlying rule "spatial interaction + channel evolution" of previous transformerbased or MLP-like architectures for general vision tasks. The de facto global modeling rule may be suboptimal for image restoration and it thus needs to be further investigated. In addition, our proposed paradigm has not achieved the best performance. Note that, the objective of our work is orthogonal to previous studies and we thus tailor a simple yet effective and efficient global modelling paradigm for image restoration. This work will spark further research to the realms of the customized global modeling image restoration framework, thus promoting practical application.

6. CONCLUSION

In this paper, we first propose a theoretically feasible global modeling paradigm for image restoration. We revisit the existing global modeling paradigm for general vision tasks and find the underlying design rule "spatial interaction + channel evolution". In addition, we revisit the inborn characteristics of Fourier prior for image restoration and find its prevailed decomposed property of image degradation and content component. Based on the above analysis, we customize the core designs: Fourier spatial modeling and Fourier channel evolution. Equipped with above designs, our image restoration paradigm is verified on mainstream image restoration tasks and achieves the competitive performance with fewer computational resources. is preserved in the amplitude spectrum of rainy images. This indicates that the phase of rainy im- ages keeps the similar background structures as the ground truth. In this way, the Fourier prior is achieved by learning the transformation of the amplitude and phase spectrum separately.

B DISCUSSION ON THE REASON OF OUR EFFECTIVENESS

Image restoration is essentially an ill-posed optimization problem. For traditional image restoration algorithms, the common sense is to explore the intrinsic knowledge and image prior to constraint the solution space and thus obtain good solution. Besides, the effectiveness of global modeling for image restoration has been demonstrated in existing works. In our work, our proposed framework incorporates both advantages of global modeling and general image degradation prior that are introduced by Fourier transformation, thus achieving better performance. Some recent works Dai et al. (2022) ; Yu et al. (2022) have confirmed that the "spatial interaction + channel evolution" is the core contribution of effectiveness within transformer. Our work stands on the principle with new designs in Fourier space, thus achieving better results. Image restoration aims to remove the degradation effect and restore clear image. It can be treated as image filtering process. In our work, we conduct extensive analysis in Fourier space and infer that Fourier transform is capable of disentangling image degradation and content component and the degradation mainly lies in the amplitude component. To this end, our method first transforms the spatial representation in Fourier space with amplitude and phase and then employs the convolution to perform the filtering function over the amplitude and phase, thus achieving the clear reconstruction. The Fourier prior is embedded in above procedure and follows the consistent principle of the frequency filtering that is common in digital image processing. Therefore, it further achieves performance gains. The proposed general image degradation prior is capable of achieving the degradation and content disentanglement, which alleviates the difficulty in network optimization.

C MORE COMPARISONS

In this section, we provide more visual comparisons with state-of-the-art methods over the reported tasks. As can be seen in Fig. 9, Fig. 10, Fig. 13, Fig. 14, Fig. 11 and Fig. 12 , our proposed method achieves the best performance against other state-of-the-art algorithms. 



Figure 1: Motivations. Analysis of discrete Fourier transform (DFT) over mainstream image restoration tasks. In (a) and (d), we respectively swap the amplitude component and phase component of a degraded image and its clear version. It can be observed that the degradation effect is transferred, thus indicating that Fourier transform is capable of disentangling image degradation and content component and the degradation mainly lies in the amplitude component. To further verify our observation, we also swap the amplitude component and phase component of a degraded image and an irrelevant image in (b). The degradation is still mainly related to the amplitude component, such as the darkness for image enhancement. Similarly, a low-resolution image and its high-resolution counterpart are different in the amplitude component in (c). These observation motivates us to leverage the Fourier transform as the image degradation prior embedded into image restoration framework. More analysis and results can be found in the Appendix.

Figure 2: The underlying rule of existing global model paradigm: spatial interaction + channel evolution.

Figure 3: Overview of the proposed customized global modeling paradigm for image restoration.

indicate the real and imagery respectively. Then we employ the spatial interaction by a stack of depth-wise convolution with kernel size of 3 × 3 and ReLU function. share the common depth-wise operator while different channels are independently performed. The spatial interaction can be written as follows:

where σ and DW indicate the ReLU function and depth-wise convolution respectively. Next, we apply the inverse DFT to transform the filtered frequency components of S

11) where conv indicates the convolution with kernel size of 1 × 1. Next, we apply the inverse DFT to transform the filtered frequency components of CX (b) I and CX (b) I back to spatial domain as C b S = F -1 (CX

Figure 6: Analysis of discrete Fourier transform (DFT) for low-light image enhancement task. In detail, we swap the amplitude and phase components of the degraded image and a clear version with same or different contents. It can be observed that the degradation effect is transferred with the swapping of amplitude component, indicating that Fourier transform is capable of disentangling degradation and content component and the degradation mainly lies in the amplitude component. This motivates us to leverage Fourier transform as the image degradation prior embedded into image restoration framework.

Figure 8: Analysis of discrete Fourier transform (DFT) image de-raining task. In detail, we swap the amplitude and phase components of the degraded image and the clear version with or without same content. It can be observed that the degradation effect is transferred with the swapping of amplitude component, indicating that Fourier transform is capable of disentangling image degradation and content component and the degradation mainly lies in the amplitude component.

Figure 9: The visual comparison on image de-hazing task.

Figure 10: The visual on guided image super-resolution task.

Figure 11: The visual comparison on image enhancement task (LOL dataset).

Figure 12: The visual comparison over low-light image enhancement task (LOL dataset).

Figure 13: The visual comparison over low-light image enhancement task (Huawei dataset).

Figure 14: The visual comparison over low-light image enhancement task (Huawei dataset).

Quantitative comparison of image de-hazing.

Quantitative comparison of image de-raining.

Quantitative comparison of image enhancement.

Quantitative comparison of guided image super-resolution.

Ablation

Ablation studies for frequency loss.

Appendix

In this appendix, we provide additional details and results. In Sec. A, we present further discussion on our motivation. In Sec. B, we present the discussion on the reason of our effectiveness. In Sec. C, we show more comparison results between our method and existing methods on multiple image tasks.

A MOTIVATION

Referring to the previous works. As pointed out in Oppenheim et al. (1979) , the motivation comes from a well-known property of the Fourier transformation: the Fourier phase spectrum preserves high-level semantics, while the amplitude spectrum contains low-level features. From Xu et al. (2021a) , the amplitude and phase components of Fourier space correspond to the style and semantic information of an image. To further validate our observation, as shown in Fig. 7 , we apply the inverse Fast Fourier Transform (iFFT) to the phase and amplitude components to visualize them in spatial domain. The appearance of the phase representation is more similar with the structure representation, and the distribution of the phase component is less affected by lightness. To this end, the phase component is more related to structures that are less affected by lightness in spatial domain.Motivation on image de-raining. Fourier transformation: the Fourier phase spectrum preserves high-level semantics, while the amplitude spectrum contains low-level features Oppenheim et al. (1979) . Fig. 8 shows the results of swapping the Fourier amplitude and phase spectrum of rainy/clean images. For the images with or without same content, most rain streaks information

