AN UNSUPERVISED DEEP LEARNING APPROACH FOR REAL-WORLD IMAGE DENOISING

Abstract

Designing an unsupervised image denoising approach in practical applications is a challenging task due to the complicated data acquisition process. In the realworld case, the noise distribution is so complex that the simplified additive white Gaussian (AWGN) assumption rarely holds, which significantly deteriorates the Gaussian denoisers' performance. To address this problem, we apply a deep neural network that maps the noisy image into a latent space in which the AWGN assumption holds, and thus any existing Gaussian denoiser is applicable. More specifically, the proposed neural network consists of the encoder-decoder structure and approximates the likelihood term in the Bayesian framework. Together with a Gaussian denoiser, the neural network can be trained with the input image itself and does not require any pre-training in other datasets. Extensive experiments on real-world noisy image datasets have shown that the combination of neural networks and Gaussian denoisers improves the performance of the original Gaussian denoisers by a large margin. In particular, the neural network+BM3D method significantly outperforms other unsupervised denoising approaches and is competitive with supervised networks such as DnCNN, FFDNet, and CBDNet.

1. INTRODUCTION

Noise always exists during the process of image acquisition and its removing is important for image recovery and vision tasks, e.g., segmentation and recognition. Specifically, the noisy image y is modeled as y = x + n, where x denotes the clean image, n denotes the corrupted noise and image denoising aims at recovering x from y. Over the past two decades, this problem has been extensively explored and many works have been proposed. Among these works, one typical kind of model assumes that the image is corrupted by additive white Gaussian noise (AWGN), i.e., n ∼ N (0, σ 2 I) where N (0, 1) is the standard Gaussian distribution. Representative Gaussian denoising approaches include block matching and 3D filtering (BM3D) (Dabov et al., 2007b) , non-local mean method (NLM) (Buades et al., 2005) , K-SVD (Aharon et al., 2006) and weighted nuclear norm minimization (WNNM) (Gu et al., 2014) , which perform well on AWGN noise removal. However, the AWGN assumption seldom holds in practical applications as the noise is accumulated during the whole imaging process. For example, in typical CCD or CMOS cameras, the noise depends on the underlying context (daytime or nighttime, static or dynamic, indoor or outdoor, etc.) and the camera settings (shutter speed, ISO, white balance, etc.). In Figure 1 , two real noisy images captured by Samsung Galaxy S6 Edge and Google Pixel smartphones are chosen from Smartphone Image Denoising Dataset (SIDD) (Abdelhamed et al., 2018) and three 40 × 40 patches are chosen for illustration of noisy distribution. It is clear that real noise distribution is content dependent and noise in each patch has different statistical properties which can be non-Gaussian. Due to the violation of the AWGN assumption, the performance of the Gaussian denoiser deteriorates significantly (Figure 1 (d) ). Thus, it is crucial to characterize the noise distribution and adapt the noise models to the denoiser in real-world image denoising. In recent years, deep learning based methods have achieved remarkable performance with careful architecture design, good training strategies, a large number of noisy and clean image pairs. However, there are two main drawbacks of these approaches from the perspective of practical applications. One is the high dependency on the quality and the size of the training dataset. et al., 2020) by developing a novel dropout technique for image denoising. Thus, unsupervised deep learning approaches with accurate noise models are important for solving real-world image denoising problems, yet current solutions are unsatisfactory. Such approach deserves to be studied and is a challenging problem as it needs a good combination of traditional methods and deep learning based methods such that the benefits of both methods are fully explored. 1.1 THE SUMMARY OF IDEAS AND CONTRIBUTIONS Motivated by the above analysis, the goal of this paper is to propose an unsupervised deep learning method that boosts the performance of existing Gaussian denoisers when solving real-world image denoising problems. The basic idea is to find a latent image z associated with the input noisy image y such that z|x satisfies the AWGN assumption, and thus we can obtain the clean image x



Figure 1: Two real noisy images. (a) Clean images. (b) Noisy Images. (c) Noisy distribution in red, green and yellow patches. (d) BM3D results (PSNR: 26.55 (top) and 29.41 (bottom)). (e) NN+BM3D (PSNR: 27.53 (top) and 30.05 (bottom)).

Therefore, to reduce the dependency of the training set, single-image based image denoising approaches deserved to be studied and have both practical and scientific value. It is worth mentioning that a recent unsupervised learning work(Ulyanov et al.,  2018)  uses a deep image prior to the general image recovery problem but its denoising results are inferior to some typical Gaussian denoisers, e.g., BM3D. The other drawback is the generalization ability of a trained network. When the noisy distribution is complicated and not contained in the training set, the results of the deep learning method can be deteriorated significantly, even worse than non-learning based methods. To alleviate this problem, some recent works are proposed by further consideration of noise estimation in the network design, e.g., Guo et al. (2019); Yue et al. (2019); Zhang et al. (2017). Despite their good performance in blind Gaussian denoising(Guo et al.,  2019; Zhang et al., 2017)  and real-world denoising problem(Yue et al., 2019), a large number of noisy and clean image pairs are needed and the generalization problem remains when the imaging system is complicated. Very recently, a single image based method has been proposed in (Quan

