GAN2GAN: GENERATIVE NOISE LEARNING FOR BLIND DENOISING WITH SINGLE NOISY IMAGES

Abstract

We tackle a challenging blind image denoising problem, in which only single distinct noisy images are available for training a denoiser, and no information about noise is known, except for it being zero-mean, additive, and independent of the clean image. In such a setting, which often occurs in practice, it is not possible to train a denoiser with the standard discriminative training or with the recently developed Noise2Noise (N2N) training; the former requires the underlying clean image for the given noisy image, and the latter requires two independently realized noisy image pair for a clean image. To that end, we propose GAN2GAN (Generated-Artificial-Noise to Generated-Artificial-Noise) method that first learns a generative model that can 1) simulate the noise in the given noisy images and 2) generate a rough, noisy estimates of the clean images, then 3) iteratively trains a denoiser with subsequently synthesized noisy image pairs (as in N2N), obtained from the generative model. In results, we show the denoiser trained with our GAN2GAN achieves an impressive denoising performance on both synthetic and real-world datasets for the blind denoising setting; it almost approaches the performance of the standard discriminatively-trained or N2N-trained models that have more information than ours, and it significantly outperforms the recent baseline for the same setting, e.g., Noise2Void, and a more conventional yet strong one, BM3D.

1. INTRODUCTION

Image denoising is one of the oldest problems in image processing and low-level computer vision, yet it still attracts lots of attention due to the fundamental nature of the problem. A vast number of algorithms have been proposed over the past several decades, and recently, the CNN-based methods, e.g., Cha & Moon (2019) ; Zhang et al. (2017) ; Tai et al. (2017) ; Liu et al. (2018) , became the throne-holders in terms of the PSNR performance. The main approach of the most CNN-based denoisers is to apply the discriminative learning framework with (clean, noisy) image pairs and known noise distribution assumption. While being effective, such framework also possesses a couple of limitations that become critical in practice; the assumed noise distribution may be mismatched to the actual noise in the data or obtaining the noise-free clean target images is not always possible or very expensive, e.g., medical imaging (CT or MRI) or astrophotographs. Several attempts have been made to resolve above issues. For the noise uncertainty, the so-called blind training have been proposed. Namely, a denoiser can be trained with a composite training set that contains images corrupted with multiple, pre-defined noise levels or distributions, and such blindly trained denoisers, e.g., DnCNN-B in Zhang et al. (2017) , were shown to alleviate the mismatch scenarios to some extent. However, the second limitation, i.e., the requirement of clean images for building the training set, still remains. As an attempt to address this second limitation, Lehtinen et al. (2018) recently proposed the Noise2Noise (N2N) method. It has been shown that a denoiser, which has a negligible performance loss, can be trained without the clean target images, as long as two independent noisy image realizations for the same underlying clean image are available. Despite its effectiveness, the requirement of the two independently realized noisy image pair for a single clean image, which may hardly be available in practice, is a critical limiting factor for N2N. In this paper, we consider a setting in which neither of above approach is applicable, namely, the pure unsupervised blind denoising setting where only single distinct noisy images are available for training. Namely, nothing is known about the noise other than it being zero-mean, additive 2019), took the self-supervised learning (SSL) approach for the same setting, we take a generative learning approach. The crux of our method is to first learn a Wasserstein GAN (Arjovsky et al., 2017) -based generative model that can 1) learn and simulate the noise in the given noisy images and 2) generate rough, initially denoised images. Using such generative model, we then synthesize noisy image pairs by corrupting each of the initially denoised images with the simulated noise twice and use them to train a CNN denoiser as in the N2N training (i.e., Noisy N2N). We further show that iterative N2N training with refined denoised images can significantly improve the final denoising performance. We dubbed our method as GAN2GAN (Generated-Artifical-Noise to Generated-Artificial-Noise) and show that the denoiser trained with our method can achieve (sometimes, even outperform) the performance of the standard supervised-trained or N2N-trained blind denoisers for the white Gaussian noise case. Furthermore, for mixture/correlated noise or real-world noise in microscopy/CT images, for which the exact distributions are hard to know a priori, we show our denoiser significantly outperforms those standard blind denoisers, which are mismatch-trained with white Gaussian noise, as well as other baselines that operate in the same condition as ours: the SSL baseline, N2V (Krull et al., 2019) , and a more conventional BM3D (Dabov et al., 2007) .

2. RELATED WORK

Several works have been proposed to overcome the limitation of the vanilla supervised learning based denoising. As mentioned above, Noise2Self (N2S) (Batson & Royer, 2019) and Noise2Void (N2V) (Krull et al., 2019) recently applied self-supervised learning (SSL) approach to train a denoiser only with single noisy images. Their settings exactly coincide with ours, but we show later that our GAN2GAN significantly outperforms them. More recently, Laine et al. ( 2019) improved N2V by incorporating specific noise likelihood models with Bayesian framework, however, their method required to know the exact noise model and could not be applied to more general, unknown noise settings. Similarly, Soltanayev & Chun (2018) proposed SURE (Stein's Unbiased Risk Estimator)based denoiser that can also be trained with single noisy images, but it worked only with the Gaussian noise. Their work was extended in Zhussip et al. (2019) , but it required noisy image pairs as in N2N as well as the Gaussian noise constraint. Chen et al. (2018) devised GCBD method to learn and generate noise in the given noisy images using W-GAN Arjovsky et al. (2017) and utilized the unpaired clean images to build a supervised training set. Our GAN2GAN is related to Chen et al. ( 2018), but we significantly improve their noise learning step and do not use the clean data at all. Table 1 summarizes and compares the settings among the above mentioned recent baselines. We clearly see that only our GAN2GAN and N2V do not utilize any "sidekicks" that other methods use. 2020) considers the denoising of specific camera settings, and it also requires clean sRGB images as well as the knowledge of the noise level. Thus, it cannot be applied to the complete blind setting as ours, in which no information on the specific noise distribution or clean images is available.



, and independent of the clean image, and neither the clean target images for blind training nor the noisy image pairs for N2N training is available. While some recent work, e.g., Krull et al. (2019); Batson & Royer (2019); Laine et al. (

Summary of different settings among the recent baselines. Additionally, there are recently published papers on blind image denoising but these also have a difference with ours. Anwar & Barnes (2019); Zhang et al. (2018) suggest effective CNN architectures for denoising, however, they only consider the setting in which clean images are necessary for training. Zamir et al. (

