GENERATIVE RECORRUPTED-TO-RECORRUPTED: AN UNSUPERVISED IMAGE DENOISING NETWORK FOR ARBITRARY NOISE DISTRIBUTION Anonymous

Abstract

With the great breakthrough of supervised learning in the field of denoising, more and more works focus on end-to-end learning to train denoisers. The premise of this method is effective is that there is certain data support, but in practice, it is particularly difficult to obtain labels in the training data. To this end, some unsupervised denoisers have emerged in recent years, however, the premise of these methods being effective is that the noise model needs to be known in advance, which will limit the practical use of unsupervised denoising. In addition, inaccurate noise prior from noise estimation algorithms causes low denoising accuracy. Therefore, we design a more practical denoiser that requires neither clean images as training labels nor noise model assumptions. Our method also needs the support of the noise model, the difference is that the model is generated by a residual image and a random mask during the network training process, and then the input and target of the network are generated from a single noisy images and the noise model, at the same time, train an unsupervised module and a pseudo supervised module. Extensive experiments demonstrate the effectiveness of our framework and even surpass the accuracy of supervised denoising.

1. INTRODUCTION

Image denoising is a traditional topic in the field of image processing, and it is the basis for the success of other vision tasks. A noisy image can be represented by y = x + n, and our task is to design a denoiser to remove the noise in the noisy image. The denoising convolutional neural network DNCNN Zhang et al. (2017) can be considered a benchmark of the use of deep learning for image denoising, and it introduced residual learning and batch normalization, which speed up the training process as well as boost the denoising performance. The fast and lexible denoising neural network FFDNET Zhang K. & Zhang (2018) treated the noisy model as a prior probability distribution, such that it can effectively handles wide a range of noise levels The convolutional blind denoising CBDNET Guo et al. (2019) went further than the FFDNET Zhang K. & Zhang ( 2018) and aimed at real photographs though synthesized and real images were both used in training. A common treatment for the above methods is that it all need to take noisy-clean image pairs in training. However, in some scenarios such as medical and biological imaging, there often lack clean images, leading to an infeasibility of the above methods. To this end, the noise-to-noise(N2N) method Lehtinen et al. (2018) was the first to reveal that deep neural networks (DNNs) can be trained with pairs of noisy-noisy images instead of noisy-clean images, in other words, training can be conducted with only two noisy images that are captured independently in the same scene. The N2N can be used in many taskBuchholz et al. 2021) concluded that it is still possible to train the network without using clean images if the noise between each region of the image is independent. Among them, Neighbor2Neighbor (NBR2NBR) method Huang et al. (2021) proposed a new sampling scheme to achieve better denoising effects with a single noisy image. The advantage of this approach is that it does not need a prior noise model prior, like the Recorrupted-to-Recorrupted ( R2R 2021) are valid under the assumption that the noise on each pixel is independent from each other, which means that they are not as effective in dealing with noise in real scenes as supervised denoising. To deal with more complex noise, some unsupervised methods have been proposed. In order to solve the above problems, we propose a new denoising mode that achieves unsupervised denoising without requiring noise prior, noise assumptions and any clean images. In one epoch of our training network, we first obtain a residual image containing noise through the difference between network input and output, and use a random mask image to reduce the influence of natural image information in the residual image. The generative noise model can be obtained by the above operation. In the second step, we put the model into the Pseudo Supervised module and the Recorrupted-to-Recorrupted module to train the same network. At this point, one epoch of the entire training ends. Eventually after many iterations of network training, we will gradually generate more realistic noise model and perfect denoiser. More detail of our proposed Generate Recorrupted-to-Recorrupted framework can be found in Figure 1 . The remainder of the paper is organized as follows. In Section 2, we introduce the related work. Then the details of our method is given in Section 3, followed by the experiments in Section 4, with the conclusions drawn in Section 5.

2.1. SUPERVISED TRAINING

With the rapid development of deep learning, many supervised learning methods are applied to image denoising. DNCNN Zhang et al. (2017) successfully applied a 17-layer deep neural network to image denoising tasks by introducing residual learning. After that, a series of more efficient and complex neural networks succeed in denoising tasks. Unlike DNCNN Zhang et al. (2017) , FFDNET Zhang K. & Zhang (2018) was more efficient in denoising, and CBDNET Guo et al. (2019) can handle more complex real noise. Without considering constraints, the above supervised denoising methods can be expressed by the following formula: argmin θ L (f θ (y) , x) . y, x, f and L are noisy images, clean images, denoising model and loss function respectively. However, these methods all require clean images as the target of training the neural network, and then optimize the parameter θ by calculating the gap between the network output and the target, so as to obtain a better denoising model. The N2N Lehtinen et al. (2018) revealed that the noisy/true image pairs used to train the DNN can be replaced by noisy/noisy images pairs. The corrupted pairs are represented by the y and z, where n 1 and n 2 are uncorrelated. There are two main principles for N2N to successfully train network with a paired noisy image: the first is that the optimal solution obtained by network training is a mean value solution; the second is that the mean value of most noise is close to zero. So the gradient generated by the network for one corrupted target is incorrect, but the gradient corresponding to the average of all corrupted images is correct, which can be expressed by the following formula: argmin θ L (f θ (y) , z) y = x + n 1 z = x + n 2 . (2)



(2019); Ehret et al. (2019); Hariharan et al. (2019); Wu et al. (2019); Zhang et al. (2019), since it creatively addressed the dependency on clean images. Unfortunately, pairs of corrupted images are still difficult to be obtained in dynamic scenes with deformations and image quality variations. To further bring the N2N into practice, some research Krull et al. (2019); Batson & Royer (2019); Krull et al. (2020); Huang et al. (

) Pang et al. (2021), nor does it lose image information, like the Noise2Void (N2V) self-supervised methods N2V Krull et al. (2019). However, Krull et al. (2019); Batson & Royer (2019); Krull et al. (2020); Huang et al. (2021); Pang et al. (

Noise2Grad (N2G) Lin et al. (2021) extracted noise by exploiting the similar properties of noise gradients and noisy image gradients, and then added the noise to unpaired clean images to form paired training data. Wang et al. (2022a) constructed a new way of unsupervised denoising through optimal transport theory. It is worth noting that although Lin et al. (2021); Wang et al. (2022a) no longer subject the end-to-end learning approach to pairing clean-noisy images, they still need to collect many clean images.

