FULLY UNSUPERVISED DIVERSITY DENOISING WITH CONVOLUTIONAL VARIATIONAL AUTOENCODERS

Abstract

Deep Learning based methods have emerged as the indisputable leaders for virtually all image restoration tasks. Especially in the domain of microscopy images, various content-aware image restoration (CARE) approaches are now used to improve the interpretability of acquired data. Naturally, there are limitations to what can be restored in corrupted images, and like for all inverse problems, many potential solutions exist, and one of them must be chosen. Here, we propose DIVNOISING, a denoising approach based on fully convolutional variational autoencoders (VAEs), overcoming the problem of having to choose a single solution by predicting a whole distribution of denoised images. First we introduce a principled way of formulating the unsupervised denoising problem within the VAE framework by explicitly incorporating imaging noise models into the decoder. Our approach is fully unsupervised, only requiring noisy images and a suitable description of the imaging noise distribution. We show that such a noise model can either be measured, bootstrapped from noisy data, or co-learned during training. If desired, consensus predictions can be inferred from a set of DIVNOISING predictions, leading to competitive results with other unsupervised methods and, on occasion, even with the supervised state-of-the-art. DIVNOISING samples from the posterior enable a plethora of useful applications. We are piq showing denoising results for 13 datasets, piiq discussing how optical character recognition (OCR) applications can benefit from diverse predictions, and are piiiq demonstrating how instance cell segmentation improves when using diverse DIVNOISING predictions.

1. INTRODUCTION

The goal of scientific image analysis is to analyze pixel-data and measure the properties of objects of interest in images. Pixel intensities are subject to undesired noise and other distortions, motivating an initial preprocessing step called image restoration. Image restoration is the task of removing unwanted noise and distortions, giving us clean images that are closer to the true but unknown signal. In the past years, Deep Learning (DL) has enabled tremendous progress in image restoration (Mao et al., 2016; Zhang et al., 2017b; Zhang et al., 2017; Weigert et al., 2018) . Supervised DL methods use corresponding pairs of clean and distorted images to learn a mapping between the two quality levels. The utility of this approach is especially pronounced for microscopy image data of biological samples (Weigert et al., 2017; 2018; Ouyang et al., 2018; Wang et al., 2019) , where quantitative downstream analysis is essential. More recently, unsupervised content-aware image restoration ˚Shared first authors. : Shared last authors. (CARE) methods (Lehtinen et al., 2018; Krull et al., 2019; Batson & Royer, 2019; Buchholz et al., 2019) have emerged. They can, enabled by sensible assumptions about the statistics of imaging noise, learn a mapping from noisy to clean images, without ever seeing clean data during training. Some of these methods additionally include a probabilistic model of the imaging noise (Krull et al., 2020; Laine et al., 2019; Prakash et al., 2020; Khademi et al., 2020) to further improve their performance. Note that such denoisers can directly be trained on a given body of noisy images. All existing approaches have a common flaw: distortions degrade some of the information content in images, generally making it impossible to fully recover the desired clean signal with certainty. Even an ideal method cannot know which of many possible clean images really has given rise to the degraded observation at hand. Hence, any restoration method has to make a compromise between possible solutions when predicting a restored image. Generative models, such as VAEs, are a canonical choice when a distribution over a set of variables needs to be learned. Still, so far VAEs have been overlooked as a method to solve unsupervised image denoising problems. This might also be due to the fact that vanilla VAEs (Kingma & Welling, 2014; Rezende et al., 2014) show sub-par performance on denoising problems (see Section 6). Here we introduce DIVNOISING, a principled approach to incorporate explicit models of the imaging noise distribution in the decoder of a VAE. Such noise models can be either measured or derived (bootstrapped) from the noisy image data alone (Krull et al., 2020; Prakash et al., 2020) . Additionally we propose a way to co-learn a suitable noise model during training, rendering DIVNOISING fully unsupervised. We show on 13 datasets that fully convolutional VAEs, trained with our proposed DIVNOISING framework, yield competitive results, in 8 cases actually becoming the new state-ofthe-art (see Fig. 2 and Table 1 ). Still, the key benefit of DIVNOISING is that the method does not need to commit to a single prediction, but is instead capable of generating diverse samples from an approximate posterior of possible true signals. (Note that point estimates can still be inferred if desired, as shown in Fig. 4 .) Other unsupervised denoising methods only provide a single solution (point estimate) of that posterior (Krull et al., 2019; Lehtinen et al., 2018; Batson & Royer, 2019) or predict an independent posterior distribution of intensities per pixel (Krull et al., 2020; Laine et al., 2019; Prakash et al., 2020; Khademi et al., 2020) . Hence, DIVNOISING is the first method that learns to approximate the posterior over meaningful structures in a given body of images. We believe that DIVNOISING will be hugely beneficial for computational biology applications in biomedical imaging, where noise is typically unavoidable and huge datasets need to be processed on a daily basis. Here, DIVNOISING enables unsupervised diverse SOTA denoising while requiring only comparatively little computational resources, rendering our approach particularly practical. Finally, we discuss the utility of diverse denoising results for OCR and showcase it for a ubiquitous analysis task in biology -the instance segmentation of cells in microscopy images (see Fig. 5 ). Hence, DIVNOISING has the potential to be useful for many real-world applications and will not only generate state-of-the-art (SOTA) restored images, but also enrich quantitative downstream processing.



Figure 1: Training and prediction/inference with DIVNOISING. (top) A DivNoising VAE can be trained fully unsupervised, using only noisy data and a (measured, bootstrapped, or co-learned) pixel noise model pNMpxi|siq (see main text for details). (bottom) After training, the encoder can be used to sample multiple z k " q φ pz|xq, giving rise to diverse denoised samples s k . These samples can further be used to infer consensus point estimates such as a MMSE or a MAP solution.

