EVALUATING UNSUPERVISED DENOISING REQUIRES UNSUPERVISED METRICS

Abstract

Unsupervised denoising is a crucial challenge in real-world imaging applications. Unsupervised deep-learning methods have demonstrated impressive performance on benchmarks based on synthetic noise. However, no metrics are available to evaluate these methods in an unsupervised fashion. This is highly problematic for the many practical applications where ground-truth clean images are not available. In this work, we propose two novel metrics: the unsupervised mean squared error (MSE) and the unsupervised peak signal-to-noise ratio (PSNR), which are computed using only noisy data. We provide a theoretical analysis of these metrics, showing that they are asymptotically consistent estimators of the supervised MSE and PSNR. Controlled numerical experiments with synthetic noise confirm that they provide accurate approximations in practice. We validate our approach on realworld data from two imaging modalities: videos in raw format and transmission electron microscopy. Our results demonstrate that the proposed metrics enable unsupervised evaluation of denoising methods based exclusively on noisy data.

1. INTRODUCTION

Image denoising is a fundamental challenge in image and signal processing, as well as a key preprocessing step for computer vision tasks. Convolutional neural networks achieve state-of-the-art performance for this problem, when trained using databases of clean images corrupted with simulated noise Zhang et al. (2017a) . However, in real-world imaging applications such as microscopy, noiseless ground truth videos are often not available. This has motivated the development of unsupervised denoising approaches that can be trained using only noisy measurements Lehtinen et al. ( 2018 2021). These methods have demonstrated impressive performance on natural-image benchmarks, essentially on par with the supervised state of the art. However, to the best of our knowledge, no unsupervised metrics are currently available to evaluate them using only noisy data. Reliance on supervised metrics makes it very challenging to create benchmark datasets using realworld measurements, because obtaining the ground-truth clean images required by these metrics is often either impossible or very constraining. In practice, clean images are typically estimated through temporal averaging, which suppresses dynamic information that is often crucial in scientific applications. Consequently, quantitative evaluation of unsupervised denoising methods is currently almost completely dominated by natural image benchmark datasets with simulated noise Lehtinen et al. ( 2018 The lack of unsupervised metrics also limits the application of unsupervised denoising techniques in practice. In the absence of quantitative metrics, domain scientists must often rely on visual inspection to evaluate performance on real measurements. This is particularly restrictive for deep-learning approaches, because it makes it impossible to perform systematic hyperparameter optimization and model selection on the data of interest. In this work, we propose two novel unsupervised metrics to address these issues: the unsupervised mean-squared error (uMSE) and the unsupervised peak signal-to-noise ratio (uPSNR), which are computed exclusively from noisy data. These metrics build upon existing unsupervised denoising



); Xie et al. (2020); Laine et al. (2019); Sheth et al. (2021); Huang et al. (

); Xie et al. (2020); Laine et al. (2019); Sheth et al. (2021); Huang et al. (2021), which are not always representative of the signal and noise characteristics that arise in real-world imaging applications.

