EARLY STOPPING FOR DEEP IMAGE PRIOR

Abstract

Deep image prior (DIP) and its variants have shown remarkable potential for solving inverse problems in computational imaging (CI), needing no separate training data. Practical DIP models are often substantially overparameterized. During the learning process, these models first learn the desired visual content and then pick up the potential modeling and observational noise, i.e., overfitting. Thus, the practicality of DIP hinges on early stopping (ES) that can capture the transition period. In this regard, most previous DIP works for CI tasks only demonstrate the potential of the models-reporting the peak performance against the groundtruth but providing no clue about how to operationally obtain near-peak performance without access to the groundtruth. In this paper, we set to break this practicality barrier of DIP, and propose an efficient ES strategy that consistently detects near-peak performance across several CI tasks and DIP variants. Simply based on the running variance of DIP intermediate reconstructions, our ES method not only outpaces the existing ones-which only work in very narrow regimes, but also remains effective when combined with methods that try to mitigate overfitting.

1. INTRODUCTION

Inverse problems (IPs) are prevalent in computational imaging (CI), ranging from basic image denoising, super-resolution, and deblurring, to advanced 3D reconstruction and major tasks in scientific and medical imaging (Szeliski, 2022) . Despite the disparate settings, all these problems take the form of recovering a visual object x from y = f (x), where f models the forward process to obtain the observation y. Typically, these visual IPs are underdetermined: x cannot be uniquely determined from y. This is exacerbated by potential modeling (e.g., linear f to approximate a nonlinear process) and observational (e.g., Gaussian or shot) noise, i.e., y ≈ f (x). To overcome the nonuniqueness and improve noise stability, people often encode a variety of problem-specific priors on x when formulating IPs. Traditionally, IPs are phrased as regularized data-fitting problems: min x ℓ(y, f (x)) + λR(x) ℓ(y, f (x)) : data-fitting loss, R(x) : regularizer (1) where λ is the regularization parameter. Here, the loss ℓ is often chosen according to the noise model, and the regularizer R encodes priors on x. The advent of deep learning (DL) has revolutionized how IPs are solved: on the radical side, deep neural networks (DNNs) are trained to directly map any given y to an x; on the mild side, pretrained or trainable DL models are taken to replace certain nonlinear mappings in numerical algorithms for solving Eq. ( 1) (e.g., plug-and-play, and algorithm unrolling). Recent surveys Ongie et al. ( 2020  min θ ℓ(y, f • G θ (z)) + λR • G θ (z). G θ is often "overparameterized"-containing substantially more parameters than the size of x, and "structured"-e.g., consisting of convolution networks to encode structural priors in natural visual objects. The resulting optimization problem is solved via standard first-order methods for modern DL (e.g., (adaptive) gradient descent). When x has multiple components with different physical



); Janai et al. (2020) on these developments trust large training sets {(y i , x i )} to adequately represent the underlying priors and/or noise distributions. This paper concerns another family of striking ideas that require no separate training data. Deep image prior (DIP) Ulyanov et al. (2018) proposes parameterizing x as x = G θ (z), where G θ is a trainable DNN parametrized by θ and z is a trainable or frozen random seed. No separate training data other than y are used! Putting the reparametrization into Eq. (1), we obtain

