LOOP UNROLLED SHALLOW EQUILIBRIUM REGULAR-IZER (LUSER) -A MEMORY-EFFICIENT INVERSE PROBLEM SOLVER

Abstract

In inverse problems we aim to reconstruct some underlying signal of interest from potentially corrupted and often ill-posed measurements. Classical optimizationbased techniques proceed by optimizing a data consistency metric together with a regularizer. Current state-of-the-art machine learning approaches draw inspiration from such techniques by unrolling the iterative updates for an optimizationbased solver and then learning a regularizer from data. This loop unrolling (LU) method has shown tremendous success, but often requires a deep model for the best performance leading to high memory costs during training. Thus, to address the balance between computation cost and network expressiveness, we propose an LU algorithm with shallow equilibrium regularizers (LUSER). These implicit models are as expressive as deeper convolutional networks, but far more memory efficient during training. The proposed method is evaluated on image deblurring, computed tomography (CT), as well as single-coil Magnetic Resonance Imaging (MRI) tasks and shows similar, or even better, performance while requiring up to 8× less computational resources during training when compared against a more typical LU architecture with feedforward convolutional regularizers.

1. INTRODUCTION

In an inverse problems we face the task of reconstructing some data or parameters of an unknown signal from indirect observations. The forward process, or the mapping from the data to observations, is typically well known, but ill-posed or non-invertible. More formally, we consider the task of recovering some underlying signal x from measurements y taken via some forward operator A according to y = Ax + η, (1) where η represents noise. The forward operator can be nonlinear, but to simplify the notation, we illustrate the idea in linear form throughout this paper. A common approach to recover the signal is via an iterative method based on the least squares loss: x = arg min x ∥y -Ax∥ 2 . (2) For many problems of interest, A is ill-posed and does not have full column rank. Thus, attempting to solve (2) does not yield a unique solution. To address this, we can extend (2) by including a regularizing term to bias the inversion towards solutions with favorable properties. Common examples of regularization include ℓ 2 , ℓ 1 , and Total Variation (TV). Each regularizer encourages certain properties on the estimated signal x (e.g., smoothness, sparsity, piece-wise constant, etc.) and is often chosen based on task-specific prior knowledge. Recent works (Ongie et al., 2020) attempt to tackle inverse problems using more data-driven methods. Unlike typical supervised learning tasks that attempt to learn a mapping purely from examples, deep learning for inverse problems have access to the forward operator and thus should be able to guide the learning process for more accurate reconstructions. One popular approach to incorporating knowledge of the forward operator is termed loop unrolling (LU). These methods are heavily inspired by standard iterative inverse problem solvers, but rather than use a hand tuned regularizer, they instead learn the update with some parameterized model. They tend to have a fixed number of iterations (typically around 5-10) due to computational constraints. Gilton et al. (2021) proposes an interesting alternative that takes advantage of deep equilibrium (DEQ) models (Bai et al., 2019; 2020; Fung et al., 2021; El Ghaoui et al., 2021) that we refer to as DEQ4IP. Equilibrium models are designed to recursively iterate on their input until a "fixed point" is found (i.e., the input no longer changes after passing through the model). They extend this principle to the LU method, choosing to iterate until convergence rather than for a "fixed budget". Our Contributions. We propose an alternative novel architecture for solving inverse problems called Loop Unrolled Shallow Equilibrium Regularizer (LUSER). It incorporates knowledge of the forward model by adopting the principles of LU architectures while reducing its memory consumption by using a shallow (relative to existing feed-forward models) DEQ as the learned regularizer update. Unlike DEQ4IP that converts the entire LU architecture into a DEQ model, we only convert the learned regularizer at each stage. This has the advantage of simplifying the learning task for DEQ models, which can be unstable to train in practice. To our knowledge, this is the first use of multiple sequential DEQ models within a single architecture for solving inverse problems. Our proposed architecture (i) reduces the number of forward/adjoint operations compared to the work proposed by Gilton et al. ( 2021), and (ii) reduces the memory footprint during training without loss of expressiveness as demonstrated by our experiments.We empirically demonstrate better reconstruction across multiple tasks than LU alternatives with comparable number of parameters, with the ability to reduce computational memory costs during training by a factor of up to 8×. The remainder of the paper is organized as follows. Section 2 reviews related works in solving inverse problems. Section 3 introduces the proposed LUSER, which we compare with other baseline methods in image deblurring, CT, and MRI tasks in Section 4. We conclude in Section 5 with a brief discussion.

2.1. LOOP UNROLLING

As noted above, a common approach to tackling an inverse problem is to cast it as an optimization problem consisting of the sum of a data consistency term and a regularization term min x ∥y -Ax∥ 2 2 + γ r(x), where r is a regularization function mapping from the domain of the parameters of interest to a real number and γ ≥ 0 is a well-tuned coefficient. The regularization function is chosen for specific classes of signals to exploit any potential structure, e.g., ∥x∥ 2 for smooth signals and ∥x∥ 0 or ∥x∥ 1 for sparse signals. When r is differentiable, the solution of (3) can be obtained in an iterative fashion via gradient descent. For some step size λ at iteration k = 1, 2, . . . , K, we apply the update: x k+1 = x k + λA ⊤ (y -Ax k ) -λγ∇r(x k ). For non-differentiable r, the more generalized proximal gradient algorithm can be applied with the following update, where τ is a well-tuned hyperparameter related to the proximal operator: x k+1 = prox τ,r (x k + λA ⊤ (y -Ax k )). The loop unrolling (LU) method performs the update in (4) or ( 5), but replaces λγ∇r or the proximal operator with a learned neural network instead. The overall architecture repeats the neural network based update for a pre-determined number of iterations, fixing the overall computational budget. Note that the network is only implicitly learning the regularizer. In practice, it is actually learning an update step, which can be thought of as de-noising or a projection onto the manifold of the data. LU is typically trained end-to-end.While end-to-end training is easier to perform and encourages faster convergence, it requires all intermediate activations to be stored in memory. Thus, the maximum number of iterations is always kept small compared to classical iterative inverse problem solvers. Due to the limitation in memory, there is a trade-off between the depth of a LU and the richness of each regularization network. Intuitively, one can raise the network performance by increasing the number of loop unrolled iterations. For example, Gilton et al. ( 2021) extends the LU model to

