CONVEX REGULARIZATION BEHIND NEURAL RECONSTRUCTION

Abstract

Neural networks have shown tremendous potential for reconstructing highresolution images in inverse problems. The non-convex and opaque nature of neural networks, however, hinders their utility in sensitive applications such as medical imaging. To cope with this challenge, this paper advocates a convex duality framework that makes a two-layer fully-convolutional ReLU denoising network amenable to convex optimization. The convex dual network not only offers the optimum training with convex solvers, but also facilitates interpreting training and prediction. In particular, it implies training neural networks with weight decay regularization induces path sparsity while the prediction is piecewise linear filtering. A range of experiments with MNIST and fastMRI datasets confirm the efficacy of the dual network optimization problem.

1. INTRODUCTION

In the age of AI, image reconstruction has witnessed a paradigm shift that impacts several applications ranging from natural image super-resolution to medical imaging. Compared with the traditional iterative algorithms, AI has delivered significant improvements in speed and image quality, making learned reconstruction based on neural networks widely adopted in clinical scanners and personal devices. The non-convex and opaque nature of deep neural networks however raises serious concerns about the authenticity of the predicted pixels in domains as sensitive as medical imaging. It is thus crucial to understand what the trained neural networks represent, and interpret their reconstruction per pixel for unseen images. Reconstruction is typically cast as an inverse problem, where neural networks are used in different ways to create effective priors; see e.g., (Ongie et al., 2020; Mardani et al., 2018b) and references therein. An important class of methods are denoising networks, which given natural data corrupted by some noisy process Y , aim to regress the ground-truth, noise-free data X * (Gondara, 2016; Vincent et al., 2010) . These networks are generally learned in a supervised fashion, such that a mapping f : Y → X is learned from inputs {y i } n i=1 to outputs {x * i } n i=1 , and then can be used in the inference phase on new samples ŷ to generate the prediction x * = f ( ŷ). The scope of supervised denoising networks is so general that it can cover more structured inverse problems appearing, for example, in compressed sensing. In this case one can easily form a poor (linear) estimate of the ground-truth image that is noisy and then reconstruct via end-to-end denoising networks (Mardani et al., 2018b; Mousavi et al., 2015) . This method has been proven quite effective on tasks such as medical image reconstruction (Mardani et al., 2018b; a; Sandino et al., 2020; Hammernik et al., 2018) , and significantly outperforms sparsity-inducing convex denoising methods, such as total-variation (TV) and wavelet regularization (Candès et al., 2006; Lustig et al., 2008; Donoho, 2006) in terms of both quality and speed. Despite their encouraging results and growing use in clinical settings (Sandino et al., 2020; Hammernik et al., 2018; Mousavi et al., 2015) , little work has explored the interpretation of supervised training of over-parameterized neural networks for inverse problems. Whereas robustness guarantees exist for inverse problems with minimization of convex sparsity-inducing objectives (Oymak & Hassibi, 2016; Chandrasekaran et al., 2012) , there exist no such guarantees for predictions of non-convex denoising neural networks based on supervised training. While the forward pass of a network has been interpreted as a layered basis pursuit from sparse dictionary learning, this approach lacks an understanding of the optimization perspective of such networks, neglecting the solutions to which these networks actually converge (Papyan et al., 2017) . In fact, it has been demonstrated empirically that deep neural networks for image reconstruction can be unstable; i.e., small perturbations in the input can cause severe artifacts in the reconstruction, which can mask relevant structural features, which are important for medical image interpretation (Antun et al., 2020) . The main challenge in explaining these effects emanates from the non-linear and non-convex structure of deep neural networks that are heuristically optimized via first-order stochastic gradient descent (SGD) based solvers such as Adam (Kingma & Ba, 2014) . As a result, it is hard to interpret the inference phase, and the training samples can alter the predictions for unseen images. In other applications, Neural Tangent Kernels (NTK) have become popular to understand the behavior of neural networks (Jacot et al., 2018) . They however strongly rely on the oversimplifying infinite-width assumption for the network that is not practical, and as pointed out by prior work (Arora et al., 2019) , they cannot explain the success of neural networks in practice. To cope with these challenges, we present a convex-duality framework for two-layer finite-width denoising networks with fully convolutional (conv.) layers with ReLU activation and the representation shared among all output pixels. In essence, inspired by the analysis by Pilanci & Ergen (2020), the zero-duality gap offers a convex bi-dual formulation for the original non-convex objective, that demands only polynomial variable count. The benefits of the convex dual are three-fold. First, with the convex dual, one can leverage offthe-shelf convex solvers to guarantee convergence to the global optimum in polynomial time and provides robustness guarantees for reconstruction. Second, it provides an interpretation of the training with weight decay regularization as implicit regularization with path-sparsity, a form of capacity control of neural networks (Neyshabur et al., 2015) . Third, the convex dual interprets CNN-based denoising as first dividing the input image patches into clusters, based on their latent representation, and then linear filtering is applied for patches in the same cluster. A range of experiments are performed with MNIST and fastMRI reconstruction that confirm the zero-duality gap, interpretability, and practicality of the convex formulation. All in all, the main contributions of this paper are summarized as follows: • We, for the first time, formulate a convex program with polynomial complexity for neural image reconstruction, which is provably identical to a two-layer fully-conv. ReLU network. • We provide novel interpretations of the training objective with weight decay as pathsparsity regularization, and prediction as patch-based clustering and linear filtering. • We present extensive experiments for MNIST and fastMRI reconstruction that our convex dual coincides with the non-convex neural network, and interpret the learned dual networks.

2. RELATED WORK

This paper is at the intersection of two lines of work, namely, convex neural networks, and deep learning for inverse problems. Convex neural networks were introduced in (Bach, 2017; Bengio et al., 2006), and later in (Pilanci & Ergen, 2020; Ergen & Pilanci, 2020a; b) . The most relevant to our work are (Pilanci & Ergen, 2020; Ergen & Pilanci, 2020b) which put forth a convex duality framework for two-layer ReLU networks with a single output. It presents a convex alternative in a higher dimensional space for the non-convex and finite-dimensional neural network. It is however restricted to scalar-output networks, and considers either fully-connected networks (Pilanci & Ergen, 2020), or, CNNs with average pooling (Ergen & Pilanci, 2020b) . Our work however focuses on fully convolutional denoising with an output dimension as large as the number of image pixels, where these pixels share the same hidden representation. This is indeed quite different from the setting considered in (Pilanci & Ergen, 2020) and demands a different treatment. It could also be useful to mention that there are works in (Amos et al., 2017; Chen et al., 2019) that customize the network architecture for convex inference, but they still require non-convex training.

