NEURAL NETWORK APPROXIMATION OF LIPSCHITZ FUNCTIONS IN HIGH DIMENSIONS WITH APPLICA-TIONS TO INVERSE PROBLEMS

Abstract

The remarkable successes of neural networks in a huge variety of inverse problems have fueled their adoption in disciplines ranging from medical imaging to seismic analysis over the past decade. However, the high dimensionality of such inverse problems has simultaneously left current theory, which predicts that networks should scale exponentially in the dimension of the problem, unable to explain why the seemingly small networks used in these settings work as well as they do in practice. To reduce this gap between theory and practice, a general method for bounding the complexity required for a neural network to approximate a Lipschitz function on a high-dimensional set with a low-complexity structure is provided herein. The approach is based on the observation that the existence of a linear Johnson-Lindenstrauss embedding A ∈ R d×D of a given high-dimensional set S ⊂ R D into a low dimensional cube [-M, M ] d implies that for any Lipschitz function f : S → R p , there exists a Lipschitz function g : [-M, M ] d → R p such that g(Ax) = f (x) for all x ∈ S. Hence, if one has a neural network which approximates g : [-M, M ] d → R p , then a layer can be added which implements the JL embedding A to obtain a neural network which approximates f : S → R p . By pairing JL embedding results along with results on approximation of Lipschitz functions by neural networks, one then obtains results which bound the complexity required for a neural network to approximate Lipschitz functions on high dimensional sets. The end result is a general theoretical framework which can then be used to better explain the observed empirical successes of smaller networks in a wider variety of inverse problems than current theory allows.

1. INTRODUCTION

At present various network architectures (NN, CNN, ResNet) achieve state-of-the-art performance in a broad range of inverse problems, including matrix completion (Zheng et al., 2016; Monti et al., 2017; Dziugaite & Roy, 2015; He et al., 2017 ) image-deconvolution (Xu et al., 2014; Kupyn et al., 2018) , low-dose CT-reconstitution (Nah et al., 2017) , electric and magnetic inverse Problems (Coccorese et al., 1994) (seismic analysis, electromagnetic scattering). However, since these problems are very high dimensional, classical universal approximation theory for such networks provides very pessimistic estimates of the network sizes required to learn such inverse maps (i.e., as being much larger than what standard computers can store, much less train). As a result, a gap still exists between the widely observed successes of networks in practice and the network size bounds provided by current theory in many inverse problem applications. The purpose of this paper is to provide a refined bound on the size of networks in a wide range of such applications and to show that the network size is indeed affordable in many inverse problem settings. In particular, the bound developed herein depends on the model complexity of the domain of the forward map instead of the domain's extrinsic input dimension, and therefore is much smaller in a wide variety of model settings. To be more specific, recall in most inverse problems one aims to recover some signal x from its measurement y = F (x). Here y and x could both be high dimensional vectors, or even matrices and tensors, and F , which is called the forward map/operator, could either be linear or nonlinear with various regularity conditions depending on the application. In all cases, however, recovering x from y amounts to inverting F . In other words, one aims want to find the operator F -1 , that sends every 1

