ON THE UNIVERSAL APPROXIMATION PROPERTY OF DEEP FULLY CONVOLUTIONAL NEURAL NETWORKS

Abstract

We study the approximation of shift-invariant or equivariant functions by deep fully convolutional networks from the dynamical systems perspective. We prove that deep residual fully convolutional networks and their continuous-layer counterpart can achieve universal approximation of these symmetric functions at constant channel width. Moreover, we show that the same can be achieved by nonresidual variants with at least 2 channels in each layer and convolutional kernel size of at least 2. In addition, we show that these requirements are necessary, in the sense that networks with fewer channels or smaller kernels fail to be universal approximators.

1. INTRODUCTION

Convolutional Neural Networks (CNN) are widely used as fundamental building blocks in the design of modern deep learning architectures, for it can extract key data features with much fewer parameters, lowering both memory requirement and computational cost. When the input data contains spatial structure, such as pictures or videos, this parsimony often does not hurt their performance. This is particularly interesting in the case of fully convolutional neural networks (FCNN) (Long et al., 2015) , built by the composition of convolution, nonlinear activation and summing (averaging) layers, with the last layer being a permutation invariant pooling operator, see Figure 1 . Consequently,

annex

a prominent feature of FCNN is that, when shifting the input data indices (e.g. picture, video, or other higher-dimensional spatial data), the output result should remain the same. This is called shift invariance. An example application of FCNN is image classification problems where the class label (or probability, under the softmax activation) of the image remains the same under translating the image (i.e. shifting the image pixels). A variant of FCNN applies to problems where the output data has the same size as the input data, e.g. pixel-wise segmentation of images (Badrinarayanan et al., 2017) . In this case, simply stacking the fully convolutional layers is enough. We call this type of CNN equivariant fully convolutional neural network (eq-FCNN), since when shifting the input data indices, the output data indices shift by the same amount. This is called shift equivariance. It is believed that the success of these convolutional architectures hinges on shift invariance or equivariance, which capture intrinsic structure in spatial data. From an approximation theory viewpoint, this

