UNIVERSAL APPROXIMATION THEOREM FOR EQUIVARIANT MAPS BY GROUP CNNS Anonymous

Abstract

Group symmetry is inherent in a wide variety of data distributions. Data processing that preserves symmetry is described as an equivariant map and often effective in achieving high performance. Convolutional neural networks (CNNs) have been known as models with equivariance and shown to approximate equivariant maps for some specific groups. However, universal approximation theorems for CNNs have been separately derived with individual techniques according to each group and setting. This paper provides a unified method to obtain universal approximation theorems for equivariant maps by CNNs in various settings. As its significant advantage, we can handle non-linear equivariant maps between infinite-dimensional spaces for non-compact groups.

1. INTRODUCTION

Deep neural networks have been widely used as models to approximate underlying functions in various machine learning tasks. The expressive power of fully-connected deep neural networks was first mathematically guaranteed by the universal approximation theorem in Cybenko (1989) , which states that any continuous function on a compact domain can be approximated with any precision by an appropriate neural network with sufficient width and depth. Beyond the classical result stated above, several types of variants of the universal approximation theorem have also been investigated under different conditions. Among a wide variety of deep neural networks, convolutional neural networks (CNNs) have achieved impressive performance for real applications. In particular, almost all of state-of-the-art models for image recognition are based on CNNs. These successes are closely related to the property that performing CNNs commute with translation on pixel coordinate. That is, CNNs can conserve symmetry about translation in image data. In general, this kind of property for symmetry is known as the equivariance, which is a generalization of the invariance. When a data distribution has some symmetry and the task to be solved relates to the symmetry, data processing is desired to be equivariant on the symmetry. In recent years, different types of symmetry have been focused per each task, and it has been proven that CNNs can approximate arbitrary equivariant data processing for specific symmetry. These results are mathematically captured as the universal approximation for equivariant maps and represent the theoretical validity of the use of CNNs. In order to theoretically correctly handle symmetric structures, we have to carefully consider the structure of data space where data distributions are defined. For example, in image recognition tasks, image data are often supposed to have symmetry for translation. When each image data is acquired, there are finite pixels equipped with an image sensor, and an image data is represented by a finitedimensional vector in a Euclidean space R d , where d is the number of pixels. However, we note that the finiteness of pixels stems from the limit of the image sensor and a raw scene behind the image data is thought to be modelled by an element in R S with continuous spatial coordinates S, where R S is a set of functions from S to R. Then, the element in R S is regarded as a functional representation of the image data in R d . In this paper, in order to appropriately formulate data symmetry, we treat both typical data representation in finite-dimensional settings and functional representation in infinite-dimensional settings in a unified manner.

