GROUP EQUIVARIANT CONDITIONAL NEURAL PRO-CESSES

Abstract

We present the group equivariant conditional neural process (EquivCNP), a metalearning method with permutation invariance in a data set as in conventional conditional neural processes (CNPs), and it also has transformation equivariance in data space. Incorporating group equivariance, such as rotation and scaling equivariance, provides a way to consider the symmetry of real-world data. We give a decomposition theorem for permutation-invariant and group-equivariant maps, which leads us to construct EquivCNPs with an infinite-dimensional latent space to handle group symmetries. In this paper, we build architecture using Lie group convolutional layers for practical implementation. We show that EquivCNP with translation equivariance achieves comparable performance to conventional CNPs in a 1D regression task. Moreover, we demonstrate that incorporating an appropriate Lie group equivariance, EquivCNP is capable of zero-shot generalization for an image-completion task by selecting an appropriate Lie group equivariance.

1. INTRODUCTION

Data symmetry has played a significant role in the deep neural networks. In particular, a convolutional neural network, which play an important part in the recent achievements of deep neural networks, has translation equivariance that preserves the symmetry of the translation group. From the same point of view, many studies have aimed to incorporate various group symmetries into neural networks, especially convolutional operation (Cohen et al., 2019; Defferrard et al., 2019; Finzi et al., 2020) . As example applications, to solve the dynamics modeling problems, some works have introduced Hamiltonian dynamics (Greydanus et al., 2019; Toth et al., 2019; Zhong et al., 2019) . Similarly, Quessard et al. (2020) estimated the action of the group by assuming the symmetry in the latent space inferred by the neural network. Incorporating the data structure (symmetries) into the models as inductive bias, can reduce the model complexity and improve model generalization. In terms of inductive bias, meta-learning, or learning to learn, provides a way to select an inductive bias from data. Meta-learning use past experiences to adapt quickly to a new task T ∼ p(T ) sampled from some task distribution p(T ). Especially in supervised meta-learning, a task is described as predicting a set of unlabeled data (target points) given a set of labeled data (context points). Various works have proposed the use of supervised meta-learning from different perspectives (Andrychowicz et al., 2016; Ravi & Larochelle, 2016; Finn et al., 2017; Snell et al., 2017; Santoro et al., 2016; Rusu et al., 2018) . In this study, we are interested in neural processes (NPs) (Garnelo et al., 2018a; b) , which are meta-learning models that have encoder-decoder architecture (Xu et al., 2019) . The encoder is a permutation-invariant function on the context points that maps the contexts into a latent representation. The decoder is a function that produces the conditional predictive distribution of targets given the latent representation. The objective of NPs is to learn the encoder and the decoder, so that the predictive model generalizes well to new tasks by observing some points of the tasks. To achieve the objective, an NP is required to learn the shared information between the training tasks T , T ∼ p(T ): the data knowledge Lemke et al. (2015) . Each task T is represented by one dataset, and multiple datasets are provided for training NPs to tackle a meta-task. For example, we consider a meta-task that completing the pixels that are missing in a given image. Often, images are taken by the same condition in each dataset, respectively. While the datasets contain identical subjects of images (e.g., cars or apples), the size and angle of the subjects in the image may be different; the datasets have group symmetry, such as scaling and rotation. Therefore, it is expected that pre-constraining NPs to have group equivariance improves the performance of the NPs at those datasets. In this paper, we investigate the group equivalence of NPs. Specifically, we try to answer the following two questions, (1) can NPs represent equivariant functions? (2) can we explicitly induce the group equivariance into NPs? In order to answer the questions, we introduce a new family of NPs, EquivCNP, and show that EquivCNP is a permutation-invariant and group-equivariant function theoretically and empirically. Most relevant to EquivCNP, ConvCNP (Gordon et al., 2019) shows that using general convolution operation leads to the translation equivariance theoretically and experimentally; however it does not consider incorporation of other groups. First, we introduce the decomposition theorem for permutation-invariant and group-equivariant maps. The theorem suggests that the encoder maps the context points into a latent variable, which is a functional representation, in order to preserve the data symmetry. Thereafter, we construct EquivCNP by following the theorem. In this study, we adopt LieConv (Finzi et al., 2020) to construct EquivCNP for practical implementation. We tackle a 1D synthetic regression task (Garnelo et al., 2018a; b; Kim et al., 2019; Gordon et al., 2019) to show that EquivCNP with translation equivariance is comparable to conventional NPs. Furthermore, we design a 2D image completion task to investigate the potential of EquivCNP with several group equivariances. As a result, we demonstrate that EquivCNP enables zero-shot generalization by incorporating not translation, but scaling equivariance.

2.1. NEURAL NETWORKS WITH GROUP EQUIVARIANCE

Our works build upon the recent advances in group equivariant convolutional operation incorporated into deep neural networks. The first approach is group convolution introduced in (Cohen & Welling, 2016), where standard convolutional kernels are used and their transformation or the output transformation is performed with respect to the group. This group convolution induces exact equivariance, but only to the action of discrete groups. In contrast, for exact equivariance to continuous groups, some works employ harmonic analysis so as to find the basis of equivariant functions, and then parameterize convolutional kernels in the basis (Weiler & Cesa, 2019) . Although this approach can be applied to any type of general data (Anderson et al., 2019; Weiler & Cesa, 2019) , it is limited to local application to compact, unimodular groups. To address these issues, LieConv (Finzi et al., 2020) and other works (Huang et al., 2017; Bekkers, 2019) use Lie groups. Our EquivCNP chooses LieConv to manage group equivariance for simplicity of the implementation. There are several works that study deep neural networks using data symmetry. In some works, in order to solve machine learning problems such as sequence prediction or reinforcement learning, neural networks attempt to learn a data symmetry of physical systems from noisy observations directly (Greydanus et al., 2019; Toth et al., 2019; Zhong et al., 2019; Sanchez-Gonzalez et al., 2019) . While both these studies and EquivCNP can handle data symmetries, EquivCNP is not limited to specific domains such as physics. Furthermore, Quessard et al. (2020) let the latent space into which neural networks map data, have group equivariance, and estimated the parameters of data symmetries. In terms of using group equivariance in the latent space, EquivCNP is similar to this study but differs from being able to use various group equivariance.

2.2. FAMILY OF NEURAL PROCESSES

NPs (Garnelo et al., 2018a; b) are deep generative models for regression functions that map an input x i ∈ R dx into an output y i ∈ R dy . In particular, given an arbitrary number of observed data points (x C , y C ) := {(x i , y i )} C i=1 , NPs model the conditional distribution of the target value y T at some new,

