META-LEARNING SYMMETRIES BY REPARAMETERI-ZATION

Abstract

Many successful deep learning architectures are equivariant to certain transformations in order to conserve parameters and improve generalization: most famously, convolution layers are equivariant to shifts of the input. This approach only works when practitioners know the symmetries of the task and can manually construct an architecture with the corresponding equivariances. Our goal is an approach for learning equivariances from data, without needing to design custom task-specific architectures. We present a method for learning and encoding equivariances into networks by learning corresponding parameter sharing patterns from data. Our method can provably represent equivariance-inducing parameter sharing for any finite group of symmetry transformations. Our experiments suggest that it can automatically learn to encode equivariances to common transformations used in image processing tasks. We provide our experiment code at https: //github.com/AllanYangZhou/metalearning-symmetries.

1. INTRODUCTION

In deep learning, the convolutional neural network (CNN) (LeCun et al., 1998) is a prime example of exploiting equivariance to a symmetry transformation to conserve parameters and improve generalization. In image classification (Russakovsky et al., 2015; Krizhevsky et al., 2012) and audio processing (Graves and Jaitly, 2014; Hannun et al., 2014 ) tasks, we may expect the layers of a deep network to learn feature detectors that are translation equivariant: if we translate the input, the output feature map is also translated. Convolution layers satisfy translation equivariance by definition, and produce remarkable results on these tasks. The success of convolution's "built in" inductive bias suggests that we can similarly exploit other equivariances to solve machine learning problems. However, there are substantial challenges with building in inductive biases. Identifying the correct biases to build in is challenging, and even if we do know the correct biases, it is often difficult to build them into a neural network. Practitioners commonly avoid this issue by "training in" desired equivariances (usually the special case of invariances) using data augmentation. However, data augmentation can be challenging in many problem settings and we would prefer to build the equivariance into the network itself. For example, robotics sim2real transfer approaches train agents that are robust to varying conditions by varying the simulated environment dynamics (Song et al., 2020) . But this type of augmentation is not possible once the agent leaves the simulator and is trying to learn or adapt to a new task in the real world. Additionally, building in incorrect biases may actually be detrimental to final performance (Liu et al., 2018b) . In this work we aim for an approach that can automatically learn and encode equivariances into a neural network. This would free practitioners from having to design custom equivariant architectures for each task, and allow them to transfer any learned equivariances to new tasks. Neural network layers can achieve various equivariances through parameter sharing patterns, such as the spatial parameter sharing of standard convolutions. In this paper we reparameterize network layers to learnably represent sharing patterns. We leverage meta-learning to learn the sharing patterns that help a model generalize on new tasks. The primary contribution of this paper is an approach to automatically learn equivariance-inducing parameter sharing, instead of using custom designed equivariant architectures. We show theoretically that reparameterization can represent networks equivariant to any finite symmetry group. Our experiments show that meta-learning can recover various convolutional architectures from data, and learn invariances to common data augmentation transformations.

2. RELATED WORK

A number of works have studied designing layers with equivariances to certain transformations such as permutation, rotation, reflection, and scaling (Gens and Domingos, 2014; Cohen and Welling, 2016; Zaheer et al., 2017; Worrall et al., 2017; Cohen et al., 2019; Weiler and Cesa, 2019; Worrall and Welling, 2019) . These approaches focus on manually constructing layers analagous to standard convolution, but for other symmetry groups. Rather than building symmetries into the architecture, data augmentation (Beymer and Poggio, 1995; Niyogi et al., 1998) trains a network to satisfy them. Diaconu and Worrall (2019) use a hybrid approach that pre-trains a basis of rotated filters in order to define roto-translation equivariant convolution. Unlike these works, we aim to automatically build in symmetries by acquiring them from data. Our approach is motivated in part by theoretical work characterizing the nature of equivariant layers for various symmetry groups. In particular, the analysis of our method as learning a certain kind of convolution is inspired by Kondor and Trivedi (2018) , who show that under certain conditions all linear equivariant layers are (generalized) convolutions. Shawe-Taylor (1989) and Ravanbakhsh et al. ( 2017) analyze the relationship between desired symmetries in a layer and symmetries of the weight matrix. Ravanbakhsh et al. ( 2017) show that we can make a layer equivariant to the permutation representation of any discrete group through a corresponding parameter sharing pattern in the weight matrix. From this perspective, our reparameterization is a way of representing possible parameter sharing patterns, and the training procedure aims to learn the correct parameter sharing pattern that achieves a desired equivariance. Prior work on automatically learning symmetries include methods for learning invariances in Gaussian processes (van der Wilk et al., 2018) and learning symmetries of physical systems (Greydanus et al., 2019; Cranmer et al., 2020) . Another very recent line of work has shown that more general Transformer (Vaswani et al., 2017) style architectures can match or outperform traditional CNNs on image tasks, without baking in translation symmetry (Dosovitskiy et al., 2020) . Their results suggest that Transformer architectures can automatically learn symmetries and other inductive biases from data, but typically only with very large training datasets. One can also consider automatic data augmentation strategies (Cubuk et al., 2018; Lorraine et al., 2019) as a way of learning symmetries, though the symmetries are not embedded into the network in a transferable way. Concurrent work by Benton et al. (2020) aims to learn invariances from data by learning distributions over transformations of the input, similar to learned data augmentation. Our method aims to learn parameter sharing of the layer weights which induces equivariance. Additionally, our objective for learning symmetries is driven directly by generalization error (in a meta-learning framework), while the objective in Benton et al. (2020) adds a regularizer to the training loss to encourage symmetry learning. Our work is related to neural architecture search (Zoph and Le, 2016; Brock et al., 2017; Liu et al., 2018a; Elsken et al., 2018) , which also aims to automate part of the model design process. Although architecture search methods are varied, they are generally not designed to exploit symmetry or learn equivariances. Evolutionary methods for learning both network weights and topology (Stanley and Miikkulainen, 2002; Stanley et al., 2009) are also not motivated by symmetry considerations. Our method learns to exploit symmetries that are shared by a collection of tasks, a form of metalearning (Thrun and Pratt, 2012; Schmidhuber, 1987; Bengio et al., 1992; Hochreiter et al., 2001) . We extend gradient based meta-learning (Finn et al., 2017; Li et al., 2017; Antoniou et al., 2018) to separately learn parameter sharing patterns (which enforce equivariance) and actual parameter values. Separately representing network weights in terms of a sharing pattern and parameter values is a form of reparameterization. Prior work has used weight reparameterization in order to "warp" the loss surface (Lee and Choi, 2018; Flennerhag et al., 2019) and to learn good latent spaces (Rusu et al., 2018) for optimization, rather than to encode equivariance. HyperNetworks (Ha et al., 2016; Schmidhuber, 1992) generate network layer weights using a separate smaller network, which can be viewed as a nonlinear reparameterization, albeit not one that encourages learning equivariances. Modular meta-learning (Alet et al., 2018) is a related technique that aims to achieve combinatorial generalization on new tasks by stacking meta-learned "modules," each of which is a neural network.

