EQUIVARIANCE-AWARE ARCHITECTURAL OPTIMIZA-TION OF NEURAL NETWORKS

Abstract

Incorporating equivariance to symmetry groups as a constraint during neural network training can improve performance and generalization for tasks exhibiting those symmetries, but such symmetries are often not perfectly nor explicitly present. This motivates algorithmically optimizing the architectural constraints imposed by equivariance. We propose the equivariance relaxation morphism, which preserves functionality while reparametrizing a group equivariant layer to operate with equivariance constraints on a subgroup, as well as the [G]-mixed equivariant layer, which mixes layers constrained to different groups to enable within-layer equivariance optimization. We further present evolutionary and differentiable neural architecture search (NAS) algorithms that utilize these mechanisms respectively for equivariance-aware architectural optimization. Experiments across a variety of datasets show the benefit of dynamically constrained equivariance to find effective architectures with approximate equivariance.

1. INTRODUCTION

Constraining neural networks to be equivariant to symmetry groups present in the data can improve their task performance, efficiency, and generalization capabilities (Bronstein et al., 2021) , as shown by translation-equivariant convolutional neural networks (Fukushima & Miyake, 1982; LeCun et al., 1989) for image-based tasks (LeCun et al., 1998) . Seminal works have developed general theories and architectures for equivariance in neural networks, providing a blueprint for equivariant operations on complex structured data (Cohen & Welling, 2016; Ravanbakhsh et al., 2017; Kondor & Trivedi, 2018; Weiler et al., 2021) . However, these works design model constraints based on an explicit equivariance property. Furthermore, their architectural assumption of full equivariance in every layer may be overly constraining; e.g., in handwritten digit recognition, full equivariance to 180 • rotation may lead to misclassifying samples of "6" and "9". Weiler & Cesa (2019) found that local equivariance from a final subgroup convolutional layer improves performance over full equivariance. If appropriate equivariance constraints are instead learned, the benefits of equivariance could extend to applications where the data may have unknown or imperfect symmetries. Learning approximate equivariance has been recently approached via novel layer operations (Wang et al., 2022; Finzi et al., 2021; Zhou et al., 2020; Yeh et al., 2022; Basu et al., 2021) . Separately, the field of neural architecture search (NAS) aims to optimize full neural network architectures (Zoph & Le, 2017; Real et al., 2017; Elsken et al., 2017; Liu et al., 2018; Lu et al., 2019) . Existing NAS methods have not yet explicitly optimized equivariance, although partial or soft equivariant approaches like Romero & Lohit (2022) and van der Ouderaa et al. ( 2022) approach custom equivariant architectures. An important aspect of NAS is network morphisms: function-preserving architectural changes (Wei et al., 2016) which can be used during training to change the loss landscape and gradient descent trajectory while immediately maintaining the current functionality and loss value (Maile et al., 2022) . Developing tools for searching over a space of architectural representations of equivariance would permit NAS algorithms to be applied towards architectural optimization of equivariance. Contributions First, we present two mechanisms towards equivariance-aware architectural optimization. The equivariance relaxation morphism for group convolutional layers partially expands the representation and parameters of the layer to enable less constrained learning with a prior on symmetry. The [G]-mixed equivariant layer parametrizes a layer as a weighted sum of layers equivariant to different groups, permitting the learning of architectural weighting parameters. Second, we implement these concepts within two algorithms for architectural optimization of partially-equivariant networks. Evolutionary Equivariance-Aware NAS (EquiNAS E ) utilizes the equivariance relaxation morphism in a greedy evolutionary algorithm, dynamically relaxing constraints throughout the training process. Differentiable Equivariance-Aware NAS (EquiNAS D ) implements [G]-mixed equivariant layers throughout a network to learn the appropriate approximate equivariance of each layer, in addition to their optimized weights, during training. Finally, we analyze the proposed mechanisms via their respective NAS approaches in multiple image classification tasks, investigating how the dynamically learned approximate equivariance affects training and performance over baseline models and other approaches.

2. RELATED WORKS

Approximate equivariance Although no other works on approximate equivariance explicitly study architectural optimization, some approaches are architectural in nature. We compare our contributions with the most conceptually similar works to our knowledge. Neural architecture search Neural architecture search (NAS) aims to optimize both the architecture and its parameters for a given task. Liu et al. (2018) approaches this difficult bi-level optimization by creating a large super-network containing all possible elements and continuously relaxing the discrete architectural parameters to enable search by gradient descent. Other NAS approaches include evolutionary algorithms (Real et al., 2017; Lu et al., 2019; Elsken et al., 2017) and reinforcement learning (Zoph & Le, 2017) , which search over discretely represented architectures.

3. BACKGROUND

We assume familiarity with group theory (see Appendix A.1). For discrete group G, the l th Gequivariant group convolutional layer (Cohen & Welling, 2016) of a group convolutional neural network (G-CNN) convolvesfoot_0 the feature map f : G → R C l-1 from the previous layer with a filter with kernel size k represented as learnable parameters ψ : G → R C l ×C l-1 . For each output channel



We identify the correlation and convolution operators as they only differ where the inverse group element is placed and refer to both as "convolution" throughout this work.



The main contributions ofBasu et al. (2021)  andAgrawal & Ostrowski (2022)  are similar to our proposed equivariant relaxation morphism. Basu et al. (2021) also utilizes subgroup decomposition but instead algorithmically builds up equivariances from smaller groups, while our work focuses on relaxing existing constraints. Agrawal & Ostrowski (2022) presents theoretical contributions towards network morphisms for group-invariant shallow neural networks: in comparison, our work focuses on deep group convolutional architectures and implements the morphism in a NAS algorithm. The main contributions of Wang et al. (2022) and Finzi et al. (2021) are similar to our proposed [G]mixed equivariant layer. Wang et al. (2022) also uses a weighted sum of filters, but uses the same group for each filter and defines the weights over the domain of group elements. Finzi et al. (2021) uses an equivariant layer in parallel to a linear layer with weighted regularization, thus only using two layers in parallel and weighting them by regularization rather than parametrization. Mouli & Ribeiro (2021) also progressively relaxes equivariance constraints, but with regularized rather than parametrized constraints. In more diverse approaches, Zhou et al. (2020) and Yeh et al. (2022) represent symmetry-inducing weight sharing via learnable matrices. Romero & Lohit (2022) and van der Ouderaa et al. (2022) learn partial or soft equivariances for each layer.

