EQUIVARIANCE-AWARE ARCHITECTURAL OPTIMIZA-TION OF NEURAL NETWORKS

Abstract

Incorporating equivariance to symmetry groups as a constraint during neural network training can improve performance and generalization for tasks exhibiting those symmetries, but such symmetries are often not perfectly nor explicitly present. This motivates algorithmically optimizing the architectural constraints imposed by equivariance. We propose the equivariance relaxation morphism, which preserves functionality while reparametrizing a group equivariant layer to operate with equivariance constraints on a subgroup, as well as the [G]-mixed equivariant layer, which mixes layers constrained to different groups to enable within-layer equivariance optimization. We further present evolutionary and differentiable neural architecture search (NAS) algorithms that utilize these mechanisms respectively for equivariance-aware architectural optimization. Experiments across a variety of datasets show the benefit of dynamically constrained equivariance to find effective architectures with approximate equivariance.

1. INTRODUCTION

Constraining neural networks to be equivariant to symmetry groups present in the data can improve their task performance, efficiency, and generalization capabilities (Bronstein et al., 2021) , as shown by translation-equivariant convolutional neural networks (Fukushima & Miyake, 1982; LeCun et al., 1989) for image-based tasks (LeCun et al., 1998) . Seminal works have developed general theories and architectures for equivariance in neural networks, providing a blueprint for equivariant operations on complex structured data (Cohen & Welling, 2016; Ravanbakhsh et al., 2017; Kondor & Trivedi, 2018; Weiler et al., 2021) . However, these works design model constraints based on an explicit equivariance property. Furthermore, their architectural assumption of full equivariance in every layer may be overly constraining; e.g., in handwritten digit recognition, full equivariance to 180 • rotation may lead to misclassifying samples of "6" and "9". Weiler & Cesa (2019) found that local equivariance from a final subgroup convolutional layer improves performance over full equivariance. If appropriate equivariance constraints are instead learned, the benefits of equivariance could extend to applications where the data may have unknown or imperfect symmetries. Learning approximate equivariance has been recently approached via novel layer operations (Wang et al., 2022; Finzi et al., 2021; Zhou et al., 2020; Yeh et al., 2022; Basu et al., 2021) . Separately, the field of neural architecture search (NAS) aims to optimize full neural network architectures (Zoph & Le, 2017; Real et al., 2017; Elsken et al., 2017; Liu et al., 2018; Lu et al., 2019) . Existing NAS methods have not yet explicitly optimized equivariance, although partial or soft equivariant approaches like Romero & Lohit (2022) and van der Ouderaa et al. (2022) approach custom equivariant architectures. An important aspect of NAS is network morphisms: function-preserving architectural changes (Wei et al., 2016) which can be used during training to change the loss landscape and gradient descent trajectory while immediately maintaining the current functionality and loss value (Maile et al., 2022) . Developing tools for searching over a space of architectural representations of equivariance would permit NAS algorithms to be applied towards architectural optimization of equivariance.

