THE SURPRISING EFFECTIVENESS OF EQUIVARIANT MODELS IN DOMAINS WITH LATENT SYMMETRY

Abstract

Extensive work has demonstrated that equivariant neural networks can significantly improve sample efficiency and generalization by enforcing an inductive bias in the network architecture. These applications typically assume that the domain symmetry is fully described by explicit transformations of the model inputs and outputs. However, many real-life applications contain only latent or partial symmetries which cannot be easily described by simple transformations of the input. In these cases, it is necessary to learn symmetry in the environment instead of imposing it mathematically on the network architecture. We discover, surprisingly, that imposing equivariance constraints that do not exactly match the domain symmetry is very helpful in learning the true symmetry in the environment. We differentiate between extrinsic and incorrect symmetry constraints and show that while imposing incorrect symmetry can impede the model's performance, imposing extrinsic symmetry can actually improve performance. We demonstrate that an equivariant model can significantly outperform non-equivariant methods on domains with latent symmetries both in supervised learning and in reinforcement learning for robotic manipulation and control problems.

1. INTRODUCTION

Recently, equivariant learning has shown great success in various machine learning domains like trajectory prediction (Walters et al., 2020) , robotics (Simeonov et al., 2022) , and reinforcement learning (Wang et al., 2022c) . Equivariant networks (Cohen & Welling, 2016; 2017) can improve generalization and sample efficiency during learning by encoding task symmetries directly into the model structure. However, this requires problem symmetries to be perfectly known and modeled at design time -something that is sometimes problematic. It is often the case that the designer knows that a latent symmetry is present in the problem but cannot easily express how that symmetry acts in the input space. For example, Figure 1b is a rotation of Figure 1a . However, this is not a rotation of the image -it is a rotation of the objects present in the image when they are viewed from an oblique angle. In order to model this rotational symmetry, the designer must know the viewing angle and somehow transform the data or encode projective geometry into the model. This is difficult and it makes the entire approach less attractive. In this situation, the conventional wisdom would be to discard the model structure altogether since it is not fully known and to use an unconstrained model. Instead, we explore whether it is possible to benefit from equivariant models even when the way a symmetry acts on the problem input is not precisely known. We show empirically that this is indeed the case and that an inaccurate equivariant model is often better than a completely unstructured model. For example, suppose we want to model a function with the object-wise rotation symmetry expressed in Figure 1a and b . Notice that whereas it is difficult to encode the object-wise symmetry, it is easy to encode an image-wise symmetry because it involves simple image rotations. Although the imagewise symmetry model is imprecise in this situation, our experiments indicate that this imprecise model is still a much better choice than a completely unstructured model. This paper makes three contributions. First, we define three different relationships between problem symmetry and model symmetry: correct equivariance, incorrect equivariance, and extrinsic equivariance. Correct equivariance means the model correctly models the problem symmetry; incorrect equivariance is when the model symmetry interferes with the problem symmetry; and extrinsic equivariance is when the model symmetry transforms the input data to outof-distribution data. We theoretically demonstrate the upper bound performance for an incorrectly constrained equivariant model. Second, we empirically compare extrinsic and incorrect equivariance in a supervised learning task and show that a model with extrinsic equivariance can improve performance compared with an unconstrained model. Finally, we explore this idea in a reinforcement learning context and show that an extrinsically constrained model can outperform state-of-the-art conventional CNN baselines. Supplementary video and code are available at https://pointw.github.io/extrinsic_page/. (Walters et al., 2020 ), robotics (Simeonov et al., 2022; Zhu et al., 2022; Huang et al., 2022) and reinforcement learning (Wang et al., 2021; 2022c) . Compared with the prior works that assume the domain symmetry is perfectly known, this work studies the effectiveness of equivariant networks in domains with latent symmetries. Symmetric Representation Learning. Since latent symmetry is not expressable as a simple transformation of the input, equivariant networks can not be used in the standard way. Thus several works have turned to learning equivariant features which can be easily transformed. Park et al. (2022) 2021) consider equivariant representations learned in a self-supervised manner using losses to encourage sensitivity or insensitivity to various symmetries. Our method may be considered as an example of symmetric representation learning which, unlike any of the above methods, uses an equivariant neural network as an encoder. Zhou et al. (2020) and Dehmamy et al. (2021) assume no prior knowledge of the structure of symmetry in the domain and learn the symmetry transformations on inputs and latent features end-to-end with the task function. In comparison, our work assumes that the latent symmetry is known but how it acts on the input is unknown.

2. RELATED WORK

Sample Efficient Reinforcement Learning. One traditional solution for improving sample efficiency is to create additional samples using data augmentation (Krizhevsky et al., 2017) . Recent works discover that simple image augmentations like random crop (Laskin et al., 2020b; Yarats et al., 2022) or random shift (Yarats et al., 2021) can improve the performance of reinforcement learning. Such image augmentation can be combined with contrastive learning (Oord et al., 2018) to achieve better performance (Laskin et al., 2020a; Zhan et al., 2020) . Recently, many prior works have shown that equivariant methods can achieve tremendously high sample efficiency in reinforcement learning (van der Pol et al., 2020; Mondal et al., 2020; Wang et al., 2021; 2022c) , and realize on-robot reinforcement learning (Zhu et al., 2022; Wang et al., 2022a) . However, recent equivariant reinforcement learning works are limited in fully equivariant domains. This paper extends the prior works by applying equivariant reinforcement learning to tasks with latent symmetries.



Figure 1: Object vs image transforms. Object transform rotates the object itself (b), while image transform rotates the image (c). We propose to use the image transform to help model the object transform.

Equivariant Neural Networks. Equivariant networks are first introduced as G-Convolution (Cohen & Welling, 2016) and Steerable CNN (Cohen & Welling, 2017; Weiler & Cesa, 2019; Cesa et al., 2021). Equivariant learning has been applied to various types of data including images (Weiler & Cesa, 2019), spherical data (Cohen et al., 2018), point clouds (Dym & Maron, 2020), sets Maron et al. (2020), and meshes (De Haan et al., 2020), and has shown great success in tasks including molecular dynamics (Anderson et al., 2019), particle physics (Bogatskiy et al., 2020), fluid dynamics (Wang et al., 2020), trajectory prediction

learn an encoder which maps inputs to equivariant features which can be used by downstream equivariant layers. Quessard et al. (2020), Klee et al. (2022), and Marchetti et al. (2022) map 2D image inputs to elements of various groups including SO(3), allowing for disentanglement and equivariance constraints. Falorsi et al. (2018) use a homeomorphic VAE to perform the same task in an unsupervised manner. Dangovski et al. (

