LIE ALGEBRA CONVOLUTIONAL NETWORKS WITH AUTOMATIC SYMMETRY EXTRACTION

Abstract

Existing methods for incorporating symmetries into neural network architectures require prior knowledge of the symmetry group. We propose to learn the symmetries during the training of the group equivariant architectures. Our model, the Lie algebra convolutional network (L-conv), is based on infinitesimal generators of continuous groups and does not require discretization or integration over the group. We show that L-conv can approximate any group convolutional layer by composition of layers. We demonstrate how CNNs, Graph Convolutional Networks and fully-connected networks can all be expressed as an L-conv with appropriate groups. By allowing the infinitesimal generators to be learnable, L-conv can learn potential symmetries. We also show how the symmetries are related to the statistics of the dataset in linear settings. We find an analytical relationship between the symmetry group and a subgroup of an orthogonal group preserving the covariance of the input. Our experiments show that L-conv with trainable generators performs well on problems with hidden symmetries. Due to parameter sharing, L-conv also uses far fewer parameters than fully-connected layers.



Many machine learning (ML) tasks involve data from unfamiliar domains, which may or may not have hidden symmetries. While much of the work on equivariant neural networks focuses on equivariant architectures, the ability of the architecture to discover symmetries in a given dataset is less studied. Convolutional Neural Networks (CNN) (LeCun et al., 1989; 1998) incorporate translation symmetry into the architecture. Recently, more general ways to construct equivariant architectures have been introduced (Cohen & Welling, 2016a; b; Cohen et al., 2018; Kondor & Trivedi, 2018) . Encoding equivariance into an ML architecture can reduce data requirements and improve generalization, while significantly reducing the number of model parameters via parameter sharing (Cohen et al., 2019; Cohen & Welling, 2016b; Ravanbakhsh et al., 2017; Ravanbakhsh, 2020) . As a result, many other symmetries such as discrete rotations in 2D (Veeling et al., 2018; Marcos et al., 2017) and 3D (Cohen et al., 2018; Cohen & Welling, 2016a) as well as permutations (Zaheer et al., 2017) have been incorporated into the architecture of neural networks. Many existing works on equivariant architectures use finite groups such as permutations in Hartford et al. (2018) and Ravanbakhsh et al. (2017) or discrete subgroups of continuous groups, such as 90 degree rotations in (Cohen et al., 2018) or dihedral groups D N in Weiler & Cesa (2019) . Ravanbakhsh (2020) also proved a universal approximation theorem for single hidden layer equivariant neural networks for Abelian and finite groups. General principles for constructing group convolutional layers were introduced in Cohen & Welling (2016b), Kondor & Trivedi (2018), and Cohen et al. (2019) , including for continuous groups. A challenge for implementation is having to integrate over the group manifold. This has been remedied either by generalizing Fast Fourier Transforms (Cohen et al., 2018) , or using irreducible representations (irreps) (Weiler et al., 2018a) either directly as spherical harmonics as in Worrall et al. (2017) or using more general Clebsch-Gordon coefficients (Kondor et al., 2018) . Other approaches include discretizing the group as in Weiler et al. (2018a; b) ; Cohen & Welling (2016a), or solving constraints for equivariant irreps as in Weiler & Cesa (2019) , or approximating the integral by sampling (Finzi et al., 2020) . The limitations in all of the approaches above are that: 1) they rely on knowing the symmetry group a priori, and 2) require encoding the whole group into the architecture. For a continuous group, it is not possible to encode all elements and we have to resort to discretization or a truncated sum over irreps. Our work attempts to resolve the issues with continuous groups by using the Lie algebra (the linearization of the group near its identity) instead of the group itself. Unlike the Lie group which is infinite, the Lie algebra usually has a finite basis (notable exception being Kac-Moody Lie algebras for 2D Conformal Field Theories (Belavin et al., 1984) in physics). Additionally, we show that the Lie algebra basis can be learned during training, or through a separate optimization process. Hence, our architecture, which generalizes a group convolutional layer, is potentially capable of learning symmetries in data without imposing inductive biases. Learning symmetries in data was tackled in restricted settings of mostly commutative Lie groups as in Cohen & Welling (2014) and 2D rotations and translations in Rao & Ruderman (1999) and Sohl-Dickstein et al. (2010) or permutations (Anselmi et al., 2019) . However, the symmetries learned by the architecture are not necessarily familiar spatial symmetries. As we show in the case of linear regression, the symmetries may correspond to transformations preserving the statistics of the data. Specifically, we show a general relation between the symmetries of linear regression and a deformed orthogonal group preserving the covariance matrix. Such symmetries of the probability distribution and ways to incorporate them into the architecture were also discussed in Bloem-Reddy & Teh (2019) . The work that is closest in spirit and setup to ours is Zhou et al. ( 2020) which uses meta-learning to automatically learn symmetries. Although the weight-sharing scheme of Zhou et al. (2020) and their encoding of the symmetry generators is different, their construction does bear some resemblance to ours and we will discuss this after introducing our architecture.

Contributions Our main contributions can be summarized as follows

• We propose a group equivariant architecture using the Lie algebra, introducing the Lie algebra convolutional layer (L-conv). • In L-conv the Lie algebra generators can be trained to discover symmetries, and it outperforms CNN on domains with hidden symmetries, such rotated and scrambled images. • Group convolutional layers on connected Lie groups can be approximated by multi-layer L-conv, and Fully-connected, CNN and graph convolutional networks are special cases of L-conv. • In linear regression, we show analytical relations between symmetries in and orthogonal groups preserving covariance of data.

1. EQUIVARIANCE IN SUPERVISED LEARNING

Consider the functional mapping y i = f (x i ) of inputs X = (x 1 , . . . , x n ) to outputs Y = (y 1 , . . . , y n ). We assume each input x ∈ R d×m where R d are the "space" dimensions and R m the "channels", and y ∈ R c (or Z c 2 for categorical variables). We assume a group G acts only on the space factor (R d , shared among channels) of x through a d-dimensional representation T d : G → GL d (R) mapping each g to an invertible d × d matrix. The map T d must be continuous and satisfy T d (u)T d (v) = T d (uv) for all u, v ∈ G (Knapp, 2013, IV.1). Similarly, let G act on y via a c-dimensional representation T c . To simplify notation, we will denote the representations simply as u α ≡ T α (u). A function f solving y i = f (x i ) is said to be equivariant under the action of a group G by representations T c , T d if u c y = u c f (x) = f (u d x) ∀u ∈ G ⇔ f (x) = u c f (u -1 d x). Lie Groups and Lie Algebras The full group of invertible d × d matrices over R is the general linear group, denoted as GL d (R). It follows that every real ddimensional group representation T d (G) ⊂ GL d (R). If T is a "faithful representation" (i.e. T (u) = T (v) if u = v), then G ⊂ GL d (R). In our problem, we only know the group G through its linear action on X or on output h l of layer l in a neural network. Therefore we may assume the representation T d l acting on h l with largest d l is faithful. GL d (R) is a Lie group and all of its continuous subgroups G are also Lie groups. Next, we will first briefly review the Lie algebra of a Lie group and then use it to introduce our equivariant architecture. Notation Unless stated or obvious, a in A a is an index, not an exponent. We write matrix products as A • B ≡ a A a B a Recall that x ∈ R d×m . For a linear transformation A : R d1 → R d2

