LEARNING IRREDUCIBLE REPRESENTATIONS OF NONCOMMUTATIVE LIE GROUPS

Abstract

Recent work has constructed neural networks that are equivariant to continuous symmetry groups such as 2D and 3D rotations. This is accomplished using explicit group representations to derive the equivariant kernels and nonlinearities. We present two contributions motivated by frontier applications of equivariance beyond rotations and translations. First, we relax the requirement for explicit Lie group representations, presenting a novel algorithm that finds irreducible representations of noncommutative Lie groups given only the structure constants of the associated Lie algebra. Second, we demonstrate that Lorentz-equivariance is a useful prior for object-tracking tasks and construct the first object-tracking model equivariant to the Poincaré group.

1. INTRODUCTION

Many tasks in machine learning exactly or approximately obey a continuous symmetry such as 2D rotations. An ML model is said to be equivariant to such a symmetry if the model respects it automatically (without training). Equivariant models have been applied to tasks ranging from computer vision to molecular chemistry, leading to a generalization of equivariance techniques beyond 2D rotations to other symmetries such as 3D rotations. This is enabled by known mathematical results about each new set of symmetries. Specifically, explicit group representation matrices for each new symmetry group are required. For many important symmetries, formulae are readily available to produce these representations. For other symmetries we are not so lucky, and the representations may be difficult to find explicitly. In the worst cases, the classification of the group representations is an open problem in mathematics. For example, in the important case of the homogeneous Galilean group, which we define in section 2, the classification of the finite dimensional representations is a so-called "wild algebraic problem" for which we have only partial solutions (De Montigny et al., 2006; Niederle & Nikitin, 2006; Levy-Leblond, 1971 ). To construct an equivariant network without prior knowledge of the group representations, novel approaches are needed. In this work, we propose an algorithm LearnRep that finds the representation matrices with high precision. We validate that LearnRep succeeds for the Poincaré group, a set of symmetries governing phenomena from particle physics to object tracking. We further validate LearnRep on two additional sets of symmetries where formulae are known. We apply the Poincaré group representations obtained by LearnRep to construct SpacetimeNet, a Poincaré-equivariant object-tracking model. As far as we are aware, LearnRep is the first automated solver which can find explicit representation matrices for sets of symmetries which form noncompact, noncommutative Lie groups Further, SpacetimeNet is the first object-tracking model with a rigorous guarantee of Poincaré group equivariance.

1.1. GROUP REPRESENTATIONS AND EQUIVARIANT MACHINE LEARNING

Group theory provides the mathematical framework for describing symmetries and building equivariant ML models. Informally, a symmetry group G is a set of invertible transformations α, β ∈ G which can be composed together using a product operation αβ. We are interested in continuous symmetries for which G is a Lie group. In prior constructions of Lie group-equivariant models, group representations are required. For a group G, an n-dimensional (real) group representation ρ : G → R n×n is a mapping from each element α ∈ G to an n × n-dimensional matrix ρ(α), such that for any two elements α, β ∈ G, we have ρ(α)ρ(β) = ρ(αβ). Two parallel techniques have been developed for implementing Lie group equivariant neural networks. The first approach was described in general by Cohen et al. (2019) . For the latter approach taken by Thomas et al. (2018) ; Anderson et al. (2019) ; Bogatskiy et al. (2020) , convolutions and nonlinearities are performed directly on the irreducible representations of the group, which we define in section 2.4. A common thread in these works has been to utilize existing formulas derived for the matrix elements of these irreducible representations. However, these formulas are only available for specific Lie groups where the representation theory is well-understood. A more convenient approach for extending equivariance to novel Lie groups would utilize an automated computational technique to obtain the required representations. The primary contribution of this work is such a technique.

1.2. CONTRIBUTIONS

In this work, we automate the generation of explicit group representation matrices of Lie groups using an algorithm called LearnRep. LearnRep poses an optimization problem defined by the Lie algebra associated with a Lie group, whose solutions are the representations of the algebra. A penalty term is used to prevent the formation of trivial representations. Gradient descent of the resulting loss function produces nontrivial representations upon convergence. We apply LearnRep to three noncommutative Lie groups for which the finite-dimensional representations are well-understood, allowing us to verify that the representations produced are irreducible by computing their Clebsch-Gordan coefficients and applying Schur's Lemma. One of the Lie groups where LearnRep performs well is the Lorentz group of special relativity. Prior work has applied Lorentz-equivariant models to particle physics. In this work we explain that the Lorentz group along with the larger Poincaré group also governs everyday object-tracking tasks. We construct a Poincaré-equivariant neural network architecture called SpacetimeNet and demonstrate that it can learn to solve a 3D object-tracking task subject to "motion equivariance," where the inputs are a time series of points in space. In summary, our contributions are: • LearnRep, an algorithm which can find irreducible representations of a noncompact and noncommutative Lie group. • SpacetimeNet, a Poincaré group-equivariant neural network applied to object-tracking tasks. Our work contributes towards a general framework and toolset for building neural networks equivariant to novel Lie groups, and motivates further study of Lorentz equivariance for object tracking.

1.3. ORGANIZATION

We summarize all necessary background and terminology in section 2. We describe the LearnRep algorithm in section 3.1 and SpacetimeNet in section 3.2. We summarize related work in section 4. We present our experimental results in section 5: our experiments in learning irreducible Lie group representations with LearnRep in section 5.1 and the performance of our Poincaré-equivariant SpacetimeNet model on a 3D object tracking task in section 5.2.

2. TECHNICAL BACKGROUND

We explain the most crucial concepts here and defer to Appendix A.1 for a derivation of the representation theory of the Lorentz group.

2.1. SYMMETRY GROUPS SO(n) AND SO(m, n)

A 3D rotation may be defined as a matrix A :∈ R 3×3 which satisfies the following properties, in which u, v = 3 i=1 u i v i : (i) det A = 1 (ii) ∀ u, v ∈ R 3 , A u, A v = u, v ; these imply the set of 3D rotations forms a group under matrix multiplication and this group is denoted SO(3). This definition directly generalizes to the n-dimensional rotation group SO(n). For n ≥ 3, the group SO(n) is noncommutative, meaning there are elements A, B ∈ SO(n) such that AB = BA. Allowing for rotations and translations of n dimensional space gives the n-dimensional special Euclidean group SE(n). SO(n) is generalized by a family of groups denoted SO(m, n), with SO(n) = SO(n, 0). For integers m, n ≥ 0, we define u, v m,n = m i=1 u i v i - m+n i=m+1 u i v i . The group SO(m, n) is the set of matrices A ∈ R (m+n)×(m+n) satisfying (i-ii) below: (i) det A = 1 (ii) ∀ u, v ∈ R m+n , A u, A v m,n = u, v m,n ; these imply that SO(m, n) is also a group under matrix multiplication. While the matrices in SO(n) can be seen to form a compact manifold for any n, the elements of SO(m, n) form a noncompact manifold whenever n, m ≥ 1. For this reason SO(n) and SO(m, n) are called compact and noncompact Lie groups respectively. The representations of compact Lie groups are fairly well understood, see Bump (2004) ; Cartan (1930) . 2.2 ACTION OF SO(m, n) ON SPACETIME We now explain the physical relevance of the groups SO(m, n) by reviewing spacetime. We refer to Feynman et al. (2011) (ch. 15) for a pedagogical overview. Two observers who are moving at different velocities will in general disagree on the coordinates {(t i , u i )} ⊂ R 4 of some events in spacetime. Newton and Galileo proposed that they could reconcile their coordinates by applying a spatial rotation and translation (i.e., an element of SE(3)), a temporal translation (synchronizing their clocks), and finally applying a transformation of the following form: t i → t i u i → u i + vt i , in which v is the relative velocity of the observers. The transformation equation 1 is called a Galilean boost. The set of all Galilean boosts along with 3D rotations forms the homogeneous Galilean group denoted HG(1, 3). Einstein argued that equation 1 must be corrected by adding terms dependent on || v|| 2 /c, in which c is the speed of light and || v|| 2 is the 2 norm of v. The resulting coordinate transformation is called a Lorentz boost, and an example of its effect is shown in figure 1 . The set of 3D rotations along with Lorentz boosts is exactly the group SO(3, 1). In the case of 2 spatial dimensions, the group is SO(2, 1). Including spacetime translations along with the Lorentz group SO(n, 1) gives the larger Poincaré group P n with n spatial dimensions. The Poincaré group P 3 is the group of coordinate transformations between different observers in special relativity. Consider an object tracking task with input data consisting of a spacetime point cloud with n dimensions of space and 1 of time, and corresponding outputs consisting of object class along with location and velocity vectors. A perfectly accurate object tracking model must respect the action of P n on the input. That is, given the spacetime points in any observer's coordinate system, the perfect model must give the correct outputs in that coordinate system. Therefore the model should be P n -equivariant. For low velocities the symmetries of the homogeneous Galilean groups HG(n, 1) provide a good approximation to SO(n, 1) symmetries, so Galilean-equivariance may be sufficient for some tasks. Unfortunately the representations of HG(n, 1) are not entirely understood De Montigny et al. ( 2006); Niederle & Nikitin (2006) ; Levy-Leblond (1971) .

2.3. LIE GROUPS AND LIE ALGEBRAS

Here we give an intuitive summary of Lie groups and Lie algebras, deferring to Bump (2004) for a rigorous technical background. A Lie group G gives rise to a Lie algebra A as its tangent space at the identity. This is a vector space V along with a bilinear product called the Lie bracket: [a, b] which must behave likefoot_0 the commutator for an associative ring R with multiplication operation × R : [a, b] = a × R b -b × R a The Lie algebra for SO(3), denoted so(3), has a basis {J 1 , J 2 , J 3 } satisfying [J i , J j ] = ijk J k , in which ijk ∈ {±1, 0} is the totally antisymmetric Levi-Civita symbol.foot_1 Intuitively, the Lie bracket shows how group elements near the identity fail to commute. For example, the matrices R x , R y , R z The arrows depict the elements of the 3-dimensional representation space (arrows) and are embedded on their associated points within the point cloud. This point cloud is from the MNIST-Live dataset as generated with digits embedded in the x -t plane. The y axis is suppressed. The left plot depicts the "original" activations (with the digit at rest). The right plots show what happens if we transform the point cloud with a Lorentz boost in the ±x direction before feeding it through the network. As dictated by Lorentz-equivariance, the activation vectors generated by the network transform in the same way as the input point cloud. for rotations about the x and y axes by a small angle θ satisfy R x R y -R y R x = R z + O(θ 2 ); more generally the Lie bracket of equation 2 is satisfied to first order in θ. The Lia algebra so(3, 1) of the Lorentz Group SO(3, 1) also satisfies equation 2 for the generators J 1 , J 2 , J 3 of its subalgebra isomorphic to so(3). It has 3 additional generators denoted K 1 , K 2 , K 3 , which satisfy: [J i , K j ] = ijk K k [K i , K j ] = -ijk J k These K i correspond to the Lorentz boosts in the same way that the J i correspond to the rotations. In general, if A is a t-dimensional Lie algebra with generators T 1 , ..., T t such that [T i , T j ] = t k=1 A ijk T k , we call the tensor A ijk the structure constants of A. For connected matrix Lie groups such as SO(m, n), the structure constants A ijk are easily obtained. For example, one may apply the matrix logarithm to several elements of the group to obtain elements of the algebra, then find a complete basis for the algebra and write the commutator of all basis elements in this basis.

2.4. GROUP REPRESENTATIONS AND THE TENSOR PRODUCT

Let G be a Lie group and ρ : G → R n×n be a representation of G as defined in section 1.1. Then ρ defines a group action on R n : given a vector u ∈ R n and a group element α ∈ G, we can define α * ρ u := ρ(α) u using the matrix product. We then say that ρ is irreducible if it leaves no nontrivial subspace invariant -for every subspace V ⊂ R n with 0 < dim V < n, there exists α ∈ G, v ∈ V such that α ρ v / ∈ V . Given two G-representations ρ 1 : G → R n1×n1 , ρ 2 : G → R n2×n2 , we define their tensor product as ρ 1 ⊗ ρ 2 : G → R n1n2×n1n2 (ρ 1 ⊗ ρ 2 )(α) = ρ 1 (α) ⊗ ρ 2 (α), in which ⊗ on the right hand side denotes the usual tensor product of matrices. It is easy to check that ρ 1 ⊗ ρ 2 is also a representation of G using the fact that for matrices A 1 , A 2 ∈ R n1×n1 and B 1 , B 2 ∈ R n2×n2 , (A 1 ⊗ B 1 )(A 2 ⊗ B 2 ) = (A 1 A 2 ) ⊗ (B 1 B 2 ). For ρ 1 , ρ 2 as above we also define their direct sum as (ρ 1 ⊕ ρ 2 )(α) = ρ 1 (α) ρ 2 (α) . For two groups H, G we say that H is isomorphic to G and write H ∼ = G if there exists a bijection f : H → G such that f (αβ) = f (α)f (β). For ρ 1 , ρ 2 as above, their images ρ i (G) form groups and we say that ρ 1 and ρ 2 are isomorphic and write ρ 1 ∼ = ρ 2 if these groups are isomorphic, i.e. ρ 1 (G) ∼ = ρ 2 (G). Some familiar representations of SO(3) act on scalars ∈ R, vectors ∈ R 3 , and tensors (e.g., the Cauchy stress tensor) -these representations are all nonisomorphic. For many Lie groups such as SO(n, 1) and SO(n), a property called complete reducibility guarantees that any representation is either irreducible, or isomorphic to a direct sum of irreducible representations. For such groups it suffices to identify the irreducible representations to understand all other representations and construct equivariant models.

2.5. CLEBSCH-GORDAN COEFFICIENTS AND TENSOR-PRODUCT NONLINEARITIES

Clebsch-Gordan Coefficients: Let G be a completely reducible Lie group, and let ρ 1 , ρ 2 , ρ 3 be irreducible G-representations on the vector spaces R n1 , R n2 , R n3 . Consider the tensor product representation ρ 1 ⊗ ρ 2 . Since G is completely reducible, there exists a set S of irreducible representations such that ρ 1 ⊗ ρ 2 ∼ = ρ∈S ρ. Suppose that ρ 3 ∈ S. Then there exists a matrix C ∈ R n3×(n1n2) which projects the space of the n 3 -dimensional group representation ρ 3 from the tensor product space R n1 ⊗ R n2 . That is, ∀(α, u, v) ∈ G × R n1 × R n2 , C(ρ 1 (α) ⊗ ρ 2 (α))( u ⊗ v) = ρ 3 (α)C( u ⊗ v) ⇒ C(ρ 1 (α) ⊗ ρ 2 (α)) = ρ 3 (α)C. (5) The matrices C satisfying equation 5 for various ρ 3 are called the Clebsch-Gordan coefficients. In equation 5 there are n 1 n 2 n 3 linear constraints on C, and therefore this is a well-posed homogeneous linear program (LP) for C. The entries of C may be found numerically by sampling several distinct α ∈ G and concatenating the linear constraints (equation 5) to form the final LP. The solutions for C form a linear subspace of R n3×(n1n2) given by the nullspace of some matrix we denote C[ρ 1 , ρ 2 , ρ 3 ]. Tensor Product Nonlinearities: Tensor product nonlinearities, including norm nonlinearities, use the Clebsch-Gordan coefficients defined above to compute equivariant quadratic functions of multiple G-representations within the G-equivariant model. This was demonstrated for the case of SE(3) by Thomas et al. (2018) ; Kondor et al. (2018) and for SO(3, 1) by Bogatskiy et al. (2020) .

3.1. LEARNING LIE GROUP REPRESENTATIONS

For a matrix M ∈ R n×n we denote its Frobenius and L 1 norms by |M | 2 F = 1≤i,j≤n |M ij | 2 , |M | 1 = 1≤i,j≤n |M ij |. The approach of LearnRep is to first learn a Lie algebra representation and then obtain its corresponding group representation through the matrix exponential. Fix a t-dimensional Lie algebra A with structure constants A ijk as defined in equation 4. Fix a positive integer n as the dimension of the representation of A. Then let the matrices T 1 , ..., T t ∈ R n×n be optimization variables, and define the following loss function on the T i : L[T 1 , ..., T t ] = max 1, max 1≤i≤t 1 |T i | 2 F N [Ti] -1 × 1≤i≤j≤t [T i , T j ] - k A ijk T k 1 . ( ) This is the magnitude of violation of the structure constants of A, multiplied by a norm penalty term N [T i ] -1 (this penalty is plotted separately in figure 2 ). The purpose of the norm penalty is to avoid convergence to a solution where T i = 0 n×n for any i, which will act trivially when restricted to the nontrivial subgroup {e tTi : t ∈ R}. We pose the optimization problem: min T1,...,Tt∈R n×n L[T 1 , ..., T t ]. The generators were initialized with entries from the standard normal distribution. Gradient descent was performed in PyTorch with the Adam optimizer (Kingma & Ba, 2014) with initial learning rate 0.1. The learning rate decreased exponentially when loss plateaued. The results are shown in figure 2 .

3.1.1. VERIFYING IRREDUCIBILITY OF LEARNED REPRESENTATIONS

Suppose we have converged to T 1 , . . . T t such that L[T i ] = 0. Then the T 1 , ..., T t are a nonzero n-dimensional representation of the Lie algebra A. The groups considered here are covered by the exponential map applied to their Lie algebras, so for each α ∈ G there exist b 1 , . . . , b t ∈ R such that ρ(α) = exp t i=1 b i T i , where ρ is any n-dimensional representation of G and exp is the matrix exponential. This ρ : G → R n×n is then a representation of the Lie group. Throughout this section, ρ denotes this representation. In general ρ may leave some nontrivial subspace invariant. In this case it is reducible and splits as the direct sum of lower-dimensional irreducible representations ρ i as explained in 2.4: ρ ∼ = ρ 1 ⊕ . . . ⊕ ρ . Recall that any representation may be obtained as such a direct sum of irreducible representations with dimensions n 1 , . . . , n satisfying n = i=1 n i . If n is set to the minimum dimension of a nontrivial irreducible representation, the only permissible partitions of n have = 1 and = n -as the latter representation is trivial, equation 6 diverges, so LearnRep can only converge to an irreducible n dimensional representation. 3 It is important to verify that the learned ρ is indeed irreducible with = 1. To validate that ρ is irreducible, LearnRep computes its tensor product structure and compares with the expected structure. Specifically, it computes the Clebsch-Gordan coefficients for the direct-sum decomposition of the tensor product of the learned representation ρ with several other known representations ρ 1 , ..., ρ r . section 2.5 defines these coefficients and explains how they are computed from the nullspace of the matrix  C = C[ρ, ρ 1 , ρ 2 ], diverges only if the nullspace is one dimensional which therefore corresponds to a unique solution for C. The number of expected solutions is known (e.g., it may be computed using the same technique from the formulae for the irreducible representations). Therefore if r(C) diverges for exactly the choices of ρ 1 , ρ 2 where the theory indicates that unique nonzero Clebsch-Gordan coefficients exist, then this is consistent with our having learned an irreducible representation of the group G. Clearly the tensor product with the trivial representation ρ 1 = 1 is ρ ⊗ 1 = ρ. In this case, the permissible C correspond to G-linear maps R n → R n2 . By a result of Schur (1905) (Schur's Lemma) , the only such (nonzero) maps are isomorphisms. Therefore a divergent value of r(C) when ρ 1 = 1 indicates that ρ ∼ = ρ 2 . This is shown in the top row of figure 3 and discussed further in section 5.1.

3.1.2. STOPPING CONDITION

Similar to (Rao & Ruderman, 1999), LearnRep restarts gradient descent several times starting from random initialization points. A restart is triggered if loss plateaus and the learning rate is smaller than the loss by a factor of at most 10 -4 . The tensor product structure is computed upon convergence to loss under 10 -9 , a restart is triggered if the divergences of r(C) do not agree with the theoretical prediction, indicating a reducible representation.

3.2. SPACETIMENET ARCHITECTURE

We obtain all Clebsch-Gordan coefficients through the procedure explained in section 2.5. We place them in a tensor: C g,qr,ls,mt . This notation corresponds to taking the tensor product of an element of the l th group representation space indexed by s with an element of the m th group representation space indexed by t, and projecting it onto the q th group representation space indexed by r. The space of possible Clebsch-Gordan coefficients can be multidimensional. 4 We use an index g to carry the dimension within the space of Clebsch-Gordan coefficients. The trainable weights in SpacetimeNet are complex-valued filter weights denoted f k qg and channelmixing weights denoted W k qcgd . Each layer builds a collection of equivariant convolutional filters F k xijqr from the geometry of the point cloud. Let q denote the index of the group representation in which the points are embedded. Let X xir denote the point coordinates, in which x indexes the batch dimension, i indexes the points, and r indexes the q group representation space. Define the (globally) translation-invariant quantity ∆X xijr := X xjr -X xir . The equivariant filters at layer k are: F k xijqr = δ qq ∆X xijr + s,t,g C g,qr,q s,q t f k qg ∆X xijs ∆X xijt . ( ) The forward pass consists of tensor product nonlinearities between equivariant filters and activations. The input and activations for the k th layer of the network are defined on a tensor V k ximct , where x is the batch dimension, i indexes the points, m is the group representation index, c is the channel index, t indexes the group representation space. Our mixing weights are then defined for the k th layer as W k qcgd with layer update rule: V k+1 xiqcr = g,l,s,m,t,d,j C g,qr,ls,mt F k xijls V k xjmdt W k qcgd . A proof that SpacetimeNet is P n -equivariant is given in Appendix A.2. (Weiler et al., 2018; Cohen et al., 2019; Kondor et al., 2018; Thomas et al., 2018; Cohen et al., 2018; Kondor, 2018; Gao et al., 2020; Anderson et al., 2019; Fuchs et al., 2020; Eismann et al., 2020) , and the group of Galilean boosts (Zhu et al., 2019) . 2019) showed all equivariant linear maps are convolutions whose kernels satisfy some linear constraints. In our work we obtain Clebsch-Gordan coefficients from similar linear constraints (equation 5) and use them to show that the learned representations are irreducible. We also use them in SpacetimeNet. Griffiths & Griffiths (2005) provide an introductory exposition of Clebsch-Gordan coefficients and Gurarie (1992) provides a more general exposition.

4. RELATED WORK

One of the first constructions that addressed spatiotemporal symmetries was by Zhu et al. (2019) . They introduce motion-equivariant networks to handle linear optical flow of an observer moving at a fixed speed. They use a canonical coordinate system in which optical flow manifests as a translation, as described for general one dimensional Lie groups by Tai et al. (2019) . This allows them to use the translation equivariance of CNNs to produce Galilean boost-equivariance. However, this gives up equivariance to translation in the original coordinate system. To maintain approximate translation-equivariance, the authors apply a spatial transformer network (Jaderberg et al., 2015) to predict a landmark position in each example. This is similar to the work of Esteves et al. (2018) , which achieved equivariance to 2D rotation and scale, and approximate equivariance to translation. The first mention of Poincaré-equivariant networks appears to be a work by Cheng et al. (2019) on the link between covariance in ML and physics. Concurrently to our work, Bogatskiy et al. (2020) constructed a Lorentz-equivariant model which operated on irreducible representations of the Lorentz group, derived similarly to Appendix A.1. This work also made use of the Clebsch-Gordan coefficients, and the model was applied to experimental particle physics rather than object-tracking. Another work by Finzi et al. (2020) concurrent to our own proposed a framework for building models equivariant to arbitrary Lie groups. This work also made use of the exponential and logarithm maps between Lie algebra and group. It does not provide a technique for identifying the Lie algebra representations. Our ideas complement this line of work by providing an algorithm (LearnRep) that solves for the representations numerically.

5.1. CONVERGENCE OF LEARNREP TO IRREDUCIBLE REPRESENTATIONS

We apply LearnRep to SO(3), SO(2, 1), and SO(3, 1) to learn 3, 3, and 4 dimensional irreducible representations respectively. The loss function converges arbitrarily close to 0 with the penalty term bounded above by a constant. We exponentiate the resulting algebra representation matrices to obtain group representations and calculate the tensor product structure as described in section 3.1.1 The details of this calculation are in Appendix A.4 and shown in figure 3 . The results indicate that the learned representations are irreducible representations of the associated Lie algebras to within numerical error of about 10 -6 . Schur's Lemma in the special case of the tensor product with the trivial representation indicates the isomorphism class of each learned group representation.

5.2. POINCAR É-EQUIVARIANT OBJECT-TRACKING NETWORKS

We created MNIST-Live, a benchmark dataset of spacetime point clouds sampled from digits from the MNIST dataset moving uniformly through space. Each sample consists of 64 points with uniformly random times t ∈ [-1/2, 1/2], and spatial coordinates sampled from a 2D probability density function proportional to the pixel intensity. Using instances of the 0 and 9 classes, we train on examples with zero velocity and evaluate on examples with random velocity and orientation. This dataset is analogous to data from an event camera (see (Orchard et al., 2015) To compare our learned group representations with those obtained through prior methods, we require analytical formulae for the Lie algebra representations for the algebras so(3), so(3, 1), and so(2, 1). The case of so(3) has a well-known solution (see Griffiths & Griffiths (2005) ). If complex matrices are permissible the library QuTiP Johansson et al. ( 2013) has a function "jmat" that readily gives the representation matrices. A formula to obtain real-valued representation matrices is given in Pinchon & Hoggan (2007) and a software implementation is available at Cohen et al. (2020) . The threedimensional Lie algebra so(2, 1) = span{K x , K y , J z } has structure constants given by equation 3. In fact, these three generators K x , K y , J z may be rescaled so that they satisfy equation 2 instead. This is due to the isomorphism so(3) ∼ = so(2, 1). Specifically, leting {L x , L y , L z } denote a Lie algebra representation of so(3), defining K x = -iL x K y = -iL y J z := L z , it may be easily checked that K x , K y , J z satisfy the applicable commutation relations from equation 3. This reflects the physical intuition that time behaves like an imaginary dimension of space. The final Lie algebra for which we require explicit representation matrix formulas is so(3, 1). Following Weinberg (1995) , we define new generators A i , B i as A i := 1 2 (J i + iK i ) B i := 1 2 (J i -iK i ), we see that the so(3, 1) commutators equation 2, equation 3 become [A i , A j ] = i ijk A k , [B i , B j ] = i ijk B k , [A i , B j ] = 0. Therefore so(3, 1) ∼ = so(3) ⊕ so(3), and the irreducible algebra representations of so(3, 1) may be obtained as the direct sum of two irreducible algebra representations of so(3). A.2 PROOF THAT SPACETIMENET IS POINCAR É-EQUIVARIANT Consider an arbitrary Poincaré group transformation α ∈ P n , and write α = βt in which β ∈ SO(n, 1) and t is a translation. Suppose we apply this α to the inputs of equation 8 through the representations indexed by q: ρ q (α) st , in which s, t index the representation matrices. Then since the translation t leaves ∆X invariant, the resulting filters will be F k xijqr = δ qq r ρ q (β) rr ∆X xijr + s,t,g C g,qr,q s,q t f k qg s ,t ρ q (β) ss ∆X xijs ρ q (β) tt ∆X xijt = δ qq r ρ q (β) rr ∆X xijr + g,s ,t s,t C g,qr,q s,q t ρ q (β) ss ρ q (β) tt f k qg ∆X xijs ∆X xijt = δ qq r ρ q (β) rr ∆X xijr + s,t,g,r (ρ q (β) rr C g,qr ,q s,q t ) f k qg ∆X xijs ∆X xijt = r ρ q (β) rr   δ qq ∆X xijr + s,t,g,r C g,qr ,q s,q t f k qg ∆X xijs ∆X xijt   = r ρ q (β) rr F k xijqr , where we have used equation 5. The network will be equivariant if each layer update is equivariant. Recall the layer update rule of equation 9: V k+1 xiqcr = g,l,s,m,t,d,j C g,qr,ls,mt F k xijls V k xjmdt W k qcgd . Suppose for the same transformation α = βt above, that V k and ∆X are transformed by α. where again we applied equation 5.

A.3 EQUIVARIANT CONVOLUTIONS

Consider data on a point cloud consisting of a finite set of spacetime points { x i } ⊂ R 4 , a representation ρ 0 : SO(3, 1) → R 4×4 of the Lorentz group defining its action upon spacetime, and feature maps { u i } ⊂ R m , { v i } ⊂ R n associated with representations ρ u : SO(3, 1) → R m×m and ρ v : SO(3, 1) → R n×n . A convolution of this feature map can be written as u i = j κ( x j -x i ) u j in which κ( x) : R 4 → R n×m , a matrix-valued function of spacetime, is the filter kernel. P 3 -equivariance dictates that for any α ∈ SO(3, 1), ρ v (α) j κ( x j -x i ) u j = j κ(ρ 0 (α)( x j -x i ))ρ u (α) u j ⇒ κ(∆ x) = ρ v (α -1 )κ(ρ 0 (α)∆ x)ρ u (α) (12) Therefore a single kernel matrix in R n×m may be learned for each coset of spacetime under the action of SO(3, 1). The cosets are indexed by the invariant t 2 -x 2 -y 2 -z 2 . The kernel may then be obtained at an arbitrary point x ∈ R 4 from equation 12 by computing an α that relates it to the coset representative x 0 : x = ρ 0 (α) x 0 . A natural choice of coset representatives for SO(3, 1) acting upon R 4 is the set of points {(t, 0, 0, 0) : t ∈ R + } ∪ {(0, x, 0, 0) : x ∈ R + } ∪ {(t, ct, 0, 0) : t ∈ R + }. A.4 TENSOR PRODUCT STRUCTURE OF LEARNED SO(3), SO(2, 1), SO(3, 1) GROUP REPRESENTATIONS We quantify the uniqueness of each set of Clebsch-Gordan coefficients in terms of the diagnostic ratio r(C) defined in equation 7. Recall that the value of r becomes large only if there is a nondegenerate nullspace corresponding to a unique set of Clebsch-Gordan coefficients. For SO(3) and SO(2, 1), the irreducible group representations are labeled by an integer which is sometimes called the spin. We label learned group representations with a primed (i ) integer. For the case of SO(3, 1) the irreducible group representations are obtained from two irreducible group representations of so(3) as explained in section A.1 and we label these representations with both spins i.e. (s 1 , s 2 ). We 



Specifically, the Lie bracket must satisfy the Jacobi identity and [a, a] = 0. The symbol ijk simply expresses in equation 2 that[J1, J2] = J3, [J2, J3] = J1, [J3, J1] = J2. This applies to our experiments learning SO(3) representations, with n = 3. This is common if a group representation is itself obtained via tensor product.



Figure1: Activations of an SO(2, 1)-Equivariant neural network constructed using our framework. The arrows depict the elements of the 3-dimensional representation space (arrows) and are embedded on their associated points within the point cloud. This point cloud is from the MNIST-Live dataset as generated with digits embedded in the x -t plane. The y axis is suppressed. The left plot depicts the "original" activations (with the digit at rest). The right plots show what happens if we transform the point cloud with a Lorentz boost in the ±x direction before feeding it through the network. As dictated by Lorentz-equivariance, the activation vectors generated by the network transform in the same way as the input point cloud.

in which ρ 2 appears in the decomposition of ρ ⊗ ρ 1 . Let ρ 1 , ρ 2 denote two other known representations, and consider the Clebsch-Gordan coefficients C such that Cρ ⊗ ρ 1 = ρ 3 C. The dimension of the nullspace of C indicates the number of unique nonzero matrices C of Clebsch-Gordan coefficients. The singular values of C are denoted SV 1 (C) ≤ ... ≤ SV (C). The ratio r(C) := SV 2 (C)/SV 1 (C)

The work byThomas et al. (2018);Kondor et al. (2018);Anderson et al. (2019);Bogatskiy et al. (2020) used Clebsch-Gordan coefficients in their equivariant neural networks.Weiler et al. (2018), generalized byCohen et al. (

Figure 5: Our Lie Algebraic Networks (lan) module handles Lie algebra and Lie group representations, derives Clebsch-Gordan coefficients for the equivariant layer update, and computes the forward pass. This makes it simple to build an equivariant point cloud network once the representations are obtained.

) or LIDAR system. We train 3 layer SO(2, 1) and SO(3, 1)-equivariant SpacetimeNet models with 3 channels and batch size 16 on 4096 MNIST-Live examples and evaluate on a dev set of 124 examples. We obtain dev accuracy of 80 ± 5% as shown in figure4of the Appendix. This suggests that Lorentz-equivariance is a useful prior for object-tracking tasks. With a treatment of bandlimiting and resampling as inWorrall et al. (2017);Weiler et al. (2018), our work could be extended to build Poincaré-equivariant networks for volumetric data. More broadly, understanding the representations of noncompact and noncommutative Lie groups may enable the construction of networks equivariant to new sets of symmetries such as the Galilean group. Since the representation theory of these groups is not entirely understood, automated techniques such as LearnRep could play a beneficial role. Daniel E Worrall, Stephan J Garbin, Daniyar Turmukhambetov, and Gabriel J Brostow. Harmonic networks: Deep translation and rotation equivariance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5028-5037, 2017. Alex Zihao Zhu, Ziyun Wang, and Kostas Daniilidis. Motion equivariant networks for event cameras with the temporal normalization transform. arXiv preprint arXiv:1902.06820, 2019.

Then because the activations associated with each point are representations of SO(n, 1), they are invariant to the global translation t of the point cloud and we have(C g,qr,ls,mt ρ m (β) ss ρ m (β) tt ) F k xijls V k xjmdt W k qcgd = g,l,s,m,t,d,j,r (ρ m (β) rr C g,qr ,ls,mt ) F k xijls V k xjmdt W k

annex

 (2, 1), and SO(3, 1 ). The multiplicative norm penalty is plotted in each lower subplot, and demonstrates that this penalty is important early on in preventing the learning of a trivial representation, but for later iterations stays at its clipped value of 1. Loss is plotted on each upper subplot.again label the learned group representations of SO(3, 1) with primed spins, i.e. (s 1 , s 2 ). The tensor product structures of the representations is shown in figure 3 .We have produced a software library titled Lie Algebraic Networks (LAN) built on PyTorch, which derives all Clebsch-Gordan coefficients and computes the forward pass of Lie group equivariant neural networks. LAN also deals with Lie algebra representations, allowing for operations such as taking the tensor product of multiple group representations. figure 5 demonstrates the LAN library. Starting from several representations for a Lie algebra, LAN can automatically construct a neural network equivariant to the associated Lie group with the desired number of layers and channels. We present our experimental results training SO(2, 1) and SO(3, 1)-equivariant object-tracking networks in section 5.2.(1/2 , 1/2 ) (3/2, 3/2)

SO(3, 1)

Figure 3 : Tensor product structure of the learned group representations ρ with several known (analytically-derived) group representations ρ 1 for the groups SO(3), SO(2, 1), and SO(3, 1). Each column is for the group indicated at the bottom, each row is for a different choice of ρ 1 for that group, and the horizontal axis indicates the ρ (i) onto which we project the tensor product ρ ⊗ ρ 1 ∼ = ⊕ i∈I ρ (i) . The diagnostic r (defined in section 2.5) is plotted on the y-axis with a log scale for each subfigure. The labelling of group representations is explained in section 5.1, recall that the primed integers indicate learned representations. The first row demonstrates by Schur's Lemma that to within numerical error of about ∼ 10 -6 the learned SO(3) group representation denoted 1 is isomorphic to the spin-1 irreducible group representation obtained from known formulae, i.e.The first row also indicates that 1 SO(2,1) ∼ = 1 SO(2,1) , andThe remaining rows indicate that the tensor product structure of the learned group representations matches that of the known irreducible group representations. 

