A WIGNER-ECKART THEOREM FOR GROUP EQUIVARIANT CONVOLUTION KERNELS

Abstract

Group equivariant convolutional networks (GCNNs) endow classical convolutional networks with additional symmetry priors, which can lead to a considerably improved performance. Recent advances in the theoretical description of GCNNs revealed that such models can generally be understood as performing convolutions with G-steerable kernels, that is, kernels that satisfy an equivariance constraint themselves. While the G-steerability constraint has been derived, it has to date only been solved for specific use cases -a general characterization of Gsteerable kernel spaces is still missing. This work provides such a characterization for the practically relevant case of G being any compact group. Our investigation is motivated by a striking analogy between the constraints underlying steerable kernels on the one hand and spherical tensor operators from quantum mechanics on the other hand. By generalizing the famous Wigner-Eckart theorem for spherical tensor operators, we prove that steerable kernel spaces are fully understood and parameterized in terms of 1) generalized reduced matrix elements, 2) Clebsch-Gordan coefficients, and 3) harmonic basis functions on homogeneous spaces.

1. INTRODUCTION

Undoubtedly, symmetries play a central role in the formulation of physical theories. Any imposed symmetry greatly reduces the set of admissible physical laws and dynamics. Specifically in quantum mechanics, the Hilbert space of a system is equipped with a group representation which specifies the transformation law of system states. Quantum mechanical operators, which map between different states, are required to respect these transformation laws. That is, any symmetry transformation of a state on which they act should lead to a corresponding transformation of the resulting state after their action. This requirement imposes a symmetry constraint on the operators themselves -only specific operators can map between a given pair of states. The situation in equivariant deep learning is remarkably similar to that in physics. Instead of a physical system, one considers in this case some learning task subject to symmetries. For instance, image segmentation is usually assumed to be translationally symmetric: a shift of the input image should lead to a corresponding shift of the predicted segmentation mask. Convolutional networks guarantee this property via their inherent translation equivariance. The role of the quantum states is in equivariant deep learning taken by the features in each layer, which are due to the enforced equivariance endowed with some transformation law. The analog of quantum mechanical operators, mapping between states, is the neural connectivity, mapping between features of consecutive layers. As in the case of operators, there is a symmetry (equivariance) constraint on the neural connectivity -only specific connectivity patterns guarantee a correct transformation law of the resulting features. In this work we are considering group equivariant convolutional networks (GCNNs), which are convolutional networks that are equivariant w.r.t. symmetries of the space on which the convolution is performed. Typical examples are isometry equivariant CNNs on Euclidean spaces (Weiler & Cesa, 2019) or spherical CNNs (Cohen et al., 2018) . Many different formulations of GCNNs have been proposed, however, it has recently been shown that H-equivariant GCNNs on homogeneous spaces H/G can in a fairly general setting be understood as performing convolutions with G-steerable kernels (Cohen et al., 2019b) . Convolutional weight sharing hereby guarantees the equivariance under "translations" of the space while G-steerability is a constraint on the convolution kernel that ensures its equivariance under the action of the stabilizer subgroup G < H. Although the space of Gsteerable kernels has been characterized for specific choices of groups G and feature transformation laws, i.e., group representations ρ, see Section 5, no general solution was known so far. This work characterizes the solution space for arbitrary compact groups G. Our solution is motivated by the close resemblance of the G-steerability kernel constraint to the defining constraint of spherical tensor operators (or more general representation operators (Jeevanjee, 2011) ) in quantum mechanics. The famous Wigner-Eckart theorem describes the general structure of these operators by Clebsch-Gordan coefficients, with the degrees of freedom given by reduced matrix elements. By generalizing this theorem, we find a general characterization and parameterization of G-steerable kernel spaces. For specific examples, like G = SO(3) or compact subgroups of G = O(2), our kernel space solution specializes to earlier work, e.g., Worrall et al. (2016) ; Thomas et al. (2018) ; Weiler & Cesa (2019) . Our main contributions are the following: • We present a generalized Wigner-Eckart theorem 4.1 for G-steerable kernels. It describes the general structure of equivariant kernels in terms of 1) endomorphism bases, which generalize reduced matrix elements, 2) Clebsch-Gordan coefficients, and 3) harmonic basis functions on a suitable homogeneous space. In contrast to the usual formulation, we cover any compact group G and both real and complex representations. • Corollary 4.2 explains how to parameterize G-steerable kernels and thus GCNNs. • We apply the theorem exemplarily to solve for the kernel spaces for the symmetry groups SO(2) , Z/2 , SO(3) and O(3) , considering both real and complex representations. Thereby, we demonstrate that the endomorphism bases, Clebsch-Gordan coefficients, and harmonic basis functions can usually be determined for practically relevant symmetry groups.

2. SYMMETRY-CONSTRAINED OPERATORS AND THEIR MATRIX ELEMENTS

To motivate our generalized Wigner-Eckart theorem, we review quantum mechanical representation operators and G-steerable kernels with an emphasis on the similarity of their underlying symmetry constraints. Due to their symmetries, the matrix elements of such operators and kernels are fully specified by a comparatively small number of reduced matrix elements or learnable parameters, respectively. This reduction is for representation operators described by the Wigner-Eckart theorem. For clarity, we discuss this theorem in its most popular form, i.e., for spherical tensor operators (SO(3)-representation operators transforming under irreducible representations). The Representation Operator Constraint Consider a quantum mechanical system with symmetry under the action of some group G, for instance rotations. The action of this symmetry group on quantum states is modeled by some unitary G-representationfoot_0 U : G → U(H) on the Hilbert space H. More specifically, G acts on kets according to |ψ → |ψ := U (g) |ψ and on bras according to ψ| → ψ | := ψ| U (g) † , where U (g) † is the adjoint of U (g). Observables of the system correspond to self-adjoint operators A = A † . The expectation value of such an observable in some quantum state |ψ is given by ψ|A|ψ ∈ R. The transformation behaviors of states and observables need to be consistent with each other. As an example, consider a system consisting of a single, free particle in R 3 , which is (among other symmetries) symmetric under rotations G = SO(3). The momentum of the particle in the direction of the three frame axes is measured by the three momentum operators (P 1 , P 2 , P 3 ). Since the momentum of a classical particle transforms geometrically like a vector, one needs to demand the same for the momentum observable expectation values. If we denote by p i := ψ|P i |ψ the expected momentum in i-direction, this means that the expected momentum of a rotated system is given by p i = j R ij p j = j R ij ψ|P j |ψ , where R ∈ SO(3) is an element of the rotation group. This result should agree with the expectation values for rotated system states, that is, p i = ψ |P i |ψ = ψ|U (R) † P i U (R)|ψ . As this argument is independent from the particular choice of state |ψ , and making use of the linearity of the operations, this implies a consistency constraint j R ij P j = U (R) † P i U (R), which identifies the collection (P 1 , P 2 , P 3 ) as a vector operator. Other geometric quantities are required to satisfy similar constraints: For instance, energy is a scalar (i.e., invariant) quantity and the Hamilton operator H is a scalar operator, satisfying H = U (R) † H U (R). Similarly, any matrix valued classical quantity corresponds to a rank (1, 1) Cartesian tensor operator (M ij ) i,j=1,2,3 subject to kl R ik M kl (R -1 ) lj = U (R) † M ij U (R). The overarching framework to study such situations is the notion of a representation operator, which we define as a family of operators (A 1 , . . . , A N ) which are required to satisfy the constraint N j=1 π(g) ij A j = U (g) † A i U (g) ∀ g ∈ G , where π : G → U(C N ) is some unitary representation of the symmetry group under consideration. The examples above correspond to specific choices of representations, namely the trivial representation π(R) = 1 for scalars, the "standard" representation π(R) = R for vectors and the tensor product representation π(R) = R ⊗ (R -1 ) for matrices. Spherical tensor operators, discussed below, correspond to the irreps (irreducible representations) of SO(3). The Steerable Kernel Constraint Convolution kernels of group equivariant CNNs are required to satisfy a very similar constraint to that in Eq. ( 1). Before coming to such GCNNs, consider the case of conventional CNNs, processing image-like signals on a Euclidean space R d . Such signals are formalized as c-channel feature maps f : R d → K c that assign a c-dimensional feature vector f (x) ∈ K c to each point x ∈ R d , where we allow for K being either of the real or complex numbers R or C. Each CNN layer maps its input feature map f in : R d → K cin via a convolution to an output feature map f out := K f in : R d → K cout . Since the convolution maps c in input channels to c out output channels, the kernel K : R d → K cout×cin is matrix-valued. Conventional CNNs are translation equivariant, however it is often desirable that the convolution is equivariant w.r.t. a larger symmetry group, for instance the isometries E(d) of R d (Weiler & Cesa, 2019) . For simplicity, we consider semidirect product groups of the form (R d , +) G, where G ≤ GL(d) is any compact group. Group elements tg ∈ (R d , +) G are uniquely split into a translation t ∈ (R d , +) and an element g ∈ G, stabilizing the origin. They act on R d according to x → (tg) • x := gx + t. The equivariance of a GCNN -which is the analog to the symmetry of a quantum mechanical system -requires the feature spaces to be endowed with a group action of the symmetry group. A natural choice is to model the feature spaces as spaces of feature fields, for instance scalar, vector or tensor fields (Cohen & Welling, 2016b) . Such feature fields are defined as functions f : R d → V , where the difference to conventional feature maps is that the space V ∼ = K c of feature vectors is equipped with a group representation ρ : G → GL(V ) of the stabilizer G. The full symmetry group acts on feature fields according to f → (tg) • f := ρ(g) • f • (tg) -1 , which is known as the induced representation of ρ. As proven in (Weiler et al., 2018a) , the most general linear and equivariant map from an input field f in : R d → V in to an output field f out : R d → V out is a convolution with a G-steerable kernel K : R d → Hom K (V in , V out ) ∼ = K cout×cin . Such kernels take values in the space of linear operators from V in to V out and are required to satisfy the G-steerability (equivariance) constraint K(gx) = ρ out (g) • K(x) • ρ in (g) -1 ∀ g ∈ G, x ∈ R d . One can easily check that a convolution with a G-steerable kernel K is indeed equivariant, i.e., satisfies K ((tg) • f ) = (tg) • (K f ) for any tg ∈ (R d , +) G. This result was later generalized to feature fields on homogeneous spaces H/G of unimodular locally compact groups H (Cohen et al., 2019b) and on Riemannian manifolds with structure group G (Cohen et al., 2019a) . That the equivariance of the convolutional network requires G-steerable kernels in any of these settings underlines the great practical relevance of our results. The two constraints, Eq. (1) and Eq. ( 2), are remarkably similar: the left-hand-sides are in both cases given by a G-transformation of the operator or kernel itself while the right-hand-sides are given by pre-and postcomposition of the operator or kernel with unitary representations. More details on this comparison can be found in Appendix C.1.3. The Wigner-Eckart Theorem for Spherical Tensor Operators All information about a linear operator A : H → H is encoded by its matrix elements A µν := µ|A|ν ∈ C relative to a given basis, where |ν ∈ H and µ| ∈ H * denote basis elements of the Hilbert space and its dual. Similarly, all information about a convolution kernel K is encoded by its matrix elements K µν (x) := µ|K(x)|ν ∈ K, where |ν ∈ V in and µ| ∈ V * out are elements of chosen bases for the input representation and dual output representation. Considering general operators and kernels, i.e., ignoring the symmetry constraints in Eqs. ( 1) and ( 2), all matrix elements are independent degrees of freedom. In the case of convolution kernels, they correspond directly to the c out • c in learnable parameters for every point of the kernel. However, if A is a representation operator -or if K is a G-steerable kernel -the symmetry constraints couple the matrix elements to each other such that they can not be chosen freely anymore. For representation operators, this statement is made precise by the Wigner-Eckart theorem. The Wigner-Eckart theorem is best known in its classical form, which applies specifically to spherical tensor operators. These operators are the representation operators for the irreps of SO(3), i.e., the Wigner D-matrices D j : SO(3) → U(C 2j+1 ). As such, spherical tensor operators of rank j are defined as families T j = (T -j j , . . . , T j j ) of 2j + 1 operators T m j that satisfy the constraint j n=-j D mn j (g) T n j = U (g) † T m j U (g) for any g ∈ SO(3). In order to express the operators T m j in terms of matrix elements, we need to fix a basis of H. Due to the SO(3)-symmetry of T j , a natural choice are the angular momentum eigenstatesfoot_1 |ln , where l ∈ N ≥0 and n = -l, . . . , l. For fixed quantum numbers j, l, and J, there are 2j + 1 components T m j of T j , 2l + 1 basis kets |ln , and 2J + 1 basis bras JM |. This implies that there are (2J + 1)(2j + 1)(2l + 1) different matrix elements JM | T m j |ln ∈ C for these quantum numbers. According to the Wigner-Eckart theorem, all of these matrix elements are fully specified by one single number (Jeevanjee, 2011): Theorem 2.1 (Wigner-Eckart theorem for Spherical Tensor Operators). Let j, l, J ∈ N ≥0 and let T j be a spherical tensor operator of rank j. Then there is a unique complex number, the reduced matrix element λ ∈ C (often written J T j l ∈ C), that completely determines any of the (2J + 1)(2j + 1)(2l + 1) matrix elements JM | T m j |ln by the relation JM | T m j |ln = λ • JM |jm; ln . The coupling coefficients JM |jm; ln , known as Clebsch-Gordan coefficients, are given by the projection of the tensor product basis |jm; ln := |jm ⊗ |ln on |JM . They are purely algebraic and therefore independent of the spherical tensor operator T j . This result generalizes to arbitrary representation operators of the form in Eq. (1) (Agrawala, 1980) . The similarities between representation operators and G-steerable kernels suggests that a similar statement might hold for the matrix elements of G-steerable kernels as well. As proven below, this is indeed the case: our generalized Wigner-Eckart theorem separates their independent degrees of freedom from purely algebraic relations between mutually dependent matrix elements. It does therefore give an explicit parametrization of the space of G-steerable kernels.

3. BUILDING BLOCKS OF STEERABLE KERNELS

This chapter gives a brief introduction to the mathematical concepts that are required to formulate our Wigner-Eckart theorem for G-steerable kernels. The first two of the following paragraphs explain why it is w.l.o.g. possible to restrict attention to steerable kernels on homogeneous spaces and to irreducible representations. The following three paragraphs discuss the building blocks of steerable kernels, which are endomorphisms, harmonic basis functions described by the Peter-Weyl theorem, and tensor product representations and their Clebsch-Gordan decomposition. An illustration of the concepts introduced in this chapter is given in Appendix A. The Restriction to Homogeneous Spaces Convolution kernels are usually defined on a Euclidean space R d , i.e., they are functions K : R d → Hom K (V in , V out ). The G-steerability constraint in Eq. ( 2) relates kernel values K(x) at x to kernel values K(gx) at all other points gx on the orbit Gx := {gx | g ∈ G} of x. To solve the constraint, it is therefore w.l.o.g. sufficient to consider restrictions of kernels to the individual orbits, from which the full solution on R d can be assembled (Weiler et al., 2018a) . By construction, the orbits have the structure of a homogeneous space: Definition 3.1 (Homogeneous Space, Transitive Action). Let • : G×X → X be a continuous action of a compact group G on a topological space X. Then X is called a homogeneous space w.r.t. G if ∅ = X and if for all x, y ∈ X there is a g ∈ G such that gx = y. The action is then called transitive. We will in the following w.l.o.g. consider steerable kernels K : X → Hom K (V in , V out ) on such homogeneous spaces X. Restriction to Irreducible Unitary Representations The theorems below apply specifically to unitary representations, that is, representations for which the automorphisms ρ(g) preserve distances (Knapp, 2002) . As asserted by Theorem B.20, this is not really a restriction as every finite-dimensional linear representation can be considered as being unitary. Thus, we assume ρ : G → U(V ), where U(V ) is the unitary group, i.e., the group of distance-preserving linear functions on V . In the case of K = R we say orthogonal instead of unitary and write O(V ). Additionally, prior research has shown that it is sufficient to solve the kernel constraint in Eq. ( 2) for irreducible (unitary) input-and output representations instead of arbitrary finite-dimensional representations (Weiler & Cesa, 2019) . This is possible due to the linearity of the constraint and the fact that any finite-dimensional unitary representation decomposes by Proposition B.38 into an orthogonal direct sum of irreps. The solution for general representations can thus be recovered from the solutions for irreps. More details on these considerations can be found in Section D.1.3. If two unitary irreps are related by an isometric intertwiner, they are isomorphic; see Definition B.18. The set of isomorphism classes of unitary irreps of G is denoted by G. We assume that for each isomorphism class j ∈ G we have picked a representative irrep ρ j : G → U(V j ). We denote by d j the dimension of V j , so that we have V j ∼ = K dj . Overall, we can w.l.o.g. replace R d with X and ρ in and ρ out by ρ l : G → U(V l ) and ρ J : G → U(V J ), where X is a homogeneous space and ρ l and ρ J are (representatives of isomorphism classes of) irreducible unitary representations of G. This leads to our working definition of steerable kernels, to which we restrict from now on: Definition 3.2 (Steerable Kernel on a Homogeneous Space w.r.t. Unitary Irreps). Let X be a homogeneous space of G and ρ l : G → U(V l ) and ρ J : G → U(V J ) be representatives of isomorphism classes of irreducible unitary representations of G. A G-steerable kernel (on a homogeneous space and w.r.t. unitary irreps) is any function K : X → Hom K (V l , V J ) such that the following Gsteerability constraint holds: K(gx) = ρ J (g) • K(x) • ρ l (g) -1 ∀ g ∈ G, x ∈ X. Endomorphisms An important concept, underlying the reduced matrix elements in the Wigner-Eckart theorem for spherical tensor operators, is that of endomorphisms of linear representations. Definition 3.3 (Endomorphism of a of Linear Representation). Let ρ : G → GL(V ) be a linear representation. An endomorphism of ρ is a linear map c : V → V which satisfies c • ρ(g) = ρ(g) • c for all g ∈ G. The space of all endomorphisms of ρ is written End G,K (V ). Endomorphisms play a central role in our generalized Wigner-Eckart theorem for steerable kernels. To get an insight why this is the case, consider a given steerable kernel K : X → Hom K (V l , V J ). The post-composition (c • K)(x) := c • (K(x)) of this kernel with any endomorphism c ∈ End G,K (V J ) is obviously still steerable, i.e., satisfies Eq. (3). A basis of the space of steerable kernels is therefore partly explained by bases of the endomorphism spaces, and thus occurs in our general solution. In the following, we write {c r | r = 1, . . . , E J } for the basis of End G,K (V J ), where E J := dim(End G,K (V J )) is the dimension of the endomorphism space. 3The Peter-Weyl Theorem and Harmonic Basis Functions A cornerstone in our proof of the Wigner-Eckart theorem for steerable kernels is Theorem C.7. It states that the space of steerable kernels, which are G-equivariant maps K : X → Hom K (V l , V J ), is isomorphic to the space of linear G-equivariant maps of the form K : L 2 K (X) → Hom K (V l , V J ). We are therefore interested in the representation theory of L 2 K (X), which is described by the Peter-Weyl theorem.foot_4 Theorem 3.4 (Peter-Weyl Theorem, Existence of Harmonic Basis Functions). Let G be a compact group and X a homogeneous space. Let G be the set of isomorphism classes of irreducible representations. For j ∈ G, let ρ j : G → U(V j ) be a representative with dimension d j = dim(V j ). Then there are multiplicities m j ∈ N ≥0 with m j ≤ d j , and for each i = 1, . . . , m j there are harmonic basis functions Y m ji : X → K, m = 1, . . . , d j , such that the following three properties hold: 1. The Y m ji , for fixed j and i, are steerable (Freeman & Adelson, 1991; Hel-Or & Teo, 1998) , i.e., transformation via g ∈ G can be expressed by shifting basis coefficients with ρ j : Y m ji (g -1 x) = dj m =1 ρ m m j (g) Y m ji (x). 2. Any square-integrable function f : X → K can be uniquely expanded in terms of harmonic basis functions, i.e., f = j∈ G mj i=1 dj m=1 λ jim Y m ji with coefficients λ jim ∈ K. 3. The Y m ji are an orthonormal system with respect to the scalar product given by integration: X Y m ji (x) Y m j i (x) dx = δ jj δ ii δ mm . 5 Note the similarity of these properties to those encountered in usual Fourier analysis. Indeed, the Peter-Weyl theorem can be viewed as describing the harmonic analysis on arbitrary compact groups and their homogeneous spaces. Tensor Products and Clebsch-Gordan Coefficients The last ingredients that we need to discuss are tensor product representations and Clebsch-Gordan coefficients. They appear, roughly speaking, in the following way: the kernel K can be thought of as being built from harmonic basis functions Y m ji which transform according to the corresponding irrep ρ j . When a harmonic kernel component of type ρ j acts on an input feature field of type ρ l , the combination will transform according to their tensor product ρ j ⊗ ρ l . If the convolution should map to an output field of type ρ J , not any harmonic component Y m ji is admissible, but only those for which ρ J appears as a subrepresentation in the tensor product ρ j ⊗ ρ l . The Clebsch-Gordan coefficients encode whether ρ j ⊗ ρ l contains ρ J , and, if it does, in which way and how often ρ J is embedded in the tensor product. For more details on the definitions in this section see Appendix D.1. Definition 3.5 (Tensor product representation). Let ρ : G → U(V ) and ρ : G → U( Ṽ ) be unitary representations. Then their tensor product ρ ⊗ ρ : G → U(V ⊗ Ṽ ) is defined by: (ρ ⊗ ρ)(g) (v ⊗ ṽ) = ρ(g) (v) ⊗ ρ(g) (ṽ). The tensor product ρ j ⊗ ρ l of two irreps is itself in general not irreducible anymore. However, as it is again a unitary representation, it splits by Proposition B.38 into a direct sum of irreducible unitary subrepresentations. Thus, there is an equivariant isomorphism CG jl : V j ⊗ V l → J∈ G [J(jl)] s=1 V J . The integer [J(jl)] is the multiplicity of V J in V j ⊗ V l , which is zero for all but finitely many J. The matrix elements of CG jl are denoted as Clebsch-Gordan coefficients:  Definition 3.6 (Clebsch-Gordan Coefficients). Let Y m j ⊗ Y n l be the basis tensors in V j ⊗ V l

4. A WIGNER-ECKART THEOREM FOR G-STEERABLE KERNELS

Now that we have discussed all of the required ingredients, we are ready for stating our main theorem. Intuitively, our Wigner-Eckart theorem identifies exactly those combinations of harmonics, Clebsch-Gordan coefficients and endomorphisms that, when being assembled together, yield a Gsteerable kernel K : X → Hom K (V l , V J ). The kernel will thereby comprise all those harmonics Y m ji for which the tensor product V j ⊗ V l contains V J as a factor. The number of possible combinations depends therefore on the number of different isomorphism classes j ∈ G for which V J appears as a factor in the tensor product, the multiplicity [J(jl)] with which it occurs, and the multiplicities m j of harmonics Y m ji in the Peter-Weyl decomposition that transform according to ρ j . In addition, each individual combination can subsequently be composed with an endomorphism in End G,K (V J ), which increases the number of combinations by a factor of E J = dim(End G,K (V J )) to a total of Λ Jl := E J • j∈ G [J(jl)] • m j . This number is finite, as we explain in Remark D.18. How are such assembled steerable kernels parameterized? The learnable parameters correspond to the degrees of freedom in the individual components from which the kernel is built. While the Clebsch-Gordan coefficients and harmonic basis functions are fixed, the endomorphisms are elements of the E J -dimensional vector spaces End G,K (V J ). The degrees of freedom of a G-steerable kernel are therefore identified with the choice of endomorphisms.foot_6 This gives a total of Λ Jl parameters which take values in K. Note that the choice of endomorphisms corresponds directly to the choice of reduced matrix elements of spherical tensor operators. For a kernel K : X → Hom K (V l , V J ), we write JM |K(x)|ln for the matrix elements of K(x) ∈ Hom K (V l , V J ) with indices n ≤ d l and M ≤ d J , see also Definition D.9. Similarly, endomorphisms c ∈ End G,K (V J ) have matrix elements JM |c|JM with M, M ≤ d J . We furthermore write i, jm|x := Y m ji (x). Finally, we denote the space of G-steerable kernels by Hom G (X, Hom K (V l , V J )). Our main result is the following Wigner-Eckart theorem for G-steerable kernels. Other versions at different levels of abstraction can be found in Theorems D.13 and D.16. Theorem 4.1 (Wigner-Eckart Theorem for Steerable Kernels). There is a vector space isomorphism GKer : j∈ G mj i=1 [J(jl)] s=1 End G,K (V J ) → Hom G (X, Hom K (V l , V J )) . (5) A general steerable kernel K = GKer((c jis ) jis ) with c jis ∈ End G,K (V J ) has matrix elements JM |K(x)|ln kernel matrix elements = j∈ G mj i=1 [J(jl)] s=1 dj m=1 d J M =1 JM c jis JM endomorphisms • s, JM jm; ln Clebsch-Gordan • i, jm x harmonics . Proof. We shortly sketch a proof of this theorem. We use the notation Hom G,K to denote linear equivariant maps. The space of steerable kernels can be progressively transformed as follows: Hom G X, Hom K (V l , V J ) (1) ∼ = Hom G,K L 2 K (X), Hom K (V l , V J ) (2) ∼ = Hom G,K j∈ G mj i=1 V ji , Hom K (V l , V J ) (3) ∼ = j∈ G mj i=1 Hom G,K V j , Hom K (V l , V J ) (4) ∼ = j∈ G mj i=1 Hom G,K V j ⊗ V l , V J (5) ∼ = j∈ G mj i=1 Hom G,K J ∈ G [J (jl)] s=1 V J , V J (6) ∼ = j∈ G mj i=1 [J(jl)] s=1 Hom G,K (V J , V J ) (7) ∼ = j∈ G mj i=1 [J(jl)] s=1 End G,K (V J ) In (1), we linearize the kernels such that they become representation operators, as detailed in Theorem C.7. Step (2) applies the representation-theoretic version of the Peter-Weyl Theorem B.22 to decompose L 2 K (X) in harmonic basis functions. Step (3) makes use of the well-known fact that linear maps can be described on each direct summand individually. Topological details are explained Published as a conference paper at ICLR 2021 in Lemma D.20. In (4), we use the hom-tensor adjunction Proposition D.23. In (5), we use the Clebsch-Gordan decomposition Eq. ( 4), which provides us with Clebsch-Gordan coefficients. In (6), we use that nontrivial linear equivariant maps from V J to V J exist by Schur's Lemma B.29 only for J = J and, once again, that we can describe linear maps on each direct summand individually. Finally, in (7) we note that Hom G,K (V J , V J ) = End G,K (V J ) is the space of endomorphisms. The formula of the matrix elements Eq. ( 6) is fully proven in Theorem D.13 by carefully tracing back all the isomorphisms above.foot_7  Technically, step (1) is the main gap that we had to bridge: it establishes that non-linear kernels on X can be seen as linear representation operators on L 2 K (X). Steps (2) to (7) orient at the proof of the Wigner-Eckart theorem for representation operators by Agrawala (1980) . However, it differs non-trivially from the reference by a) allowing the operator to be non-injective, b) topological considerations, since L 2 K (X) is not simply a direct sum of irreps but its topological closure, and c) the possibility to allow for real representations, which is why we end up with endomorphisms. We obtain the following corollary, which clarifies how steerable kernels can be parameterized: Corollary 4.2. The space Hom G (X, Hom K (V l , V J )) of steerable kernels is spanned by basis kernels {K jisr : X → Hom K (V l , V J ) | j ∈ G, i ≤ m j , s ≤ [J(jl)], r ≤ E J } with matrix elements JM |K jisr (x)|ln = dj m=1 d J M =1 JM c r JM • s, JM jm; ln • i, jm x . (7) A matrix-expression of the basis kernels from Eq. ( 7) is given in Eq. ( 22). Here, c r is one of the E J basis endomorphisms of End G,K (V J ). This means that a general steerable kernel K : X → Hom K (V l , V J ) is of the form K = j∈ G mj i=1 [J(jl)] s=1 E J r=1 λ jisr • K jisr with a total of Λ Jl = E J • j∈ G [J(jl)] • m j learnable parameters λ jisr ∈ K. Overall, the kernel space can therefore be parameterized with an isomorphism GKer : K Λ Jl → Hom G (X, Hom K (V l , V J )). Proof. We simply choose K jisr := GKer((c jisr j i s ) j i s ) with c jisr j i s = δ j j • δ i i • δ s s • c r . Clearly, the c jisr are a basis of j∈ G mj i=1

[J(jl)]

s=1 End G,K (V J ), and since GKer is an isomorphism, the K jisr form a basis of steerable kernels. Remark 4.3. The matrix elements JM |c jis |JM relate to the reduced matrix elements λ ∈ C of spherical tensor operators as follows: in the case of spherical tensor operators one deals with complex irreps, whose endomorphism spaces are according to Schur's Lemma D.8 generated by the identity. Consequently, such endomorphisms c have matrix-elements JM |c|JM = λ δ M M for some scaling factor λ ∈ C. λ is denoted as the reduced matrix element of the spherical tensor operator. The analog to λ in our Wigner-Eckart theorem are the learnable parameters λ jisr ∈ K.

5. RELATED WORK

Harmonic convolution kernels date back to at least the early '80s (Hsu & Arsenault, 1982; Rosen & Shamir, 1988) . The term steerable filter was coined in Freeman & Adelson (1991) . Hel-Or & Teo (1998) generalized steerable filters to Lie groups. Reisert & Burkhardt (2007) proposed matrix valued steerable kernels between representation spaces, which are similar to our G-steerable kernels. Steerable CNNs formulate GCNNs in the language of representation theory and feature fields. This design was proposed by Cohen & Welling (2016b) , who specifically considered finite groups, for which the kernel constraint can be solved numerically. Weiler et al. (2018a) introduced the Gsteerability constraint in the form in Eq. ( 2) for G = SO(3). The authors choose a slightly different approach to solve the constraint in which they decompose the space Hom R (V l , V J ) ∼ = V * l ⊗ V J instead of V j ⊗ V l via Clebsch-Gordan coefficients. An essentially equivalent design was simultaneously proposed by Thomas et al. (2018) , who decomposed V j ⊗ V l as in the present work, see Appendix E.5. The case of complex valued irreps of SO(2) was investigated by Worrall et al. (2016) and Wiersma et al. (2020) ; see Appendix E.1. Weiler & Cesa (2019) solve the constraint for any, not necessarily irreducible, representation of the groups O(2), SO(2), D N and C N . Their solution strategy is based on an expansion of the kernel in the Fourier basis of L 2 R (S 1 ) and solving for the Fourier coefficients satisfying the constraint. This is a special case of the strategy that we employ in the proof of our Wigner-Eckart theorem. de Haan et al. (2020) solve for SO(2)-steerable kernels by viewing them as invariants of the tensor product representation L 2 R (S 1 ) * ⊗ V * l ⊗ V J . As they use real valued irreps, they can use that the duals are isomorphic to their original counterparts. Our Wigner-Eckart theorem unifies all of these results in one general framework. To which use cases does the proposed kernel space solution apply? As argued by Cohen et al. (2019b) , any H-equivariant convolutional network on a homogeneous space H/G needs to satisfy a G-steerability constraint -if H is locally compact and unimodular. While these works proved the necessity of steerable kernels, they did not solve the constraint -a gap which is filled by our Wigner-Eckart theorem for compact groups G, see also Remark D.15. This framework includes in particular the popular group convolutions on flat spaces (Cohen & Welling, 2016a) and homogeneous spaces of compact groups (Kondor & Trivedi, 2018) and Lie groups (Bekkers, 2020) , including for instance the sphere (Cohen et al., 2018) . Specifically, if ρ in and ρ out are chosen to be regular representations L 2 K (G), steerable convolutions are equivalent to group convolutions (Weiler & Cesa, 2019) . A related line of work are Clebsch-Gordan Networks (Kondor et al., 2018; Kondor, 2018; Anderson et al., 2019; Bogatskiy et al., 2020) . They apply bilinear equivariant nonlinearities which compute the tensor products of global irrep features. A subsequent Clebsch-Gordan decomposition disentangles the product features back into irrep features. Note that in this network design, the Clebsch-Gordan coefficients are used in the nonlinear part, which differs from our use of these coefficients in the construction of steerable basis kernels, i.e. in the linear part of the network. 6 EXAMPLE APPLICATIONS Cohen et al. (2019b) showed in a fairly general setting that every GCNN is based on G-steerable kernels. In practice, a basis for the space of G-steerable kernels needs to be determined for parameterizing GCNNs. This work determines the general structure of these basis kernels for compact (point-)symmetry groups G and their homogeneous spaces X: Corollary 4.2 explains that one needs to determine 1) the irreps ρ l of G, 2) harmonic basis functions Y m ji in L 2 K (X) according to the Peter-Weyl Theorem 3.4, 3) the Clebsch-Gordan decomposition of V j ⊗ V l , given by the Clebsch-Gordan coefficients s, JM |jm; ln , and 4) a basis of endomorphisms c r ∈ End G,K (V J ) for any J ∈ G. Given these ingredients, they can in a fifth step be put together according to Eq. ( 7) to obtain a complete, Λ Jl -dimensional basis of G-steerable kernels K jisr : X → Hom K (V l , V J ). Appendix E demonstrates this for the examples of G being SO(2), SO(3), O(3), and Z/2, considering both real and complex irreps. In any of these cases, we derive the kernel bases following exactly the five steps outlined above. This procedure can easily be applied to further compact groups, for instance SU(2) or SU(3), which play an important role in physics applications of deep learning.

7. CONCLUSIONS AND FUTURE WORK

Prior work revealed that group equivariant convolutions generally rely on G-steerable kernels. Our Wigner-Eckart theorem for G-steerable kernels characterizes them for the practically relevant case of G being any compact group. The degrees of freedom -or learnable parameters -correspond thereby precisely to the choice of endomorphisms. This mirrors the situation in quantum mechanics, where the degrees of freedom of spherical tensor operators are given by reduced matrix elements. It would be desirable to extend this result to non-compact groups, where the Peter-Weyl Theorem does not hold anymore. One alternative might be Pontryagin duality (Reiter, 1968) , which describes the Fourier transform on locally compact abelian groups. Furthermore, for many non-compact, nonabelian groups, one can often find a direct integral decomposition of L 2 C (G). This generalization of the Peter-Weyl theorem can be found in Segal (1950) and Mautner (1955) . Such generalizations of our Wigner-Eckart Theorem might lead to a better theoretical understanding of several recent work (Worrall & Welling, 2019; Bekkers, 2020; Sosnovik et al., 2020; Shutty & Wierzynski, 2020) . Finally, we hope that the analogies between steerable kernels and representation operators appearing in physics inspire further research in this fascinating crossdisciplinary domain. This could lead to applications of GCNNs for learning tasks with physical symmetries.

APPENDIX

This appendix contains a detailed and rigorous treatment of the Wigner-Eckart theorem for steerable kernels, including background knowledge, proofs, and many example applications. In Chapter A, we shortly look at the simple example SO(2) for motivating the concepts and results in Section 3. Everything afterwards, starting with Chapter B, can be read independently of the main paper and is a self-contained treatment of our investigations. In Chapter B, we start with the foundations of the representation theory of compact groups. We formulate the Peter-Weyl Theorem B.22, which tells us how to decompose the space of square-integrable functions on a homogeneous space into irreducible representations, leading to harmonic basis functions. In the second half, we include a proof of the more algebraic parts of this theorem. We do this since the theorem is usually only proven for complex representations in the literature, but we need it for real representation as well. In Chapter C we investigate steerable kernels and show their similarities to representation operators from physics and representation theory. In Theorem C.7 we will then proof a precise isomorphism between steerable kernels and representation operators on the space of square-integrable functions on a homogeneous space. We call these kernel operators. In Chapter D we will then formulate and prove the Wigner-Eckart theorem for steerable kernels of general compact groups D.13. The proof makes in essential parts use of the Peter-Weyl Theorem and Theorem C.7, and additionally of Schur's Lemma B.29, the hom-tensor adjunction Proposition D.23, and the Clebsch-Gordan decomposition of tensor products. In Chapter E, we then look at specific example applications of our theory. In these examples, we look at specific compact transformation groups G, specific, relevant homogeneous spaces X of the group and one of the fields R or C. For this combination we derive a basis for the space of steerable kernels between arbitrary irreducible input-and output representations of the group. Specifically, we look at harmonic networks (Worrall et al., 2016) , SO(2)-equivariant networks for real representations (Weiler & Cesa, 2019) , Z 2 -equivariant networks for real representations, SO(3)-equivariant networks for both real and complex representations (Weiler et al., 2018a; Thomas et al., 2018) , and O(3)-equivariant networks for both real and complex representations. The investigation of Z 2equivariant CNNs will additionally show that our result is consistent with group convolutional CNNs for the regular representation (Cohen & Welling, 2016a) . In Chapter F, we summarize some important notions and results from the theory of topological spaces, metric spaces, normed vector spaces, and (pre-)Hilbert spaces that we use throughout this appendix. Chapters B, C, and D contain the bulk of the theoretical work. We recommend the reader to first only read the first halves of these chapters, Sections B.1, C.1 and D.1, since they contain the formulation of the most important results and the main intuitions, whereas the second halves of these chapters, i.e., Sections B.2, C.2 and D.2, mainly contain detailed proofs that can be skipped when going over the material for the first time. vector space on which ρ l acts v i l or Y n l fixed chosen orthonormal basis vector of V l VECTOR SPACES AND HILBERT SPACES dim(V ) dimension of K-vectorspace V V ⊥ W V and W are perpendicular V ∼ = W V and W are isomorphic with respect to their structures V W V and W are not isomorphic with respect to their structures f |g bra-ket notation of a scalar product on a Hilbert space

CONTENTS OF THE APPENDIX

y|f |x equivalent to y|f (x) for a function f null(f ) null space of f im(f ) image of f f * adjoint of the operator f id V identity function on V (HILBERT) SPACE CONSTRUCTIONS FROM OTHER SPACES Hom K (V, W ) space of K-linear functions from V to W GL(V ) space of invertible K-linear functions from V to itself, sometimes written GL(V, K) in the literature Hom G,K (V, W ) space of intertwiners from V to W Hom G (X, W ) space of G-equivariant continuous maps from X to W , for a homogeneous space X End G,K (V ) space of endomorphisms of V , i.e., intertwiners from V to V V ⊗ W tensor product of two vector spaces over their common field. Also denotes the tensor product of pre-Hilbert spaces i∈I V i (orthogonal) direct sum of all V i i∈I V i topological closure of the (orthogonal) direct sum of all V i span K (M ) vector subspace of a K-vector space spanned by M V ⊥ orthogonal complement of V E λ (ϕ) eigenspace of ϕ for eigenvalue λ TOPOLOGICAL SPACES, METRIC SPACES, NORMED SPACES T topology U x open neighborhood of x ∈ X U x set of all open neighborhoods of x ∈ X lim U ∈Ux limit over the directed set of open neighborhoods of x lim k→∞ x k limit of the sequence (x k ) k A topological closure of A ⊆ X x norm of x |x| absolute value of x d(x, x ) distance of x, x according to metric d B (x) -ball around x according to some metric d HOMOGENEOUS SPACES AND THE PETER-WEYL THEOREM X a homogeneous space of G x * ∈ X arbitrary point S n n-dimensional sphere in (n + 1)-dimensional space µ a measure on a compact group G or its Homogeneous Space X X integral on a space X with respect to its measure L 2 K (X), L 2 K (G) Hilbert space of square-integrable functions on X and G with values in K λ unitary representation on L 2 K (X) or L 2 K (G) g(x) arbitrary lift of x with respect to projection π : G → X, g → gx * av(f ) average of f : G → K along cosets π * lift of functions L 2 K (X) → L 2 K (G) δ x Dirac delta function at point x δ U approximated Dirac delta function for nonempty open set U ρ ij l abbreviation for ρ v i v j l for orthonormal basis vectors v i , v j ∈ V l E linear span of all matrix coefficients of irreducible unitary representations E l linear span of all matrix coefficients of ρ l E j l linear span of all matrix coefficients ρ ij l with varying i but fixed j n l , m l multiplicity of l in orthogonal decomposition of L 2 K (G) and L 2 K (X), respectively V li copy of V l appearing in the Peter-Weyl decomposition of L 2 K (X) p li canonical projection p li : L 2 K (X) → V li and p li : l i V l i → V li sin m , cos m the functions x → sin(mx) and x → cos(mx) Y n l , r Y n l complex-and real-valued version of a spherical harmonic D l , r D l complex-and real-valued version of Wigner D-matrix KERNELS AND REPRESENTATION OPERATORS K kernel K : X → Hom K (V in , V out ) K f convolution of kernel K with input f K kernel operator or (more generally) representation operator K : T → Hom K (U, V ) K kernel operator K : L 2 K (X) → Hom K (V in , V out ) corre- sponding to a kernel K K| X kernel K| X : X → Hom K (V in , V out ) corresponding to a kernel operator K K for a representation operator K : T → Hom K (U, V ), this denotes the corresponding map K : T ⊗ U → V under the hom-tensor adjunction THE WIGNER-ECKART THEOREM ρ l , ρ J input-and output representations on the spaces V l and V J Y m j , Y n l , Y M J fixed chosen orthonormal basis vectors of the abstract irre- ducible representations V j , V l , V J JM |K(x)|ln matrix element of K(x) for a kernel K and x ∈ X d l dimension of l'th irrep V l as K-vector space m j number of times V j is in the Peter-Weyl decomposition of L 2 K (X) [J(jl)] number of times V J is in the direct sum decomposition of V j ⊗ V l c, c jis endomorphisms, mostly on V J . c jis are endomorphisms appearing in the Wigner-Eckart theorem for steerable kernels c r basis endomorphism of ρ J , indexed with index set r = 1, . . . , E J JM |c|JM matrix element at indices M, M for endomorphism c l s , l jis linear equivariant isometric embeddings l s : V J → V j ⊗ V l and l jis : SO(2)-steerable kernels K : R 2 → Hom R (V in , V out ) ∼ = R cout×cin allow for rotation equivariant convolutions. For instance, a convolution with an SO(2)-steerable kernel on R 2 is guaranteed to be SE(2) = (R 2 , +) SO(2) equivariant while a convolution with an SO(2)-steerable kernel on S 2 will be SO(3)-equivariant. V J → V ji ⊗ V l p jis projection p jis : V ji ⊗ V l → V J corresponding Homogeneous Spaces SO(2) acts on the kernel's domain R 2 by rotating it. The orbits of the action are therefore given by 1) the origin {0} and 2) circles of arbitrary radius. We know that the kernel constraint can be solved on each orbit individually, and so we can restrict to looking at those. Since {0} is rather trivial, we specifically consider the circle S 1 as a more interesting homogeneous space. Example A.1. Consider the circle S 1 and the rotation group SO(2). For convenience, we reparameterize both: we view SO(2) as the group of angles φ ∈ R/2πZ ∼ = [0, 2π]/ 0∼2π and S 1 as the space R/2πZ as well. Then the action of SO(2) on S 1 is given by φ • x := (φ + x) mod 2π. It is easy to see that this action is transitive, which makes the circle a homogeneous space of SO(2). Irreducible Representations As it is sufficient to solve the kernel constraint for irreducible orthogonal input-and output representations, we now state a classification of those up to isomorphism. Example A.2. The irreducible orthogonal representations ρ l : SO(2) → O(V l ) of SO(2) are labeled by indices ("quantum numbers") l ∈ N ≥0 . For l = 0, one has the trivial representation with V 0 = R and ρ 0 (φ) = id R . For l ≥ 1, one has V l = R 2 and ρ l (φ) = cos(lφ) -sin(lφ) sin(lφ) cos(lφ) , i.e., rotation matrices of "frequency l ". The isomorphism classes of irreducible orthogonal representations are then given by SO(2) ∼ = N ≥0 . We are thus in the following considering SO(2)-steerable kernels of the form K : S 1 → Hom R (V l , V J ) , where l, J ≥ 0. Endomorphisms Remember that if c : V J → V J is an endomorphism, i.e., commutes with ρ J , that c • K is then steerable as well. Thus, we now look at a classification of the endomorphisms of the irreducible orthogonal representations: Example A.3. Let G = SO(2) with the irreducible representations ρ J as in Example A.2. Clearly, the endomorphism space End SO(2),R (V 0 ) is 1-dimensional, i.e., E 0 = 1. For all J ≥ 1, the endomorphism space is two-dimensional (E J = 2) and given by combinations of scalings and rotationsfoot_8 on V J = R 2 . A basis of this space is given by the following two matrices: c 1 = id R 2 = 1 0 0 1 , c 2 = 0 -1 1 0 , span R {c 1 , c 2 } = End SO(2),K (V J ). That c 1 is an endomorphism of ρ J for J ≥ 1 is immediately clear. That the same holds for c 2 is checked by the following simple calculation: c 2 ρ J (φ) = 0 -1 1 0 cos(Jφ) -sin(Jφ) sin(Jφ) cos(Jφ) = -sin(Jφ) -cos(Jφ) cos(Jφ) -sin(Jφ) = cos(Jφ) -sin(Jφ) sin(Jφ) cos(Jφ) 0 -1 1 0 = ρ J (φ) c 2 The proof that there are no other endomorphisms is sketched in Proposition E.5. Peter-Weyl and Harmonic Basis Functions Another ingredient that we need to construct SO(2)steerable kernels on S 1 is the decomposition of L 2 R (S 1 ) into its irreducible subrepresentations V ji , which the Peter-Weyl theorems guarantees to exist. Less abstractly, we are interested in an orthonormal set of harmonic (steerable) basis functions on S 1 that span L 2 R (S 1 ) -which corresponds to the usual Fourier series on S 1 . Example A.4. As in Example A.1, we assume G = SO(2) and X = S 1 . A standard result in harmonic analysis says that square-integrable functions f : S 1 → R, i.e., f ∈ L 2 R (S 1 ), can be uniquely written as an infinite sum of sine and cosine terms, f (x) = a 0 + ∞ j=1 a j cos(jx) + b j sin(jx) , where a 0 and a j , b j , j ≥ 1 are real-valued expansion coefficients. How does this result relate to the harmonic basis functions in the Peter-Weyl theorem 3.4? As stated above, we have isomorphism classes G = SO(2) ∼ = N ≥0 of irreps with representatives ρ j . A comparison of the Fourier series in Eq. ( 8) with property 2 in the Peter-Weyl theorem 3.4 suggests the following identification of harmonic basis functions Y m j and coefficients λ jm , Y 1 0 = cos 0 = 1 , λ 01 = a 0 for j = 0 Y 1 j = cos j , λ j1 = a j for j ≥ 1 Y 2 j = sin j , λ j2 = b j for j ≥ 1 , where we introduced the shorthand notations cos j (x) := cos(jx) and sin j (x) := sin(jx). Note that we dropped the index i = 1, . . . , m j since m j = 1 for any j ∈ SO(2). As expected, we have indices m = 1 for j = 0 with d j = dim(V 0 ) = 1 and indices m = 1, 2 for j ≥ 1 with d j = dim(V j ) = 2. The orthogonality relations in property 3 of the Peter-Weyl theorem hold up to a simple normalization of these basis functions and are easily checked by explicitly computing the scalar products. Property 1, i.e., the SO(2)-steerability of the harmonic bases, is trivial for j = 0. For j ≥ 1, the standard angle summation formulas for cosines and sines lead to the following expressions for harmonics that are translated by φ ∈ SO(2): cos j (x -φ) = cos j (x) cos j (-φ) -sin j (x) sin j (-φ) = ρ 11 j (φ) cos j +ρ 21 j (φ) sin j (x) sin j (x -φ) = cos j (x) sin j (-φ) + sin j (x) cos j (-φ) = ρ 12 j (φ) cos j +ρ 22 j (φ) sin j (x) , which is just property 1 in the Peter-Weyl theorem. This is concisely summarized by cos j sin j (φ -1 • x) = cos j sin j (x -φ) = ρ j (φ) cos j sin j (x) , which shows that the basis functions Y 1 j = cos j and Y 2 j = sin j span an invariant subspace V j of L 2 R (S 1 ) under rotations. From a more abstract viewpoint, the Peter-Weyl theorem just states that L 2 R (S 1 ) splits into the orthogonal direct sum j∈ SO(2) V j . Tensor Products and Clebsch-Gordan Coefficients Finally, we need to investigate the tensor products of irreducible representations and their decomposition via Clebsch-Gordan coefficients. They will be used to correctly assemble harmonic basis functions to steerable kernels. Example A.5. Remember the irreducible representations ρ l : SO(2) → O(V l ) given in Example A.2. As we prove in Proposition E.4, including a description of the Clebsch-Gordan coefficients, the tensor products decompose as follows: V 0 ⊗ V 0 ∼ = V 0 , V j ⊗ V 0 ∼ = V j , V 0 ⊗ V l ∼ = V l , V j ⊗ V l ∼ = V |j-l| ⊕ V j+l , where the last isomorphism only holds if j, l ≥ 1 and j = l. If j, l ≥ 1 and j = l, then we obtain V l ⊗ V l ∼ = (V 0 ) 2 ⊕ V 2l , i.e., V 0 here appears twice in the decomposition of a tensor product of irreducible representations. We therefore have multiplicities [J(jl)] which are 1 for [0(00)], [j(j0)], [l(0l)], [|j -l|(jl) ], [j + l(jl)] and [2l(ll)] while [0(ll)] = 2. Any other multiplicity is zero. Wigner-Eckart theorem for SO(2)-steerable kernels With these ingredients one can then determine all SO(2)-steerable kernels. This is explained in Proposition E.6.

B REPRESENTATION THEORY OF COMPACT GROUPS

In this chapter, we outline the main ingredients of the representation theory of compact groups that we need for our applications to steerable CNNs. Usually, this theory is only developed for representations over the complex numbers. However, since we want to apply it also to steerable CNNs using real representations, we need to be a bit more careful. In particular, we need to make sure that the Peter-Weyl theorem is correctly stated and proven. The outline is as follows: In Section B.1, we start by stating all the important definitions and concepts from group theory and representation theory of (unitary) representations that are needed for formulating the Peter-Weyl theorem. After defining Haar measures both for compact groups and their homogeneous spaces and shortly discussing their square-integrable functions, we formulate the Peter-Weyl Theorem B.22. In Section B.2, then, we give a proof of this version of the Peter-Weyl theorem, carefully making sure to not use properties that are only true over C. In some essential steps, mainly the density of the matrix coefficients in the regular representation, we refer to the literature, since the proof clearly does not make use of C per se. While we initially only give the proof for the regular representation, i.e., the space of square-integrable functions on the group itself, we end this section with a discussion of general unitary representations and, in particular, the space of square-integrable functions for an arbitrary homogeneous space. In the whole chapter, let K be the field of real or complex numbers.

B.1.1 PRELIMINARIES OF TOPOLOGICAL GROUPS AND THEIR ACTIONS

In this section, we define preliminary concepts from topological groups and their actions. This material can, for example, be found in detail in Arkhangel 'skii & Tkachenko (2008) . For the topological concepts that we use, we refer to Chapter F.1. Definition B.1 (Group, Abelian Group). A group G = (G, •, (•) -1 , e) , most often simply written G, consists of the following data: 1. A set G of group elements g ∈ G.

2.. A multiplication

• : G × G → G, (g, h) → g • h. 3. An inversion (•) -1 : G → G, g → g -1 . 4. A distinguished unit element e ∈ G. It is also called neutral element. They are assumed to have the following properties for all g, h, k ∈ G: 1. The multiplication is associative: g • (h • k) = (g • h) • k. 2. The unit element is neutral with respect to multiplication: e • g = g = g • e. 3. The inversion of an element multiplied with itself is the neutral element: g •g -1 = g -1 •g = e. A group is called abelian if, additionally, the multiplication is commutative: g • h = h • g for all g, h ∈ G. If this is the case, a group is often written as G = (G, +, -(•), 0). If we consider several groups at once, say G and H, then we often do not distinguish their multiplication, inversion, and neutral elements in notation. It will be clear from the context which group the operation belongs to. Definition B.2 (Subgroup). Let G be a group and H ⊆ G a subset. H is called a subgroup if: 1. For all h, h ∈ H we have h • h ∈ H. 2. For all h ∈ H we have h -1 ∈ H. 3. The neutral element e ∈ G is in H. Consequently, H is also a group with the restrictions of the multiplication and inversion of G to H. Definition B.3 (Group Homomorphism). Let G and H be groups. A function f : G → H is called a group homomorphism if it respects the multiplication, inversion, and neutral element, i.e., for all g, h ∈ G: 1. f (g • h) = f (g) • f (h). 2. f (g -1 ) = f (g) -1 . 3. f (e) = e. The second and third properties automatically follow from the first and so do not need to be verified in order to prove that a certain function is a group homomorphism. Definition B.4 (Topological Group, Compact Group). Let G be a group and T be a topology of the underlying set of G. Then G = (G, T ) is called a topological group (Arkhangel'skii & Tkachenko, 2008) if both multiplication G × G → G, (x, y) → x • y and inversion G → G, x → x -1 are continuous maps. Additionally, we always assume the topology to be Hausdorff. A topological group is called compact if the underlying topological space is compact. From now on, all groups considered are compact topological groups. Furthermore, whenever G is a finite group, we assume that it is a topological group with the discrete topology, i.e., the topology with respect to which all subsets of G are open. We will need the following definition in order to define homogeneous spaces: Definition B.5 (Group Action). Let G be a compact group and X a topological space. Then a group action of G on X is a continuous function • : G × X → X with the following properties: 1. (g • h) • x = g • (h • x) for all g, h, ∈ G and x ∈ X. 2. e • x = x for all x ∈ X. We will often simply write gx instead of g • x. Also, note that the multiplication within G is denoted by the same symbol as the group action on the space X. Definition B.6 (Orbit). Let • : G × X → X be a group action. Let x ∈ X. Then it's orbit, denoted G • x, is given by the set G • x := {g • x | g ∈ G} ⊆ X. Definition B.7 (Transitive Action, Homogeneous Space). Let • : G × X → X be a group action. This action is called transitive if for all x, y ∈ X there exists g ∈ G such that gx = y. Equivalently, each orbit is equal to X, that is: For all x ∈ X we have G • x = X. X is called a homogeneous space (with respect to the action) if the action is transitive, X is Hausdorff and X = ∅. The Hausdorff condition and non-emptiness in the definition of homogeneous spaces is needed for Lemma B.21, which is necessary to even define a normalized Haar measure on a homogeneous space. Some texts in the literature may define homogeneous spaces without these conditions. Definition B.8 (Stabilizer Subgroup). Let • : G × X → X be a group action. Let x ∈ X. The stabilizer subgroup G x is the subgroup of G given by G x := {g ∈ G | gx = x} ⊆ G. Example B.9. The multiplication of the group G is a group action of G on itself. G is a homogeneous space with this action. Furthermore, for each g ∈ G the stabilizers G g are the trivial subgroup e. In general, homogeneous spaces with the property that all stabilizers are trivial are called torsors or principal homogeneous spaces. Principal homogeneous spaces are topologically indistinguishable from the group itself.

B.1.2 LINEAR AND UNITARY REPRESENTATIONS

In this section, we define many of the foundational concepts about linear and unitary representations (Knapp, 2002; Kowalski, 2014) . Whenever we will consider linear or unitary representations of compact groups, we want those representations to be continuous. This requires that the vector spaces on which our groups act carry themselves a topology. Prototypical examples of such vector spaces are (pre-)Hilbert spaces. They are the main examples of vector spaces considered in this work. Foundational concepts about (pre-)Hilbert spaces can be found in Chapter F.3. The most important difference between how we view pre-Hilbert spaces and how it can often be found in the literature is that in this work, scalar products are antilinear in the first component and linear in the second. This is the convention usually chosen in physics. For a vector space V over K let GL(V ) be the group of invertible linear functions from V to V . Sometimes in the literature, this is also written GL(V, K). The multiplication is given by function composition and the neutral element by the identity function id V on V . Definition B.10 (Linear Representation). Let G be a compact group and V be a K-vector space carrying a topology, for example, a (pre)-Hilbert space. Then a linear representation of G on V is a group homomorphism ρ : G → GL(V ) which is continuous in the following sense: for all v ∈ V , the function ρ v : G → V, g → ρ v (g) := ρ(g)(v) is continuous. From the definition we obtain ρ(e) = id V , ρ(g • h) = ρ(g) • ρ(h) and ρ(g -1 ) = ρ(g) -1 for all g, h ∈ G. For simplicity, we also just say representation or G-representation instead of linear representation. Instead of denoting the representation by ρ, we often denote it by V if the function ρ is clear from the context. Note that in this definition, V can be any abstract topological K-vector space with a topology and does not need to be a space K n or something similar. Consequently, we usually do not view the functions ρ(g) as matrices, but as abstract linear automorphisms from V to V . Definition B.11 (Intertwiner). Let ρ : G → GL(V ) and ρ : G → GL(V ) be two representations over the same group G. An intertwiner between them is a linear function f : V → V that is additionally equivariant with respect to ρ and ρ and continuous. Equivariance means that for all g ∈ G one has f • ρ(g) = ρ (g) • f , which means the following diagram commutes: V V V V f ρ(g) ρ (g) f Definition B.12 (Equivalent Representations). Let ρ : G → GL(V ) and ρ : G → GL(V ) be two representations. They are called equivalent if there is an intertwiner f : V → V that has an inverse. That is, there exists an intertwiner f : V → V such that f • f = id V and f • f = id V . In categorical terms, equivalent representations are isomorphic in the category of linear representations. The reason we do not call them isomorphic is that there is a stronger notion of isomorphism between representations which we will later use, namely isomorphisms of unitary representations. Definition B.13 (Invariant Subspace, Subrepresentation, Closed Subrepresentation). Let ρ : G → GL(V ) be a representation. An invariant subspace W ⊆ V is a linear subspace of V such that ρ(g)(w) ∈ W for all g ∈ G and w ∈ W . Consequently, the restriction ρ| W : G → GL(W ), g → ρ(g)| W : W → W is a representation as well, called subrepresentation of ρ. A subrepresentation is called closed if W is closed in the topology of V . Definition B.14 (Irreducible Representation). A representation ρ : G → GL(V ) is called irre- ducible if V = 0 and if the only closed subrepresentations of V are 0 and V itself. An irreducible representation is also shortly called irrep. Definition B.15 (Unitary Group). Let V be a pre-Hilbert space. The unitary group U(V ) of V is defined as the group of all linear invertible maps f : V → V that respect the inner product, i.e., f (x)|f (y) = x|y for all x, y ∈ V . It is a group with respect to the usual composition and inversion of invertible linear maps. Note that if the field K is the real numbers, then what we call "unitary" is actually called orthogonal, and the group would be denoted O(V ). However, the mathematical properties are essentially the same, and since the term "unitary" is more widely used (as normally, representations over the complex numbers are considered) we stick with "unitary". More generally, we have the following: Definition B.16 (Unitary Transformation). Let V, V be two pre-Hilbert spaces. A unitary transformation f : V → V is a bijective linear function such that f (x)|f (y) = x|y for all x, y ∈ V . These can be regarded as isomorphisms between pre-Hilbert spaces. Note that unitary transformations are in particular isometries, i.e., they keep the distances of vectors with respect to the metric defined by the scalar product. For the definition of this metric, see the discussion before and after Definition F.14. Definition B.17 (Unitary Representation). Let V be a pre-Hilbert space and G a group. Then a representation ρ : G → GL(V ) is called a unitary representation if ρ(g) ∈ U(V ) for all g ∈ G. We then write ρ : G → U(V ). In this whole chapter, the space V of a unitary representation is supposed to be a Hilbert space, instead of just a pre-Hilbert space. Only in chapter D will we consider unitary representations on pre-Hilbert spaces. Note that all finite-dimensional pre-Hilbert spaces are already complete by Proposition F.47, so in these cases, there is no difference. The same proposition also shows that for finite-dimensional unitary representations, we can ignore the topological closedness condition in order to check whether it is irreducible. It will later turn out that all irreducible representations of a compact group are automatically finite-dimensional anyway, see Proposition B.31, so this further simplifies our considerations. As before with the unitary group, a unitary representation is actually called "orthogonal representation" when the field is the real numbers R. U(V ) is then replaced by O(V ). We again stick with U(V ) whenever the field K is not specified. Definition B.18 (Isomorphism of Unitary Representations). Let ρ : G → U(V ), ρ : G → U(V ) be unitary representations and f : V → V an intertwiner. f is called an isomorphism (of unitary representations) if, additionally, f is a unitary transformation. The representations are then called isomorphic. For this, we write ρ ∼ = ρ or V ∼ = V depending on whether we want to emphasize the representations or the underlying vector spaces. We note the following, which we will frequently use: due to the unitarity of ρ(g) for a unitary representation ρ, we have ρ(g) * = ρ(g) -1 , i.e., the adjoint is the inverse. Adjoints are defined in Definition F.42 and this statement is proven more generally in Proposition F.44. Overall, this means that ρ(g)(v)|w = v ρ(g) -1 (w) for all v, w and g. In the end, it will turn out that the Peter-Weyl theorem which we aim at is exclusively a statement about unitary representations. One may then wonder whether this is too restrictive. After all, the representations that we consider for steerable CNNs (with precise definitions given in Section C.1) are not necessarily unitary, and so it is not immediately obvious how the Peter-Weyl theorem will be able to help for those. However, as it turns out, all linear representations on finite-dimensional spaces can be considered as unitary, and so the theory applies. We will discuss this in Proposition B.20 once we understand Haar measures on compact groups.

B.1.3 THE HAAR MEASURE, THE REGULAR REPRESENTATION AND THE PETER-WEYL THEOREM

Now that we have introduced many notions in the representation theory of compact groups, we can formulate the most important result, the Peter-Weyl theorem that we will use throughout this work. In the next section, we will then go through a step-by-step proof of this theorem. The material in this section is based on Nachbin & Bechtolsheim (1965) ; Kowalski (2014) and Knapp (2002) . We thank Stefan Dawydiak for a discussion about the Peter-Weyl theorem over the real numbers (Dawydiak, 2020) . We assume that the reader knows what a measure is (Tao, 2013) . Let G be a compact group. A standard result is that there exists a measure µ on G, called a Haar measure that, among other properties, fulfills the following: 1. µ(S) can be evaluated for all Borel sets S ⊆ G. Here, the Borel sets form the smallest so-called σ-algebra that contains all the open sets. 2. In particular, we can evaluate µ(S) for all open or closed sets S ⊆ G. 3. The Haar measure is normalized: µ(G) = 1. 4. µ is left and right invariant: µ(gS) = µ(S) = µ(Sg) for all g ∈ G and S measurable. 5. µ is inversion invariant: µ(S -1 ) = µ(S) for all S measurable. These properties then translate into properties of the associated Haar integral: let f : G → K be integrable with respect to µ, then we obtain: 1. G 1dg = 1 for the constant function with value 1. 2. G f (hg)dg = G f (g)dg = G f (gh)dg for all h ∈ G. 3. G f (g -1 )dg = G f (g)dg. Example B.19 (Finite Groups). If G is a finite group with n elements, then the Haar measure is just the normalized counting measure which assigns µ(g) = 1 n for all g ∈ G. Each function f : G → K is then integrable, and its integral is just given by G f (g)dg = 1 n g∈G f (g). In this special case, one can easily verify all properties of Haar measures and Haar integrals stated above. With this measure defined, we can already understand why all linear representations on finitedimensional spaces can be considered as unitary: Proposition B.20. Let ρ : G → GL(V ) be a linear representation on a finite-dimensional space V . Then there exists a scalar product •|• ρ : V × V → K that makes (V, •|• ) a Hilbert space and such that ρ becomes a unitary representation with respect to this scalar product. Proof. Since V is finite-dimensional, there is an isomorphism of vector spaces to some K n . Consequently, there is some scalar product •|• : V × V → K that makes V a Hilbert space. However, this scalar product does not necessarily make ρ a unitary representation. However, we can define •|• ρ : V × V → K by v|w ρ := G ρ(g)(v)|ρ(g)(w) dg. That this integral exists is due to the continuity of linear representations and since also the scalar product is continuous by Proposition F.38. It can easily be checked that this construction makes V a Hilbert space. And due to the right invariance of the Haar measure, we can check that ρ is a unitary representation with respect to this scalar product. Namely, for arbitrary g ∈ G we have: ρ(g )(v) ρ(g )(w) ρ = G ρ(g)ρ(g )v ρ(g)ρ(g )w dg = G ρ(gg )(v) ρ(gg )(w) dg = G ρ(g)(v) ρ(g)(w) dg = v|w ρ . Now, for a measure space Y with corresponding measure µ, we can consider the space of squareintegrable functions on Y with values in K, denoted L 2 K (Y ) (the measure is omitted in the notation since there is usually no ambiguity). In these spaces, functions are identified if they coincide on a set with measure 0. L 2 K (Y ) is clearly a vector space over K, but it turns out that it can even be considered to be a Hilbert space as follows: f |g := Y f (y)g(y)dy. Here, the overline means complex conjugation. The Hilbert space properties are easily verified. In particular, one can consider the space L 2 K (G) of square-integrable functions on the group G itself. Now the claim is that L 2 K (G) can actually be equipped with a prototypical structure as a unitary representation over G which makes this space, in some sense, "universal among unitary representations". This works with the following canonical representation, called the regular representation: λ : G → U(L 2 K (G)), [λ(g)(f )] (g ) := f (g -1 g ). continuity of this map is non-trivial and is, for example, shown in Knapp (2002) . However, the more algebraic properties of being a unitary representation are easy to appreciate. First of all, we clearly see that λ is a group homomorphism mapping each group element to a linear automorphism. And finally, the unitarity of this representation can be understood as a direct consequence of the properties of the Haar measure, where we notably make only use of the left-invariance: λ(g)(f )|λ(g)(h) = G [λ(g)(f )] (g ) • [λ(g)(h)] (g )dg = G f (g -1 g ) • h(g -1 g )dg = G f (g )h(g )dg = f |h . We saw in Example B.9 that G is a homogeneous space with respect to the action on itself. We can now ask whether these constructions can also work if X is an arbitrary homogeneous space of G. This requires us to define a suitable measure on X. This is indeed possible. For a fixed element x * ∈ X, denote the stabilizer subgroup by H = G x * ⊆ G. Then the Hausdorff property of X allows to write down a homeomorphism between X and G/H, which in turn will allow us to use a canonical measure on G/H that we study below. We denote cosets gH ∈ G/H by [g]. Lemma B.21. Let X be a homogeneous space of the compact group G and H the stabilizer subgroup of a fixed element x * ∈ X. Then the map ϕ : G/H → X, [g] → gx * is a homeomorphism. Furthermore, H is topologically closed. Proof. Let φ : G → X, g → gx * . This map is equal to the composition of the maps G → G × X, g → (g, x * ) and G × X → X, (g, x) → gx. Both these are continuous, and thus φ is continuous as well. Furthermore, note that if g -1 g ∈ H, then there is h ∈ H such that g = gh, and thus φ(g ) = φ(gh) = (gh)x * = g(hx * ) = gx * = φ(g) which means that by Proposition F.12, the map ϕ : G/H → X, [g] → gx * is a well-defined continuous map. It is surjective since the action is transitive by definition of a homogeneous space. Furthermore, it is injective since if gx * = g x * then x * = (g -1 g )x * and thus g -1 g ∈ H, which means [g] = [g ]. Overall, ϕ is a continuous bijective map from G/H to X. Furthermore, G/H is compact since it is the continuous image of the compact group G under the projection G → G/H, see Proposition F.8. Since X is Hausdorff by definition of homogeneous spaces, ϕ is a homeomorphism according to Proposition F.9. Now, since X is Hausdorff and ϕ is a homeomorphism, it follows that G/H is Hausdorff as well. Then, necessarily, H is a topologically closed subgroup of G, see Bourbaki (1998) , Chapter III, Section 2.5, Proposition 13. Every space G/H where H is topologically closed allows a measure µ with similar properties to those of G (Nachbin & Bechtolsheim, 1965) . Since the stabilizer H is closed and X ∼ = G/H by Lemma B.21, we can do these constructions for X as well, as we outline now. The only properties that we now miss are the right-invariance and inversion-invariance: We simply can't ask for them since G does not naturally act on X from the right and since we cannot invert elements in X. But left-invariance does hold and this means that λ : G → L 2 K (X), [λ(g)(f )] (x) := f (g -1 x) makes L 2 K (X) a unitary representation over G, as can be shown in the exact same way as for L 2 K (G). Let G be the set of isomorphism classes of irreducible unitary representations over G. Furthermore, let ρ l : G → V l be a fixed representative of such an isomorphism class l ∈ G. We write isomorphism classes as "l" (and later also j and J) in order to bring to mind quantum numbers used in quantum mechanics. Recall from linear algebra that a countable sum of subspaces of a vector space is called direct if no nontrivial subspace of any of the considered spaces is contained in the sum of all the other considered spaces.foot_9 Furthermore, recall that two subspaces U, W ⊆ V of a Hilbert space V are called perpendicular or orthogonal if u|w = 0 for all u ∈ U and w ∈ W . We then write U ⊥ W . We can now formulate the Peter-Weyl theorem. Intuitively, it says that L 2 K (X) splits into an orthogonal direct sum of the irreducible unitary representations, where each irreducible unitary representation appears maximally as often as its own dimension (and may not appear at all): Theorem B.22 (Peter-Weyl Theorem). Let G be a compact group. Let X be a homogeneous space. There are numbers m l ∈ N ≥0 for all l ∈ G and closed-invariant subspaces V li ⊆ L 2 K (X) for all l ∈ G and i ∈ {1, . . . , m l } such that the following hold: 1. V li ∼ = V l as unitary representations for all i and l. 2. m l ≤ dim(V l ) < ∞ for all l. 3. V li ⊥ V l j whenever l = l or i = j. 4. l∈ G m l i=1 V li is topologically dense in L 2 K (X), written L 2 K (X) = l∈ G m l i=1 V li . Now additionally consider G as a homogeneous space of itself. Then the same holds for L 2 K (G) as well, with numbers n l ≤ dim(V l ) < ∞. We additionally have the following: 1. m l ≤ n l . 2. If K = C, then n l = dim(V l ). Note that the representative V l is not assumed to be embedded in L 2 K (X). It is just isomorphic, as a unitary representation, to each of the V li ⊆ L 2 K (X). Example B.23. For G = SO(2) and K = C we have L 2 C (SO(2)) = l∈Z V l1 and all irreducible representations V l are 1-dimensional. Published as a conference paper at ICLR 2021 For G = SO(2) and K = R, we obtain L 2 R (SO(2)) = l≥0 V l1 , and all irreducible representations V l with l ≥ 1 are two-dimensional, whereas V 0 is one-dimensional. Thus, here we see an example where the multiplicity of most irreducible representations in the regular representation is 1 and therefore smaller than their dimension, which cannot happen for representations over the complex numbers. Both of these results are standard results in harmonic analysis. These examples are discussed in more detail, especially with respect to their applications in deep learning, in Section E.1 and E.2.

B.2 A PROOF OF THE PETER-WEYL THEOREM

This section presents a proof of the Peter-Weyl theorem, as formulated in Theorem B.22. We mostly skip the analytical parts of the proof,foot_10 since they are well-presented in the literature and clearly work over both the real and complex numbers. However, the more algebraic parts of the proof usually make use of the property of the complex numbers to be algebraically closed, which does not hold for the real numbers. This is invoked usually both in the proof of a version of Schur's lemma, as well as in proving Schur's orthogonality. We therefore carefully adapt the proof of the Peter-Weyl theorem in the literature so that it also works over the real numbers, and formulate and prove versions of Schur's Lemma B.29 and Schur's orthogonality B.30 that work in general. This section can be skipped if the interest is mainly in the applications of the Peter-Weyl theorem. In this case, the reader is advised to directly move on to Chapter C. We note the following convention that applies to this section: for all unitary representations ρ : G → U(V ) that we consider here, V is a Hilbert space (instead of just a pre-Hilbert space).

B.2.1 DENSITY OF MATRIX COEFFICIENTS

An important ingredient in the construction of the spaces V li that appear in the formulation of the Peter-Weyl Theorem B.22 are matrix coefficients, which together generate those spaces in case that one considers the regular representation on L 2 K (G). Definition B.24 (Matrix Coefficients). Let ρ : G → U(V ) be a unitary representation. A matrix coefficient is any function of the form ρ uv : G → K, g → u|ρ(g)(v) for arbitrary u, v ∈ V . The term "matrix coefficient" comes from the analogy to matrix elements of linear maps between pre-Hilbert spaces of which orthonormal bases are fixed. Later, in Definition D.9 we will also define the notion of "matrix elements" separately. The term "matrix coefficient" only applies to unitary representations. Remark B.25. By definition of linear representations, the function g → ρ(g)(v) is continuous. Thus, since scalar products of Hilbert spaces are also continuous as functions on V × V , see Proposition F.38, every matrix coefficient ρ uv : G → K is continuous. As a continuous function on a compact space, it is of course also square-integrable, i.e., ρ uv ∈ L 2 K (G). The Peter-Weyl theorem basically asserts that these matrix coefficients can be considered as the building blocks of all square-integrable functions. Furthermore, one may wonder why there is a complex conjugation in the definition. The reason for this is that, otherwise, the isomorphism that we will construct in Proposition B.35 is not linear but conjugate linear. The reason why this can nevertheless be called a matrix coefficient is that this actually is the matrix coefficient (without complex conjugation) on a conjugate Hilbert space, as explained in the next Proposition, which we took from Williams (1991) . Proposition B.26. Let ρ : G → U(V ) be a unitary representation on a Hilbert space V with scalar multiplication • V and scalar product •|• V . We have the following: 1. Ṽ := V (equality as abelian groups) with α • Ṽ v := α • V v and u|v Ṽ := u|v is again a Hilbert space, the so-called conjugate Hilbert space of V . 2. ρ : G → U( Ṽ ) with ρ(g) := ρ(g) is again a unitary representation. 3. For the matrix coefficients, we have ρuv (g) = ρ uv (g). Proof. All these assertions are easy to check. As a demonstration, we do 3: ρuv (g) = u|ρ(g)(v) Ṽ = u|ρ(g)(v) V = ρ uv (g). That's what we wanted to show. As a consequence of this proposition, the matrix coefficient ρ uv (g) is equal to ρuv (g), thus being a "matrix coefficient without complex conjugation above the scalar product" of the conjugate unitary representation. Theorem B.27 (Density of Matrix Coefficients). The linear span of the matrix-coefficients of finitedimensional, unitary, irreducible representations of G are dense in L 2 K (G) for all compact groups G. Proof. For K = C, this is shown in Knapp (2002) . The same proof, without adaptions, also works for K = R. Note that the cited proof uses a definition of matrix coefficients without the complex conjugation. However, Proposition B.26 shows those span the same space, and thus we can apply it to our situation.

B.2.2 SCHUR'S LEMMA, SCHUR'S ORTHOGONALITY AND CONSEQUENCES

In this section, we state and prove versions of Schur's lemma and Schur's Orthogonality (Knapp, 2002) that are valid for both K = R and K = C. Lemma B.28. Let ρ : G → U(V ) and ρ : G → U(V ) be unitary representations. Furthermore, let f : V → V be an intertwiner. Then the adjoint f * : V → V is also an intertwiner. Proof. The adjoint f * : V → V is the unique continuous linear function from V to V such that, for all v ∈ V and v ∈ V , we have f (v)|v = v|f * (v ) . This always exists according to Definition F.42. Note that with f being an intertwiner and using the unitarity of the representations, we obtain for all g ∈ G, v ∈ V and v ∈ V : v|ρ(g)f * (v ) = ρ(g -1 )(v) f * (v ) = f ρ(g -1 )(v) v = ρ (g -1 )f (v) v = f (v)|ρ (g)(v ) = v|f * ρ (g)(v ) from which we deduce ρ(g)f * = f * ρ (g) from Proposition F.45 for all g ∈ G, i.e., f * is an intertwiner. Lemma B.29 (Schur's Lemma for unitary Representations). Assume ρ : G → U(V ) and ρ : G → U(V ) are irreducible unitary representations with V finite-dimensional. Also assume that f : V → V is an intertwiner. Then either f = 0 or there is µ ∈ R >0 such that µf is an isomorphism. Proof. For this proof, we follow the exposition of Tao (2011) . We thank Terrence Tao for confirming in the discussion below his blogpost that this lemma can also be proven over the real numbers. Let f * : V → V be the adjoint of f , which is also an intertwiner by Lemma B.28. Now, set ϕ := f * • f : V → V . As a composition of intertwiners, ϕ is also an intertwiner. Furthermore, for arbitrary composable continuous linear functions between Hilbert spaces one always has (g • h) * = h * • g * and (g * ) * = g, which easily follows from the definition and uniqueness of adjoints. Consequently, we have ϕ * = (f * • f ) * = f * • (f * ) * = f * • f = ϕ, and so ϕ is self-adjoint. Thus, ϕ(v)|w = v|ϕ(w) for all v, w ∈ V , from which we conclude that the matrix of ϕ corresponding to any orthonormal basis of V is Hermitian or, if K = R, even symmetric. Such an orthonormal basis exists by Proposition F.41. From the Spectral Theorem for Hermitian or symmetric matrices (Horn & Johnson, 2012) we conclude that ϕ is unitarily (or for real matrices: orthogonally) diagonalizable with only real eigenvalues. Thus, there is an orthogonal decomposition of V into eigenspaces: V = λ eigenvalue E λ (ϕ). Let E λ (ϕ) be any eigenspace. We now claim that it is an invariant subspace of ρ. Indeed, for all g ∈ G and v ∈ E λ (ϕ) we have since ϕ is an intertwiner: ϕ(ρ(g)(v)) = ρ(g)(ϕ(v)) = ρ(g)(λv) = λρ(g)(v). Since V is finite-dimensional, E λ (ϕ) is topologically closed by Proposition F.47, and since V is irreducible, we necessarily have E λ (ϕ) = 0 or E λ (ϕ) = V . Since not all eigenspaces can be zero, we conclude that there is an eigenvalue λ with E λ (ϕ) = V , meaning ϕ = λ id V . Assume f = 0. We now claim that λ > 0. Indeed, note that for all v ∈ V we have λ v 2 = ϕ(v)|v = f * • f (v)|v = f (v)|f (v) = f (v) 2 . Thus, if v ∈ V is any vector with f (v) = 0, then we obtain λ = f (v) v 2 > 0. Now define g : V → V as g = λ -1 2 f . g is clearly still an intertwiner. We can also show it is an isometry: g(v)|g(w) = λ -1 f (v)|f (w) = λ -1 ϕ(v)|w = λ -1 λ v|w = v|w . Note that since V is irreducible and f (V ) ⊆ V topologically closed due to V being finitedimensional, we necessarily have that f is surjective. Thus, we have shown that µf with µ := λ -1 2 is an isomorphism of unitary representations. Proposition B.30 (Schur's Orthogonality). Let ρ : G → U(V ) and ρ : G → U(V ) be nonisomorphic irreducible unitary representations of the compact group G, of which at least one is finite-dimensional. Let ρ uv and ρ u v be matrix coefficients of them, which are functions in L 2 K (G) due to their continuity. Then they are orthogonal, i.e., ρ uv ρ u v = 0. Proof. Without loss of generality, we can assume V to be finite-dimensional. Assume that l : V → V is any linear function. We can associate to it the function f : V → V given by f (w ) := G ρ(g)lρ (g) -1 w dg. For all h ∈ G we have ρ(h)f ρ (h) -1 = G ρ(h)ρ(g)lρ (g) -1 ρ (h) -1 dg = G ρ(hg)lρ (hg) -1 dg = G ρ(g)lρ (g) -1 dg = f, and thus ρ(h)f = f ρ (h), which means that f is an intertwiner. In this derivation, ρ(h) could be put insight the integral since ρ(h) is continuous and an integral is a limit over finite sums, which commutes with the continuous ρ(h). By Schur's Lemma B.29, we necessarily have f = 0. Now look at the specific linear function l : V → V given by l(w ) := v |w v with the fixed vectors v, v corresponding to the matrix coefficients. We obtain f = 0, for f defined as before, and thus: 0 = u|f (u ) = u G ρ(g)lρ (g) -1 (u )dg = G u ρ(g)lρ (g) -1 (u ) dg = G u ρ(g) v ρ (g) -1 (u ) v dg = G u|ρ(g)(v) • v ρ (g) -1 (u ) dg = G u|ρ(g)(v) • u |ρ (g)(v ) dg = G ρ uv (g)ρ u v (g)dg = ρ uv ρ u v In this derivation, the integral could be put out of the scalar product since the scalar product is continuous, see Proposition F.38, and since integrals are certain limits over finite sums, with which the scalar product commutes. Note that there are more general Schur's orthogonality relations in the case that K = C, see Knapp (2002), Corollary 4.10. These then engage with the matrix coefficients of one and the same representation. This, together with a version of Schur's lemma that only holds over C leads to the strengthening of the Peter-Weyl theorem that shows that the multiplicities n l are given by dim(V l ). Proposition B.31. All irreducible unitary representations of a compact group G are finitedimensional. Proof. Assume ρ : G → U(V ) was an irreducible unitary representation on an infinite-dimensional space V . Let ρ uv be any of its matrix coefficients. By Proposition B.30, and since an infinitedimensional representation can never be isomorphic to a finite-dimensional representation, ρ uv is perpendicular to all matrix coefficients of finite-dimensional irreducible unitary representations. Due to the linearity of the scalar product, ρ uv is perpendicular to the whole linear span of these matrix coefficients and thus to the topological closure of this span. The last step follows from the continuity of the scalar product, see Proposition F.38. By Theorem B.27 this closure is the whole space L 2 K (G). Therefore, ρ uv is even perpendicular to itself, and thus ρ uv = 0. Overall, for arbitrary u, v ∈ V and g ∈ G we obtain 0 = ρ uv (g) = u|ρ(g)(v) and thus (by setting u = ρ(g)(v)) ρ(g)(v) = 0 and consequently ρ(g) = 0. We obtain ρ = 0, a contradiction. Thus infinite-dimensional irreducible unitary representations cannot exist. As a consequence, we mention that the finiteness conditions in Schur's lemma and Schur's Orthogonality were not necessary to state since all irreducible unitary representations are finite-dimensional anyway. We obtain from this and from Schur's Lemma B.29 that isomorphism classes and equivalence classes of irreducible unitary representations are one and the same.

B.2.3 A PROOF OF THE PETER-WEYL THEOREM FOR THE REGULAR REPRESENTATION

In this section, we engage with the Peter-Weyl theorem for the regular representation on L 2 K (G). The case of L 2 K (X) for a homogeneous space X will be dealt with in Section B.2.4. The core arguments in the proofs of this section are adapted from Williams (1991) . As before, let G be the set of isomorphism classes of irreducible representations of G. For l ∈ G let ρ l be a representative for the isomorphism class l. Furthermore, for each ρ l : G → U(V l ), let v 1 l , . . . , v dim(V l ) l be an arbitrary orthonormal basis, which exists due to Proposition F.41 (mostly written without the superscript, i.e., as v 1 , v 2 , . . . , if the corresponding isomorphism class is clear). Denote ρ ij l := ρ v i v j l . Remember that matrix coefficients of unitary representations are continuous by Remark B.25, and thus functions in L 2 K (G). Then, let E ⊆ L 2 K (G) be the linear span of the matrix coefficients of all irreducible unitary representations. In the next Lemma, we want to show that E is already spanned by the matrix coefficients corresponding to representatives of isomorphism classes and their orthonormal bases: Lemma B.32. We have E = span K ρ ij l | l ∈ G, i, j ∈ {1, . . . , dim(V l )} . Proof. First, we show that isomorphic representations don't add distinct matrix coefficients. Thus, let ρ ∼ = ρ l and let f : V → V l be the corresponding isomorphism. Then we have ρ l (g )•f = f •ρ(g) and thus, since f is a unitary transformation, ρ(g) = f * • ρ l (g) • f , for all g ∈ G, see Proposition F.44. Now let u, v ∈ V be arbitrary. We obtain ρ uv (g) = u|ρ(g)(v) = u|f * ρ l (g)f (v) = f (u)|ρ l (g)(f (v)) = ρ f (u)f (v) l (g), which proves the first claim. Now we want to show that we only need to consider the ρ ij l . Thus, let u, v ∈ V l be arbitrary. They allow for linear combinations u = i λ i v i , v = i µ i v i with coefficients λ i , µ i ∈ K. We obtain: ρ uv l (g) = u|ρ l (g)(v) = i j λ i µ j • v i |ρ l (g)(v j ) = i j λ i µ j ρ ij l (g), thus showing that ρ uv l is in the linear span of the matrix coefficients corresponding to the orthonormal basis. This concludes the proof.

For an isomorphism class

l ∈ G, let E l := span ρ ij l | i, j ∈ {1, . . . , dim(V l )} ⊆ L 2 K (G) be the linear subspace of E generated by matrix coefficients corresponding to l. Let furthermore for all j the space E j l ⊆ E l be the subspace generated by all ρ ij l for i ∈ {1, . . . , dim(V l )}. In the next lemma, we prove that these are actually closed subrepresentations of the regular representation. Lemma B.33. For j ∈ {1, . . . , dim(V l )}, E j l is a closed invariant subspace of L 2 K (G). In particular, E l is a closed invariant subspace of L 2 K (G). Proof. Closedness follows immediately since this space is finite-dimensional and thus complete, see Proposition F.47. We need to show that λ(g)ρ ij l ∈ E j l for all g ∈ G and all i, j. We can compute this directly: λ(g)ρ ij l (g ) = ρ ij l (g -1 g ) = v i |ρ l (g -1 g )(v j ) = ρ l (g)(v i )|ρ l (g )(v j ) = i v i |ρ l (g)(v i ) v i ρ l (g )(v j ) = i v i |ρ l (g)(v i ) • v i |ρ l (g )(v j ) = i v i |ρ l (g -1 )(v i ) ρ i j l (g ) = i ρ ii l (g -1 )ρ i j l (g ) where the coefficients ρ ii l (g -1 ) do not depend on g . Consequently, λ(g)ρ ij l ∈ E j l . Lemma B.34. Let ρ : G → U(V ) and ρ : G → U(V ) be unitary representations, ρ being irreducible and V = 0. Furthermore, assume that f : V → V is a surjective intertwiner. Then V is also irreducible and f an equivalence. Proof. Assume by contradiction that V is reducible. Thus, there is a nontrivial closed invariant subspace 0 W V . Now the following can easily be checked: 1. 0 f -1 (W ) V . 2. f -1 (W ) is an invariant subspace of V . 3. f -1 (W ) is a closed subset of V . Once we have this, we have a contradiction to the fact that V is irreducible. 1 and 2 can be checked by the reader, and 3 follows since V is, as an irreducible representation, finite-dimensional by Proposition B.31 and thus every subspace is closed by Proposition F.47. Therefore, we know that V is irreducible. Now use Schur's Lemma B.29 to conclude that f , being nonzero, necessarily is an equivalence.

Proposition B.35.

There is an equivalence of representations f j l : V l → E j l given on the orthonormal basis by f j l (v i ) = ρ ij l . Consequently, there is an isomorphism V l ∼ = E j l of unitary representations. Proof. We need to show that f j l is equivariant. Using the result of the derivation of Lemma B.33, we compute f j l (ρ l (g)(v i )) = f j l i v i ρ l (g)(v i ) v i = i v i ρ l (g)(v i ) f j l (v i ) = i v i |ρ l (g -1 )(v i ) ρ i j l = i ρ ii l (g -1 )ρ i j l = λ(g)ρ ij l = λ(g) f j l (v i ) , so f j l •ρ l (g) = λ(g)•f j l for all g ∈ G, which is what we wanted to show. That f is an intertwiner also requires it to be continuous: this follows since V l is finite-dimensional, and so all linear functions on it are continuous. Now, that f j l is even an equivalence follows from Lemma B.34 by noting that E j l = 0. Indeed, if it was zero then we would have ρ ij l (g) = 0 for all i, and thus ρ(g) would not be invertible, in contrast that it is a unitary automorphism. Thus, there is even an isomorphism V l ∼ = E j l by Schur's Lemma B.29. Lemma B.36. Let ρ : G → U(V ) be a unitary representation. Let V 1 ⊆ V be a subrepresentation. Then the orthogonal complement V ⊥ 1 is a subrepresentation as well. Proof. We have v|v 1 = 0 for all v ∈ V ⊥ 1 and all v 1 ∈ V 1 . Now, let g ∈ G be arbitrary. From the unitarity of ρ we obtain ρ(g )(v)|v 1 = v ρ(g -1 )(v 1 ) = 0. The last step follows from ρ(g -1 )(v 1 ) ∈ V 1 , which holds since V 1 is a subrepresentation. Overall, this shows ρ(g)(v) ∈ V ⊥ 1 as well, and so this is a subrepresentation. Lemma B.37. Let ρ : G → U(V ) be a finite-dimensional unitary representation. Furthermore, assume that W 1 , W 2 are irreducible subrepresentations. If they are not isomorphic, then they are perpendicular, i.e., w 1 |w 2 = 0 for all w 1 ∈ W 1 and w 2 ∈ W 2 . Proof. Let P : V → W 1 be the orthogonal projection from V to W 1 , defined as the adjoint of the canonical inclusion i : W 1 → V , i.e., defined by the property w 1 |P (v) = i(w 1 )|v = w 1 |v for all v ∈ V and w 1 ∈ W 1 , see also Proposition F.46. We now show that P is equivariant. For all g ∈ G, v ∈ V and w 1 ∈ W 1 we have: w 1 P (ρ(g)(v)) = w 1 |ρ(g)(v) = ρ(g -1 )(w 1 ) v = ρ(g -1 )(w 1 ) P (v) = w 1 ρ(g)(P (v)) , where we used in the third step that W 1 is a subrepresentation. Since this holds for all w 1 ∈ W 1 , we obtain P (ρ(g)(v)) = ρ(g)(P (v)) by Proposition F.45 and overall that P is equivariant. In particular, also the restriction P | W2 : W 2 → W 1 is equivariant. Since W 1 and W 2 are not isomorphic, we obtain by Schur's Lemma B.29 that P | W2 = 0, i.e., for all w 1 ∈ W 1 and w 2 ∈ W 2 we have w 1 |w 2 = w 1 |P | W2 (w 2 ) = w 1 |0 = 0. Thus, W 1 and W 2 are perpendicular as claimed. Proposition B.38. Let ρ : G → U(V ) be any finite-dimensional unitary representation. Then V decomposes into an orthogonal direct sum V = n i=1 V i such that V i ⊆ V are irreducible subrepresentations of ρ. Proof. Let V 1 be any irreducible subrepresentation of V : This can be obtained by noting that if V is not already irreducible (in which case V 1 = V ), then we find a nontrivial subrepresentation 0 W V . By iteratively proceeding with W , we eventually need to reach an irreducible representation since V is finite-dimensional. Now, let V ⊥ 1 be the orthogonal complement of V 1 . From Lemma B.36 we know that this is a subrepresentation of V . By induction on the dimension of V , and since V ⊥ 1 has strictly smaller dimension, we can assume that V ⊥ 1 already splits into an orthogonal direct sum of irreducible subrepresentations V ⊥ 1 = n i=2 V i , and overall, V = n i=1 V i is the decomposition we were looking for. The following proposition will not be used now, but we make use of it later when showing that there are only finitely many basis kernels in a steerable CNN for a compact group: Proposition B.39 (Krull-Remak-Schmidt Theorem). In the situation of Proposition B.38, the orthogonal direct sum decomposition is essentially unique. That is, the type and multiplicities of the irreducible direct summands is always the same. Proof. If one has one decomposition of V in which an irreducible representation U does not appear, then it cannot appear in any decomposition since U would be perpendicular to all the irreps in the decomposition of V by Lemma B.37 and thus zero. Therefore, the types of irreducible representations is always the same. That the multiplicities are always the same follows by the same argument and for dimension-reasons. We can now finally prove The Peter-Weyl Theorem B.22 for the case that X = G: Proof. By Proposition B.38 and Lemma B.33 there is some orthogonal decomposition E l = n l i=1 V li into irreducible invariant subspaces. Now assume that there is an i such that V li V l . By Proposition B.35 this means that V li E j l for all j = 1, . . . , dim(V l ). By Lemma B.37 we obtain V li ⊥ E j l for all j and thus, since j E j l = E l , we obtain V li ⊥ E l and overall V li = 0, a contradiction. Thus, the assumption was wrong and all V li in the orthogonal direct sum are isomorphic to V l . Now let l = l and i, j be arbitrary. We have E l ⊥ E l by Proposition B.30, and thus in particular V li ⊥ V l j . Furthermore, we have n l ≤ dim(V l ) since E l = dim(V l ) j=1 E j l = n l i=1 V li , and dim(V l ) < ∞ by Proposition B.31. Moreover, we have l∈ G n l i=1 V li = l∈ G E l = E, which is topologically dense in L 2 K (G) by Theorem B.27. Finally, that n l = dim(V l ) if K = C follows by invoking a stronger version of Schur's orthogonality than we have developed, and which works only over the complex numbers (Knapp, 2002).

B.2.4 A PROOF OF THE PETER-WEYL THEOREM FOR GENERAL L 2

K (X) Now let X be a homogeneous space of G. Then, as mentioned in Section B.1.3, there is a measure µ on X which is left-G-invariant (Nachbin & Bechtolsheim, 1965) in the sense that we have for all g ∈ G and all square-integrable functions f ∈ L 2 K (X): X f (g • x)dx = X f (x)dx. Furthermore, let π : G → X be the projection given by g → gx * for a fixed element x * ∈ X. One important result is that there is a Fubini-like theorem for evaluation of integrals on G using the invariant measure on X. Namely, for arbitrary x ∈ X, let g(x) ∈ G be any lift, i.e., any element in G with π(g(x)) = x. This exists since the action is transitive. Let H := G x * ⊆ G be the stabilizer subgroup. For a square-integrable function f : G → K, we can then construct the average av(f ) : X → K by av(f )(x) := H f (g(x)h)dh, where we integrate using the Haar-measure on H.foot_11 If it is hard to understand why this is called an average, note that X ∼ = G/H, i.e., points in X can be interpreted as cosets of G, and then the average just averages over cosets. 12This construction is well-defined, i.e., does not depend on the specific choice of the lift g(x). Indeed, let g(x) be another lift of x. Then g(x) = g(x)h for some h ∈ H, since H is the stabilizer subgroup. Consequently, using the invariance of the Haar measure, we see: H f (g(x) h)dh = H f (g(x)h h)dh = H f (g(x)h)dh, and thus the well-definedness of the average av(f ) : X → K. Integration of f on the whole of G is a "complete" average, and thus we can hope that averaging av(f ) leads to this complete integral. This is indeed the case, i.e., av(f ) is square-integrable on X and one has (Nachbin & Bechtolsheim, 1965 ) G f (g)dg = X av(f )(x)dx. We will use this important result later in order to see that L 2 K (X) embeds with good properties into L 2 K (G). We now want to prove the Peter-Weyl theorem for L 2 K (X). We first present a general argument showing an orthogonal decomposition of L 2 K (X) into irreducible subspaces, and then use a specific argument to deduce that the multiplicities of irreducible subrepresentations are necessarily bounded by the multiplicities in L 2 K (G). Proposition B.40. Let ρ : G → U(V ) be any unitary representation. Then there is a dense subrepresentation which splits as an orthogonal direct sum of irreducible subrepresentations. Proof. We sketch the proof in Kowalski (2014), Corollary 5.4.2. In this book, the proof is done only for the complex numbers C, but it is obvious that each step carries over without any changes to arbitrary K ∈ {R, C}. The rough steps are as follows: 1. From ρ one builds a function ρ : L 2 K (G) → Hom K (V, V ), given by ρ(ϕ)(v) = G ϕ(g)ρ(g)(v)dg. This is analogous to our construction of kernel operators (special representation operators) from kernels, which we will handle in the next chapter, See Theorem C.7. 2. Given v ∈ V fixed, one obtains the function ρ v : L 2 K (G) → V , ϕ → ρ(ϕ)(v). One can check easily that this is an intertwiner.

3.. For each finite-dimensional subrepresentation

E ⊆ L 2 K (G), the image ρ v (E) ⊆ V is a finite-dimensional subrepresentation of V . 4. For v = 0, using analytical arguments and the Peter-Weyl theorem for L 2 K (G), one can prove that there is an E such that ρ v (E) ⊆ V is not zero. Having that, one can use Proposition B.38 in order to deduce that ρ v (E) contains an irreducible subrepresentation, and so does V . With this at hand, one can proceed inductively as follows: Given an irreducible subrepresentation V 1 ⊆ V , one can consider the orthogonal complement V ⊥ 1 , which is by Lemma B.36 again a subrepresentation of V . Thus, this also has, by the same argument as above, an irreducible subrepresentation V 2 and so on. By induction (or better: using Zorn's Lemma), one can then "fill up" V with orthogonal irreducible subrepresentations, deducing the result. Consequently, since L 2 K (X) carries a unitary representation of G by [λ(g)(ϕ)] (x) := ϕ(g -1 x), we can deduce that it contains a dense subrepresentation which splits as an orthogonal direct sum of irreducible subrepresentations. But we would like to know more details about this, in particular the multiplicities of the irreps. For this to work, we want to embed L 2 K (X) into L 2 K (G) and thus deduce a more specific result from the decomposition of L 2 K (G). Let as before x * ∈ X be an arbitrary point and let π : G → X be the projection given by π(g) := gx * . Consider the function π * : L 2 K (X) → L 2 K (G) given by π * (ϕ) := ϕ • π. It is unclear a priori whether this is well-defined: For example, it might be that an f : X → K which is zero outside a measure 0 set gets lifted to π * (f ) : G → K which does not have this property, and thus π * would not be an actual function. 13 Thus, we need some lemmas: Published as a conference paper at ICLR 2021 Lemma B.41. Let f : X → K be square-integrable. Then we have av(π * (f )) = f . Proof. Using Eq. ( 9) and that H is the stabilizer subgroup we compute: av(π * (f ))(x) = H π * (f )(g(x)h)dh = H f (π(g(x)h))dh = H f (π(g(x)))dh = H f (x)dh = f (x) H 1dh = f (x)µ(H) = f (x). Lemma B.42. Let A ⊆ X be any measurable set. Let 1 A : X → {0, 1} ⊆ K be its indicator function. Then π * (1 A ) = 1 π -1 (A) . Proof. This can easily be checked. Lemma B.43. Let ϕ : X → K be zero outside a measure zero set A. Then π * (ϕ) is zero outside π -1 (A) which is also a measure zero set. Proof. If g / ∈ π -1 (A) then π(g) / ∈ A and thus: 0 = ϕ(π(g)) = π * (ϕ)(g) which proves the first statement. The second is shown as follows using both Lemmas B.41 and B.42 and Eq. ( 9): µ(π -1 (A)) = G 1 π -1 (A) (g)dg = G π * (1 A )(g)dg = X av(π * (1 A ))(x)dx = X 1 A (x)dx = µ(A) = 0, thus showing what was claimed. Thus, our concern about well-definedness as a function is invalid and we can now prove an embedding result: Proposition B.44. π * : L 2 K (X) → L 2 K (G) is a well-defined intertwiner and a unitary transformation, i.e., for all ϕ, ψ ∈ L 2 K (X) we have π * (ϕ)|π * (ψ) L 2 K (G) = ϕ|ψ L 2 K (X) . Proof. For well-definedness, we still need to show that π * (ϕ) is again square-integrable for squareintegrable ϕ : X → K. This is indeed the case due to Eq. ( 9). Namely, let |π * (ϕ)| 2 : G → K and consider its average av(|π * (ϕ)| 2 ). Clearly, we have |π * (ϕ)| 2 = π * (|ϕ| 2 ) and thus, using Lemma B.41, av(|π * (ϕ)| 2 ) = |ϕ| 2 . We obtain: G |π * (ϕ)| 2 (g)dg = X av(|π * (ϕ)| 2 )(x)dx = X |ϕ(x)| 2 dx < ∞. Thus, π * is not only well-defined but even fulfills π * (ϕ) L 2 K (G) = ϕ L 2 K (X) , which also shows the continuity of π * . With similar arguments, we show that π * respects the whole scalar product, i.e., is a uniform transformation: π * (ϕ)|π * (ψ) L 2 K (G) = G π * (ϕ) • π * (ψ) (g)dg = X av(π * (ϕ) • π * (ψ))(x)dx = X ϕ(x)ψ(x)dx = ϕ|ψ L 2 K (X) . The step from the second to the third line follows as before by noting that π * (ϕ) • π * (ψ) = π * (ϕ • ψ) and invoking Lemma B.41 again. The linearity of π * is obvious, and the equivariance is done as follows: note that for arbitrary g, g ∈ G we have π(g -1 g ) = (g -1 g )x * = g -1 (g x * ) = g -1 π(g ) and therefore: [π * (λ(g)ϕ)] (g ) = (λ(g)ϕ)(π(g )) = ϕ(g -1 π(g )) = ϕ(π(g -1 g )) = π * (ϕ)(g -1 g ) = [λ(g)π * (ϕ)] (g ). Thus, we shown everything which was to show. Thus, π * : L 2 K (X) → L 2 K (G) is an embedding which even preserves the scalar product. We can therefore view L 2 K (X) as a subspace: L 2 K (X) ⊆ L 2 K (G).foot_14 We can finally complete the proof of the Peter-Weyl Theorem B.22: Proof of Theorem B.22. Assume that l∈ G m l i=1 V li ⊆ L 2 K (X) ⊆ L 2 K (G) is a dense subspace such that the direct sum is orthogonal, where V li ∼ = V l for all l, i. This exists by Proposition B.40. Remember that n l denotes the multiplicity of V l as a subrepresentation in L 2 K (G). We now want to show that m l ≤ n l . Since V li is perpendicular to all E l with l = l by Lemma B.37, V li must be contained in the orthogonal complement of l =l E l . This is exactly E l , which we show in a final lemma after this proof. So V li ⊆ E l for all i. Thus, we obtain the result m l ≤ n l by dimension reasons. This was all there was left to show. Lemma B.45. We have E l = l =l ∈ G E l ⊥ Proof. We already know E l ⊆ l =l ∈ G E l ⊥ from Proposition B.30. Now, assume this inclusion is not an equality. Then there is v / ∈ E l such that v ∈ l =l ∈ G E l ⊥ . The space span K (v, E l ) does contain an orthonormal basis by Proposition F.41, where the procedure of Gram-Schmidt orthonormalization allows starting with an orthonormal basis of E l and to fill it up to one of the whole space span K (v, E l ). Thus, we can assume v ∈ E ⊥ l as well. Overall, v ∈ l ∈ G E l ⊥ , and by taking topological closure and using that the scalar product is continuous by Proposition F.38, ob- tain v ∈ l ∈ G E l ⊥ = (L 2 K (G)) ⊥ by the Peter-Weyl theorem for the regular representation. This means v = 0 ∈ E l , a contradiction to v / ∈ E l . Thus, our assumption is wrong and such a vector v cannot exist. We obtain the equality as desired.

C THE CORRESPONDENCE BETWEEN STEERABLE KERNELS AND REPRESENTATION OPERATORS

In this chapter, we formulate and prove Theorem C.7, which gives a precise one-to-one correspondence between steerable kernels on the one hand, and certain representation operators which we call kernel operators on the other hand. Representation operators are a representation-theoretic abstraction of the scalar, vector and tensor operators from physics, that were explained in Section 2. The correspondence will allow us to prove a Wigner-Eckart theorem for steerable kernels in Chapter D and, ultimately, to obtain a complete description of steerable kernel bases. We formulate the correspondence in Section C.1, while Section C.2 gives a detailed and rigorous proof of it. As in Chapter B, K is either of the two fields R or C.

C.1 FUNDAMENTALS OF THE CORRESPONDENCE

In Section C.1, we formulate the correspondence between steerable kernels and special representation operators that we name kernel operators. We do this by first studying steerable CNNs and the kernel constraint in Section C.1.1, which progressively leads us to consider steerable kernels on homogeneous spaces of general compact groups in Section C.1.2. This abstract formulation of steerable kernels will show apparent similarities to the concept of representation operators in Section C.1.3. We study them in purely representation-theoretic terms in Section C.1.4. However, they importantly differ in the fact that steerable kernels are not linear, whereas representation operators are -this is a difference that we need to bridge. Finally, after defining kernel operators as special representation operators, we give the formulation of the correspondence in Theorem C.7 in Section C.1.5 and shortly give some intuitions about why it is true.

C.1.1 STEERABLE KERNELS AND THE RESTRICTION TO HOMOGENEOUS SPACES

The concept of steerable CNNs outlined here follows (Weiler et al., 2018a; Weiler & Cesa, 2019) . In a nutshell, they work as follows: The network is supposed to process feature fields f : R d → K c with d ∈ N. c is the dimension of the features themselves, i.e., the number of channels. For example, planar RGB-images correspond to the case d = 2 and c = 3.  Ind R d G G ρ (tg) • f (x) := ρ(g) • f (g -1 (x -t)). Let the kernel that "maps" between the layers by convolutionfoot_17 be given by a function K : R d → K cout×cin . That is, for an input f in : R d → K cin , the output f out : R d → K cout is given by f out (x) = [K f in ] (x) = R d K(y)f in (x + y)dy, where K(y) ∈ K cout×cin acts for any y ∈ R d as a linear transformation from K cin to K cout . The goal is now to find kernels K such that convolution with these kernels commutes with the induced actions on the input and output fields. That is, for all input fields f in and for all t ∈ R d and g ∈ G we want the following property: K Ind R d G G ρ in (tg) • f in = Ind R d G G ρ out (tg) • (K f in ) . It was shown in Weiler et al. (2018a) that a kernel K has this equivariance property if and only if the kernel satisfies a certain constraint. We are rederiving it here for convenience. Writing out both sides we obtain the following equality that needs to hold for all f in and all x, t ∈ R d : R d K(y)ρ in (g)f in g -1 (x + y -t) dy = ρ out (g) R d K(y)f in g -1 (x -t) + y dy. Substituting y = g -1 y on the left side and using | det g| = 1 due to the compactness of G, and putting ρ out (g) inside the integral on the right side, which is possible due to linearity, we obtain: R d K(gy)ρ in (g) f in (g -1 x -g -1 t + y)dy = R d ρ out (g)K(y) f in (g -1 x -g -1 t + y)dy. Since this needs to hold for all fields f in , we necessarily have K(gx)ρ in (g) = ρ out (g)K(x) for all x ∈ R d and all g ∈ G and obtain the kernel constraint K(gx) = ρ out (g) • K(x) • ρ in (g) -1 . ( ) This work will create a general theory for how to solve this kernel constraint, which means to find a parameterization for the space of all kernels that fulfill this constraint. We now explain how to make this problem more tractable: formally, the action of G on R d is a group action as in Definition B.5. However, it cannot be transitive as in Definition B.7 since G is compact and R d is not. Thus R d splits into a disjoint union of orbits (Definition B.6), of the action: R d = k∈K X k . That this is a disjoint union can be explained as follows: define the relation ∼ on R d by x ∼ x if gx = x for some g ∈ G. This is then an equivalence relation, and so R d splits into a disjoint union of equivalence classes. One then can show that these equivalence classes are precisely the orbits of the group action. For example, such orbits take the form of spheres S d-1 if G = SO(d) or G = O(d) and the form of a finite set of points if G = C N or G = D N . The idea is now that the kernel constraint 10 only constrains the behavior of the kernel at each orbit individually, and thus a solution on each orbit can be "patched together" to a solution on the whole of R d . Indeed, assume that K k : X k → K cout×cin individually fulfill the kernel constraint, which means that for all x k ∈ X k and g ∈ G we have K k (gx k ) = ρ out (g) • K k (x k ) • ρ in (g) -1 . Then, define the patch of these orbit-kernels by K : R d → K cout×cin as K(x) = K k (x) if x ∈ X k . This is well-defined since each x is in precisely one orbit. Then clearly, K satisfies the kernel constraint 10. Moreover, each kernel K which fulfills the kernel constraint emerges from such a construction, since we can simply set K k := K| X k . Overall, we see that we can restrict our attention to orbits. In Weiler et al. (2018b) and later Weiler et al. (2018a) , a discretized implementation is done where the kernel is discretized into finitely many orbits with a smooth Gaussian radial profile. We will come back to these practical questions of parameterization in Remark D.19, once we have fully developed the theory of steerable CNNs.

C.1.2 AN ABSTRACT DEFINITION OF STEERABLE KERNELS

Motivated by the discussion in the last section, we now define steerable kernels in precise terms and will stick to that definition throughout this work. The definition will be more abstract than usual in the deep learning community, but we are rewarded since such an abstract definition makes it easier to apply representation-theoretic results. Without loss of generality, we will in the rest of this work only consider kernels on orbits. Thus, let X := G • x be an arbitrary orbit. We consider steerable kernels K : X → K cout×cin . Note that the restriction of the action G × R d → R d to X, written G × X → X, makes X to a homogeneous space of G, see Definition B.7. Thus, instead of viewing X as a subset of R d , we view X as an arbitrary homogeneous space of an arbitrary compact group G. Notably, this framework is more general than usually studied in the context of steerable CNNs on R d , since we allow also groups that are not Lie groups and homogeneous spaces which are not naturally embedded in an R d , as well as finite homogeneous spaces of finite groups all at the same time. Furthermore, we replace K cin and K cout by coordinate-independent K-vector spaces V in and V out , and therefore K cout×cin by the space of linear functions from V in to V out , written Hom K (V in , V out ). We assume there are linear representations ρ in : G → GL(V in ) and ρ out : G → GL(V out ). Overall, this means that steerable kernels are certain maps K : X → Hom K (V in , V out ). The only property they need to fulfill is the kernel constraint K(gx) = ρ out (g) • K(x) • ρ in (g) -1 for all g ∈ G and x ∈ X. This can be viewed in representation-theoretic terms by defining the Homrepresentation: Definition C.1 (Hom-Representation). Let ρ in : G → GL(V in ) and ρ out : G → GL(V out ) be two finite-dimensional G-representations over the field K. The space Hom K (V in , V out ) of K-linear (not necessarily G-equivariant) functions from V in to V out also carries an induced G-representation, with action [ρ Hom (g)] (f ) := ρ out (g) • f • ρ in (g) -1 . We call this the Hom-representation. Remark C.2. Of course, one needs to check that this is indeed a linear representation. Continuity follows from the continuity of ρ in and ρ out as follows: the topology on Hom K (V in , V out ) is just the Euclidean topology of K cout×cin coming from a basis of V in and V out . In these bases, ρ in (g) and ρ out (g) are given by matrices. All matrix coefficients are continuous by Remark B.25. Now, in order to show that ρ Hom is continuous, pick a fixed element f ∈ K cin×cout . One needs to show that the map ρ f Hom : G → K cin×cout , g → ρ out (g) • f • ρ in (g -1 ) is continuous. Since all matrix coefficients are continuous and since also the inversion G → G, g → g -1 is continuous by the definition of a topological group, the map ρ f Hom is basically just a stacked linear combination of continuous functions and thus continuous itself. The linearity of each ρ Hom (g) is also clear. So what needs to be checked is that ρ Hom is a group homomorphism. And indeed, it is, exploiting the corresponding property of ρ in and ρ out : [ρ Hom (gg )] (f ) = ρ out (gg ) • f • ρ in (gg ) -1 = ρ out (g) • ρ out (g ) • f • ρ in (g ) -1 • ρ in (g) -1 = [ρ Hom (g)] [ρ Hom (g )] (f ) = [ρ Hom (g) • ρ Hom (g )] (f ), and so the claim follows. With this definition in mind, steerable kernels K : X → Hom K (V in , V out ) are just functions with the property K(gx) = [ρ Hom (g)] (K(x)). Summarizing, we have the following abstract definition of steerable kernels (different from Definition 3.2, we here allow also input-and output representations that are not irreducible and make explicit reference to the Hom-representation): Definition C.3 (Steerable Kernel). Let G be any compact group and X be any homogeneous space of G. Furthermore, let ρ in : G → GL(V in ) and ρ out : G → GL(V out ) be finite-dimensional representations of G. We assume that Hom K (V in , V out ) is equipped with the Hom-representation ρ Hom . A G-steerable kernel is an equivariant function K : X → Hom K (V in , V out ), i.e., a function such that K(gx) = [ρ Hom (g)] (K(x)) for all g ∈ G and x ∈ X. We denote the vector-space of all these kernels by Hom G (X, Hom K (V in , V out )) = {K : X → Hom K (V in , V out ) | K is steerable } . Notably, steerable kernels are not linear in a meaningful sense with respect to their input. That the space of steerable kernels forms a vector space, as claimed in this definition, can easily be checked.

C.1.3 MORE DETAILS ON THE COMPARISON OF REPRESENTATION OPERATORS AND STEERABLE KERNELS

Steerable kernels satisfy the constraint K(gx) = ρ out (g) • K(x) • ρ in (g) -1 , whereas, as we saw in Section 2, representation operators are collections (A 1 , . . . , A N ) of operators A i : H → H that satisfy the constraint N j=1 π(g) ij A j = U (g) † A i U (g) ∀ g ∈ G. Hereby, U : G → U(H) and π : G → U(C N ) are unitary representations. Unfortunately, these equations still look somewhat different from each other. We can make them more similar by inverting g and using the unitarity of π (note the swap of j and i and the complex conjugation): N j=1 π(g) ji A j = U (g)A i U (g) † ∀ g ∈ G. In order to make the analogy to steerable kernels stronger, we would like to interpret a representation operator as one object A instead of separate operators A i , in the same way as a kernel K is one single object and not just a disjoint collection of linear functions in Hom K (V in , V out ). For this, we interpret A as a function that assigns to arbitrary vectors in C N an operator. Namely, let {e i } be the standard basis of C N . We then define A as the unique linear map which is given on basis elements as follows: A : e i → A i . We can then deduce the following, where we use the linearity of A in the second step, the definition of A j in the third and fifth step, and Eq. ( 13) in the fourth step: A π(g)(e i ) = A j π(g) ji e j = j π(g) ji A (e j ) = j π(g) ji A j = U (g)A i U (g) † = U (g)A(e i )U (g) † . ( ) If now v = i λ i e i is an arbitrary vector in C N , not necessarily a standard basis vector, then from the linearity of A and Eq. ( 14) we obtain A π(g)(v) = U (g)A(v)U (g) -1 . ( ) This equation is essentially the starting point for the definition of a representation operator as it can be found in Jeevanjee [2011] . This, finally, really looks like Eq. ( 12). In this comparison, the action of the group G on R d in deep learning is replaced by the action of G via π on the space C N . The main difference is that steerable kernels are not necessarily linear. This difference will be bridged in Theorem C.7.

C.1.4 REPRESENTATION OPERATORS AND KERNEL OPERATORS

Now that we have a clear abstract idea of what steerable kernels are and saw strong analogies to representation operators, we can begin to formulate precise theoretical connections. In this section, we therefore begin with formulating a purely representation-theoretic and more abstract working definition of representation operators and will then formulate the main theorem of this chapter, Theorem C.7. We come to the main definition, which is directly motivated from Eq. ( 15). It differs from (Jeevanjee, 2011) by allowing the input-and output representations to differ. We furthermore restrict to finitedimensional input-and output representations due to our specific applications. As explained in Section C.1.3, this new definition furthermore somewhat differs from the one given in Section 2 since now we view representation operators as one object instead of viewing it as a collection of several linear operators. Definition C.4 (Representation Operator). Let ρ in : G → GL(V in ) and ρ out : G → GL(V out ) be finite-dimensional G-representations. Let λ : G → GL(T ) be a third G-representation, not necessarily finite-dimensional. Then a representation operator is an intertwiner K : T → Hom K (V in , V out ), where the right space is equipped with the Hom-representation as in Definition C.1. We denote the vector space of all these representation operators by Hom G,K (T, Hom K (V in , V out )) = {K : T → Hom K (V in , V out ) | K is an intertwiner} . Note that representation operators are by definition linear, which is a requirement that needs to be satisfied for the standard Wigner-Eckart theorem. We clearly see strong similarities between this definition and the formalization of steerable kernels in Definition C.3. The main difference is that we assume representation operators to be linear. This is in notation captured by the subscript K that we put in the corresponding Hom-space. One may think that there is another difference, namely coming from the fact that intertwiners are by definition continuous with respect to the topologies involved. Two things need to be said about this: 1. First of all, one may wonder what continuity for representation operators actually means. This can be clarified as follows: By assumption, G-representations are always on vector spaces with topologies, and thus T has a topology. Furthermore, in Remark C.2 we clarified the topology on Hom K (V in , V out ). Then, being continuous just means, as always, to be continuous with respect to the topologies of these two spaces. 2. The second remark is that this apparent difference in the requirement of continuity for steerable kernels and representation operators is actually non-existent. This is explained by the following Proposition which says that steerable kernels are automatically continuous. Note that this is not true for steerable kernels that are defined on the domain R d -in that case, continuity is only guaranteed when restricting to orbits. Proposition C.5. Let K : X → Hom K (V in , V out ) be a steerable kernel. Then K is continuous. Proof. For brevity, denote V := Hom K (V in , V out ) and ρ := ρ Hom . Let x * ∈ X be any point and G x * the stabilizer corresponding to the action of G on X. Remember the homeomorphism ϕ : G/H → X, [g] → gx * from Lemma B.21. Since this is a homeomorphism, the kernel K is continuous if and only if the composition K • ϕ is continuous, since then K = (K • ϕ) • ϕ -1 is a composition of continuous functions. Thus, we evaluate K • ϕ: (K • ϕ)([g]) = K(ϕ([g])) = K(gx * ) = ρ(g)(K(x * )), where in the last step we have used the equivariance of K. Thus, if we set v * := K(x * ) ∈ V , then we obtain the simple relation (K • ϕ)([g]) = ρ(g)(v * ). This is by definition just the unique map on the quotient, G/H → V , coming from ρ v * : G → V , g → ρ(g)(v * ). This last map is continuous by definition of a linear representation. The universal property of quotients Proposition F.12 then shows that K • ϕ is continuous as well, and so we are done. All of this is visualized in the following commutative diagram, where q : G → G/H, g → [g] is the canonical projection: G G/H X V q (•)•x * ρ v * ϕ ∼ K•ϕ K Thus, the only difference between steerable kernels and representation operators is indeed the linearity. We now look at special representation operators that play the main role in this work: Definition C.6 (Kernel Operator). Let ρ in : G → GL(V in ) and ρ out : G → GL(V out ) be finitedimensional G-representations. Let λ : G → U(L 2 K (X)) be the standard unitary representation on the space of square-integrable functions of a homogeneous space X, given, as in Section B.1.3, by [λ(g)(ϕ)] (g ) = ϕ(g -1 g ). A kernel operator is a representation operator K : L 2 K (X) → Hom K (V in , V out ). We denote the space of these by Hom G,K (L 2 K (X), Hom K (V in , V out )) = K : L 2 K (X) → Hom K (V in , V out ) | K is an intertwiner . Notably, kernel operators are K-linear in their input.

C.1.5 FORMULATION OF THE CORRESPONDENCE BETWEEN STEERABLE KERNELS AND KERNEL OPERATORS

The following Theorem lies at the heart of our investigations and establishes that steerable kernels can be considered as kernel operators, which we defined as special representation operators. More precisely, we will give an explicit isomorphism between the space of steerable kernels and the space of kernel operators. We shortly explain why the theorem is useful. First of all, using a Wigner-Eckart theorem for kernel operators that we prove in Theorem D.13, one can explicitly describe a basis B of the space of kernel operators Hom G,K (L 2 K (X), Hom K (V in , V out )). Then, since we have an isomorphism of vector spaces to the space of steerable kernels, one can "carry over" this basis to a basis for the space of steerable kernels, namely Hom G (X, Hom K (V in , V out )). This basis will then have a convenient explicit form that we establish in Theorem D.16 and is exactly what we need in order to parameterize an equivariant neural network layer. We now come to a precise formulation of the theorem: Theorem C.7 (Kernel-Operator-Correspondence). Let ρ in : G → GL(V in ) and ρ out : G → GL(V out ) be finite-dimensional G-representations and X be a homogeneous space of G. Then there is an isomorphism Hom G (X, Hom K (V in , V out )) Hom G,K (L 2 K (X), Hom K (V in , V out )) (•) (•)| X between the space of steerable kernels on the left and the space of kernel operators on the right. The two maps are defined as follows: 1. For a steerable kernel K : X → Hom K (V in , V out ), the extension K : L 2 K (X) → Hom K (V in , V out ) is given by K(f ) := X f (x)K(x)dx.

2.. For a kernel operator

K : L 2 K (X) → Hom K (V in , V out ), the restriction K| X : X → Hom K (V in , V out ) is given by K| X (x) := lim U ∈Ux K(δ U ). Hereby, U x is the directed set of open neighborhoods of x, see Example F.27. δ U : X → K is the approximated Dirac delta function with δ U (y) = 1 µ(U ) if y ∈ U and δ U (y) = 0, else. The limit is a limit of nets as in Definition F.29. This theorem requires some explanation. First of all, K is supposed to be a kernel operator, i.e., a map L 2 K (X) → Hom K (V in , V out ). Thus, K(f ) should be a linear function V in → V out . The formal expression of it can indeed be considered as such: K(f ) = X f (x)K(x)dx : v in → X f (x) [K(x)] (v in )dx ∈ V out . ( ) Due to the continuity of K proven in Proposition C.5foot_18 and the integrability of f , the function X → V out , x → f (x) [K(x)] (v in ) is also integrable, meaning the expression in Eq. ( 16) can be evaluated. This explains the meaning of the map (•) in Theorem C.7. For the map (•)| X in the other direction, we want to shortly explain the intuitions in a more informal way. For this, we consider Dirac delta functions δ x for x ∈ X. Such a "function" δ x : X → K for a point x ∈ X can be imagined as a function taking value infinity at x and zero elsewhere. It is characterized by the property that X δ x (x )f (x )dx = f (x) for any function f ∈ L 2 K (X). We think of δ x as being a function in L 2 K (X), even though technically, it is not in this space. This is since ∞ / ∈ K. Now, informally, we can think of the limit K| X (x) = lim U ∈Ux K(δ U ) as being given by K(δ x ), the value that K takes at the Dirac delta function δ x . This is since the limit of nets progressively "shrinks down" the open neighborhood U of x. Of course, K(δ x ) is not really well-defined since δ x / ∈ L 2 K (X), but we can pretend that it is for gaining intuitions. Now that we have understood the formulation of the theorem, we might wonder, why should such a theorem be true? A first intuition comes from an analogy with linear algebra: namely, assume B is a basis of a K-vector space V and W any other vector space. Then linear maps f : V → W are in one-to-one correspondence with (not assumed to be linear) functions f : B → W , and this isomorphism is given by restriction and linear extension: Hom(B, W ) Hom K (V, W ). (•) (•)| B Thus, we can think of the homogeneous space X as a "continuous basis" of the space of squareintegrable functions. Sums are then replaced by integrals, and evaluations at a basis element by evaluations at Dirac delta functions of elements in X. For the actual proof of Theorem C.7, informally, one direction seems pretty clear from the properties of the Dirac delta: K X (x) = K(δ x ) = X δ x (x )K(x )dx = K(x). But the other direction is less obvious: it seems like the space of kernel operators is considerably larger than the space of steerable kernels, since kernel operators are defined on a larger space. Therefore it is hard to believe that the construction is also inverse in the other direction. However, it pays off to ponder a bit more over what the Dirac delta construction does: Basically, we "embed" X into L 2 K (X) by means of the Dirac delta functions, i.e., x → δ x and, as such, view X as a subset of L 2 K (X) (albeit a subset that is only in approximation in that space). Steerable kernels are then "partial" kernel operators in the sense that they are only defined on this subset X ⊆ L 2 K (X). What then needs to be understood is why there is only one unique extension of each steerable kernel K to a kernel operator K on the whole of L 2 K (X): if this is understood, then the space of kernel operators cannot be larger than the space of steerable kernels. And indeed, if there is an extension of K to K on L 2 K (X), it has to be unique: each f ∈ L 2 K (X) can be approximated by finite linear combinations of scaled indicator functions. Then by linearity of the kernel operator K, we can evaluate K(f ) by knowing K(δ U ) for scaled indicator functions δ U on small measurable sets U . And these approximate K(x) = K(δ x ) for x ∈ U arbitrarily well by construction. This determines the behavior of K. The details of all of this can be found in the next section.

C.2 A PROOF OF THE CORRESPONDENCE BETWEEN STEERABLE KERNELS AND KERNEL OPERATORS

Here, we give a step-by-step proof of Theorem C.7. The details of this investigation will not be needed later, and so a reader who is mainly interested in the applications to steerable CNNs can safely skip reading this section and go on reading Chapter D.

C.2.1 A REDUCTION TO UNITARY IRREDUCIBLE REPRESENTATIONS

In this section, we make the proof more manageable by reducing Hom K (V in , V out ) to an irreducible representation. First, remember that Proposition B.20 shows that there is a scalar product on Hom K (V in , V out ) such that it's Hom-representation becomes unitary. Since all norms on finitedimensional spaces are equivalent, as is well known, this will not change the topology. Then, we can decompose Hom K (V in , V out ) into an orthogonal direct sum of irreducible unitary representations by Proposition B.38. Let Hom K (V in , V out ) ∼ = n i=1 V i be such a decomposition. We get canonicalfoot_19 isomorphisms Hom G (X, Hom K (V in , V out )) ∼ = n i=1 Hom G (X, V i ) and Hom G,K (L 2 K (X), Hom K (V in , V out )) ∼ = n i=1 Hom G,K (L 2 K (X), V i ). Thus, we can show Theorem C.7 by showing it for irreducible unitary representations instead of Hom K (V in , V out ). Overall, we have reduced our Theorem to the following, simpler statement: Theorem C.8 (Kernel-Operator-Correspondence, Restated). Let ρ : G → U(V ) be an irreducible unitary representation and X a homogeneous space of G. Then there is an isomorphism Hom G (X, V ) Hom G,K (L 2 K (X), V ) (•) (•)| X which is given as follows: for K ∈ Hom G (X, V ) we set K(f ) = X f (x)K(x)dx and for K ∈ Hom G,K (L 2 K (X), V ) we set K| X (x) = lim U ∈Ux K(δ U ), with δ U being an approximated Dirac delta function as before. From now on, we assume that X and ρ : G → U(V ) is fixed as in the formulation of Theorem C.8.

C.2.2 WELL-DEFINEDNESS OF (•)

Lemma C.9. The function (•) : Hom G (X, V ) → Hom G,K (L 2 K (X), V ) is well-defined, i.e.: for an equivariant function K : X → V , the function K : L 2 K (X) → V is linear, equivariant and continuous. Proof. Linearity of K is clear. Equivariance can be proven using the equivariance of K and the left invariance of the Haar measure on the homogeneous space X: K(λ(g)f ) = X (λ(g)f )(x)K(x)dx = X f (g -1 • x)K(x)dx = X f (x)K(g • x)dx = X f (x) [ρ(g) (K(x))] dx = ρ(g) X f (x)K(x)dx = ρ(g) K(f ) . The action by ρ(g) could be put out of the integral since ρ(g) it is linear and continuous, and since integrals can be approximated by finite sums. Now about continuity: By Proposition F.18, we only need to show continuity in 0. Thus, let (f k ) k be a sequence of functions f k ∈ L 2 K (X) with lim k→∞ f k L 2 = 0. Then we obtain K(f k ) V = X f k (x)K(x)dx V ≤ X |f k (x)| • K(x) V dx ≤ max x K(x ) V • X |f k (x)|dx, where the continuity of K proven in Proposition C.5 was used. 20 For the right expression, using the Cauchy-Schwarz inequality Proposition F.34 we obtain X |f k (x)|dx = X |f k (x)| • 1dx = | |f k | | 1 | ≤ f k L 2 • 1 L 2 = f k L 2 . So, overall, if lim k→∞ f k L 2 = 0, then lim k→∞ K(f k ) V = 0 as well, which proves continuity.

C.2.3 WELL-DEFINEDNESS OF (•)| X

While it is clear that the limit lim U ∈Ux K(δ U ) from Theorem C.8 is unique if it exists (Conway, 2014) , it is somewhat unclear why it exists in the first place. For this, we need to better understand the properties of the (approximated) Dirac delta. The most important one is the following, which we hinted at already in the intuitions we gave before this section: basically, Dirac deltas help for evaluating continuous functions at specific points: Lemma C.10. For each x ∈ X and Y : X → K continuous we have lim U ∈Ux δ U |Y = Y (x). Proof. We have δ U |Y -Y (x) = X δ U (x )Y (x )dx -µ(U ) • 1 µ(U ) Y (x) = U 1 µ(U ) Y (x )dx - U 1 µ(U ) Y (x)dx = U 1 µ(U ) (Y (x ) -Y (x))dx ≤ U 1 µ(U ) |Y (x ) -Y (x)| dx . Let > 0. Since Y is continuous in x, there is U ∈ U x such that Y (x ) ∈ B (Y (x)) for all x ∈ U or, equivalently, |Y (x ) -Y (x)| < . Thus, for all U ⊇ U , i.e., all U ≤ U in U x we obtain δ U |Y -Y (x) ≤ U 1 µ(U ) |Y (x ) -Y (x)|dx ≤ U 1 µ(U ) dx = • µ(U ) • 1 µ(U ) = and consequently lim U ∈Ux δ U |Y = Y (x). Before we can show the well-definedness of K| X , we first want to get a better description of K. For this, recall from the Peter-Weyl theorem that L 2 K (X) = l∈ G m l i=1 V li . With this at our disposal, we can formulate the following Lemma on the form of intertwiners on L 2 K (X): Lemma C.11. Let K : L 2 K (X) → V be an intertwiner. Let l ∈ G be the unique index such that V ∼ = V li for all i = 1, . . . , m l . Let Y n li , n = 1, . . . , d l be an orthonormal basis of V li where d l = dim(V l ). Then K(f ) = m l i=1 d l n=1 Y n li |f K(Y n li ) for all f ∈ L 2 K (X). Proof. We can write f ∈ L 2 K (X) according to the discussion after Definition F.40 as f = l ∈ G m l i=1 [l ] n=1 Y n l i |f Y n l i . Note that K| V l i : V l i → V is an intertwiner as well, and so by Schur's Lemma B.29 it is necessarily zero unless l = l is the unique index such that V li ∼ = V . Due to its continuity and linearity, K commutes with infinite sums and we obtain K(f ) = l ∈ G m l i=1 [l ] n=1 Y n l i |f K (Y n l i ) = l ∈ G m l i=1 [l ] n=1 Y n l i |f K| V l i (Y n l i ) = m l i=1 d l n=1 Y n li |f K(Y n li ). Corollary C.12. We have K| X (x) = m l i=1 d l n=1 Y n li (x)K(Y n li ) . In particular, the defining limit exists. Proof. Since the Y n li are by the proof of the Peter-Weyl theorem in the finite-dimensional space E l spanned by matrix coefficients of the irreducible representation ρ l : G → U(V l ) and since these matrix coefficients are continuous by Remark B.25, the Y n li are as finite linear combinations of them also continuous functions. Thus, from Lemma C.10 and C.11 together we obtain: K| X (x) = lim U ∈Ux K (δ U ) = lim U ∈Ux m l i=1 d l n=1 Y n li |δ U K(Y n li ) = m l i=1 d l n=1 lim U ∈Ux Y n li |δ U K(Y n li ) = m l i=1 d l n=1 Y n li (x)K(Y n li ). The complex conjugation came into play since the order in the scalar product is swapped compared to Lemma C.10. Thus, since we now know that K| X as a function makes sense, we can finally prove the welldefinedness of K → K| X , Lemma C.13. The function (•)| X : Hom G,K (L 2 K (X), V ) → Hom G (X, V ) is well-defined, that is: for a linear, equivariant and continuous function K : L 2 K (X) → V , the restriction K| X : X → V is equivariant. Proof. We have K| X (g • x) = lim U ∈Ugx K (δ U ) = lim U ∈Ux K (δ gU ) = lim U ∈Ux K (λ(g)δ U ) = lim U ∈Ux ρ(g) [K (δ U )] = ρ(g) lim U ∈Ux K (δ U ) = ρ(g) [K| X (x)] , where the steps are justified as follows: The first step is just the definition of K| X . The second step uses that the open neighborhood of gx are precisely the g-translated open neighborhoods of x since g : X → X is a homeomorphism. The third step is easy to check. The fourth step uses the equivariance of K. The fifth step uses the continuity of ρ(g), which follows since ρ(g) is a unitary transformation. The last step is again the definition of K| X .

C.2.4 (•) AND (•)| X ARE INVERSE TO EACH OTHER

We can now finish the proof of Theorem C.8 and consequently of Theorem C.7: Proof of Theorem C.8. After all the preparation, we only need to still show that the maps (•) and (•)| X are inverse to each other. For K X = K, i.e., the injectivity of the function K → K and surjectivity of the function K → K| X , we compute: K X (x) = lim U ∈Ux K (δ U ) = lim U ∈Ux X δ U (x )K(x )dx = K(x). The last step follows from Lemma C.10 by identifying V = V l with K d l and viewing K as consisting of continuous component functions K n : X → K, n ∈ {1, . . . , d l }. The continuity of K was shown in Proposition C.5. For showing K| X = K we do a computation using the description of K from Lemma C.11 and the description of K| X from Corollary C.12: K| X (f ) = X f (x)K| X (x)dx = X f (x) m l i=1 d l n=1 Y n li (x)K(Y n li ) dx = m l i=1 d l n=1 X f (x)Y n li (x)dx K(Y n li ) = m l i=1 d l n=1 Y n li |f K(Y n li ) = K(f ). This finally finishes the proof.

D A WIGNER-ECKART THEOREM FOR STEERABLE KERNELS OF GENERAL COMPACT GROUPS

In Chapter C we have seen the most important theoretical insight of this work: steerable kernels on a homogeneous space X correspond one-to-one to kernel operators (certain representation operators) on the space of square-integrable functions L 2 K (X). In this chapter, we will develop the most important consequence of this correspondence: a Wigner-Eckart theorem for steerable kernels and consequently a description of a basis for steerable kernels. This works for both fields R and C, for an arbitrary compact group G, an arbitrary homogeneous space X and arbitrary finite-dimensional input-and output fields. Additionally, it covers the general theory of equivariant CNNs on homogeneous spaces developed in (Cohen et al., 2019b) . In Section D.1 we will work towards formulating the most important theorems. Since these will involve tensor products, we will start with defining and studying tensor products of pre-Hilbert spaces and (unitary) representations. Afterward, we will define the Clebsch-Gordan coefficients, which relate a tensor product of irreducible representations to the irreducible subrepresentations of this tensor product. This will lead to a formulation of the original Wigner-Eckart theorem similar as it appears in quantum mechanics, including a proof. The original Wigner-Eckart theorem is a statement about representation operators on irreducible representations. However, we consider kernel operators on L 2 K (X) which is not irreducible. Also, different from the original Theorem, we also consider representations over the real numbers, which leads to a replacement of reduced matrix elements by endomorphisms. Therefore we then formulate a generalization of the original theorem. Then, using the correspondence between kernel operators and steerable kernels from Theorem C.7, we can transform this into a Wigner-Eckart theorem for steerable kernels and ultimately a statement about a basis of the space of steerable kernels. We conclude with some remarks about how to use the basis kernels in practice. Afterward, in Section D.2, we give the remaining proof of the Wigner-Eckart theorem for kernel operators, which we omit in the section before. First, we reduce the statement to the dense subspace of L 2 K (X) which is a direct sum of all irreducible subrepresentations. We then describe a correspondence between representation operators and intertwiners on a certain tensor product, the so-called hom-tensor adjunction. Finally, we finish with the full proof of the Wigner-Eckart theorem. As always, let K be either of the two fields R and C and G be a compact topological group. X is any homogeneous space of G.

D.1 A WIGNER-ECKART THEOREM FOR STEERABLE KERNELS AND THEIR KERNEL BASES D.1.1 TENSOR PRODUCTS OF PRE-HILBERT SPACES AND UNITARY REPRESENTATIONS

In order to state the Wigner-Eckart theorem, we need the notion of representations on tensor products. This is defined similarly to Hom-representations, see Definition C.1. For this, we first need to discuss the notion of a tensor product of vector spaces: Definition D.1 (Tensor Product). Let V and V be two vector spaces over K. Then V ⊗ V , the tensor product of V and V , is a vector space over K with the following properties: 1. There is a bilinear function ⊗ : V × V → V ⊗ V , (v, v ) → v ⊗ v . V ⊗ V is generated by elements of the form v ⊗ v . 2. It has the following universal property: for any bilinear function β : V × V → P into a vector space P , there is a unique linear function β : V ⊗ V → P given on elements of the form v ⊗ v by β(v ⊗ v ) = β(v, v ). In other words, the following diagram commutes: V × V P V ⊗ V β ⊗ β 3. If V and V are finite-dimensional with bases{v 1 , . . . , v n } ⊆ V and {v 1 , . . . , v m } ⊆ V , then {v i ⊗ v j } i,j ⊆ V ⊗ V is a basis of V ⊗ V . In particular, the dimension of V ⊗ V is n • m. Property 3 follows from 1 and 2 and would therefore not necessarily be needed in the definition. The explicit construction of tensor products shall not matter for our purposes since the properties above characterize it up to isomorphism. The second property stated in the definition is of large importance since it tells us how we can define linear functions on V ⊗ V : if we have a guess for such a function ϕ : V ⊗ V → P (of which we don't yet know whether its "assignment rule" is well-defined), then we just need to test whether the function φ : V × V → P given by φ(v, v ) := ϕ(v ⊗ v ) is bilinear. If it is, then ϕ is a well-defined linear function. We will use this soon in the following context: Assume f : V → V and g : V → V are linear functions. Then we would like to define a function v ) . For this to work, we need to test whether the assignment f ⊗ g : V ⊗ V → V ⊗ V by (f ⊗ g)(v ⊗ v ) = f (v) ⊗ g( (v, v ) → f (v) ⊗ g(v ) is a bilinear function V × V → V ⊗ V . Clearly, it is, and so f ⊗ g is a well-defined linear function! We use this in Definition D.3 in order to define the tensor product of representations. Since we actually deal with Hilbert spaces most of the time, we would like to build tensor products of Hilbert spaces. However, their definition is not completely straightforward since one cannot just take the tensor product of the underlying vector spaces but needs to additionally build the completion of the resulting space (Kadison & Ringrose, 1997) . Since this complicates the considerations related to a correspondence we later formulate in Proposition D.23, we go a slightly different route. Instead of describing the tensor product of Hilbert spaces, we describe the tensor product of pre-Hilbert spaces, which does not require a completion step. Recall from Definition F.3 that a pre-Hilbert space is basically a Hilbert space that is not necessarily complete. Definition D.2 (Tensor Product of pre-Hilbert spaces). Let V, V be two pre-Hilbert spaces with scalar products •|• and •|• . Then the tensor product of vector spaces V ⊗ V can be made into a pre-Hilbert space using the scalar product which is given on generators by v ⊗ v |w ⊗ w ⊗ := v|w • v |w . This is then anti-linearly extended in the first (i.e., "Bra"), and linearly extended in the second (i.e., "Ket") component. One can show that this makes V ⊗ V a pre-Hilbert space. For simplicity, we will from now on not notationally distinguish the different scalar products involved. With this preparation, we can come to the notion of tensor product representations: Definition D.3 (Tensor Product Representation). Let ρ : G → GL(V ) and ρ : G → GL(V ) be two linear representations, where V and V are pre-Hilbert spaces. Then on the tensor product V ⊗ V of pre-Hilbert spaces, we can define the tensor product representation ρ ⊗ ρ by ρ ⊗ ρ : G → GL(V ⊗ V ), g → ρ(g) ⊗ ρ (g), where ρ(g) ⊗ ρ (g) : V ⊗ V → V ⊗ V is given on generators by (ρ(g) ⊗ ρ (g)) (v ⊗ v ) := ρ(g)(v) ⊗ ρ (g)(v ). Lemma D.4. The map ρ ⊗ ρ : G → GL(V ⊗ V ) defined above is a linear representation. Proof. Clearly, each (ρ ⊗ ρ )(g) is linear and we have (ρ ⊗ ρ )(gg ) = (ρ ⊗ ρ )(g) • (ρ ⊗ ρ )(g ). Thus, for showing that it is a linear representation, we need to show it is continuous. Assume we already knew continuity of all maps (ρ ⊗ ρ ) v⊗v : G → V ⊗ V , g → (ρ ⊗ ρ )(g) (v ⊗ v ). Then for linear combinations ξ = n i=1 λ i (v i ⊗ v i ) we obtain using the linearity of (ρ ⊗ ρ )(g): (ρ ⊗ ρ ) ξ (g) = (ρ ⊗ ρ )(g) (ξ) = (ρ ⊗ ρ )(g) n i=1 λ i (v i ⊗ v i ) = n i=1 λ i (ρ ⊗ ρ )(g) (v i ⊗ v i ) = n i=1 λ i (ρ ⊗ ρ ) vi⊗v i (g). Now, since scalar multiplication and addition in topological vector spaces is continuous, and since pre-Hilbert spaces are special topological vector spaces, the continuity of (ρ ⊗ ρ ) ξ follows from that of all (ρ ⊗ ρ ) v⊗v . What's left is proving the continuity of functions of the form (ρ ⊗ ρ ) v⊗v . For notational simplicity, write f = ρ v : G → V and f : ρ v , which are both continuous since ρ and ρ are linear representations. We want to show that also f ⊗ f : G → V ⊗ V is continuous. We can test continuity in each point g 0 ∈ G separately by Definition F.6. For each g ∈ G we then obtain, with Re being the real part of a complex number: (f ⊗ f )(g) -(f ⊗ f )(g 0 ) 2 = [f (g) ⊗ f (g) -f (g) ⊗ f (g 0 )] + [f (g) ⊗ f (g 0 ) -f (g 0 ) ⊗ f (g 0 )] 2 = f (g) ⊗ [f (g) -f (g 0 )] + [f (g) -f (g 0 )] ⊗ f (g 0 ) 2 = f (g) ⊗ [f (g) -f (g 0 )] 2 + [f (g) -f (g 0 )] ⊗ f (g 0 ) 2 + 2 Re f (g) ⊗ [f (g) -f (g 0 )] [f (g) -f (g 0 )] ⊗ f (g 0 ) = f (g) 2 • f (g) -f (g 0 ) 2 + f (g) -f (g 0 ) 2 • f (g 0 ) 2 + 2 Re f (g)|f (g) -f (g 0 ) • f (g) -f (g 0 )|f (g 0 ) . All in all we see the following: If g is sufficiently close to g 0 , then due to the continuity of f , f , the scalar product, multiplication in K and the real part, (f ⊗ f )(g) -(f ⊗ f )(g 0 ) 2 gets arbitrarily close to 0. This shows the continuity of f ⊗ f and we are done. Lemma D.5. Let ρ : G → U(V ) and ρ : G → U(V ) be unitary representations on pre-Hilbert spaces. Then also ρ ⊗ ρ : G → U(V ⊗ V ) is a well-defined unitary representation. Proof. According to Lemma D.4 we only need to check whether all ρ(g) ⊗ ρ (g) are unitary transformations. This follows immediately from the unitarity of ρ(g) and ρ (g).

D.1.2 THE CLEBSCH-GORDAN COEFFICIENTS AND THE ORIGINAL WIGNER-ECKART THEOREM

In this section, we describe the Clebsch-Gordan coefficients and the original Wigner-Eckart theorem. Except for the proof, we roughly follow Jeevanjee (2011). For the proof, we follow the more general treatment in Agrawala (1980) . 21For our aims, let ρ j : G → U(V j ) and ρ l : G → U(V l ) be representatives of isomorphism classes of irreducible unitary representations. 22 Then consider their tensor product representation ρ j ⊗ ρ l : G → U(V j ⊗ V l ) which is again a unitary representation according to Lemma D.5. If V j and V l are of dimension d j and d l , respectively, then V j ⊗ V l is of dimension d j • d l . Since it is a finite-dimensional unitary representation, it is itself an orthogonal direct sum of finitely many irreducible unitary representations by Proposition B.38: V j ⊗ V l ∼ = J∈ G [J(jl)] s=1 V J . Here G is, as before, the set of isomorphism classes of irreducible unitary representations and [J(jl)] is the number of times that ρ J : G → U(V J ) appears in the direct sum decomposition of V j ⊗ V l . Note that for most J we have [J(jl)] = 0, and for some J we may have [J(jl)] > 1, see Section E.2, where it turns out that ρ 0 is contained twice in ρ m ⊗ ρ m . Now, choose -once and for all -orthonormal bases of all involved irreps, which exists according to Proposition F.41: Y m j | m = 1, . . . , d j ⊆ V j , Y n l | n = 1, . . . , d l ⊆ V l , Y M J | M = 1, . . . , d J ⊆ V J . This notation is supposed to remind about spherical harmonics since they form a basis for irreducible representations of the group SO(3). But as mentioned in the footnote, we do not consider these basis elements to be functions here. Furthermore, let l s : V J → V j ⊗ V l be the linear, equivariant and isometric (i.e., scalar product preserving) embeddings that correspond to the direct sum decomposition of V j ⊗V l into irreps, where s ranges in {1, . . . , [J(jl)]}. With this in mind, we can define the Clebsch-Gordan coefficients: Definition D.6 (Clebsch-Gordan Coefficients). The Clebsch-Gordan Coefficients are given by s, JM |jm; ln := l s (Y M J ) Y m j ⊗ Y n l . Note that in the literature, people usually only consider Clebsch-Gordan coefficients of the specific groups SO(3), SU(2), SU(3) or similar groups appearing in physics. Also note that in the physics context, there is only one linear, equivariant, isometric embedding l s , which follows directly from Schur's Lemma D.8. Therefore, it is sensible that the embedding is usually not part of the notation of these coefficients. In our case, however, when considering real representations, there can be several such embeddings l s . This happens if the endomorphism space of V J is nontrivial. An example is given by the two-dimensional irreducible representations of SO(2) over the real numbers which we discuss in Section E.2. Since, however, we do not want to depart too much from the notation usually considered in physics, we also omit the embedding from the notation. The index s however needs to be present in order to index the possibly different appearances of V J in V j ⊗ V l . With this preparation, we can explain the Wigner-Eckart theorem the way it is usually considered in physics, as a prelude for the generalization that we consider in the next section. In this (and only this!) section, we assume that our field is C, since this is the case considered in physics. The Wigner-Eckart theorem aims to obtain a description for all possible representation operators K : V j → Hom C (V l , V J ). This is, for example, useful for describing state transitions in the electrons of hydrogen atoms. To motivate the generalization in the next section, we shortly explain the derivation: we can consider the equivalent function K : V j ⊗ V l → V J given by K(v j ⊗ v l ) := [K(v j )] (v l ) on the tensor product. As one can compute, and as we will see in more generality in Proposition D.23, K : V j ⊗ V l → V J is an intertwiner, where on the left we consider the tensor product representation. We assume, as is the case for G = SO(3) or G = SU(2) for usual applications in physics, that V J is exactly once a direct summand of V j ⊗ V l . Then, since by Schur's Lemma B.29 there cannot be nontrivial equivariant linear maps between nonisomorphic irreps, K restricted to each direct summand of V j ⊗ V l vanishes, except the one isomorphic to V J . More precisely, assume that V j ⊗ V l ∼ = V J ⊕ l V l is a decomposition of V j ⊗ V l into copies of irreducible representations, where each V l is nonisomorphic to V J . Then the information contained in K is equal to the information contained in the restriction K| V J : V J → V J . Since it is an intertwiner from a representation to itself, it deserves a special name. We state the following definition for arbitrary K ∈ {R, C}, since it will be of crucial importance in our generalization of the Wigner-Eckart theorem: Definition D.7 (Endomorphism). Let ρ : G → GL(V ) be a linear representation. An intertwiner from V to V is called endomorphism. The vector space of endomorphisms is written as End G,K (V ) := Hom G,K (V, V ). A version of Schur's lemma gives a simple description for endomorphisms of irreducible representations in the case that the underlying field is the complex numbers C. It makes use of the property of the complex numbers to be algebraically closed: Lemma D.8 (Schur's Lemma). Let ρ : G → GL(V ) be an irreducible representation. If the underlying field is the complex numbers C, then the set of endomorphisms, i.e., intertwiners from V to V , only consists of the complex multiples of the identity: End G,C (V ) = {c • id V | c ∈ C} ∼ = C.

Proof. See Jeevanjee (2011).

This means that K| V J = c • id V J for some complex number c ∈ C. Now if we let p : V j ⊗ V l → V J be the projection corresponding to the direct sum decomposition of V j ⊗ V l , then we obtain K = K| V J • p = (c • id V J ) • p = c • p. That is, we have just found out that one complex number, c, is able to completely characterize K and consequently K! This is basically already the Wigner-Eckart theorem. However, it is useful to find a formulation that describes K with respect to bases of the different irreducible representations. For this, we define matrix elements of representation operators. Before we come to the definition, we introduce some notation: If f : V → V is a linear continuous map between Hilbert spaces, we set y|f |x := y|f (x) for each x ∈ V and y ∈ V . The symmetry in this notation is supposed to remind about the fact that f has an adjoint, see Definition F.42, and thus can be applied to y just as well as to x, but we will not make use of this fact. Definition D.9 (Matrix Element). Let T , V l and V J be unitary representations with orthonormal bases {Y m j } ⊆ T (with j possibly also varying), {Y n l } ⊆ V l and {Y M J } ⊆ V J , respectively. Let K : T → Hom K (V l , V J ) be a representation operator. Then it's matrix elements are given by the scalars JM K m j ln := Y M J K(Y m j ) Y n l . In the same way, if f : V l → V J is any linear (not necessarily equivariant) map, then its matrix elements are given by the scalars JM |f |ln := Y M J f Y n l . Remark D.10. We shortly explain this term. Usually, in linear algebra, one has to do with linear functions f : V → V between vector spaces carrying bases {v j } ⊆ V and {v i } ⊆ V . For each basis element v j ∈ V one can then find coefficients A ij ∈ K such that f (v j ) = i A ij v i . The A ij are called the matrix elements of f and characterize f completely. Now if the bases are orthonormal bases as in Definition F.40, then the coefficients are given by A ij = v i |f (v j ) = v i |f |v j . In a similar way we can understand the matrix elements of a representation operator, only that the linear function itself depends on a chosen basis vector of V j . As for linear functions, the matrix elements of a representation operator completely characterize it. One last remark: since in this section, V J appears only once as a direct summand in V j ⊗ V l , we omit the additional "quantum number" s in the notation for the Clebsch-Gordan coefficients. With this preparation, we can formulate and prove the original version of the Wigner-Eckart theorem. Remember that there is a unique complex number c such that K is given by K = c • p for a projection p : V j ⊗ V l → V J . We now denote this by J K l := c. Theorem D.11 (Wigner-Eckart Theorem). The matrix elements of the representation operator K : V j → Hom C (V l , V J ) are given by JM K m j ln = J K l • JM jm; ln , with the JM jm; ln being the Clebsch-Gordan coefficients (which are independent from the representation operator K). Proof. Let i : V J → V j ⊗ V l be the embedding corresponding to the direct sum decomposition of V j ⊗ V l . It is an adjoint of the projection p : V j ⊗ V l → V J according to the proof of Proposition F. 46 . By what we've argued above, there exists some c ∈ C such that: JM K m j ln = Y M J K(Y m j ) Y n l = Y M J K(Y m j ⊗ Y n l ) = Y M J c • p(Y m j ⊗ Y n l ) = c • Y M J p(Y m j ⊗ Y n l ) = c • i(Y M J ) Y m j ⊗ Y n l = J K l • JM jm; ln . As a short explanation: in the fifth step it was used that i and p are adjoint to each other, and consequently, we move from considering the tensor product in V J to that one in V j ⊗ V l . In the last step, the definition of the Clebsch-Gordan coefficients was used, and additionally, the notation J K l := c that we mentioned before the theorem. The index s is everywhere missing since V J appears only once in V j ⊗ V l . This finishes the proof. Definition D.12 (Reduced Matrix Element). The unique number c = J K l ∈ C in this theorem is called the reduced matrix element. To reiterate, it characterizes the representation operator completely.

D.1.3 REDUCTION TO IRREDUCIBLE UNITARY REPRESENTATIONS

Let G be any compact group and X any homogeneous space of G. Before we state the Wigner-Eckart Theorem for steerable kernels in the next section, we first want to explain why we can restrict to the case of irreducible unitary input-and output representations. Our explanations are adapted from Weiler & Cesa (2019) . Thus, let ρ in : G → GL(V in ) and ρ out : G → GL(V out ) be general finite-dimensional input-and output representations. We consider the task of finding a basis for the space of steerable kernels Hom G (X, Hom K (V in , V out )). By Theorem B.20 and Proposition B.38, there are equivalences of representations (i.e., linear isomorphisms that intertwine between the representations) Q in : V in → µ∈Iin V µ , Q out : V out → ν∈Iout V ν , where ρ µ : G → U(V µ ) and ρ ν : G → U(V ν ) are irreducible unitary representations. Both for the input-and the output representation, the same irrep can appear several times, e.g., there can be µ = µ such that ρ µ ∼ = ρ µ . Now, notice that the map Φ Qout,Qin : Hom G X, Hom K µ∈Iin V µ , ν∈Iout V ν → Hom G (X, Hom K (V in , V out )) given for all x ∈ X by Φ Qout,Qin (K) (x) := Q -1 out • K(x) • Q in is clearly an isomorphism. Thus, once a basis for the first kernel space is known, we just need to postcompose and precompose each basis kernel with Q -1 out and Q in , respectively, in order to get a basis for the space we actually care about. Furthermore, the map Ψ : ν∈Iout µ∈Iin Hom G (X, Hom K (V µ , V ν )) → Hom G X, Hom K µ∈Iin V µ , ν∈Iout V ν given by [Ψ((K νµ ) ν,µ )(x)] ((v µ ) µ ) := µ∈Iin K νµ (x)(v µ ) ν ∈ ν∈Iout V ν , where x ∈ X and (v µ ) µ ∈ µ∈Iin V µ are arbitrary, is also clearly an isomorphism. It expresses that we can take a collection of steerable kernels (K νµ ) ν,µ and build with it a block-matrix, which is steerable again, as can easily be checked. Accordingly, if we have basis kernels for a space Hom G (X, Hom K (V µ , V ν )) for some µ, ν, then we can, by applying Ψ, map it to block basis kernels which are zero outside the block with indices ν and µ. Overall, by doing this for all µ, ν, we thus recover a full basis for the space Hom G X, Hom K µ∈Iin V µ , ν∈Iout V ν . By applying the base change Φ Qout,Qin from above, we thus get a basis for Hom G (X, Hom K (V in , V out )). In summary, knowing a basis of steerable kernels for irreducible unitary input-and output representations gives us one for all finite-dimensional input-and output representations. Finally, note that the transformation of basis kernels using Φ Qout,Qin and Ψ can be done in the network initialization process and does not need to be performed in each forward pass.

D.1.4 THE WIGNER-ECKART THEOREM FOR STEERABLE KERNELS

Now that we have seen the Wigner-Eckart theorem in a version similar to how it usually appears in physics, it is time to state the version which we will need in this work for applications in deep learning. The treatment is similar to the formulation in Agrawala (1980) , which presents a generalization of the Wigner-Eckart theorem to the case that V J may appear several times as a direct summand in the direct sum decomposition of the tensor product. However, this paper still only considers the Wigner-Eckart theorem for the case of the complex numbers C. If we allow the real numbers as well, we cannot be sure that endomorphisms of irreducible representations are just given by one number. This is a complication we will deal with below by allowing matrix elements of general endomorphisms. Furthermore, we will deal with topological considerations that did not play a role in Agrawala (1980) . And lastly, we transport the theorem over into the nonlinear realm of steerable kernels. As discussed in the last section, we can restrict the considerations to (representatives of isomorphism classes of) irreducible unitary input-and output representations. Thus, assume the inputrepresentation to be the irrep ρ l : G → U(V l ) and the output-representation to be the irrep ρ J : G → U(V J ). The idea is now that kernel operators K : L 2 K (X) → Hom K (V l , V J ) can be described on each direct summand of the domain individually, and that on each of these summands, arguments similar to those for the original Wigner-Eckart theorem apply. According to the Peter-Weyl Theorem B.22 the space L 2 K (X) has a dense subset which is a direct sum of irreducible unitary representations: L 2 K (X) = j∈ G mj i=1 V ji . Each V ji is, as a subrepresentation of L 2 K (X), isomorphic to V j . V j is itself not assumed to be embedded in L 2 K (X). For arbitrary j ∈ G, fix once and for all orthonormal bases {Y m ji } ⊆ V ji corresponding to the basis {Y m j } of V j . 23 Furthermore, assume that for all s = 1, . . . , [J(jl)], p jis : V ji ⊗ V l → V J is a projection which is an adjoint of the linear equivariant isometric embedding l jis : V J → V ji ⊗ V l . This is assumed to be aligned with the embeddings V J → V j ⊗ V l with respect to the isomorphisms V j ∼ = V ji that underlie the correspondence of basis elements Y m j ∼ Y m ji . What this means is that the Clebsch-Gordan coefficients with respect to all of these embeddings, for all i, are equal: l jis (Y M J ) Y m ji ⊗ Y n l = s, JM |jm; ln . Now we state and prove the Wigner-Eckart theorem, which gives an explicit description of representation operators K : L 2 K (X) → Hom K (V l , V J ) in terms of endomorphisms of V J and then transfers this statement over to a statement about steerable kernels K : X → Hom K (V l , V J ). Before we state the theorem, we want to shortly explain what to expect: in the derivation of the original Wigner-Eckart theorem in Section D.1.2, we saw that a kernel operator could be expressed as K : V j ⊗ V l → V J . This was in turn equal to K = c • p for an endomorphism c : V J → V J and the projection p corresponding to the appearance of V J in the direct sum decomposition of V j ⊗ V l . This time, however, V J can be found often in L 2 K (X) ⊗ V l , namely: 1. For each isomorphism class of irreps j ∈ G, 2. For each appearance i = 1, . . . , m j of the irrep V j in L 2 K (X) and 3. For each appearance s = 1, . . . , [J(jl)] of the irrep V J in the tensor product representation V j ⊗ V l . [J(jl)] can be zero, which means that j does not contribute. We therefore expect K to be a whole sum of compositions of endomorphisms with projections, for each combination of valid j, i and s. Furthermore, the specific structure of L 2 K (X) will be exploited as well by using orthogonal projections from L 2 K (X) to summands V ji . Overall, we hope this sufficiently motivates the theorem: Theorem D.13 (Wigner-Eckart Theorem for Steerable Kernels). We state the theorem in three parts: 1. (Basis-independent Wigner-Eckart for Kernel Operators) There is an isomorphism of vector spaces Rep : j∈ G mj i=1 [J(jl)] s=1 End G,K (V J ) → Hom G,K (L 2 K (X), Hom K (V l , V J )) which is given by [Rep((c jis ) jis )(ϕ)] (v l ) := j∈ G mj i=1 [J(jl)] s=1 dj m=1 Y m ji ϕ • c jis p jis (Y m ji ⊗ v l ) where (c jis ) jis is a tuple of endomorphisms, ϕ : X → K is any square-integrable function and v l ∈ V l is any element.

2.. (Basis-independent Wigner-Eckart for Steerable Kernels) There is an isomorphism of vector spaces

GKer : j∈ G mj i=1 [J(jl)] s=1 End G,K (V J ) → Hom G (X, Hom K (V l , V J )) which is given by [GKer((c jis ) jis )(x)] (v l ) := j∈ G mj i=1 [J(jl)] s=1 dj m=1 i, jm|x • c jis p jis (Y m ji ⊗ v l ) where (c jis ) jis is a tuple of endomorphisms, x ∈ X is any point and v l ∈ V l is any element. Here, i, jm|x := lim U ∈Ux Y m ji δ U , which is according to Proposition C.10 equal to Y m ji (x). 3. (Basis-dependent Wigner-Eckart for Steerable Kernels) Let K = GKer((c jis ) jis ) be the steerable kernel corresponding to the tuple of endomorphisms (c jis ) jis according to the isomorphism above. Then the matrix elements of K(x) ∈ Hom K (V l , V J ) are explicitly given by JM |K(x)|ln = j∈ G mj i=1 [J(jl)] s=1 dj m=1 d J M =1 JM c jis JM • s, JM jm; ln • i, jm x . Remark D.14. Before we come to the proof, we have some remarks to make about this theorem: 1. In line with the usual convention, we call the JM c jis JM the generalized reduced matrix elements of the representation operator K. Different from the situation in physics, these can depend nontrivially on the specific basis indices M and M . If the space of endomorphisms is 1-dimensional, as is the case when considering representations over C, then each c jis is a diagonal matrix, meaning that it is characterized by only one complex number, for simplicity with the same name c jis . Then one has JM |c jis |JM = δ M M • c jis and the sum over M disappears. What this means for the matrix form of basis kernels of steerable CNNs will be discussed in Corollary D.17. 2. The coefficients s, JM jm; ln are as before the Clebsch-Gordan coefficients. Note that the input x of K appears only in i, jm x . Those two parts of the right-hand side of the formula are always the same, independent of the kernel K. 3. The Clebsch-Gordan coefficients are traditionally defined with respect to isometric embeddings l jis : V J → V j ⊗V l since this makes them less ambiguous. However, we mention that the property of being isometric is no requirement for the construction of Clebsch-Gordan coefficients or the proof of the Wigner-Eckart theorem, being equivariant and linear is sufficient. This then means that the copies l s (Y M J ) do not anymore form an orthonormal basis. We will use this relaxation in the example in Section E.2, where we do not want to be bothered with obtaining isometric embeddings. 4. The names for the isomorphisms in the theorem are meant as follows: Rep is the map that maps a tuple of endomorphisms to a kernel operator, which is a special representation operator. GKer maps a tuple of endomorphisms to a G-steerable kernel. It is not meant as a notation for a kernel in the sense of a nullspace in linear algebra. 5. Furthermore, a reader with a background in abstract algebra may wonder why we build the direct sum of spaces of endomorphisms instead of the direct product. The reason is that a posteriori, it turns out that only finitely many j contribute nontrivially, and so the direct sum is equal to the direct product. For a proof of the finiteness, see Remark D.18 below. 6. As a last remark, we want to mention that part 1 of the theorem is not the most general version we could do. We chose to formulate the Wigner-Eckart theorem for L 2 K (X) specifically since this is the space we use it for. However, an appropriate isomorphism can probably be formulated for any unitary representation instead of L 2 K (X), only that we then need to take care that we replace direct sums by direct products if the index sets on the left side are infinite. Additionally, V l and V J could be replaced by arbitrary finite-dimensional representations, and an appropriate adaptation of the theorem would apply. Whether V l and V J could also be replaced by infinite-dimensional unitary representations would need to be explored, but an extension to such a case seems possible. Proof of Theorem D.13. The proof of 1 will be done in Section D.2 since it requires some work. However, the proofs of 2 and 3 are relatively straightforward once we believe 1 and so we do them here: From 1 we know that Rep is an isomorphism. Furthermore, from Theorem C.7 we know that (•)| X : Hom G,K (L 2 K (X), Hom K (V l , V J )) → Hom G,K (X, Hom K (V l , V J ) ) is an isomorphism as well, and this is given by K| X (x) := lim U ∈Ux K(δ U ), where we take the limit over the directed set of open neighborhoods of x. We define the isomorphism GKer now simply as the composition, i.e., GKer := (•)| X • Rep. This isomorphism is then explicitly given by: [GKer((c jis ) jis )(x)] (v l ) = [Rep((c jis ) jis )| X (x)] (v l ) = lim U ∈Ux [Rep((c jis ) jis )(δ U )] (v l ) = lim U ∈Ux j∈ G mj i=1 [J(jl)] s=1 dj m=1 Y m ji δ U • c jis p jis (Y m ji ⊗ v l ) = j∈ G mj i=1 [J(jl)] s=1 dj m=1 lim U ∈Ux Y m ji δ U • c jis p jis (Y m ji ⊗ v l ) = j∈ G mj i=1 [J(jl)] s=1 dj m=1 i, jm|x • c jis p jis (Y m ji ⊗ v l ) . This already proves 2. Now, in the following computation, we will use that c jis • p jis = c jis • id V J •p jis and that, inspired by notation in physics, we can write the identity on V J as id V J = d J M =1 Y M J • Y M J . For 3, we then compute JM |K(x)|ln = Y M J K(x) Y n l = Y M J [GKer((c jis ) jis )(x)] (Y n l ) = j∈ G mj i=1 [J(jl)] s=1 dj m=1 i, jm|x • Y M J c jis • p jis Y m ji ⊗ Y n l = j∈ G mj i=1 [J(jl)] s=1 dj m=1 d J M =1 i, jm x • Y M J c jis Y M J • Y M J p jis Y m ji ⊗ Y n l = j∈ G mj i=1 [J(jl)] s=1 dj m=1 d J M =1 JM c jis JM • s, JM jm; ln • i, jm x . In the last step, we used the Clebsch-Gordan coefficients, see Definition D.6 and, as mentioned before, that p jis is adjoint to the embedding l jis : V J → V ji ⊗ V l . Remark D.15. Here, we want to argue that our kernel space solution also covers that of general equivariant CNNs on homogeneous spaces (Cohen et al., 2019b) . One definition of the kernel space in that setting is Hom Gin×Gout (H, Hom K (V in , V out )) = K : H → Hom K (V in , V out ) | K(g out hg in ) = ρ out (g out ) • K(h) • ρ in (g in ) , where H is a loccally compact group and G in , G out ⊆ H are subgroups with input-and output representations ρ in : G in → GL(V in ) and ρ out : G out → GL(V out ). For compact groups G in and G out , this is covered by our setting as follows: we define G := G out × G in and g := (g out , g in ). We can define the left action of G on H by g • h := g out hg -1 in . Furthermore, we can reformulate the representations of G in and G out to representations of the group G by setting ρ in : G → GL(V in ) with ρ in (g) := ρ in (g in ), and similarly for ρ out . We furthermore notice that in Eq. ( 19) we could also have inverted g in since that constraint needs to apply to all elements of G in . Thus, we then see that the kernel space can be equivalently defined by Hom Gin×Gout (H, Hom K (V in , V out )) = K : H → Hom K (V in , V out ) | K(g • h) = ρ out (g) • K(h) • ρ in (g) -1 , ( ) which precisely is the kernel constraint of steerable CNNs in Eq. ( 2). Thus, if we restrict to a homogeneous space of the action of G on H, we recover steerable kernels as in Definition 3.2 and can apply Theorem 4.1.

D.1.5 GENERAL STEERABLE KERNEL BASES

Now that we have a Wigner-Eckart theorem for steerable kernels, which gives a one-to-one correspondence between steerable kernels and tuples of endomorphisms, we can finally describe what a basis of the space of steerable kernels looks like. For this, additionally to the notation in the last section, we assume that {c r | r = 1, . . . , E J } is a basis of End G,K (V J ). Theorem D.16 (Steerable Kernel Bases). A basis of the space of steerable kernels Hom G (X, Hom K (V l , V J )) is given by {K jisr : X → Hom K (V l , V J ) | j ∈ G, i ∈ {1, . . . , m j }, s ∈ {1, . . . , [J(jl)]}, r = 1, . . . , E J }, where the basis kernels K jisr have matrix elements JM |K jisr (x)|ln = dj m=1 d J M =1 JM c r JM • s, JM jm; ln • i, jm x . Now, for each M ∈ {1, . . . , d J }, let CG M J(jl)s be the d j × d l -matrix of Clebsch-Gordan coefficients s, JM |jm; ln , with only m and n varying. Furthermore, let i, j|x be the row vector with entries i, jm|x for m = 1, . . . , d j . In matrix-notation with respect to the bases {Y M J } ⊆ V J and {Y n l } ⊆ V l , we can then express the basis kernel K jisr (x) : V l → V J as follows: K jisr (x) = c r •    i, j|x • CG 1 J(jl)s . . . i, j|x • CG d J J(jl)s    . In this formula, all "dots" mean conventional matrix multiplication and c r is by abuse of notation the matrix of the endomorphism c r . Proof. For the first statement, note that a basis for j∈ G mj i=1 [J(jl)] s=1 End G,K (V J ) is given by all the tuples t jisr := (0, . . . , c r , . . . , 0) that have c r at position jis, for all combinations of j, i, s and r. Thus, from the isomorphism GKer in the second part of Theorem D.13 we obtain that all K jisr := GKer(t jisr ) together form a basis for the space of steerable kernels Hom G (X, Hom K (V l , V J )). When applying the basis-dependent form in part 3 of that theorem to K jisr , the first three sums in Eq. ( 18) just disappear since t jisr is zero almost everywhere. Furthermore, c jis is replaced by the basis endomorphism c r . We obtain the claimed result. For the final statement on the matrix representation, note that JM |K jisr (x)|ln = dj m=1 d J M =1 JM c r JM • s, JM jm; ln • i, jm x = d J M =1 JM c r JM dj m=1 i, jm x • s, JM jm; ln = c M r • dj m=1 i, jm x • s, JM |jm; ln d J M =1 = c M r • i, j x • CG M -n J(jl)s d J M =1 . Here, c M r is the M 'th row of the matrix c r . The result follows by dropping the indices M and n. The next corollary means that endomorphisms can be ignored if the space of endomorphisms is 1-dimensional, which is in particular the case if K = C. Corollary D.17. Assume that dim(End G,K (V J )) = 1. Then a basis of steerable kernels K : X → Hom K (V l , V J ) is given by all K jis with matrices K jis (x) =    i, j|x • CG 1 J(jl)s . . . i, j|x • CG d J J(jl)s    . In particular, this is the case if K = C. Proof. In this case, a basis for the space of endomorphisms is given by the single endomorphism c = id V J . Postcomposition with the identity does not change the matrix, and so the result follows. For K = C we have dim(End G,C (V J )) = 1 by Schur's Lemma D.8, and thus the result follows. We end with two remarks regarding the parameterization of steerable CNNs. The first remark considers the case of steerable CNNs of the form K : X → Hom K (V l , V J ) on a homogeneous space X. The second remark connects this back to the case that X is an orbit embedded in R d . Remark D.18 (Parameterization in the abstract). First of all, we want to understand that there are only finitely many basis kernels K jisr . To this end, note that the index sets for i, s, and r are necessarily finite for all j, and thus we need to understand the finite range of j. A priori, j can run over the whole set G, which can be infinite. But, as we argue now, for only finitely many j ∈ G we can have V J in a direct sum decomposition of V j ⊗ V l , which rescues the finiteness: Namely, V J is in the direct sum decomposition of V j ⊗ V l if and only if the vector space Hom G,K (V j ⊗ V l , V J ) is nonzero by Schur's Lemma B.29. By the hom-tensor adjunction that we will show in Proposition D.23 in more generality, this is the case if an only if Hom G,K (V j , Hom K (V l , V J )) is nonzero. And finally, this is the case if and only if V j is in a direct sum decomposition of the representation Hom K (V l , V J ), again by Schur's lemma. Now, since Hom K (V l , V J ) is finite-dimensional, this can only be the case for finitely many j, and so we are done. 24Overall, this means the following: To parameterize an equivariant neural network, one needs arbitrary parameters w jisr ∈ K for all combinations of j ∈ G, i ∈ {1, . . . , m j }, s ∈ {1, . . . , [J(jl)]} and r = 1, . . . , E J . A general steerable Kernel K : X → Hom K (V l , V J ) then takes the form K = j∈ G mj i=1 [J(jl)] s=1 E J r=1 w jisr K jisr , with the basis kernels K jisr as in Theorem D.16. Remark D.19 (Parameterization in practice). Remember that our original motivation for the use of homogeneous spaces in Section C.1.1 was that R d splits as a disjoint union of homogeneous spaces, on which the kernel constraint acts separately. For simplicity, we assume that the compact group acting on R d is either G = SO(d) or G = O(d), but the general ideas hold also for the finite transformation groups in R d -the only difference is that in these finite cases, the set of representatives of orbits becomes larger. Thus, R d splits into orbits R d = r≥0 S n-1 (r), where S n-1 (r) is the sphere of radius r (with S(0) = {0} being a single point). We'll discuss the orbit X 0 = {0}, the origin, separately below. But note that all other orbits are necessarily homeomorphic to each other and thus can be treated on equal footing. Therefore, let S n-1 be the standard sphere with radius 1 and K jisr : S n-1 → Hom K (V l , V J ) be basis kernels for this choice. Then for a general steerable kernel K : R d → Hom K (V l , V J ) there are arbitrary functions w jisr : R >0 → K such that, for all x ∈ R d \ {0}, we have: K(x) = j∈ G mj i=1 [J(jl)] s=1 E J r=1 w jisr ( x ) • K jisr x x . For x = 0, we might use our heavy theory to solve the kernel constraint, but it is more illuminating to do it from scratch since this case is so simple: we have K(0) : V l → V J , and the kernel constraint takes the form K(0) = K(g • 0) = ρ J (g) • K(0) • ρ l (g) -1 for all g ∈ G, which is equivalent to K(0)•ρ l (g) = ρ J (g)•K(0) for all g ∈ G. This just means that K(0) : V l → V J is an intertwiner, and by Schur's Lemma B.29 it is either 0 if l = J or an arbitrary endomorphism V J → V J if l = J. Thus, assuming l = J and choosing basis-endomorphisms c r : V J → V J , there are coefficients w r ∈ K such that K(0) = E J r=1 w r • c r . The reader may find it interesting to check that this solution is precisely what is also predicted by our theory using that L 2 K ({0}) ∼ = K is just isomorphic to the trivial representation of G. All in all, we now know what the most general steerable kernels look like. In practice, one needs to choose the functions w jisr : R >0 → K. For representations over the real numbers, i.e., with K = R, one choice is to only consider finitely many radii and Gaussian radial profiles around them. Then instead of learning the whole function w jisr , one learns finitely many real parameters that choose "how activated" a basis kernel K jisr is for a certain radius. This is, for example, the route taken in Weiler et al. (2018b; a) ; Weiler & Cesa (2019) . If one deals with complex representations, one usually goes the same route, only that the parameters that choose how "activated" the basis kernels are will then be complex numbers. One can either parameterize them as a + ib with a real part a and a complex part b. This intuitively means that a activates the standard version of the kernel K jisr , whereas b activates the kernel iK jisr , which can be imagined as a version of the kernel turned by 90 • . One other possibility is to parameterize a complex number as α • e iβ with a scaling factor α > 0 and a phase shift β. This is the route chosen in Worrall et al. (2016) . In Chapter E we will look at examples of determining the basis kernels K jisr , which will hopefully further illuminate the theorem. In the next section, we go back to the theory and prove the remaining parts of the Wigner-Eckart theorem.

D.2 PROOF OF THE WIGNER-ECKART THEOREM FOR KERNEL OPERATORS

In this section, we prove the first part of Theorem D.13, the Wigner-Eckart theorem for Kernel Operators, since we have skipped this in the last section. It is not necessary to read this section and the reader may wish to directly go to the chapter on examples E. We will make frequent use of topological concepts from Chapter F.1 in this section. The strategy is the following: in Section D.2.1, we show that Hom G,K (L 2 K (X), Hom K (V l , V J )) ∼ = Hom G,K j∈ G mj i=1 V ji , Hom K (V l , V J ) , which basically means that we can ignore the "topological closure" of the direct sum which is dense in L 2 K (X). This works, intuitively, since kernel operators are continuous, and so they are determined by what they do on a dense subset. Then, in section D.2.2, we show that Hom G,K j∈ G mj i=1 V ji , Hom K (V l , V J ) ∼ = Hom G,K j∈ G mj i=1 V ji ⊗ V l , V J , which is the main step that we need in order to be able to make use of the Clebsch-Gordan coefficients, namely when we decompose the tensor product. Finally, in Section D.2.3, we finish the proof of Theorem D.13.

D.2.1 REDUCTION TO

A DENSE SUBSPACE OF L 2 K (X) In this section, we reduce the statement to representation operators on j∈ G mj i=1 V ji . For simplicity, we write the double direct sum from now on as ji . Furthermore, remember that V l and V J are finite-dimensional, and thus Hom K (V l , V J ) can be identified with matrices in K d J ×d l . This space is a Euclidean space and thus has a scalar product and consequently also a norm, see Chapter F.1. Consequently, each kernel operator is a continuous map between normed vector spaces, which we'll use in the following. A short terminological note: kernel operators are just representation operators on L 2 K (X) and only have their name due to the relation to steerable kernels. Thus, the terminological difference to representation operators in the following reduction result has no further meaning: Lemma D.20. The restriction map Hom G,K (L 2 K (X), Hom K (V l , V J )) → Hom G,K ji V ji , Hom K (V l , V J ) given by K → K| ji Vji , between kernel operators on the left and representation operators on the right is an isomorphism. Proof. First of all, the kernel operators on the left are actually uniformly continuous by Proposition F.18. Thus, by Lemma F.22, the restriction map is an injection into uniformly continuous representation operators on ji V ji . The set of all these maps is equal to the set of all representation operators by Proposition F.18 again. Thus, in order to be finished, we only need to see that the unique extension of a representation operator K : ji V ji → Hom K (V l , V J ) to a continuous function K : L 2 K (X) → Hom K (V l , V J ) is a kernel operator, which means it is linear and equivariant. For linearity, let a ∈ K and f ∈ L 2 K (X). Let (f k ) k be a sequence in ji V ji that converges to f . Using the continuity of K and the linearity of K we obtain: K(a • f ) = K lim k→∞ (a • f k ) = lim k→∞ K(a • f k ) = lim k→∞ K(a • f k ) = lim k→∞ a • K(f k ) = a • lim k→∞ K(f k ) = a • K lim k→∞ f k = a • K(f ). Linearity with respect to addition can be shown similarly. For the equivariance we can argue in the same way, only that we additionally need to use the continuity of the representations λ : G → U(L 2 K (X)) and ρ Hom : G → GL(Hom K (V l , V J )).

D.2.2 THE HOM-TENSOR ADJUNCTION

Lemma D.21. Let K : li V li → V be linear and equivariant, where V is an irrep. Then K is continuous. Proof. By Schur's Lemma D.8,foot_25 we know that K factors through the irreducible representations that are isomorphic to V . That is, let V j be that irrep and p ji : li V li → V ji be the canonical projections. Then there are intertwiners c i : V ji → V such that K = i c i • p ji . Each c i is continuous since it is a linear function between finite-dimensional normed vector spaces. Since also summation on normed vector spaces is continuous, we only need to show that the projections p ji are continuous.

D.2.3 PROOF OF THEOREM D.13

After the work done in the prior sections, we are ready to complete the proof of Theorem D.13! Proof of Theorem D.13. Only the first part of that theorem still needs to be proven. We have the following string of isomorphisms, which we will explain below: Hom G,K (L 2 K (X), Hom K (V l , V J )) ∼ = Hom G,K ji V ji , Hom K (V l , V J ) ∼ = Hom G,K ji V ji ⊗ V l , V J ∼ = Hom G,K ji (V ji ⊗ V l ), V J ∼ = ji Hom G,K (V ji ⊗ V l , V J ) ∼ = ji [J(jl)] s=1 Hom G,K (V J , V J ) = j∈ G mj i=1 [J(jl)] s=1 End G,K (V J ). The steps are justified as follows: 1. For the first step, use Lemma D.20. 2. For the second step, use Proposition D.23. 3. For the third step, use that there is a natural isomorphism ji V ji ⊗V l ∼ = ji (V ji ⊗V l ). 4. For the fourth step, use that linear equivariant maps can be described on each direct summand individually (and that we do not need to worry about continuity due to Lemma D.21). 5. For the fifth step, precompose with the linear equivariant isometric embeddings l jis : V J → V ji ⊗ V l and use, again, that linear equivariant maps can be described on each direct summand individually. Furthermore, use Schur's Lemma B.29 in order to see that the other summands disappear. 6. The last step is just a reformulation. Now, we call the string of isomorphisms from right to left Rep : j∈ G mj i=1 [J(jl)] s=1 End G,K (V J ) → Hom G,K (L 2 K (X), Hom K (V l , V J )) and are only left with understanding that it is actually given by Eq. ( 17). For this, we take a tuple (c jis ) jis of endomorphisms and explicitly trace back "where it comes from". As in Lemma D.21, let p ji : j i V j i → V ji be the canonical projection, which is by Proposition F.46 explicitly given by p ji (ϕ) = dj m=1 Y m ji ϕ Y m ji . Furthermore, let p jis : V ji ⊗ V l → V J be the projections corresponding to the embeddings l jis . Then from bottom to top, (c jis ) jis gets transformed as follows: (c jis ) jis → [J(jl)] s=1 c jis • p jis ji → j∈ G mj i=1 [J(jl)] s=1 c jis • p jis • (p ji ⊗ id V l ) → Rep((c jis ) jis ) In the last step, the hom-tensor adjunction Proposition D.23 is used, but in the other direction. As an illustration, the composition of functions over which we sum can be shown in the following commutative diagram: i j V j i ⊗ V l V ji ⊗ V l V J V J pji⊗id V l cjis•pjis•(pji⊗id V l ) pjis cjis We obtain: [Rep((c jis ) jis )(ϕ)] (v l ) = j∈ G mj i=1 [J(jl)] s=1 c jis • p jis • (p ji ⊗ id V l ) (ϕ ⊗ v l ) = j∈ G mj i=1 [J(jl)] s=1 (c jis • p jis )(p ji (ϕ) ⊗ v l ) = j∈ G mj i=1 [J(jl)] s=1 (c jis • p jis ) dj m=1 Y m ji ϕ Y m ji ⊗ v l = j∈ G mj i=1 [J(jl)] s=1 dj m=1 Y m ji ϕ • c jis p jis Y m ji ⊗ v l . That, finally, finishes the proof.

E EXAMPLE APPLICATIONS

In this chapter, we develop some relevant examples of the theory outlined in prior chapters. All of these examples are applications of Theorem D.16 and Corollary D.17. These examples are concerned with the following question: Given a specific field K ∈ {R, C}, compact transformation group G and homogeneous space X of G, how can a basis of steerable kernels K : X → Hom K (V l , V J ) for given irreducible representations ρ l : G → U(V l ) and ρ J : G → U(V J ) be determined? The theorems give an outline for what needs to be done in order to succeed in this task, and the steps are always as follows: 1. For each l ∈ G, a representative for the isomorphism class of irreducible representations l needs to be determined. That is, one needs to determine ρ l : G → U(V l ) and an orthonormal basis {Y n l | n ∈ {1, . . . , d l }}. We omit the index n if there is only one basis element. Usually, we have V l = K d l and the orthonormal basis is just the standard basis. 2. The Peter-Weyl Theorem B.22 gives the existence-statement for a decomposition of L 2 K (X) into irreducible subrepresentations. We need an explicit such decomposition, i.e.: we need to find multiplicities m j , irreducible subrepresentations V ji ∼ = V j for i ∈ {1, . . . , m j } and basis functions Y m ji ∈ V ji ⊆ L 2 K (X) corresponding to the Y m j such that L 2 K (X) = j∈ G mj i=1 V ji . 3. For each combination of j, l and J in G, one needs to find the number of times [J(jl)] that V J appears in a direct sum decomposition of V j ⊗ V l . Then, for each s ∈ {1, . . . , [J(jl)]}, and for all basis-indices M, m and n, one needs to determine the Clebsch-Gordan coefficients s, JM |jm; ln . We omit the index s if V J appears only once in the direct sum decomposition of V j ⊗ V l . 4. For each J one needs to determine a basis {c r | r = 1, . . . , E J } of the space of endomorphisms of V J , namely End G,K (V J ). Once all of this is done, one can then simply write down the basis kernels according to Eq. ( 22) or, in case that the space of endomorphisms is 1-dimensional, Eq. ( 23). The ingredients determined above are purely representation-theoretic information about the situation at hand, which hopefully makes the reader appreciate the results even more: we do not simply determine basis kernels; we understand in detail, along the way, the representation theory of the group and homogeneous space. Note that we are not concerned with practical considerations related to how fine-grained to do this in practice (for example, if the space on which the kernels operate splits into infinitely many orbits). For such questions, we refer back to Remark D.19. In the following sections, we discuss harmonic networks (SO(2)-equivariant CNNs with complex representations), SO(2)-equivariant CNNs with real representations, reflection-equivariant networks, SO(3)-equivariant CNNs with both complex and real representations, and O(3)-equivariant CNNs with both complex and real representations. For each of these examples, we go through the four steps outlined above. We recommend looking at the first example in detail: we conduct it in the greatest detail and it is the easiest to understand and thus serves as a nice introduction.

E.1 SO(2)-STEERABLE KERNELS FOR COMPLEX REPRESENTATIONS -HARMONIC NETWORKS

Here, we explain how the kernel constraint for harmonic networks (Worrall et al., 2016 ) can be solved using our theory. In the case of harmonic networks, we have K = C, G = SO(2), X = S 1 . As in most examples that follow, we ignore the solution of the kernel constraint in the origin, since it is usually easy to solve. For simplifying the formulas, we employ the isomorphism SO(2) ∼ -→ U(1), a -b b a → a + ib and always write U(1) instead of SO(2). Here, U(1) is the group of rotations of C, i.e., the group of elements in C with absolute value 1. It is also called the circle group since the group elements lie on a circle in the complex plane. Note that the change from SO(2) to the isomorphic group U(1) is done purely for convenience reasons, and SO(2) could be used just as well. We now go through the four steps outlined above. Our statements about the representation theory of the circle group can be found in Kowalski (2014), chapter 5.

E.1.1 CONSTRUCTION OF THE IRREDUCIBLE REPRESENTATIONS OF U(1)

We have U(1) = Z, and for l ∈ Z we can construct a representative ρ l : U(1) → U(V l ) as follows: V l = C is just the canonical 1-dimensional C-vector space, and ρ l is given by [ρ l (g)] (z) := g l • z, where g is regarded as an element in C. One can easily check that this is an irreducible representation. The orthonormal basis element for each such representation is just given by 1 ∈ C = V l . This already answers step 1 of the outline above. E.1.2 THE PETER-WEYL THEOREM FOR L 2 C (S 1 ) For step 2, we need to determine the Peter-Weyl decomposition of L 2 C (S 1 ), where we regard S 1 as a subset of C. Let Y l1 : S 1 → C be given by Y l1 (z) = z -l . Let V l1 ⊆ L 2 C (S 1 ) just be given by its span: V l1 = span C (Y l1 ). We want to see that this is a subrepresentation of L 2 C (S 1 ). To see this, remember that the unitary representation on L 2 C (X) is given by λ : U(1) → U(L 2 C (S 1 )) with [λ(g)ϕ] (z) = ϕ(g -1 z). We have [λ(g)Y l1 ] (z) = Y l1 (g -1 z) = (g -1 z) -l = g l • z -l = g l • Y l1 (z) and thus λ(g)Y l1 = g l Y l1 ∈ V l1 , which is what we claimed. Since the V l1 are 1-dimensional, they are necessarily irreducible for dimension reasons. Now, an important result from Fourier analysis is that the Y l1 for l ∈ Z actually form an orthonormal basis of L 2 C (S 1 ) and that, consequently, the Peter-Weyl decomposition of L 2 C (S 1 ) looks as follows: L 2 C (S 1 ) = l∈Z V l1 . From this we see that the multiplicities m l are all given by 1. What is missing is the connection to the irreps ρ l : U(1) → U(V l ), but we have already indicated this in the notation. Namely, the map f l : V l → V l1 given by z → z • Y l1 is clearly an isomorphism of vector spaces, and due to Eq. ( 24) even an isomorphism of representations: f l ρ l (g)(z) = f l g l • z = (g l • z) • Y l1 = z • (g l • Y l1 ) = z • (λ(g)(Y l1 )) = λ(g) z • Y l1 = λ(g) f l (z) . Thus, f l • ρ l (g) = λ(g) • f l for all g ∈ U(1) and, as claimed, f l turns out to be an isomorphism. This finishes step 2 of the outline above.

E.1.3 THE CLEBSCH-GORDAN DECOMPOSITION

For step 3, we proceed as follows: The map f : V j ⊗ V l → V j+l , z j ⊗ z l → z j • z l is clearly well-defined and linear by the universal property of tensor products, see Definition D.1. Furthermore, it is an isometry: namely, since the scalar product in C is just the usual multiplication (with the left entry being complex conjugated), we obtain f (z j ⊗ z l ) f (z j ⊗ z l ) = z j z l z j z l = z j z l • z j z l = z j z j • z l z l = z j z j • z l |z l = z j ⊗ z l z j ⊗ z l . In the last step, we have used the definition of the scalar product on the tensor product, Definition D.2. Thus, f is an isomorphism of Hilbert spaces. Finally, it also respects the representations since f [(ρ j ⊗ ρ l )(g)] (z j ⊗ z l ) = f [ρ j (g)] (z j ) ⊗ [ρ l (g)] (z l ) = f g j z j ⊗ g l z l = g j z j • g l z l = g j+l • (z j z l ) = [ρ j+l (g)] (f (z j ⊗ z l )) and thus f • (ρ j ⊗ ρ l )(g) = ρ j+l (g) • f for all g ∈ U(1). Finally, the basis vectors correspond in the simplest possible way since f (1 ⊗ 1) = 1. Overall, what we've shown is the following: V J is a direct summand of V j ⊗ V l if and only if J = j + l. If this is the case, we have [J(jl)] = 1 and can thus omit the index s. The only Clebsch-Gordan coefficient is then given by J1|j1l1 = 1 since the basis elements directly correspond. E.1.4 ENDOMORPHISMS OF V J This is the simplest part: Since we are considering representations over C, Schur's Lemma D.8 tells us that End U(1),C (V J ) is 1-dimensional for each irrep J, and thus we can ignore the endomorphisms altogether.

E.1.5 BRINGING EVERYTHING TOGETHER

We now show that a basis of steerable kernels K : S 1 → Hom C (V l , V J ) of the group U(1) is given, when expressed as 1 × 1-matrix parameterized by S 1 , by the basis function Y l-J : S 1 → C. We remove the index "1" at the basis function to remove clutter. How can we see this result, using Eq. ( 23)? Note that V J can only appear as a direct summand of V j ⊗ V l if j = J -l by what we've shown above. The "matrix" of Clebsch-Gordan coefficients CG J((J-l)l) is then just the number 1. We can omit the vacuous indices i and s and obtain that the only basis kernel is given by K J-l (x) = Y J-l |x = Y J-l (x) = x -(J-l) = x -(l-J) = Y l-J (x). This result is precisely equal to the one obtained in the original paper (Worrall et al., 2016) . This concludes our investigations of harmonic networks.

E.2 SO(2)-STEERABLE KERNELS FOR REAL REPRESENTATIONS

In this section, we look at the case K = R, G = SO(2) and X = S 1 . In the following sections, we again step by step determine the representation-theoretic ingredients that we need for the application of our theorem. Compared to Chapter A, which focuses more on the components themselves and how they relate to the general situation, this section has a stronger focus on actually determining the final kernels, which also involves the task of determining the Clebsch-Gordan coefficients explicitly. We remark that the resulting kernels are not new, since Weiler & Cesa (2019) have solved for this kernel basis already. However, we want to emphasize again that with our method, we learn more about the representation theory of SO(2) and thus get an overall better conceptual understanding of how the kernels arise. Since it will help the presentation of our results, we set SO(2) = R/2πZ, i.e., we view SO(2) as a group of angles. We also set S 1 = R/2πZ, i.e., we take the interval [0, 2π] as the space where our functions are defined. Consequently, since we want our Haar measure to be normalized, we have to put the fraction 1 2π before all of our integrals, different from what we did in our treatment of SO(2) over C. Note that since we now consider representations over the real numbers, unitary representations become orthogonal and we write O(V ) instead of U(V ).

E.2.1 CONSTRUCTION OF THE IRREDUCIBLE REPRESENTATIONS OF SO(2)

The irreps of SO(2) over R are given by ρ l : SO(2) → O(V l ), l ∈ N ≥0 . For l = 0, we have V 0 = R and the action is trivial. For l ≥ 1, V l = R 2 as a vector space. The action is given by ρ l (φ) (v) = cos(lφ) -sin(lφ) sin(lφ) cos(lφ) • v for φ ∈ SO(2) = R/2πZ . The orthonormal basis is in both cases just given by standard basis vectors. E.2.2 THE PETER-WEYL THEOREM FOR Lfoot_27 R (S 1 ) Now we look at square-integrable functions L 2 R (S 1 ) that we now assume to take real values. As before, SO(2) acts on this space by (λ(φ)f )(x) = f (x -φ). 26 For notational simplicity, we write cos l for the function that maps x to cos(lx), and analogously for sin l . One then can show the following, which is a standard result in Fourier analysis: Proposition E.1. The functions cos l , sin l , l ≥ 1 span an irreducible invariant subspace of L 2 R (S 1 ) of dimension 2, explicitly given by span R (cos l , sin l ) = α cos l +β sin l | α, β ∈ R which is isomorphic as an orthogonal representation to V l by √ 2 cos l → 1 0 and √ 2 sin l → 0 1 . 27 Furthermore, sin 0 = 0 and cos 0 = 1 are constant functions and their span is 1-dimensional and equivariantly isomorphic to V 0 by cos 0 → 1. Finally, the functions √ 2 • cos l , √ 2 • sin l form an orthonormal basis of L 2 R (S 1 ), i.e., every function can be written uniquely as a (possibly infinite) linear combination of these basis functions. When setting V l1 = span R (cos l , sin l ), we thus obtain a decomposition L 2 R (S 1 ) = l≥0 V l1 . Thus, we have m l = 1 for all l ∈ N. All in all, we know everything there is to know about the Peter-Weyl theorem in our situation.

E.2.3 THE CLEBSCH-GORDAN DECOMPOSITION

We now do the explicit decomposition of V j ⊗ V l into irreps, which will give us the Clebsch-Gordan coefficients that we need. Instead of doing the decomposition in terms of V j and V l themselves, in the proofs we actually use the isomorphic images V j1 and V l1 in L 2 R (S 1 ). For doing so, we first need some trigonometric formulas in our disposal: Lemma E.2. The sine and cosine functions fulfill the following rules: 1. cos j+l = cos j cos l -sin j sin l . 2. sin j+l = sin j cos l + cos j sin l . 3. cos j-l = cos j cos l + sin j sin l . 4. sin j-l = sin j cos l -cos j sin l . Proof. The first two are well-known and the last two follow directly from the first two using sin -j = -sin j and cos -j = cos j . We will need the following general lemma: Lemma E.3. Let f : V → V be an intertwiner between representations ρ : G → GL(V ) and ρ : G → GL(V ). Then null(f ) = {v ∈ V | f (v) = 0} is an invariant linear subspace of V. Proof. This can easily be checked by the reader. As a remark on notation for the following proposition: We write the Clebsch-Gordan coefficients CG J(jl)s of irreps V J , V j and V l with dimensions d J , d j and d l as a d J × (d j × d l )-tensor. That is, it consists of d J "rows", each of which is a d j × d l -matrix. If V J appears only once in the tensor product, we omit the index s as before. Proposition E.4. We have the following decomposition results: 1. For j = l = 0 we have V 0 ⊗ V 0 ∼ = V 0 and Clebsch-Gordan coefficients CG 0(00) = [1] . 2. For j = 0, l > 0 we have V 0 ⊗ V l ∼ = V l and Clebsch-Gordan coefficients CG l(0l) = [1 0] [0 1] . 3. For j > 0, l = 0, we get V j ⊗ V 0 ∼ = V j and Clebsch-Gordan coefficients CG j(j0) =     1 0 0 1     . 4. For j > l > 0 we get V j ⊗ V l ∼ = V j-l ⊕ V j+l . The Clebsch-Gordan coefficients are given by CG j-l,(jl) =     1 0 0 1 0 -1 1 0     and CG j+l,(jl) =     1 0 0 -1 0 1 1 0     . 5. For l > j > 0 we get V j ⊗ V l ∼ = V l-j ⊕ V j+l . The Clebsch-Gordan coefficients are given by CG (l-j)(jl) =     1 0 0 1 0 1 -1 0     and CG j+l,(jl) =     1 0 0 -1 0 1 1 0     . 6. For j = l > 0, we get an isomorphism V l ⊗ V l ∼ = V 2 0 ⊕ V 2l . We obtain the Clebsch-Gordan coefficients CG 0(ll)1 = 1 0 0 1 , CG 0(ll)2 = 0 ∓1 ±1 0 , CG 2l,(ll) =     1 0 0 -1 0 1 1 0     , the last one being the same as the Clebsch-Gordan coefficients CG j+l,(jl) from above. In CG 0(ll)1 and CG 0(ll)2 , a fourth index is present, namely 1 and 2, respectively. This is the index "s" that was missing in all the prior examples, since this is the first time an irrep appears more than once in a tensor product decomposition. Note that for CG 0(ll)2 , we have exactly one positive and one negative entry present, but both are equally valid and mirror the lower halves in CG j-l,(jl) from part 4 and CG l-j,(jl) from part 5. Proof. In the proof, instead of working directly with the irreps ρ j : SO(2) → O(V j ), we use the isomorphic copies V j1 in L 2 R (S 1 ) given in Proposition E.1. Since we think that it does not help understanding to carry the index "1" in all computations, we omit this index. The proof of 1, 2, and 3 is clear. For 4 and 5, consider the (unnormalized) basis {cos j ⊗ cos l , cos j ⊗ sin l , sin j ⊗ cos l , sin j ⊗ sin l } of V j ⊗ V l . Our goal is to express these basis elements with respect to basis elements of invariant subspaces. We do this by explicitly constructing an isomorphism to a decomposition of irreps. To that end, let p : V j ⊗ V l → L 2 R (S 1 ) be given by f ⊗ g → f • g, which is clearly a well-defined intertwiner. We get as image of p the set im(p) = span R p(cos j ⊗ cos l ), p(cos j ⊗ sin l ), p(sin j ⊗ cos l ), p(sin j ⊗ sin l ) = span R cos j • cos l , cos j • sin l , sin j • cos l , sin j • sin l . From Lemma E.2 we obtain: (a) p(cos j ⊗ cos l ) -p(sin j ⊗ sin l ) = cos j+l , (b) p(cos j ⊗ sin l ) + p(sin j ⊗ cos l ) = sin j+l , (c) p(cos j ⊗ cos l ) + p(sin j ⊗ sin l ) = cos j-l , (d) p(sin j ⊗ cos l ) -p(cos j ⊗ sin l ) = sin j-l . (25) Since the right hand sides are linearly independent basis functions of L 2 R (S 1 ), we obtain: im(p) = span R cos j+l , sin j+l , cos j-l , sin j-l = V |j-l| ⊕ V j+l . Note for the last step that due to symmetry, cos j-l = cos l-j and sin j-l = -sin l-j . We now specialize to the case of 4, i.e., j > l > 0. In this case, V |j-l| = V j-l , and the basis is given by cos j-l and sin j-l , as in the right hand sides of Eq. ( 25) (c) and (d). Consequently, Eq. ( 25) is already the expansion of the new basis elements with the old, and the coefficients are consequently the Clebsch-Gordan coefficients. 28 More precisely, if we want to compute, for example, CG j-l,(jl) , then we observe from (c) that cos j-l = +1 • p(cos j ⊗ cos l ) + 0 • p(cos j ⊗ sin l ) +0 • p(sin j ⊗ cos l ) + 1 • p(sin j ⊗ sin l ) from which we can already read the upper half of CG j-l,(jl) as the coefficients in this equation (which we conveniently visually arranged in the right way). For the lower half, we do proceed the same for sin j-l , using (d). Then, for CG j+l,(jl) , we proceed exactly the same, using parts (a) and (b). That proves 4. For 5, we have l > j > 0. In this case, V |j-l| = V l-j , i.e., the basis is given by cos l-j = cos j-l and sin l-j = -sin j-l . The latter means that in part (d) of Eq. ( 25), we need to replace sin j-l by sin l-j and thus change the signs on the left hand side. This change means that CG j+l,(jl) will remain the same as in 4, the upper half of CG l-j,(jl) will remain the same as the upper half of CG j-l,(jl) from part 4 since the cosine in part (c) of Eq. ( 25) is symmetric, and the lower part will flip the signs. This fully proves 5. Finally, we prove 6. We have j = l and still consider the same function p. Note that p(cos j ⊗ cos l )+ p(sin j ⊗ sin l ) = 1 and p(sin j ⊗ cos l ) -p(cos j ⊗ sin l ) = 0 are constant functions that span the 1dimensional trivial representation. Thus, we see that p is a surjection p : V l ⊗ V l → V 0 ⊕ V 2l with null space spanned by sin j ⊗ cos l -cos j ⊗ sin l . Such a null space is automatically an invariant subspace as well, and since it is one-dimensional, it also must be isomorphic to the trivial representation. Overall, this gives us an isomorphism V l ⊗ V l ∼ = V 2 0 ⊕ V 2l . From this, we can as before read off the Clebsch-Gordan coefficients. The only thing that changes is that parts (c) and (d) of Eq. ( 25) now correspond to two different copies of V 0 , which means that the Clebsch-Gordan coefficients CG 0(ll) now split up in two parts CG 0(ll)1 and CG 0(ll)2 . Note that in the trivial representation, the isomorphism that sends the basis vector to its negative is clearly equivariant, which means that both combinations of signs that we give in the final formula for CG 0(ll)2 are valid.

E.2.4 ENDOMORPHISMS OF V J

We now describe the endomorphisms of the irreducible representations, our last ingredient: Proposition E.5. We have End SO(2),R (V 0 ) ∼ = R, i.e., multiplications with all real numbers are valid endomorphisms of V 0 . For l ≥ 1, we get End SO(2),R (V l ) = a -b b a a, b ∈ R , which is the set of all scaled rotations of R 2 . When identifying R 2 ∼ = C, we can also view these transformations as arbitrary multiplications with a complex number. As a consequence, id R is a basis for End SO(2),R (V 0 ) and 1 0 0 1 , 0 -1 1 0 a basis for End SO(2),R (V l ) for l ≥ 1. Proof Sketch. the real numbers, the scalar product is symmetric, and these coefficients are thus also the expansion coefficients when expressing {ci} with {bj}. This is why we do not have to rearrange the expressions in Eq. ( 25), it simply doesn't matter which of the two bases is expanded. Note, however, that our bases are not normalized, and so the Clebsch-Gordan coefficients differ by a constant if the equation is rearranged. This constant does not matter for us since we are only interested in a basis of the space of steerable kernels, and constant multiples of bases are still bases.

E.2.5 BRINGING EVERYTHING TOGETHER

Now we have done all needed preparation and can solve the kernel constraint explicitly, using the matrix-form of the Wigner-Eckart theorem for steerable kernels, Theorem D.16. This is, as mentioned before, a new derivation of the results in Weiler & Cesa (2019) . One can compare with table 8 in their appendix which only differs by (irrelevant) constants. Proposition E.6. We consider steerable kernels K : S 1 → Hom R (V l , V J ), where V l and V J are irreducible representations of SO(2). Then the following holds: 1. For l = J = 0, we get K(x) = a • (1) for every x ∈ S 1 and an arbitrary real number a ∈ R independent of x. 2. For l = 0, J > 0, a basis for steerable kernels is given by cos J sin J and -sin J cos J . 3. For l > 0 and J = 0, a basis for steerable kernels is given by (cos l sin l ), (sin l -cos l ). 4. For l, J > 0, a basis for steerable kernels is given by cos J-l -sin J-l sin J-l cos J-l , -sin J-l -cos J-l cos J-l -sin J-l , cos J+l sin J+l sin J+l -cos J+l , and -sin J+l cos J+l cos J+l sin J+l . Proof. The proof of 1 is clear. For 2, note that V J can only appear in V j ⊗V 0 if j = J. The relevant Clebsch-Gordan coefficients are by Proposition E.4 therefore CG J (J0) =     1 0 0 1     . Furthermore, the orthonormal basis of V j1 = V J1 is given by Proposition E.1 up to constants by {cos J , sin J }, which we have to write as a row-vector according to Theorem D.16. Thereby, we can ignore the complex conjugation since we work over the real numbers. Our final ingredient is the endomorphism basis of V J , which is by Proposition E.5 given by c 1 = id R 2 and c 2 = 0 -1 1 0 . Overall, the basis kernels are given by c i •     [cos J sin J ] • 1 0 [cos J sin J ] • 0 1     = c i • cos J sin J . The result follows. For 3, we find V 0 only in V j ⊗ V l if j = l, and even twice so. The relevant Clebsch-Gordan coefficients are therefore by Proposition E.4 given by CG 0(ll)1 = 1 0 0 1 and CG 0(ll)2 = 0 -1 1 0 . The basis-functions in V j1 = V l1 are by Proposition E.1 up to constants {cos l , sin l }, again written as a row-vector. Finally, V J = V 0 has only id R as a basis-endomorphism by Proposition E.5, so this can be ignored altogether by Corollary D.17. We obtain the following basis for steerable kernels: [cos l sin l ] 1 0 0 1 = (1 cos l 1 sin l ) [cos l sin l ] 0 -1 1 0 = (1 sin l -1 cos l ) . For 4, we consider only the case l < J. By Proposition E.4 we have V J-l ⊗ V l ∼ = V |2l-J| ⊕ V J , V l+J ⊗ V l ∼ = V J ⊕ V 2l+J , i.e., j = J -l and j = l + J leads to a tensor product decomposition containing V J , but no other j does. Thus, the relevant Clebsch-Gordan coefficients are by Proposition E.4 the matrices CG J,(J-l,l) and CG J,(l+J,l) . We now consider the first case, i.e., j = J -l. The Clebsch-Gordan coefficients are CG J,(J-l ,l) = CG l+j,(jl) =     1 0 0 -1 0 1 1 0     . The basis functions of V (J-l)1 are by Proposition E.1 furthermore given by {cos J-l , sin J-l }. Finally, V J has again the two basis endomorphisms c 1 = id R 2 and c 2 from above. Thus, we obtain the following basis kernel for c 1 : c 1 •     [cos J-l sin J-l ] • 1 0 0 -1 [cos J-l sin J-l ] • 0 1 1 0     = cos J-l -sin J-l sin J-l cos J-l . Consequently, for c 2 as the basis endomorphism we need to postcompose with c 2 and get: 0 -1 1 0 • cos J-l -sin J-l sin J-l cos J-l = -sin J-l -cos J-l cos J-l -sin J-l . ( ) These are half of the basis kernels. For the other half, we need to look at the case j = l + J. The Clebsch-Gordan coefficients are by part 4 of Proposition E.4 given by CG J,(l+J,l) = CG j-l,(jl ) =     1 0 0 1 0 -1 1 0     . The basis functions of V (l+J)1 are by Proposition E.1 furthermore given by {cos J+l , sin J+l }. For the basis endomorphism c 1 we thus get the basis kernel c 1 •     [cos J+l sin J+l ] • 1 0 0 1 [cos J+l sin J+l ] • 0 -1 1 0     = cos J+l sin J+l sin J+l -cos J+l . Consequently, for c 2 as the basis endomorphism we need to postcompose with c 2 and get: 0 -1 1 0 • cos J+l sin J+l sin J+l -cos J+l = -sin J+l cos J+l cos J+l sin J+l . Overall, for the case l < J we have determined all four basis kernels in Eqs. ( 26), ( 27), (28), and (29). The cases l = J and l > J can be considered analogously, and in every case the correct Clebsch-Gordan coefficients have to be picked. By using cos l-J = cos J-l and sin l-J = -sin J-l , this will, in the end, always lead to the same final formulas. This result is consistent with Table 8 in Weiler & Cesa (2019) .

E.3 Z 2 -STEERABLE KERNELS FOR REAL REPRESENTATIONS

In this section, we discuss steerable CNNs that use the finite group Z 2 , which we identify with ({-1, +1}, •), for their symmetries. We let this group act on the plane R 2 by vertical reflections, though other choices are possible as well: x • a b = xa b . This example is simple and one may see it as contrived to apply our relatively heavy theory to it. We include it mainly as a demonstration that our results can also be applied to non-smooth finite groups as instances of compact groups. Furthermore, we will fully recover the relationship to the original group convolutional CNNs from Cohen & Welling (2016a) and thereby demonstrate that all the different developed theories are consistent with each other.

E.3.1 THE IRREDUCIBLE REPRESENTATIONS

OF Z 2 OVER THE REAL NUMBERS Let ρ : Z 2 → GL(V ) be an irreducible real representation. Note that ρ(-1) • ρ(-1) = ρ((-1) • (-1)) = ρ(1) = id V , and thus ρ(-1) is an involution satisfying the equation ρ(-1) 2 -id V = 0. It is well-known from linear algebra that involutions are diagonalizable, and thus ρ(-1) leaves 1-dimensional subspaces invariant. By irreducibility of ρ this means that V itself needs to be 1-dimensional. Consequently, we can assume V = R without loss of generality. Note that the computations above mean that we have ρ(-1) -id R • ρ(-1) + id R = 0 and thus we need to have ρ(-1) -id R = 0 or ρ(-1) + id R = 0. It follows ρ(-1) = id R or ρ(-1) = -id R . Overall, all these investigations mean that we have precisely two irreducible representations of Z 2 up to equivalence. We call them ρ + : Z 2 → O(V + ) and ρ -: Z 2 → O(V -), where ρ + (-1) = id R and ρ -(-1) = -id R and V + = V -= R. E.3.2 THE PETER-WEYL THEOREM FOR L 2 R (X) Here we do the Peter-Weyl decomposition for L 2 R (X), where X is one of the two homogeneous spaces X = {-1, 1} and X = {0} with the obvious actions coming from the groups Z 2 . This time, we also discuss orbits with only one point since we later want to get a description of kernels on the whole of R 2 for comparisons with group convolutional CNNs. We start with X = {-1, 1}. Note that the measure on X is just the normalized counting measure, and thus all functions f : X → R are square-integrable. We define the two functions f + : X → R, f + (x) = 1 for all x ∈ X = {-1, 1}, f -: X → R, f -(x) = x for all x ∈ X = {-1, 1}. We then define V +1 = span R (f + ) and V -1 = span R (f -). This gives a decomposition L 2 R (X) = V +1 ⊕ V -1 since we have for all f ∈ L 2 R (X) f = f (1) + f (-1) 2 • f + + f (1) -f (-1) 2 • f -. Furthermore, the maps 1 → f + and 1 → f -give isomorphisms of representations V + ∼ = V +1 and V -∼ = V -1 , respectively. Now, assume that X = {0} with the trivial action coming from Z 2 . Then L 2 R (X) = V +1 generated from the function f + : X → R, f + (0) = 1. As before, 1 → f + gives an isomorphism V + ∼ = V +1 . This concludes the investigations of the Peter-Weyl theorem.

E.3.3 THE CLEBSCH-GORDAN DECOMPOSITION

We have the following four isomorphisms of representations: V + ⊗ V + ∼ = V + , V + ⊗ V -∼ = V -, V -⊗ V + ∼ = V -, V -⊗ V -∼ = V + , each time simply given by a ⊗ b → ab. It can easily be checked that these are isomorphisms. In Section E.6.3 the reader can find a proof for similar, sign-dependent isomorphisms for the case that the group is O(3). For each such isomorphism, there is precisely one Clebsch-Gordan coefficient and it is just given by 1. Thus, as in the case of harmonic networks in Section E.1.5, we can just ignore the Clebsch-Gordan coefficients altogether in the final formulas for our basis kernels. E.3.4 ENDOMORPHISMS OF V + AND V - Since V + and V -are themselves only 1-dimensional, the endomorphism spaces are necessarily 1dimensional as well and just given by arbitrary 1 × 1-matrices, i.e., arbitrary stretchings. As in the example of harmonic networks, we can therefore ignore the endomorphisms as well.

E.3.6 GROUP CONVOLUTIONAL CNNS FOR Z 2

We now investigate what all this means if we consider regular representations instead of irreducible representations, thus corresponding to group convolutional kernels as in (Cohen & Welling, 2016a) . In this case, we will see an interesting "twist" in the kernel, which makes this example more interesting than one might initially think. The twist emerges as follows: For regular representations, we consider steerable kernels K : R 2 → Hom R (L 2 R (Z 2 ), L 2 R (Z )) Now, there are two relatively canonical bases we can choose in the left and the right space. We already know from above that {f + , f -} is the basis to choose if we want to express steerable kernels corresponding to irreducible representations. However, for vanilla group convolutional CNNs, the basis usually chosen is {e +1 , e -1 } where e +1 (x) = δ +1,x and e -1 (x) = δ -1,x . We then obtain the following four base change relations: f + = e +1 + e -1 , f -= e +1 -e -1 , e +1 = 1 2 f + + 1 2 f -, e -1 = 1 2 f + - 1 2 f -. Thus, the base change matrices are given by B = 1 1 1 -1 , B -1 = 1 2 1 2 1 2 -1 2 . Now, assume that K : R 2 → Hom R (L 2 R (Z 2 ), L 2 R (Z 2 )) ∼ = R 2×2 is expressed with respect to the basis {f + , f -}. If we write K as a matrix K = K 11 K 12 K 21 K 22 then we know that K 11 and K 22 map between equal-sign representations and K 12 and K 21 between unequal-sign representations. Consequently, from what we've found above, K 11 and K 22 are symmetric, whereas K 12 and K 21 are antisymmetric. What we now want to figure out is how exactly this translates to a property of the kernel expressed in the basis {e + , e -}. Thus, let K be this corresponding kernel. Then using the base change matrices above we obtain K 11 K 12 K 21 K 22 = K = B • K • B -1 = 1 1 1 -1 • K 11 K 12 K 21 K 22 • 1 2 1 2 1 2 -1 2 = 1 2 [K 11 + K 12 + K 21 + K 22 ] 1 2 [K 11 -K 12 + K 21 -K 22 ] 1 2 [K 11 + K 12 -K 21 -K 22 ] 1 2 [K 11 -K 12 -K 21 + K 22 ] . What symmetry properties does this kernel obey? In order to understand this, we use the following convention: for y ∈ R 2 we set -y = -y 1 y 2 , i.e., the vertically flipped image of y. Then we have, using the symmetry and anti-symmetry of the entries of the original kernel K: Thus the second row of K is basically the same as the first, only that the kernels swap with each other and are internally flipped. This is a special case of the outcome in Cohen & Welling (2016a) , which is also described clearly in Weiler et al. (2018b) : in group convolutional kernels which are steerable with respect to finite groups, the kernels get copied and applied in all orientations demanded by the group. What we would still like to understand is if we can also reverse the direction: That is, assume that we start with a group convolutional kernel K of which we know that K 22 (-y) = K 11 (y) and K 21 (-y) = K 12 (y) for all y ∈ R 2 . If we then do a base change, we would like to know if the resulting kernel consists of symmetric and antisymmetric entries. Namely, set K 11 K 12 K 21 K 22 = K = B -1 • K • B = 1 2 1 2 1 2 -1 2 • K 11 K 12 K 21 K 22 • 1 1 1 -1 = 1 2 [K 11 + K 12 + K 21 + K 22 ] 1 2 [K 11 -K 12 + K 21 -K 22 ] 1 2 [K 11 + K 12 -K 21 -K 22 ] 1 2 [K 11 -K 12 -K 21 + K 22 ] . The reader can easily check that we can deduce that K 11 and K 22 are symmetric and that K 12 and K 21 are anti-symmetric. We have thus fully shown the equivalence of the kernel solutions in the setting of steerable CNNs compared to the setting of group convolutional CNNs for the specific group Z 2 . E.4 SO(3)-STEERABLE KERNELS FOR COMPLEX REPRESENTATIONS. In the first two sections, we have discussed SO(2)-equivariant kernels (i.e., SE(2)-equivariant neural networks) both over C and R. The situation over R was considerably more complicated and required new arguments. In this section, we will discuss SO(3)-equivariant kernels (i.e., SE(3)-equivariant neural networks) for complex representations. In Section E.5 we will then look at the real case, which will essentially give the exact same results, thus differing somewhat from the considerations about SO(2). Different from the earlier sections, we will from now on be less explicit and care more about the general properties of the different functions and coefficients we consider. SO(3)equivariant networks with real representations have before been implemented in Weiler et al. (2018a) and Thomas et al. (2018) , among others.

E.4.1 THE IRREDUCIBLE REPRESENTATIONS OF SO(3) OVER THE COMPLEX NUMBERS

In this section, we state the complex irreducible representations of SO(3). We will not state the matrices explicitly since the matrix elements are considerably more complicated than in the earlier examples that we saw. For each l ∈ N ≥0 , there is one irreducible unitary representation We note that the indices for the dimensions in C 2l+1 are -l, -l+1, . . . , l-1, l by general convention. E.4.2 THE PETER-WEYL THEOREM FOR L 2 C (S 2 ) AS A REPRESENTATION OF SO(3) Here, we describe how L 2 C (S 2 ), considered as a unitary representation via λ : SO(3) → U(L 2 C (S 2 )), with [λ(g)ϕ](x) = ϕ(g -1 x), contains densely a direct sum of irreducible representations. For doing so, we proceed by first describing spherical harmonics without formulas and stating their orthonormality properties, and then stating how they transform under rotation. This will then yield the result. Note that we do not need to describe explicit formulas for the spherical harmonics, which are again somewhat complicated since we are more interested in their properties mentioned in the last section, the endomorphisms are trivial, which is why we also do not need the index r. Overall, we see that we simply have basis kernels K j : S 2 → Hom C (V l , V J ) for all j with |l -J| ≤ j ≤ l + J.foot_30 They are explicitly given by This ends the discussion.

E.5 SO(3)-STEERABLE KERNELS FOR REAL REPRESENTATIONS

In this section, we want to argue why the results in the last section transfer over to the real case as well. Most of the investigations in this section are probably well-known. However, we were not able to find sources that explicitly explain the representation theory of SO(3) over the real numbers, and so we develop lots of it here from scratch. We thereby make use of the theory over C, some results about real spherical harmonics, and the general theory of real and quaternionic representations outlined in Bröcker & Dieck (2003) . We need to somewhat turn the order around in this section in order to develop the results. Therefore we first investigate the Peter-Weyl theorem, then look at the endomorphism spaces of the appearing irreducible representations and afterward, as a consequence, show that the representations appearing in the decomposition of L 2 R (S 2 ) are already exhaustive. E.5.1 THE PETER-WEYL THEOREM FOR L 2 R (S 2 ) AS A REPRESENTATION OF SO(3) The most important finding is the following, which is taken from Gallier & Quaintance (2020): One can do a base change for the spherical harmonics as follows to obtain real versions of them. Namely, let r Y n l =          i √ 2 Y n l -(-1) n Y -n l if n < 0, Y 0 l if n = 0, 1 √ 2 Y -n l + (-1) n Y n l if n > 0. One can then show that these functions are real-valued continuous functions and therefore r Y n l ∈ L 2 R (S 2 ). Furthermore, they are an orthonormal basis of this space. We can then, as before, set r V l1 as the span of the r Y n l ∈ L 2 R (S 2 ) and obtain a decomposition L 2 R (S 2 ) = l≥0 r V l1 . We need to understand the transformation properties of these real-valued spherical harmonics under rotation. To understand this explicitly, we set B l ∈ C (2l+1)×(2l+1) as the (complex) base change matrix between the complex and real spherical harmonics. Its entries are given according to Eq. ( 31) such that the following relation holds for all n = -l, . . . , l: r Y n l = l n =-l B n n l • Y n l . Since for a given l, both the complex and real spherical harmonics are linearly independent, the matrix B l is invertible. Let B -1 l be its inverse. Then it is generally known from linear algebra that In order to compare real and complex representations we need to define two functors between those:foot_31 Definition E.12 (Restriction and Extension). Let c ρ : G → GL( c V ) be a complex representation. Furthermore, let r ρ : G → GL( r V ) be a real representation. Then we define their restriction and extension as follows: 1. Set r( c V ) as the R-vector space that has the same underlying abelian group as c V and the scalar multiplication from R which is the restriction of the multiplication from C. The restriction r( c ρ) : G → GL(r( c V )) is defined as the exact same map as c ρ, only that r( c ρ)(g) : r( c V ) → r( c V ) is now viewed as an automorphism of real vector spaces. 2. We define the extension by e( r V ) := C ⊗ R r V , where C is regarded as an R-vector space. This construction becomes a C-vector space by scalar multiplication z•(z ⊗v) := (zz )⊗v. We can then define e( r ρ) : G → GL(e( r V )) by setting e( r ρ)(g) := id C ⊗( r ρ(g)). Note that the extension operation doubles the R-dimension, whereas for the restriction it stays equal. Therefore, we can not hope that these operations are inverse to each other. However, we have the following, almost as nice statement: Proposition E.13. For each real representation ρ : G → GL(V ) there is a natural isomorphism r(e(V )) ∼ = V ⊕ V of R-representations. Proof. This is the first statement in Bröcker & Dieck (2003) , Proposition II.6.1. The following definition is actually not the definition that Bröcker & Dieck (2003) formulate. However, it is an equivalent characterization that follows from their Proposition II.6.6 (vii), (viii) and (ix) and is more convenient for our needs: Definition E.14 (Real Type Complex Representation). Let ρ : G → GL(V ) be a complex irreducible representation. Then ρ is called of real type if there is an isomorphism of real representations r(V ) ∼ = U ⊕ U where 1. ρ U : G → GL(U ) is an irreducible real representation and 2. r(ρ) : G → GL(r(V )) is the restriction of ρ, as defined in Definition E.12. Proposition E.15. Assume G is a compact group such that all complex irreducible representations are of real type. Then also all real irreducible representations are of real type. Proof. This follows from Bröcker & Dieck (2003), Proposition II.6.6 (ii) and (iii) . Proposition E.16. Let ρ : G → GL(V ) be an irreducible real representation of real type. Then its extension e(ρ) : G → GL(e(V )) given as in Definition E.12 is an irreducible complex representation (also of real type). Proof. This is precisely Bröcker & Dieck (2003) , Proposition II.6.6(i).

E.5.4 THE IRREDUCIBLE REPRESENTATIONS OF SO(3) OVER THE REAL NUMBERS

The rough strategy is to use the fact that the D r l , viewed as complex irreducible representations, are an exhaustive list of all the complex irreps. Then, using the restriction and extension operators r and e between real and complex representations, we can show that in the specific case of SO(3), there can not be any other real irreducible representations than the D r l , viewed as real representations. Lemma E.17. All complex irreducible representations of SO(3) are of real type. Earlier, we already considered tensor product representations of one and the same group. A related notion is that of tensor product representations of two different groups:foot_34 Definition E.21 (Tensor Product Representation). Let G and H be two compact groups. Let ρ G : G → GL(V G ) and ρ H : H → GL(V H ) be representations of the two groups G and H. Then the tensor product representation is given by ρ G ⊗ ρ H : G × H → GL(V G ⊗ V H ), (ρ G ⊗ ρ H ) (g, h) (v G ⊗ v H ) := ρ G (g)(v G ) ⊗ ρ H (h)(v H ). This is again a linear representation. Proposition E.22. Representatives of isomorphism classes of irreducible representations of G × H are given precisely by all the ρ G ⊗ρ H , where ρ G and ρ H run through representatives of isomorphism classes of irreducible representations of G and H, respectively. Proof. This is proven in chapter II, Proposition 4.14 and 4.15 of Bröcker & Dieck (2003) . It is important to note that the proof of the above proposition uses the property of the complex numbers to be algebraically closed in crucial steps, and therefore it is unclear how exactly a generalization to representations over the real numbers looks like. Therefore, we will not use the above proposition in our later considerations for real representations of O(3). However, in our current situation, we can apply it without problems. This proposition, together with Lemma E.20, suggests that we should understand the irreducible representations of Z 2 . We already saw this for real representations before and essentially obtain the same result: Lemma E.23. The irreducible representations of Z 2 are up to equivalence precisely the following two, which we state for simplicity only on the generator: D l+ (sg) = D l (g) for all s ∈ Z 2 , g ∈ SO(3). D l-(sg) = sD l (g) for all s ∈ Z 2 , g ∈ SO(3). Proof. Remember from Section E.4.1 that the irreducible representations of SO(3) are given by the Wigner D-matrices D l . From Lemma E.23 we know that the irreducible representations of Z 2 are given by ρ + and ρ -. From the isomorphism O(3) ∼ = Z 2 × SO(3) from Lemma E.20 and from Proposition E.22 we thus obtain that the irreducible representations of O(3) are precisely given by all ρ + ⊗ D l and ρ -⊗ D l . We now show that ρ -⊗ D l is equivalent to D l-: We have ρ -⊗ D l : O(3) → GL(C ⊗ V l ), (ρ -⊗ D l )(sg) (z ⊗ v) = sz ⊗ [D l (g)] (v). Now, consider the linear isomorphism f : C ⊗ V l → V l+ , z ⊗ v → zv. We only need to check that it is equivariant and are then done: f [(ρ -⊗ D l )(sg)] (z ⊗ v) = f sz ⊗ [D l (g)](v) = sz • [D l (g)] (v) = [sD l (g)](zv) = [D l-(sg)](f (z ⊗ v)). The statement about D l+ can be shown using the exact same map f . Let in the following X and Y be topological spaces. Definition F.2 (Open Neighborhood). Let x ∈ X. An open set U ⊆ X is called open neighborhood of x if x ∈ U . Definition F.3 (Hausdorff Space). X is called a Hausdorff space if two distinct points can always be separated by open sets, i.e., for all x, y ∈ X there exist U x , U y open such that x ∈ U x , y ∈ U y , and U x ∩ U y = ∅. In this work, all topological spaces are Hausdorff. Definition F.4 (Subspace). Assume A ⊆ X is a subset. Then the set T A := {U ∩ A | U ∈ T } is a topology for A and thus makes A a topological space as well. It is called a subspace of X. Whenever we consider a subset of a topological space, it is viewed as a topological space with this construction. Definition F.5 (Closure, Density). For A ⊆ X, its closure A is defined as the smallest closed subset of X that contains A. Equivalently, it is the intersection of all closed subsets of X containing A, which is closed by the axioms of a topology. A is called dense in X if A = X. A homeomorphism is a continuous bijective function with a continuous inverse. Note that compositions of continuous functions are continuous as well. Definition F.7 (Open Cover, Compact Space). An open cover of X is a family of open sets {U i } i∈I that cover X, i.e., X = i∈I U i . X is called compact if all open covers have a finite subcover, that is: For all open covers {U i } i∈I there exists a finite subset J ⊆ I such that {U i } i∈J is still an open cover of X. Proposition F.8. If X is compact and f : X → Y is continuous, then f (X) ⊆ Y is compact as well. In particular, if f surjective, then Y is compact. Proof. See Sutherland (1975) , Proposition 13.15. Proposition F.9. Let f : X → Y be a continuous bijection and assume that X is compact and that Y is Hausdorff. Then the inverse f -1 is continuous as well and thus f is a homeomorphism. Proof. See Sutherland (1975) , Proposition 13.26. Definition F.10 (Product Topology). The product topology on X × Y is the coarsest (i.e., smallest in terms of inclusion) topology that makes both projections p X : X ×Y → X and p Y : X ×Y → Y continuous. If Z is a third topological space and we have two continuous functions f X : Z → X and f Y : Z → Y , then the function f X × f Y : Z → X × Y , z → (f X (z), f Y (z)) is continuous as well. Definition F.11 (Quotient Map, Quotient Space). A continuous function f : X → Y is called a quotient map if f is surjective and if U ⊆ Y is open if and only if f -1 (U ) ⊆ X is open. Let ∼ be any equivalence relation on X and X/∼ be the quotient set formed by identifying equivalent elements. Let q : X → X/∼ be the canonical function sending each element to its equivalence class. We define U ⊆ X/∼ to be open if q -1 (U ) ⊆ X is open. Then q is a quotient map and X/∼ is called a quotient space. Proposition F.12 (Universal property of Quotient Maps). Let q : X → X/∼ be a standard quotient map and f : X → Y be any continuous function such that f (x) = f (x ) whenever x ∼ x . Then there is a unique continuous function f : X/∼ → Y such that the following diagram commutes: X Y X/∼ f q f The following is a result we use several times in the main text: Proposition F.18. Let f : V → V be a linear function between normed vector spaces. Then the following are equivalent: 1. f is uniformly continuous. 2. f is continuous. 3. f is continuous in 0. Proof. Trivially, 1 implies 2, which in turn implies 3. Now assume 3, i.e., f is continuous in 0. Let > 0. Then by continuity in 0, there exists δ > 0 such that for all v ∈ V with v = v -0 < δ we obtain f (v) = f (v) -f (0) < . Now let v, v ∈ V be arbitrary with v -v < δ. Then by the linearity of f we obtain: f (v) -f (v ) = f (v -v ) < , which is exactly what we wanted to show. Sometimes, sequences look like they converge since their elements get ever closer to each other. However, not all such sequences need to converge. Therefore, there is the following notion: Definition F.19 (Cauchy Sequence). Let Y be a metric space. A sequence (y k ) k in Y is a Cauchy Sequence if for all > 0 there is k ∈ N such that for all k, k > k we have d(y k , y k ) < . For example, one can consider the metric space R \ {0} together with the usual metric. Then the sequence 1 k k is a Cauchy sequence but does not converge since the limit (in R!), which would be 0, is not in R \ {0}. Thus, the following notion is useful: Definition F.20 (Complete Metric Space). A metric space Y is called complete if every Cauchy sequence converges. Corollary F.25 (Extreme Value Theorem). Let f : X → R be continuous, where X is any nonempty compact topological space. Then f has a maximum and a minimum. Proof. By Proposition F.8, f (X) ⊆ R is compact. By Theorem F.24 this means that f (X) is closed and bounded. Boundedness means that the supremum is finite and closedness means that the supremum must lie in f (X), and consequently it is a maximum. For the minimum, the same arguments apply.



Unitary representations are explained in Section 3. The notation U for the operator is distinct from the notation U of the unitary group U(H). The system could in general have further quantum numbers, which we suppress here for simplicity. For K = C, Schur's Lemma D.8 tells us that the endomorphism spaces of irreducible representations are always 1-dimensional, generated by the identity. For K = R, however, one can show that they have either 1, 2, or dimensions, seeBröcker & Dieck (2003), Theorem II.6.3. Usually, the Peter-Weyl theorem uses G itself as the homogeneous space and is formulated for complex representations(Knapp, 2002). However, generalizations to arbitrary homogeneous spaces and real representations are possible, as we explain in Appendix B.2 From a representation theoretic viewpoint, the functions Y m ji for fixed j and i span an irreducible subrepresentation Vji of the unitary representation λ :G → U(L 2 K (X)) given by [λ(g)f ] (x) := f (g -1 x). L 2 K (X) then splits into an orthogonal direct sum L 2 K (X) = j∈ G m j i=1Vji. This viewpoint is explained in the equivalent, more representation theoretic formulation of the Peter-Weyl theorem in Theorem B.22. This statement is made precise by the isomorphism GKer, defined in Eq. (5) in Theorem 4.1. In Eq. (6) we see terms i, jm|x which are not present in the original Wigner-Eckart Theorem 2.1. They appear through the linearization of steerable kernels by Theorem C.7. Another way to imagine this is to identify VJ with the complex plane C. Then an endomorphism is given by multiplication with an arbitrary complex number. For a vector space V and subspaces (Ui)i∈I , their sum i∈I Ui is the set of sums i∈J ui with J ⊆ I finite and ui ∈ Ui for all i. It is itself a subspace of V . I.e., those parts that deal with approximations of square-integrable functions by matrix elements. Such a Haar measure exists since H ⊆ G is a topologically closed subgroup of a compact group by Proposition B.21 and thus compact itself by standard topological results(Conway, 2014). Note that this measure fulfills µ(H) = 1 and is thus not the same as the restriction of the measure on G to H. Here, G/H is the set of equivalence classes in G with respect to the equivalence relation g ∼ g if g -1 g ∈ H, which has a quotient topology as explained in Definition F.11. The equivalence classes are given by the cosets gH for g ∈ G. Remember that functions in L 2 K (X) for any measurable space X are identified if they agree outside a set of measure 0. In this notation, we suppress that this embedding depends on the specific base point x * which was chosen. For another base point, the embedding differs by a unitary automorphism on L 2 K (G) as the reader may want to check. We will study some of these groups in the Examples in Chapter E. The semidirect product R d G can be imagined as the smallest subgroup of the group of all isometries of R d that contains both the translations R d and the transformations G. It is not important to know the abstract definition of a semidirect product in our context. The operation is actually a so-called "correlation", but the term "convolution" is more widespread in the deep learning context and we follow this convention. This means that all matrix elements of K(x) for chosen bases of Vin and Vout are continuous. "Canonical" once the decompositions into irreducible representations is already chosen. since K is continuous on X, which is compact by Proposition F.8 as an image of the compact group G, it has a maximum by Corollary F.25. It is more general in that it considers arbitrary groups and the situation that the considered irreducible representation appears several times in a tensor product representation instead of just once. Those are a priori not assumed to be embedded in a space of square-integrable functions. For such embedded representations, we write Vji instead. i is like an additional quantum number in physics. Of course, for this argument, we need the uniqueness of direct sum decompositions. But this follows if we assume the Hom-representation to be unitary, which works by Proposition B.20 and then using the Krull-Remak-Schmidt Theorem, Proposition B.39. Schur's lemma applies since it is a statement about irreducible representations which are necessarily finitedimensional. This means that the continuity condition in the definition of intertwiners is vacuous and thus we don't need to worry about K not being continuous a priori. Note that we have a subtraction now instead of a multiplicative inversion. This is because we view our group as additive. acts as a normalization. Note that for two orthonormal bases in a Hilbert space, when expressing one basis {bi} with respect to another basis {ci}, then the expansion coefficients are given by the scalar products cj|bi . Since we work over Here, the letter "D" stands for "Darstellung" which is the German term for "representation". We saw that VJ is a direct summand of Vj ⊗ V l if and only if |l -j| ≤ J ≤ l + j. By doing case distinctions, one can show that this is the case if and only if |l -J| ≤ j ≤ l + J. We only define these functors on objects and not on morphisms. The reason is that we will never explicitly use their definitions on morphisms. More details on this can be found inBröcker & Dieck (2003), including other functors which are needed in the general theory. The reader should not worry if he or she does not know what a functor is. The reader does not need to know what a functor is if he or she believes these statements. The reason for this is that the standard basis vectors in C k which are used for the Clebsch-Gordan coefficients are exactly the standard basis vectors in R k ⊆ C k by definition of this embedding. It is not a direct generalization due to the presence of two different group elements being applied.



and let the basis element Y M Js be the copy of Y M J with index s in J∈ G [J(jl)] s=1 V J . Then the Clebsch-Gordan coefficients are the matrix elements of CG jl relative to these bases, s, JM | jm; ln := Y M Js CG jl Y m j ⊗ Y n l , i.e., the scalar product of CG jl Y m j ⊗ Y n l and Y M Js .

group G (Definition B.4) is considered that acts on R d , for example, the special orthogonal group SO(d), the orthogonal group O(d) or the finite groups C N or D N if d = 2. 15 Then for each layer, the input and output features have a certain type, i.e., representation, which may differ from layer to layer. That is, the input (and output as well) consists of a function f : R d → K c , and G acts on K c with a linear representation ρ, see Definition B.10. This action induces an action of the semi-direct product (R d , +) G on the space of all signals,16 where t ∈ (R d , +) and g ∈ G:

For l ≥ 1 and an arbitrary matrix E = a b c d that commutes with all rotation matrices ρ l (φ), i.e., E • ρ l (φ) = ρ l (φ) • E, one can easily show the constraints a = d and b = -c, from which the result follows.

-y) -K 12 (-y) -K 21 (-y) + K 22 (y) + K 12 (y) + K 21 (y) + K 22 (y) = K 11 (y), K 21 (-y) = 1 2 K 11 (-y) + K 12 (-y) -K 21 (-y) -K 22 (-y) = 1 2 K 11 (y) -K 12 (y) + K 21 (y) -K 22 (y)= K 12 (y).

l : SO(3) → U(V l ), where V l = C 2l+1 .The matrices D l (g) for g ∈ SO(3) are called the Wigner D-matrices.29 There are, up to equivalence, no other irreducible representations of SO(3) over C. A reference for all this is the original workWigner (1944).

all x ∈ S 2 . Remembering that jm|x = Y m j (x), the individual matrix elements of K j (x) are then given byJM |K j (x)|ln = j m=-j JM |jm; ln • Y m j (x).

+ : Z 2 → GL(C), ρ + (-1) = id C ρ -: Z 2 → GL(C), ρ -(-1) = -id C .Proof. This can be shown in exactly the same way as in Section E.3.1.Thus we are ready to state our result about the irreducible representations of O(3): Proposition E.24. The irreducible representations of O(3) are up to equivalence given as follows: for each l ∈ N ≥0 there are precisely two representations D l+ : O(3) → U(V l+ ) and D l-: O(3) → U(V l-) with V l+ = C 2l+1 = V l-, given as follows:

Definition F.6 (Continuous Function, Homeomorphism). A function f : X → Y is called continuous if preimages of open sets are always open. Equivalently, for each point x 0 ∈ X and each open neighborhood V of f (x 0 ) there is an open neighborhood U of x 0 such that f (U ) ⊆ V .

Definition F.21 (Completion). Let Y be a metric space. A completion of Y is a metric space Y which contains Y as a dense subspace and such that Y is complete. Proposition F.22 (Universal Property of Completions). Assume that Y ⊆ Y is a pair of metric spaces, where Y is a completion of Y . Then the following universal property holds:Let Z be any complete metric space and f : Y → Z be any uniformly continuous function. Then there is a unique continuous function f : Y → Z that extends f , i.e., such that f | Y = f . f furthermore is also uniformly continuous. This can be expressed by the following commutative diagram, where i : Y → Y is the canonical inclusion:Proof. See, for example,Kaplansky (2001).Definition F.23 (Boundedness). Let Y be a metric space. A subset A ⊆ Y is called bounded if there is a constant C > 0 such that d(a, b) ≤ C for all a, b ∈ A.Theorem F.24 (Heine-Borel Theorem). A subset A ⊆ K d is compact if and only if it is closed and bounded.Proof. SeeConway (2014), Theorem 1.4.8.

Eckart Theorem for Steerable Kernels of General Compact Groups D.1 A Wigner-Eckart Theorem for Steerable Kernels and their Kernel Bases . . . . . . D.2 Proof of the Wigner-Eckart Theorem for Kernel Operators . . . . . . . . . . . . . SO(2)-Steerable Kernels for Real Representations . . . . . . . . . . . . . . . . . . E.3 Z 2 -Steerable Kernels for Real Representations . . . . . . . . . . . . . . . . . . . E.4 SO(3)-Steerable Kernels for Complex Representations. . . . . . . . . . . . . . . . E.5 SO(3)-Steerable Kernels for Real Representations . . . . . . . . . . . . . . . . . .

Definition (Matrix Element) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D.12 Definition (Reduced Matrix Element) . . . . . . . . . . . . . . . . . . . . . . . .

ACKNOWLEDGMENTS

We thank Lucas Lang for discussions on the Wigner-Eckart Theorem and observables in physics and Patrick Forré for discussions on the link between steerable kernels and representation operators. Additionally, we are greatful for discussions with Gabriele Cesa on the connection between real and complex representations of compact groups. Furthermore, we thank Stefan Dawydiak and Terrence Tao for online discussions on aspects surrounding a real version of the Peter-Weyl theorem. Finally, we thank Roberto Bondesan, Miranda Cheng, Tom Lieberum, and Rupert McCallum for feedback on different aspects of our work.

annex

This follows from the following fact on how the norm on li V li is composed from the norms on each V li : For an element f = li f li ∈ li V li with f li ∈ V li , we have:The reason for this is that the V li are perpendicular to each other. Consequently, if (f k ) k with f k ∈ li V li converges to 0, then also (p ji (f k )) k = (f k ji ) k converges to 0, which shows the continuity of p ji in 0 and thus general continuity by Proposition F.18.Remark D.22. Note the curious fact that we cannot get rid of the equivariance condition in the preceding Lemma. I.e., if we have a linear function K : l V l → V , then we cannot deduce that K is continuous. We omit the index i for simplicity. If equivariance is no requirement, then we only deal with vector spaces, which are in general isomorphic to spaces of (maybe infinite) tuples of elements in K. Thus, let the function K : l∈N K → K given byThis is linear but not continuous in 0. The latter can be seen by considering the sequence (a k ) k with a k = (0, . . . , 0, 1 k , 0, . . . ) that has value 1 k on position k and otherwise only zeros. This sequence converges to the 0-sequence in norm. However, we have K(a k ) = 1 for all k, thus the images do not converge to 0 = K(0).From the preceding lemma, we are able to obtain the following alternative description of representation operators: Proposition D.23 (Hom-tensor Adjunction). The mapis an isomorphism.Proof. For continuity, note the following: by straightforward extensions of Lemma D.21, all linear and equivariant maps ji V ji → Hom K (V l , V J ) and ji V ji ⊗ V l → V J are necessarily continuous, and thus we can ignore continuity altogether. The rest of the proof can be done as in Agrawala (1980) . For illustrating the most important part, we show that K is actually equivariant:Remark D.24. Some readers may wonder why this is called an adjunction. With removing some of the notation in the Proposition, one has Hom G,K (T, Hom K (U, V )) ∼ = Hom G,K (T ⊗ U, V ).Now, for notational clarity, set F := Hom K (U, •) and H := (•) ⊗ U and remove the subscripts. Then the formula can be written as Hom(T, F (V )) ∼ = Hom(H(T ), V ).With replacing the notation if the Hom-spaces with a scalar product, and the isomorphism sign with equality, this reads as follows:T |F (V ) = H(T )|V . Similar to adjoints in Hilbert spaces, we can then view F and H as adjoint to each other. In categorical terms, they are a pair of adjoint functors, see Lane et al. (1998) .

E.3.5 BRINGING EVERYTHING TOGETHER

Different from the other examples, we will in this section not only engage with the final steerable kernels on homogeneous spaces but also discuss how these assemble to kernels defined on the whole plane R 2 . In the end, we will then also discuss how kernels for the regular representation would look like.But first, we engage with the homogeneous spaces. We start with X = {-1, 1} and consider steerable kernels K : X → Hom R (V in , V out ) for irreducible V in and V out . There are four possibilities for the input and output representations: STEERABLE KERNELS K : X → Hom R (V + , V + ):V + can only be in a tensor product V ⊗ V + if the sign of V is positive as well. Such a space appears precisely once in L 2 R (X) according to Section E.3.2. Since endomorphisms and Clebsch-Gordan coefficients do not appear by what we've shown before, and since complex conjugation doesn't do anything over the real numbers, a basis for steerable kernels is just given by the one kernel K + = f + itself. Here, we identify Hom R (V + , V + ) with R since it only consists of 1 × 1-matrices.

STEERABLE KERNELS

By the same arguments, a basis is given by the one kernelAgain, a basis for steerable kernels is given by KA basis is given by K + = f + .Finally, we also need to engage with the case that X = {0} consists only of a single point. Similarly to above, in the "even" case that the signs of input-and output representations agree, a basis is given by K + = f + with f + (0) = 1. If, however, the signs do not agree, then only K = 0 fulfills the constrained and the basis is empty. Now, we assemble this to kernels on the whole of R 2 . We saw above that we only need to distinguish two cases, namely (a) the case that the signs of input and output representation agree and (b) that they do not.For case (a), let K : R 2 → R be a steerable kernel, where R is isomorphic to the Hom-space between equal-sign representations. R 2 splits disjointly into orbits, namely a b , -a b for all a ∈ R ≥0 and b ∈ R. If a = 0, then the orbit is just a single point, which means that we have a vertical line of single-point orbits. The solution above showed that on each orbit, the kernel needs to be constant (since f + is constant) and overall this just translates tofor all a ≥ 0 and b ∈ R. Consequently, K is just an arbitrary left-right symmetric kernel.In the case that the input-and output representations do not share their sign, by the same arguments we see that K : R 2 → R is an arbitrary left-right anti-symmetric kernel which is zero on the vertical line 0 b for arbitrary b ∈ R.Other than these left-right restrictions, the kernel can be freely learned. Overall, this means that we learn one "half" of the kernel and can recover the other half by the symmetry property derived above.in relation to Hilbert space theory and representation theory. A reference for all this is MacRobert (1947).The spherical harmonics are continuous functions Y n l : S 2 → C for l ∈ N ≥0 and n = -l, . . . , l. Thus, they are elements of L 2 C (S 2 ). They have the following properties:1. Y n l Y n l = δ ll δ nn for all l, l , n, n .2. The linear span of the spherical harmonics is dense in L 2 C (S 2 ). 3. They transform as follows under rotation:, where D n n l (g) are the matrix elements of the Wigner D-matrices defined in Section E.4.1.Properties 1 and 2 together imply that the spherical harmonics form an orthonormal basis of. . , l). Then we already obtain L 2 C (S 2 ) = l≥0 V l1 . Now, let e n ∈ C 2l+1 be the n'th standard basis vector, for n = -l, . . . , l. Then property 3 means that the linear map given on basis vectors byis an isomorphism of unitary representations. More precisely, f is clearly a unitary transformation and a linear isomorphism, and it is furthermore equivariant on basis vectors sinceGeneral equivariance then follows from equivariance on basis vectors. This concludes this section.

E.4.3 THE CLEBSCH-GORDAN DECOMPOSITION

Explicit formulas for the Clebsch-Gordan coefficients of SO(3) are given in Bohm & Löwe (1993) . The most important fact is the following: There is a decompositionof representations. Furthermore, the Clebsch-Gordan coefficients JM |jm; ln are all real numbers, a fact that we will use in Section E.5.

E.4.4 ENDOMORPHISMS OF V J

As in the case of harmonic networks, this is again simple: we are considering representations over C, and so Schur's Lemma D.8 tells us that End SO(3) (V J ) is 1-dimensional for each irrep J. We can therefore ignore the endomorphisms once again.

E.4.5 BRINGING EVERYTHING TOGETHER

Now, with all this prior work, let us determine the equivariant kernels K : S 2 → Hom C (V l , V J ) for the irreducible representations D l : SO(3) → U(V l ) and D J : SO(3) → U(V J ). For this, we use Eq. ( 23). Since each V j appears only once in the direct sum decomposition of L 2 C (S 2 ) according to Section E.4.2 and since V J can only appear once in the direct sum decomposition of a tensor product V j ⊗ V l according to Section E.4.3 , we do not need the indices i and s. Furthermore, as we also obtain the inverse relation:Using both these relations and the rotation properties of the complex spherical harmonics from Section E.4.2 we obtain the following rotation property for the real spherical harmonics:which is analogous to the one in Section E.4.2.Lemma E.7. D l (g) r n n ∈ R for all l ≥ 0, n , n = -l, . . . , l and g ∈ SO(3).Proof. Note that since r Y n l is a real-valued function, the rotation λ(g)( r Y n l ) is real-valued as well. Thus, it is in the space L 2 R (S 2 ). The real spherical harmonics are a basis of this space, which means that the coefficients when expanding λ(g)( r Y n l ) in this basis are necessarily real as well. These coefficients are precisely given by the D l (g) r n n according to Eq. (32). Now, we have the choice to view D r l as either a real or a complex representation, but first we take the complex viewpoint and see it as a function D r l : SO(3) → GL(C 2l+1 ). Notationwise, the following is important: the "r" in D r l indicates that the elements in this matrix are real but does not tell us on which space it acts. This will always be clarified by the context. We have the following: Lemma E.8. D Proof. First of all, it is an actual linear representation sincewhere we used that D l is a linear representation. Now since Y n l and r Y n l are both orthonormal bases of L 2 C (S 2 ), the base change matrix B l needs to be a unitary matrix. Consequently, D l (g) r is a unitary matrix for each g ∈ SO(3) by Lemma E.8, and since its matrix elements are real by Lemma E.7, it automatically is an orthogonal matrix. If it was reducible, then there would be a real base change matrix that brings D r l in a nontrivial block-diagonal shape. However, this base change would in particular be complex, meaning that we would conclude that the complex version of the representation D r l is reducible. But it is not, due to Lemma E.8. Now, remember that L 2 R (S 2 ) = l≥0 r V l1 and that r V l1 is generated from the real spherical harmonics. Also, remember that the real spherical harmonics transform as in Eq. ( 32). Thus, with the same arguments as in Eq. ( 30) we obtain r V l1 ∼ = r V l , which is from the preceding lemmas an irreducible orthogonal representation. Thus, we have found the Peter-Weyl decomposition of L 2 R (S 2 ).In the next section, we will show that the D r J : SO(3) → O( r V J ) already given an exhaustive list of the irreducible representations of SO(3) over the real numbers. In this section, we first describe their endomorphism spaces since this will help in showing that there cannot be any other irreducible representations. Fortunately, the situation is again simple: Proposition E.10.for all g ∈ SO(3). Now note that as a real matrix, f is in particular a complex matrix, i.e., f ∈ C (2J+1)×(2J+1) . Also, remember that we can view D r J also as a complex irreducible representation D which is isomorphic to C by Schur's Lemma D.8. Thus, f is a complex multiple of the identity. Since f is a real matrix, it is thus a real multiple of the identity. The result follows.

E.5.3 GENERAL NOTES ON THE RELATION BETWEEN REAL AND COMPLEX REPRESENTATIONS

In the next section we show that there can, up to isomorphism, not be other irreducible representations than the D r l : SO(3) → O( r V l ). In order to do so, we first need to better understand the relationship between real and complex representations of compact groups. These investigations will carry over to the investigations for O(3) that we do in Section E.6 as well.The following definition of a classification of real irreducible representations of a compact group G can be found in Bröcker & Dieck (2003) , Theorem II.6.7. In this book, it is a theorem, since the authors give an independent but equivalent definition of these notions. Definition E.11 (Real, Complex, and Quaternionic Type Irreducible Representations). Let ρ : G → O(V ) be a real irreducible representation of a compact group G. Then ρ is said to be of, where H are the quaternions.Here, these isomorphisms respect both addition and multiplication. The multiplication in the endomorphism spaces is thereby given by composition of functions. Furthermore, Bröcker & Dieck (2003) shows in Theorem II.6.3 that there is no other possibility for an irreducible real representation, i.e., they can be completely categorized by being of real, complex or quaternionic type. Additionally, since R, C and H already differ in their R-dimension, it is enough to check whether the R-dimension of an endomorphism space is 1, 2 or 4 in order to do the classification.Proof. From Section E.4.1 and Lemma E.8 we know that the D r l : SO(3) → U(C 2l+1 ) give us, up to equivalence, all the complex irreducible representations of SO(3). According to Definition E.14 we now need to understand that its restriction splits into the direct sum of twice the same irreducible real representation. We do this as follows:We can write r(C 2l+1 ) = R 2l+1 ⊕ (iR) 2l+1 = r V l ⊕ i r V l , which is a decomposition of C 2l+1 when viewed as an R-vector space. Then, we can note that bothare well-defined R-representations, which follows from the fact that the matrix elements are all real. Furthermore, the first map is actually an irreducible real representation by Lemma E.9. The second one is isomorphic to the first since one can show thatis an isomorphism of real SO(3)-representations. This gives us precisely the splitting of r(C 2l+1 ) as a representation that we were looking for.Corollary E.18. All irreducible real representations of SO(3) are of real type.Proof. This follows directly from Lemma E.17 and Proposition E.15.Proposition E.19. The D Proof. Assume that ρ : SO(3) → GL(V ) is an irreducible real representation of SO(3). It is of real type by Corollary E.18. By Proposition E.16, the extension e(ρ) : G → GL(e(V )) is an irreducible complex representation. Since the D r l give us all complex irreducible representations up to equivalence by Section E.4.1 and Lemma E.8, there is an equivalence of complex SO(3)representations e(V ) ∼ = C 2l+1 for some l. Since functors respect isomorphisms (and equivalences are isomorphisms in the categories of G-representations) and the restriction operation is a functor, 32 and using Proposition E.13 as well as the proof of Lemma E.17 we obtain:Using the Krull-Remak-Schmidt Theorem B.39, we see that there is an isomorphism of SO(3)representations V ∼ = r V l . This finishes the proof.

E.5.5 THE CLEBSCH-GORDAN DECOMPOSITION

We are almost there. The only thing left to understand is the Clebsch-Gordan decomposition. Remember the following from Section E.4.3: For the complex irreducible representations there are decompositionswhere on each space, the representations D j , D l and D J are given by the Wigner D-matrices. Furthermore, the Clebsch-Gordan coefficients are all real. Now, we know that D r l is, as a complex representation, isomorphic to D l by Lemma E.8, and such a representation then acts on C 2l+1 as well. Consequently, we also get the decompositionof the complex representations D r j and D r l . Obviously, the Clebsch-Gordan coefficients can be chosen to be exactly the same as before, and thus they are again real.Let the above isomorphism be called f . Now, we can view all involved vector spaces as R-vector spaces as well. Furthermore, we have subspaces r V j = R 2j+1 , r V l = R 2l+1 and r V J = R 2J+1 which are also invariant under the representations D r j , D r l and D r J . Consequently, we can just restrict the isomorphism above to a mapwhich is well-defined since the Clebsch-Gordan coefficients are real. It needs to be injective, since it is a restriction of an isomorphism. For dimension reasons, the restriction then needs to be an isomorphism, and obviously, it has the exact same Clebsch-Gordan coefficients as the original map f . 33

E.5.6 BRINGING EVERYTHING TOGETHER

By what we've shown in the last sections, we see that the situation is basically the same as in Section E.4.5. The only thing that changes is that we now use the real spherical harmonics, and therefore the complex conjugation disappears. What this overall means is the following: let D r l : SO(3) → O( r V l ) and D r J : SO(3) → O( r V J ) be the representations determining the input and output fields. Then a basis for steerable kernels K : S 2 → Hom R ( r V l , r V J ) is given by kernels K j : S 2 → Hom R ( r V l , r V J ) for all |l -J| ≤ j ≤ l + J. The matrix elements are given by

E.6 O(3)-STEERABLE KERNELS FOR COMPLEX REPRESENTATIONS

In this section, we deal with O(3)-equivariant kernels for complex representations and then, in the next section, will transport the results over to real representations. In the earlier examples, we saw that the Peter-Weyl decomposition of L 2 K (X) always contained each irreducible representation of the symmetry group exactly once. The example of O(3) is the first in which this is not the case: parity will play a role in determining which irreducible representations make their way in the space of square-integrable functions and which do not. Overall, we hope that the example of O(3) is a sufficient justification for our use of the multiplicities m j of irreducible representations that we considered in all our theorems. O(3)-equivariant networks are to the best of our knowledge not described in any published work yet.

E.6.1 THE IRREDUCIBLE REPRESENTATIONS OF O(3)

The most important observation is the following, after which we can deduce the irreducible representations of O(3) from those of SO(3): Lemma E.20. Let Z 2 := ({-1, +1}, •) be the group with two elements. Then the mapis an isomorphism of groups.Proof. It is a group homomorphism since s ∈ {-1, +1} can be represented by a multiple of the identity matrix, and as such it commutes with every matrix g. That • is an isomorphism follows since all matrices in O(3) either have determinant 1 or -1. The matrices with determinant 1 form SO(3) and are the image of {+1} × SO(3). The matrices with determinant -1 are the image of {-1} × SO(3).Note the fact that for g ∈ SO(3), -g has determinant -1, which we used in the proof. This does only hold for g ∈ SO(d) with d being odd. Therefore, the above lemma is not true for d even. In the even case, we obtain a semidirect product and the story complicates somewhat.Published as a conference paper at ICLR 2021The considerations in this section follow almost entirely from Section E.4.2. There we saw that, as a representation over SO(3), we have a decompositionwith the spaces V l1 being spanned by the spherical harmonics Y n l , n = -l, . . . , l. We immediately see that in L 2 C (S 2 ), viewed as a representation over O(3), there is not enough space for all the irreducible representations, since they appear in pairs as shown in Proposition E.24. 35 Thus, we need to figure out which irreducible representations are present and which are not. The core of this question is answered by the following proposition: Lemma E.25 (Parity in spherical harmonics). The spherical harmonics obey the following parity rules:Proof. This is a well-known property of the spherical harmonics.Thus, together with Section E.4.2 we get the following transformation behavior of spherical harmonics under the group O(3), where s ∈ Z 2 and g ∈ SO(3):Thus, we obtain the following decomposition of L 2 C (S 2 ):Here, V l1+ and V l1-are generated from the spherical harmonics of order l and we have V l1+ ∼ = V l+ and V l1-∼ = V l-as representations according to the transformation behavior we saw above.

E.6.3 THE CLEBSCH-GORDAN DECOMPOSITION

Remember from Section E.4.3 that we have a decomposition of SO(3)-representationsgiven by real Clebsch-Gordan coefficients. Now for O(3), remember that as vector spaces we have for all j (and equally for l and J) equalities V j = V j-= V j+ , and so we guess that in the isomorphism above, we just need to figure out the correct signs in order to be compatible with the 35 With this, we mean the following: the irreducible representations of SO(3) already cover L 2 C (S 2 ). O(3) has even more irreducible representations than SO(3), so it is a priori clear that they cannot all fit into L 2 C (S 2 ).corresponding representations. The idea is that "multiplying the signs at the left" should lead to the "sign at the right", and this paradigm leads us to believe that there are the following isomorphisms:We just show the lower-left isomorphism since the arguments are always the same. So, assume that f : V j ⊗ V l → l+j J=|l-j| V J is an isomorphism and thus in particular intertwines the given representations. Now, we take the exact same map f : V j-⊗ V l+ → l+j J=|l-j| V J-and only need to figure out that it is equivariant with respect to the given representations, using the same property for the original isomorphism we started with:This shows the claim. From these considerations, it also follows that the Clebsch-Gordan coefficients do not in any way depend on the signs of the spaces V j , V l , V J . Thus, we write them generically as JM |jm; ln .

E.6.4 ENDOMORPHISMS OF V J

As always over C, Schur's Lemma D.8 shows that the endomorphism spaces are 1-dimensional, and thus we can ignore endomorphisms.

E.6.5 BRINGING EVERYTHING TOGETHER

Now we can finally compute the basis for steerable kernels. The section on the Clebsch-Gordan decomposition suggests that we need to do a case distinction for this. Namely, the possible kernels depend on the signs of V l and V J . The results basically follow analogously to the results in Section E.4.5.

STEERABLE KERNELS

V J+ can only be in a tensor product V j ⊗ V l+ if the sign of j is positive. Spaces V j1+ appear in the tensor product decomposition of L 2 C (S 2 ) precisely for even j, according to Section E.6.2. Thus, a basis for steerable kernels is given by all K j with even j ∈ |l -J|, . . . , l + J . It has matrix elementsexactly as in Section E.4.5. Analogously, a basis for steerable kernels is given by all K j , with odd j ∈ |l -J|, . . . , l + J . Again, a basis for steerable kernels is given by all K j with odd j ∈ |l -J|, . . . , l + J .Published as a conference paper at ICLR 2021As in the first case, a basis for steerable kernels is given by all K j with even j ∈ |l -J|, . . . , l +J .Thus, we have determined all kernel bases for the group O(3) over the complex numbers. Compared to SO(3), we see that the kernel spaces get roughly halved. The reason for this is that with a bigger symmetry group, the kernel needs to obey more rules, which means that the kernel constraint has fewer solutions.

E.7 O(3)-STEERABLE KERNELS FOR REAL REPRESENTATIONS

Basically, we can argue exactly as in Section E.5.4 in order to transport the results for complex representations over to the real world. We shortly sketch the procedure and outcome. As we know from Section E.3.1, ρ -: Z 2 → O(R) and ρ + : Z 2 → O(R) are the only irreducible real representations of Z 2 . Thus, for each l ≥ 0 we obtain two irreducible real representations DAs before, they also act on complex vector spaces and are as such isomorphic to the complex irreducible representations of O(3). One can then show as in Lemma E.17 that all complex irreducible representations are of real type since they split into two copies of the real version of this representation. Thus, by Corollary E.18, all real irreducible representations are of real type, and this means that we can proceed exactly as in Proposition E.19 in order to show that the D For the Peter-Weyl decomposition of L 2 R (S 2 ), we only need to note that the real spherical harmonics emerge with a base change from the complex ones, as seen in Eq. ( 31), and thus fulfill the same parity rules as the complex spherical harmonics. This gives us a decompositionFor the Clebsch-Gordan coefficients, we again get decompositionswhere the signs on the left must "multiply to" the signs on the right, as in Section E.6.3. Finally, the endomorphism spaces must be 1-dimensional since the endomorphism spaces of the complex versions are 1-dimensional.Overall, we obtain the same kernels as in Section E.6.5, only that we need to use the real spherical harmonics as our steerable filters and can get rid of the complex conjugation.

F MATHEMATICAL PRELIMINARIES

In this chapter, we state mathematical preliminaries that we use throughout the earlier chapters. In this whole chapter, K is one of the two fields R or C.

F.1 TOPOLOGICAL SPACES, NORMED SPACES, AND METRIC SPACES

Since in this work, we want to develop the theory of representations over compact groups, and since this is a topological property, we need to formulate some topological concepts (Conway, 2014) . Additionally, the vector spaces on which our compact groups act also carry a topology, mostly coming from their Hilbert space structure. Definition F.1 (Topological Space, Open Sets, Closed Sets). A topological space (X, T ) consists of a set X and a set T of subsets of Proof. See Conway (2014) , Proposition 2.8.7.It can be shown that all quotient maps are equivalent to a construction of the form q : X → X/∼. Namely, for a quotient map f :is a well-defined continuous map by the universal property of quotient maps Proposition F.12. One can show that this is a homeomorphism. Thus for a quotient map f : X → Y we also call Y a quotient space.Our route for defining concrete topologies is in most cases through the existence of inner products on Hilbert spaces, which will be defined in detail in Definition F.32. Namely, inner products define norms, which define metrics (Kaplansky, 2001) , which in turn define topologies. For this, we need some definitions: Definition F.13 (Norm). Let V be a K-vector space, A norm on V is a map • : V → R ≥0 with the following properties for all λ ∈ K and v, w ∈ V : Additionally, we need notions about convergence in this work. Since we will deal with them mostly in the context of metric spaces (with normed vector spaces and Hilbert spaces being special cases, as explained above), we focus on these notions for metric spaces. Definition F.15 (Convergent Sequence). Let Y be a metric space. Then a sequenceWith this in mind, one can give an equivalent definition of continuity that applies to metric spaces:we also have that the sequence f (y k ) converges to f (y). This can be understood in terms of the function "commuting with limits":Equivalently, the following holds: f : Y → Z is continuous in y ∈ Y if and only of for all > 0 there is a δ > 0 such that f (B δ (y)) ⊆ B (f (y)). Definition F.17 (Uniform Continuity). A function f : Y → Z between metric spaces is called uniformly continuous if for each > 0 there is a δ > 0 such that for all y, y ∈ Y with d Y (y, y ) < δ we obtain d Y (f (y), f (y )) < .

F.2 LIMITS OF NETS AND APPROXIMATED DIRAC DELTA FUNCTIONS

In this section, we discuss "limits of nets", where a net can be imagined as a sequence over an index set which may be "too big to be handled as a sequence over the natural numbers". They appear in the formulation of Theorem C.7. This material can, for example, be found in (Conway, 2014) . Definition F.26 (Partially Ordered Set, Directed Set). Let I be an index set and ≤ a relation on it. I = (I, ≤) is a partially ordered set if:1. ≤ is reflexive, i.e., i ≤ i for all i ∈ I.2. ≤ is antisymmetric, that is: i ≤ j and j ≤ i together imply i = j.3. ≤ is transitive, that is: i ≤ j and j ≤ k together imply i ≤ k.A partially ordered set I is called directed if for all i, j ∈ I there exists k ∈ I such that i ≤ k and j ≤ k.Example F.27. Clearly, the natural numbers together with the standard order relation form a directed set.An important example for our purposes is the following: let Z be any topological space (for example, a homogeneous space X of a compact group G) and x ∈ Z be any point. Furthermore, define U x as the set of open neighborhoods of x, i.e., open sets U ⊆ Z such that x ∈ U . On this set, we define U ≤ V if U ⊇ V , i.e., by reversed inclusion. Then (U x , ≤) is a directed set: Note that U x is usually not totally ordered, i.e., there are usually U, V ∈ U x such that neither U ⊇ V nor V ⊇ U . Definition F.28 (Net). Let Z be any topological space and I a directed set. Then a net in Z is a function x : I → Z. We write a net as (x i ) i∈I , in analogy to sequences. Definition F.29 (Convergence of Nets). Let (x i ) i∈I be a net in a topological space Z. Let x ∈ Z. We say that (x i ) i∈I converges to x, written lim i∈I x i = x, if the following holds: for all open neighborhoods U of x there is an i 0 ∈ I such that for all i ≥ i 0 we have x i ∈ U . Now we define the approximated Dirac delta for the special case that X is a homogeneous space of a compact group G. Remember that there is a Haar measure µ on X. Definition F.30 (Approximated Dirac Delta). For ∅ = U ⊆ X open, we define the approximated Dirac delta by δ U : X → K withA priori, it is unclear that open sets have positive measure, which is needed for the well-definedness of this construction, since otherwise we divide by zero. Thus, we need the following lemma: Lemma F.31. Let ∅ = U ⊆ X be an open set. Then µ(U ) > 0.Proof. Consider the family of open sets (gU ) g∈G . That all of these sets are necessarily open follows since the action G × X → X is continuous, and thus by the definition of a group action, each g ∈ G induces a homeomorphism X → X, x → gx. Now, since the action is transitive, (gU ) g∈G is an open cover of X, and since X is compact, see Definition F.7, it has an open subcover (g i U ) n i=1 with g i ∈ G. Note that µ(g i U ) = µ(U ) for all i since the measure µ on X is by definition left invariant under the action of G. Overall, we obtainand thus µ(U ) ≥ 1 n > 0.F.3 PRE-HILBERT SPACES AND HILBERT SPACES Here, we state foundational concepts in the theory of Hilbert spaces (Debnath & Mikusinski, 2005) . Definition F.32 (pre-Hilbert Space, Hilbert space). A pre-Hilbert space V = (V, •|• ) consists of the following data:1. A vector space V over K.2. An inner productIt has the following properties that hold for all x, x , y, y ∈ V , λ ∈ K:1. The inner product is conjugate linear in the first component: x + x |y = x|y + x |y and λx|y = λ x|y , where λ is the complex conjugate of λ. If additionally, the following statement holds, then V is called a Hilbert Space:5. V , together with the norm • : V → V induced from the inner product by x := x|x , and consequently the metric defined by d(x, y) := x -y , is a complete metric space as in Definition F.20. Remark F.33. Of course, all Hilbert Spaces are pre-Hilbert spaces, and so all Propositions about pre-Hilbert spaces in the following apply to Hilbert spaces just as well.Note that the first property follows from the second and third. We also mention that usually, inner products on Hilbert spaces are assumed to be linear in the first and conjugate linear in the second component, in contrast to how we view it. The reason for our choice is that our work is inspired by connections to physics where our convention is more common. It is basically the bra-ket convention. Furthermore, note that if K = R, then conjugate linear maps are linear and thus the inner product will be linear in both components. Additionally, it will be symmetric instead of only conjugate symmetric. Proposition F.34 (Cauchy-Schwartz Inequality). For any two elements v, w in a pre-Hilbert space V , we have | v|w | ≤ v • w . We have equality if and only if v and w are linearly dependent.Proof. See Debnath & Mikusinski (2005) , Theorem 3.2.9. Definition F.35 (Orthogonality). Two vectors v, w in a pre-Hilbert space V are called orthogonal, written v ⊥ w, if v|w = 0.Obviously, being orthogonal is a symmetric relation. Definition F.36 (Orthogonal Complement). Let V be a pre-Hilbert space and W ⊆ V a subset.The orthogonal complement of W , denoted W ⊥ , is the set of all vectors in V that are orthogonal to W .Proposition F.37 (Closedness of Complements). Let W ⊆ V be a subset of a pre-Hilbert space V . Then W ⊥ is a topologically closed linear subspace of V .Proof. See Debnath & Mikusinski (2005) , Theorem 3.6.2. Proposition F.38 (Continuity of Scalar Product). For any pre-Hilbert space V , the scalar productProof. See Debnath & Mikusinski (2005) , Theorem 3.3.12.Definition F.39 (Orthonormal System). A family (v i ) i∈I of elements in a pre-Hilbert space is called orthonormal system if v i = 1 for all i ∈ I and v i ⊥ v j for all i = j. Definition F.40 (Orthonormal Basis). An orthonormal system (v i ) i∈I in a Hilbert space V is called orthonormal basis if the linear span of all {v i } i∈I is dense in V . If this is the case, then each v ∈ V can be uniquely written as v = i∈I α i v i with only countably many α i ∈ K being nonzero. The coefficients are given by α i = v i |v .We stress that while the index set I can be uncountably infinite, the sequence expansions of each element in V only have countably many entries. It is obvious from the Peter-Weyl Theorem B.22 and this definition that the functionsform an orthonormal basis of L 2 K (X). Proposition F.41 (Gram-Schmidt Orthonormalization). For every linearly independent sequence (y k ) k in a pre-Hilbert space V with N ∈ N ∪ {∞} elements, one can find an orthonormal sequence (v k ) k in V such that the following holds: for all n ∈ N, n ≤ N , the progressive linear span stays the same:span K (v 1 , . . . , v n ) = span K (y 1 , . . . , y n ). In particular, since every finite-dimensional Hilbert space has a vector space basis, it necessarily also has an orthonormal basis.Proof. See Debnath & Mikusinski (2005) , page 110. Definition F.42 (Adjoint of an Operator). Let f : V → V be a continuous linear function between Hilbert spaces. Then there is a unique continuous linear function f * : V → V such that for all v ∈ V and v ∈ V one has:The existence of adjoints is, for example, discussed in Debnath & Mikusinski (2005) , page 158. This book only considers the case of operators on a Hilbert space to itself, but these considerations generalize to the setting with two different Hilbert spaces. One has the following: Proposition F.43. Let f : V → V and g : V → V be continuous linear functions between Hilbert spaces. Then:Proof. All of these properties follow directly from the uniqueness of adjoints.Proposition F.44. Let f : V → V be a unitary transformation between Hilbert spaces, i.e., an invertible linear function such that f (v)|f (w) = v|w for all v, w ∈ V . Then the adjoint is the inverse, i.e., f * = f -1 .Proof. First of all, the inverse f -1 is again continuous due to the unitarity of f . Furthermore, due to the unitarity, we obtainfor all v ∈ V and v ∈ V . Due to the uniqueness of adjoints, we obtain f -1 = f * .The following proposition is sometimes used in the main text: Proposition F.45. Let v, w ∈ V be two elements in a pre-Hilbert space such that v|u = w|u for all u ∈ V . Then v = w.Proof. We have v -w|u = v|u -w|u = 0 for all u ∈ V . In particular, when setting u = v -w we obtain v -w|v -w = 0 and thus v -w = 0, i.e., v = w.Proposition F.46 (Orthogonal Projection Operators). Let W ⊆ V be a topologically closed subspace of a Hilbert space. Then there is a continuous linear function P : V → W such that for all v ∈ V and w ∈ W we have P (v)|w = v|w . Furthermore, if W is finite-dimensional and w 1 , . . . , w n and orthonormal basis, then P is given explicitly byProof. That W is topologically closed means that W , with the scalar product inherited from V , is a complete metric space. Thus, W is a Hilbert space as well. Therefore, the continuous linear embedding i : W → V given by w → w has an adjoint i * : V → W by Definition F.42. Set P := i * . For arbitrary v ∈ V and w ∈ W we obtain: P (v)|w = i * (v)|w = v|i(w) = v|w .For the second statement, note that for all j ∈ {1, . . . , n} we have, using that the w i are orthonormal: Proposition F.47. Let (V, •|• ) be a finite-dimensional pre-Hilbert space. Then this space is already complete and thus a Hilbert space.In particular, all finite-dimensional subspaces of Hilbert spaces are topologically closed.Proof. The proof of the Gram-Schmidt orthonormalization in Proposition F.41 does not make use of the completeness of the Hilbert space, and thus it holds for pre-Hilbert spaces as well. Consequently, V , being finite-dimensional, has an orthonormal basis. It is thus isomorphic to K n together with the standard scalar product, which is well-known to be complete. Thus, V is a Hilbert space. Now, let W ⊆ V be a finite-dimensional subspace of a Hilbert space V which may be infinitedimensional. Then W is a pre-Hilbert space and by what was just shown a Hilbert space. Consequently, all sequences in W which have a limit in V need, by completeness, to have that limit already in W . This shows that W is topologically closed.

