EFFICIENT GENERALIZED SPHERICAL CNNS

Abstract

Many problems across computer vision and the natural sciences require the analysis of spherical data, for which representations may be learned efficiently by encoding equivariance to rotational symmetries. We present a generalized spherical CNN framework that encompasses various existing approaches and allows them to be leveraged alongside each other. The only existing non-linear spherical CNN layer that is strictly equivariant has complexity OpC 2 L 5 q, where C is a measure of representational capacity and L the spherical harmonic bandlimit. Such a high computational cost often prohibits the use of strictly equivariant spherical CNNs. We develop two new strictly equivariant layers with reduced complexity OpCL 4 q and OpCL 3 log Lq, making larger, more expressive models computationally feasible. Moreover, we adopt efficient sampling theory to achieve further computational savings. We show that these developments allow the construction of more expressive hybrid models that achieve state-of-the-art accuracy and parameter efficiency on spherical benchmark problems.

1. INTRODUCTION

Many fields involve data that live inherently on spherical manifolds, e.g. 360 ˝photo and video content in virtual reality and computer vision, the cosmic microwave background radiation from the Big Bang in cosmology, topographic and gravitational maps in planetary sciences, and molecular shape orientations in molecular chemistry, to name just a few. Convolutional neural networks (CNNs) have been tremendously effective for data defined on Euclidean domains, such as the 1D line, 2D plane, or nD volumes, thanks in part to their translation invariance properties. However, these techniques are not effective for data defined on spherical manifolds, which have a very different geometric structure to Euclidean spaces (see Appendix A). To transfer the remarkable success of deep learning to data defined on spherical domains, deep learning techniques defined inherently on the sphere are required. Recently, a number of spherical CNN constructions have been proposed. Existing CNN constructions on the sphere fall broadly into three categories: fully real (i.e. pixel) space approaches (e.g. Boomsma & Frellsen, 2017; Jiang et al., 2019; Perraudin et al., 2019; Cohen et al., 2019) ; combined real and harmonic space approaches (Cohen et al., 2018; Esteves et al., 2018; 2020) ; and fully harmonic space approaches (Kondor et al., 2018) . Real space approaches can often be computed efficiently but they necessarily provide an approximate representation of spherical signals and the connection to the underlying continuous symmetries of the sphere is lost. Consequently, such approaches cannot fully capture rotational equivariance. Other constructions take a combined real and harmonic space approach (Cohen et al., 2018; Esteves et al., 2018; 2020) , where sampling theorems (Driscoll & Healy, 1994; Kostelec & Rockmore, 2008) are exploited to connect with underlying continuous signal representations to capture the continuous symmetries of the sphere. However, in these approaches non-linear activation functions are computed pointwise in real space, which induces aliasing errors that break strict rotational equivariance. Fully harmonic space spherical CNNs have been constructed by Kondor et al. (2018) . A continual connection with underlying continuous signal representations is captured by using harmonic signal representations throughout. Consequently, this is the only approach exhibiting strict rotational equivariance. However, strict equivariance comes at great computational cost, which can often prohibit usage. In this article we present a generalized framework for CNNs on the sphere (and rotation group), which encompasses and builds on the influential approaches of Cohen et al. (2018 ), Esteves et al. (2018) and Kondor et al. (2018) and allows them to be leveraged alongside each other. We adopt a harmonic signal representation in order to retain the connection with underlying continuous representations and thus capture all symmetries and geometric properties of the sphere. We construct new fully harmonic (non-linear) spherical layers that are strictly rotationally equivariant, are parameterefficient, and dramatically reduce computational cost compared to similar approaches. This is achieved by a channel-wise structure, constrained generalized convolutions, and an optimized degree mixing set determined by a minimum spanning tree. Furthermore, we adopt efficient sampling theorems on the sphere (McEwen & Wiaux, 2011) and rotation group (McEwen et al., 2015a) to improve efficiency compared to the sampling theorems used in existing approaches (Driscoll & Healy, 1994; Kostelec & Rockmore, 2008) . We demonstrate state-of-the-art performance on all spherical benchmark problems considered, both in terms of accuracy and parameter efficiency.

2. GENERALIZED SPHERICAL CNNS

We first overview the theoretical underpinnings of the spherical CNN frameworks introduced by Cohen et al. ( 2018), Esteves et al. (2018), and Kondor et al. (2018) , which make a connection to underlying continuous signals through harmonic representations. For more in-depth treatments of the underlying harmonic analysis we recommend Esteves (2020), Kennedy & Sadeghi (2013) and Gallier & Quaintance (2019) . We then present a generalized spherical layer in which these and other existing frameworks are encompassed, allowing existing frameworks to be easily integrated and leveraged alongside each other in hybrid networks. Throughout the following we consider a network composed of S rotationally equivariant layers A p1q , ...., A pSq , where the i-th layer A piq maps an input activation f pi´1q P H pi´1q onto an output activation f piq P H piq . We focus on the case where the network input space H p0q consists of spherical signals (but note that input signals on the rotation group may also be considered).

2.1. SIGNALS ON THE SPHERE AND ROTATION GROUP

Let L 2 pΩq denote the space of square-integrable functions over domain Ω. A signal f P L 2 pΩq on the sphere (Ω " S 2 ) or rotation group (Ω " SOp3q) can be rotated by ρ P SOp3q by defining the action of rotation on signals by R ρ f pωq " f pρ ´1ωq for ω P Ω. An operator A : L 2 pΩ 1 q Ñ L 2 pΩ 2 q, where Ω 1 , Ω 2 P tS 2 , SOp3qu, is then equivariant to rotations if R ρ pApf qq " ApR ρ f q for all f P L 2 pΩ 1 q and ρ P SOp3q, i.e. rotating the function before application of the operator is equivalent to application of the operator first, followed by a rotation. A spherical signal f P L 2 pS 2 q admits a harmonic representation p f 0 , f 1 , ..., q where f P C 2 `1 are the harmonic coefficients given by the inner product xf, Y m y, where Y m are the spherical harmonic functions of degree and order |m| ď . Likewise a signal f P L 2 pSOp3qq on the rotation group admits a harmonic representation p f 0 , f 1 , ...q where f P C p2 `1qˆp2 `1q are the harmonic coefficients with pm, nq-th entry xf, D mn y for integers |m|, |n| ď , where D : SOp3q Ñ C p2 `1qˆp2 `1q is the unique 2 `1 dimensional irreducible group representation of SOp3q on C p2 `1q . The rotation f Þ Ñ R ρ f of a signal f P L 2 pΩq can be described in harmonic space by f Þ Ñ D pρq f . A signal on the sphere or rotation group is said to be bandlimited at L if, respectively, xf, Y m y " 0 or xf, D mn y " 0 for ě L. Furthermore, a signal on the rotation group is said to be azimuthally bandlimited at N if, additionally, xf, D mn y " 0 for |n| ě N . Bandlimited signals therefore admit finite harmonic representations p f 0 , ..., f L´1 q. In practice real-world signals can be accurately represented by suitably bandlimited signals; henceforth, we assume signals are bandlimited.

2.2. CONVOLUTION ON THE SPHERE AND ROTATION GROUP

A standard definition of convolution between two signals f, ψ P L 2 pΩq on either the sphere (Ω " S 2 ) or rotation group (Ω " SOp3q) is given by pf ‹ ψqpρq " xf, R ρ ψy " ż Ω dµpωqf pωq ψ ˚pρ ´1ωq, (1)

