LAPLACIAN EIGENSPACES, HOROCYCLES AND NEU-RON MODELS ON HYPERBOLIC SPACES

Abstract

We use hyperbolic Poisson kernel to construct the horocycle neuron model on hyperbolic spaces, which is a spectral generalization of the classical neuron model. We prove a universal approximation theorem for horocycle neurons. As a corollary, we obtain a state-of-the-art result on the expressivity of f 1 a,p , which is used in the hyperbolic multiple linear regression. Our experiments get state-of-the-art results on the Poincare-embedding subtree classification task and the classification accuracy of the two-dimensional visualization of images.

1. INTRODUCTION

Conventional deep network techniques attempt to use architecture based on compositions of simple functions to learn representations of Euclidean data (LeCun et al., 2015) . They have achieved remarkable successes in a wide range of applications (Hinton et al., 2012; He et al., 2016) . Geometric deep learning, a niche field that has caught the attention of many authors, attempts to generalize conventional learning techniques to non-Euclidean spaces (Bronstein et al., 2017; Monti et al., 2017) . There has been growing interest in using hyperbolic spaces in machine learning tasks because they are well-suited for tree-like data representation (Ontrup & Ritter, 2005; Alanis-Lobato et al., 2016; Nickel & Kiela, 2017; Chamberlain et al., 2018; Nickel & Kiela, 2018; Sala et al., 2018; Ganea et al., 2018b; Tifrea et al., 2019; Chami et al., 2019; Liu et al., 2019; Balazevic et al., 2019; Yu & Sa, 2019; Gulcehre et al., 2019; Law et al., 2019) . Many authors have introduced hyperbolic analogs of classical learning tools (Ganea et al., 2018a; Cho et al., 2019; Nagano et al., 2019; Grattarola et al., 2019; Mathieu et al., 2019; Ovinnikov, 2020; Khrulkov et al., 2020; Shimizu et al., 2020) . Spectral methods are successful in machine learning, from nonlinear dimensionality reduction (Belkin & Partha, 2002) to clustering (Shi & Malik, 2000; Ng et al., 2002) to hashing (Weiss et al., 2009) to graph CNNs (Bruna et al., 2014) to spherical CNNs (Cohen et al., 2018) and to inference networks (Pfau et al., 2019) . Spectral methods have been applied to learning tasks on spheres (Cohen et al., 2018) and graphs (Bruna et al., 2014) , but not yet on hyperbolic spaces. This paper studies a spectral generalization of the FC (affine) layer on hyperbolic spaces. Before presenting the spectral generalization of the affine layer, we introduce some notations. Let (•, •) E be the inner product, | • | the Euclidean norm, and ρ an activation function. The Poincaré ball model of the hyperbolic space H n (n≥2) is a manifold {x∈R n : |x|<1} equipped with a Riemannian metric ds 2 H n = n i=1 4(1-|x| 2 ) -2 dx 2 i . The boundary of H n under its canonical embedding in R n is the unit sphere S n-foot_0 . The classical neuron y=ρ((x, w) E +b) is of input x∈R n , output y∈R, with trainable parameters w∈R n , b∈R. An affine layer R n → R m is a concatenation of m neurons. An alternative representation of the neuron x →ρ((x, w) E +b) is given by 1 x∈R n → ρ(λ(x, ω) E +b), ω∈S n-1 , λ, b∈R. (1) This neuron is constant over any hyperplane that is perpendicular to a fixed direction ω. In H n , a horocycle is a n-1 dimensional sphere (one point deleted) that is tangential to S n-1 . Horocycles are hyperbolic counterparts of hyperplanes (Bonola, 2012) . Horocyclic waves x, ω H := 1 2 log 1-|x| 2 |x-ω| 2 are constant over any horocycle that is tangential to S n-1 at ω. Therefore, generalizes the classical neuron model (1), and a concatenation of finitely many (2) generalizes the FC (affine) layer. We call (2) a horocycle neuron. Figure 1 (middle) is an example on H 2 . x∈H n → ρ(λ x, ω H +b), ω∈S n-1 , λ, b∈R The neuron models in (1, 2) are related to spectral theory because (•, ω) E (respectively •, ω H ) are building blocks of the Euclidean (respectively hyperbolic) Laplacian eigenspace. Moreover, many L 2 spaces have a basis given by Laplacian eigenfunctions (Einsiedler & Ward, 2017). On one side, all Euclidean (respectively hyperbolic) eigenfunctions are some kind of "superposition" of (•, ω) E (respectively •, ω H ). On the other side, neural networks based on (1) (respectively (2)) represent functions that are another kind of "superposition" of (•, ω) E (respectively •, ω H ). They heuristically explain why the universal approximation property is likely to hold for networks constructed by ( 1) and (2). By using the Hahn Banach theorem, an injectivity theorem of Helgason, and integral formula, we prove that finite sums of horocycle neurons (2) are universal approximators (Theorem 2). Let p ∈ H n , T p (H n ) be the tangent space of H n at p, a ∈ T p (H n ), ⊕ be the Möbius addition (Ungar, 2008) . We remind the reader that the following functions f 1 a,p (x) = 2|a| 1 -|p| 2 sinh -1 2(-p ⊕ x, a)E (1 -| -p ⊕ x| 2 )|a| are building blocks of many hyperbolic learning tools (Ganea et al., 2018a; Mathieu et al., 2019; Shimizu et al., 2020) . Figure 1 illustrates examples of different neuron models (1, 2, 3) on H 2 . In Lemma 1, we shall present a close relationship between (2) and (3). Using this relationship and Theorem 2, we obtain a novel result on the expressivity of f 1 a,p (Corollary 1). This article contributes to hyperbolic learning. We first apply spectral methods, such as the horocycle, to hyperbolic deep learning. We prove results on the expressivity of horocycle neurons (2) and f 1 a,p (3). With horocycle neurons, we obtain state-of-the-art results on the Poincaré-embedding subtree classification task and the classification accuracy of the 2-D visualization of images in in the experiment.

2. RELATED WORK

Universal approximation There is a vast literature on universal approximation (Cybenko, 1989; Hornik et al., 1989; Funahashi, 1989; Leshno et al., 1993) . Cybenko (1989)'s existential approach uses the Hahn Banach theorem and Fourier transform of Radon measures. To prove Theorem 2, we also use the Hahn Banach theorem, and additionally an integral formula (7) and an injectivity Theorem 1 of Helgason. Generalizing integral formulas and injectivity theorems is easier than generalizing Fourier transform of Radon measures on most non-Euclidean spaces. (Carroll & Dickinson, 1989) uses the inverse Radon transform to prove universal approximation theorems. This method relates to ours, as injectivity theorems are akin to inverse Radon transforms. However, using the injectivity theorem is an existential approach while using the inverse Radon transform is a constructive one. (2018) use a basis of L 2 (X) given by eigenfunctions, where X is a finite graph or the sphere. Because L 2 (H n ) has no eigenfunctions as a basis, our approach is different from theirs.

Spectral methods

Hyperbolic deep learning One part of hyperbolic learning concerns embedding data into the hyperbolic space (Nickel & Kiela, 2017; Sala et al., 2018) . Another part concerns learning architectures with hyperbolic data as the input (Ganea et al. (2018a); Cho et al. (2019) ). Ganea et al. (2018a) proposes two ways to generalize the affine layer on hyperbolic spaces: one by replacing the linear and bias part of an affine map with (25, 26) of their paper; another one by using a concatenation of f 1 a,p in



if w = (0, . . . , 0), one can take ω = w/|w|, λ = |w|; else, one can take λ = 0 and any ω ∈ S n-1 .



Figure 1: (Left) ρ((•, ω) E ); (middle) ρ( •, ω H ); (right) ρ(f 1 a,p (•)). In this figure, ω=(1, 0), a=(1, 0), p=(0.5, 0), and ρ is tanh. The colorbar represents function values.

Spectral methods in Bronstein et al. (2017); Bruna et al. (2014); Cohen et al.

