BRINGING ROBOTICS TAXONOMIES TO CONTINUOUS DOMAINS VIA GPLVM ON HYPERBOLIC MANIFOLDS

Abstract

Robotic taxonomies have appeared as high-level hierarchical abstractions that classify how humans move and interact with their environment. They have proven useful to analyse grasps, manipulation skills, and whole-body support poses. Despite the efforts devoted to design their hierarchy and underlying categories, their use in application fields remains scarce. This may be attributed to the lack of computational models that fill the gap between the discrete hierarchical structure of the taxonomy and the high-dimensional heterogeneous data associated to its categories. To overcome this problem, we propose to model taxonomy data via hyperbolic embeddings that capture the associated hierarchical structure. To do so, we formulate a Gaussian process hyperbolic latent variable model and enforce the taxonomy structure through graph-based priors on the latent space and distance-preserving back constraints. We test our model on the whole-body support pose taxonomy to learn hyperbolic embeddings that comply with the original graph structure. We show that our model properly encodes unseen poses from existing or new taxonomy categories, it can be used to generate trajectories between the embeddings, and it outperforms its Euclidean counterparts.

1. INTRODUCTION

Roboticists are often inspired by biological insights to create robotic systems that exhibit human-or animal-like capabilities (Siciliano & Khatib, 2016) . In particular, it is first necessary to understand how humans move and interact with their environment to then generate biologically-inspired motions and behaviors of robotics hands, arms or humanoids. In this endeavor, researchers proposed to structure and categorize human hand postures and body poses into hierarchical classifications known as taxonomies. Their structure depends on the variables considered to categorize human motions and their interactions with the environment, as well as on associated qualitative measures. Different taxonomies have been proposed in the area of human and robot grasping (Cutkosky, 1989; Feix et al., 2016; Abbasi et al., 2016; Stival et al., 2019) . Feix et al. (2016) introduced a taxonomy of hand grasps whose structure was mainly defined by the hand pose and the type of contact with the object. Later, Stival et al. (2019) claimed that the taxonomy designed in (Feix et al., 2016) heavily depended on subjective qualitative measures, and proposed a quantitative tree-like taxonomy of hand grasps based on muscular and kinematic patterns. A similar data-driven approach was used to design a grasp taxonomy based on sensed contact forces in (Abbasi et al., 2016) . Robotic manipulation also gave rise to various taxonomies. Bullock et al. (2013) introduced a hand-centric manipulation taxonomy that classifies manipulation skills according to the type of contact with the objects and the object motion imparted by the hand. A different strategy was developed in (Paulius et al., 2019) , where a manipulation taxonomy was designed based on a categorization of contacts and motion trajectories. Humanoid robotics also made significant efforts to analyze human motions, thus proposing taxonomies as high-level abstractions of human motion configurations. Borràs et al. (2017) analyzed the contacts of the human limbs with the environment and designed a taxonomy of whole-body support poses. In addition to being used for analysis purposes in robotics or biomechanics, some of the aforementioned taxonomies were leveraged for modeling grasp actions (Romero et al., 2010; Lin & Sun, 2015) , for planning contact-aware whole-body pose sequences (Mandery et al., 2016) , and for learning manipulation skills embeddings (Paulius et al., 2020) . However, despite most taxonomies carry Figure 1 : Left: Illustration of the Lorentz L 2 and Poincaré P 2 models of the hyperbolic manifold. The former is depicted as the gray hyperboloid, while the latter is represented by the blue circle. Both models show a geodesic ( ) between two points x1 ( ) and x2 ( ). The vector u ( ) lies on the tangent space of x1 such that u = Log x 1 (x2). Right: Subset of the whole-body support pose taxonomy (Borràs et al., 2017) used in our experiments. Each node is a support pose defined by the type of contacts (foot F, hand H, knee K). The lines represent graph transitions between the taxonomy nodes. Contacts are depicted by grey dots. a well-defined hierarchical structure, it is often overlooked. First, these taxonomies are usually exploited for classification tasks whose target classes are mainly the tree leaves, disregarding the full taxonomy structure (Feix et al., 2016; Abbasi et al., 2016) . Second, the discrete representation of the taxonomy categories hinders their use for motion generation (Romero et al., 2010) . We believe that the difficulty of leveraging robotic taxonomies is due to the lack of computational models that exploit (i) the domain knowledge encoded in the hierarchy, and (ii) the information of the high-dimensional data associated to the taxonomy categories. We tackle this problem from a representation learning perspective by modeling taxonomy data as embeddings that capture the associated hierarchical structure. Inspired by recent advances on word embeddings (Nickel & Kiela, 2017; 2018; Mathieu et al., 2019) , we propose to leverage the hyperbolic manifold (Ratcliffe, 2019) to learn such embeddings. An important property of the hyperbolic manifold is that distances grow exponentially when moving away from the origin, and shortest paths between distant points tend to pass through it, resembling a continuous hierarchical structure. Therefore, we hypothesize that the geometry of the hyperbolic manifold allows us to learn embeddings that comply with the original graph structure of robotic taxonomies. Specifically, we propose a Gaussian process hyperbolic latent variable model (GPHLVM) to learn embeddings of taxonomy data on the hyperbolic manifold. To do so, we impose a hyperbolic geometry to the latent space of the well-known GPLVM (Lawrence, 2003; Titsias & Lawrence, 2010) . This demands to reformulate the Gaussian distribution, the kernel, and the optimization process of the vanilla GPLVM to account for the geometry of the hyperbolic latent space. To do so, we leverage the hyperbolic wrapped Gaussian distribution (Nagano et al., 2019) , and provide a positive-definiteguaranteed approximation of the hyperbolic kernel proposed by McKean (1970) . Moreover, we resort to Riemannian optimization (Absil et al., 2007; Boumal, 2022) to optimize the GPHLVM parameters. We enforce the taxonomy graph structure in the learned embeddings through graphbased priors on the latent space and via graph-distance-preserving back constraints (Lawrence & Quiñonero Candela, 2006; Urtasun et al., 2008) . Our GPHLVM is conceptually similar to the GPLVM for Lie groups introduced in (Jensen et al., 2020) , which also imposes geometric properties to the GPLVM latent space. However, our formulation is specifically designed for the hyperbolic manifold and fully built on tools from Riemannian geometry. Moreover, unlike (Tosi et al., 2014) and (Jørgensen & Hauberg, 2021) , where the latent space was endowed with a pullback Riemannian metric learned via the GPLVM mapping, we impose the hyperbolic geometry to the GPHLVM latent space as an inductive bias adapted to our targeted applications. We test our approach on graphs extracted from the whole-body support pose taxonomy (Borràs et al., 2017) . The proposed GPHLVM learns hyperbolic embeddings of the body support poses that comply with the original graph structure, and properly encodes unseen poses from existing or new taxonomy nodes. Moreover, we show how we can exploit the continuous geometry of the hyperbolic manifold to generate trajectories between different embeddings pairs, which comply with the taxonomy graph structure. To the best of our knowledge, this paper is the first to leverage the hyperbolic manifold for robotic applications.

2. BACKGROUND

Gaussian Process Latent Variable Models: A GPLVM defines a generative mapping from latent variables {x n } N n=1 , x n ∈ R Q to observations {y n } N n=1 , y n ∈ R D by modeling the corresponding non-linear transformation with Gaussian processes (GPs) (Lawrence, 2003) . The GPLVM is described as y n,d ∼ N (y n,d ; f n,d , σ 2 d ) with f n,d ∼ GP(m d (x n ), k d (x n , x n )) and x n ∼ N (0, I), (1) where y n,d denotes the d-th dimension of the observation y n , m d (•) : R Q → R and k d (•, •) : R Q × R Q → R are the GP mean and kernel function, respectively, and σ 2 d is a hyperparameter. Classically, the hyperparameters and latent variables of the GPLVM were optimized using maximum likelihood or maximum a posteriori (MAP) estimates. As this does not scale gracefully to large datasets, contemporary methods use inducing points and variational approximations of the evidence (Titsias & Lawrence, 2010) . Compared to neural-network-based generative models, GPLVMs are data efficient and provide automatic uncertainty quantification. Riemannian geometry: To understand the hyperbolic manifold, it is necessary to first define some basic Riemannian geometry concepts (Lee, 2018) . To begin with, consider a Riemannian manifold M, which is a locally Euclidean topological space with a globally-defined differential structure. For each point x ∈ M, there exists a tangent space T x M that is a vector space consisting of the tangent vectors of all the possible smooth curves passing through x. A Riemannian manifold is equipped with a Riemannian metric, which permits to define curve lengths in M. Shortest-path curves, called geodesics, can be seen as the generalization of straight lines on the Euclidean space to Riemannian manifolds, as they are minimum-length curves between two points in M. To operate with Riemannian manifolds, it is common practice to exploit the Euclidean tangent spaces. To do so, we resort to mappings back and forth between T x M and M, which are the exponential and logarithmic maps. The exponential map Exp x (u) : T x M → M maps a point u in the tangent space of x to a point y on the manifold, so that it lies on the geodesic starting at x in the direction u, and such that the geodesic distance d M between x and y equals the distance between x and u. The inverse operation is the logarithmic map Log x (u) : M → T x M. Finally, the parallel transport P x→y u : T x M → T y M operates with manifold elements lying on different tangent spaces.

Hyperbolic manifold:

The hyperbolic space H d is the unique simply-connected complete ddimensional Riemannian manifold with a constant negative sectional curvature (Ratcliffe, 2019) . There are several isometric models for the hyperbolic space, in particular, the Poincaré ball P d and the Lorentz (hyperboloid) model L d (see Fig. 1 -left). The latter representation is chosen here as it is numerically more stable than the former, and thus better suited for Riemannian optimization. However, the Poincaré model provides a more intuitive representation and is here used for visualization. This is easily achieved by leveraging the isometric mapping between both models (see App. A for details). An important property of the hyperbolic manifold is the exponential rate of the volume growth of a ball with respect to its radius. In other words, distances in H d grow exponentially when moving away from the origin, and shortest paths between distant points on the manifold tend to pass through the origin, resembling a continuous hierarchical structure. Because of this, the hyperbolic manifold is often exploited to embed hierarchical data such as trees or graphs (Nickel & Kiela, 2017; Chami et al., 2020) . Although its potential to embed discrete data structures into a continuous space is well known in the machine learning community, its application in robotics is presently scarce. Hyperbolic wrapped Gaussian distribution: Probabilistic models on Riemannian manifolds demand to have probability distributions that consider the manifold geometry. We use the hyperbolic wrapped distribution (Nagano et al., 2019) , which builds on a Gaussian distribution on the tangent space at the origin µ 0 = (1, 0, . . . , 0) T of H d , that is then projected onto the hyperbolic space after transporting the tangent space to the desired location. Intuitively, the construction of this wrapped distribution is as follows: (1) sample a point ṽ ∈ R d from the Euclidean normal distribution N (0, Σ), (2) transform ṽ to an element of T µ0 H d ⊂ R d+1 by setting v = (0, ṽ) T , (3) apply the parallel transport u = P µ0→µ v , and (4) project u to H d via Exp µ (u). The resulting probability density function is log N H d (x; µ, Σ) = log N (v; 0, Σ) -(d -1) log (sinh(∥u∥ L )/∥u∥ L ) , where v = P µ→µ0 u , u = Log µ (x), and ∥u∥ L = ⟨u, u⟩ µ . The hyperbolic wrapped distribution (Nagano et al., 2019) has a more general expression given in (Skopek et al., 2020) .

3. GAUSSIAN PROCESS HYPERBOLIC LATENT VARIABLE MODEL

We present the proposed GPHLVM, which extends GPLVM to hyperbolic latent spaces. A GPHLVM defines a generative mapping from the hyperbolic latent space H Q to the observation space, e.g. the data associated to the taxonomy, based on GPs. By considering independent GPs across the observation dimensions, the GPHLVM is formally described as y n,d ∼ N (y n,d ; f n,d , σ 2 d ) with f n,d ∼ GP(m d (x n ), k H Q d (x n , x n )) and x n ∼ N H Q (µ 0 , αI), where y n,d denotes the d-th dimension of the observation y n ∈ R D and x n ∈ H Q is the corresponding latent variable. Our GPHLVM is built on hyperbolic GPs, characterized by a mean function m d (•) : H Q → R (usually set to 0), and a kernel k H Q d (•, •) : H Q × H Q → R. These kernels encode similarity information in the latent hyperbolic manifold and should reflect its geometry to perform effectively, as detailed in §. 3.1. Also, the latent variable x ∈ H Q is assigned a hyperbolic wrapped Gaussian prior N H Q (µ 0 , αI) based on Eq. 2, where µ 0 is the origin of H Q , and the parameter α controls the spread of the latent variables in H Q . As Euclidean GPLVMs, our GPHLVM can be trained by finding a MAP estimate or via variational inference. However, special care must be taken to guarantee that the latent variables belong to the hyperbolic manifold, as explained in §. 3.2.

3.1. HYPERBOLIC KERNELS

For GPs in Euclidean spaces, the squared exponential (SE) and Matérn kernels are standard choices (Rasmussen & Williams, 2006) . In the modern machine learning literature these were generalized to non-Euclidean spaces such as manifolds (Borovitskiy et al., 2020; Jaquier et al., 2021) or graphs (Borovitskiy et al., 2021) . The generalized SE kernels may be connected to the much studied heat kernels. These are given (cf. Grigoryan & Noguchi (1998) ) by k H 2 (x, x ′ ) = σ 2 C ∞ ∞ ρ se -s 2 /(2κ 2 ) (cosh(s) -cosh(ρ)) 1/2 ds, k H 3 (x, x ′ ) = σ 2 C ∞ ρ sinh ρ e -ρ 2 /(2κ 2 ) , where ρ = dist H d (x, x ′ ) denotes the geodesic distance between x, x ′ ∈ H d , κ and σ 2 are the kernel lengthscale and variance, and C ∞ is a normalizing constant. To the best of our knowledge, no closed form expression for H 2 is known. To approximate the kernel in this case, a discretization of the integral is performed. One appealing option is the Monte Carlo approximation based on the truncated Gaussian density. Unfortunately, such approximations easily fail to be positive semidefinite if the number of samples is not very large. We address this via an alternative Monte Carlo approximation k H 2 (x, x ′ ) ≈ σ 2 C ′ ∞ 1 L L l=1 s l tanh(πs l )e (2s l i+1)⟨x P ,b l ⟩ e (2s l i+1)⟨x ′ P ,b l ⟩ , where ⟨x P , b⟩ = 1 2 log 1-|x P | 2 |x P -b| 2 is the hyperbolic outer product with x P being the representation of x as a point on the Poincaré disk P 2 = D, i, z denote the imaginary unit and complex conjugation, respectively, b l i.i.d. ∼ U (T) with T the unit circle, and s l i.i.d. ∼ e -s 2 κ 2 /2 1 [0,∞) (s). The distributions of b l and s l are easy to sample from: The former is sampled by applying x → e 2πix to x ∼ U ([0, 1]) and the latter is (proportional to) a truncated normal distribution. Importantly, the right-hand side of Eq. 5 is easily recognized to be an inner product in the space C L , which immediately implies its positive semidefiniteness (see App. B for the development of Eq. 5). Note that hyperbolic kernels for H Q with Q > 3 are generally defined as integrals of the kernels Eq. 4 (Grigoryan & Noguchi, 1998) . Analogs of Matérn kernels for H Q are obtained as integral of the SE kernel of the same dimension (Jaquier et al., 2021) .

3.2. MODEL TRAINING

Similarly to the Euclidean case, training the GPHLVM is equivalent to finding an optimal set of latent variables {x n } N n=1 and hyperparameters Θ = {θ d } D d=1 , with x n ∈ H Q and θ d the hyperparameters of the d-th GP. For small datasets, the GPHLVM can be trained by maximizing the log posterior of the model, i.e., L MAP = log p(Y |X)p(X) with Y = (y 1 . . . y N ) T and X = (x 1 . . . x N ) T . For large datasets, the GPHLVM can be trained, similarly to the so-called Bayesian GPLVM (Titsias & Lawrence, 2010) , by maximizing the marginal likelihood of the data, i.e., L MaL = log p(Y ) = log p(Y |X)p(X)dX. As this quantity is intractable, it is approximated via variational inference by adapting the methodology introduced in (Titsias & Lawrence, 2010) to hyperbolic latent spaces, as explained next. Variational inference: We approximate the posterior p(X|Y ) by a variational distribution q(X) defined as a hyperbolic wrapped normal distribution over the latent variables, i.e., q ϕ (X) = N n=1 N H Q (x n ; µ n , Σ n ), with variational parameters ϕ = {µ n , Σ n } N n=1 , with µ n ∈ H Q and Σ n ∈ T µn H Q . Similarly to the Euclidean case (Titsias & Lawrence, 2010) , this variational distribution allows the formulation of a lower bound log p(Y ) ≥ E q ϕ (X) [log p(Y |X)] -KL q ϕ (X)||p(X) . The KL divergence KL q ϕ (X)||p(X) between two hyperbolic wrapped normal distributions can easily be evaluated via Monte-Carlo sampling (see App. C.1 for details). Moreover, the expectation E q ϕ (X) [log p(Y |X)] can be decomposed into individual terms for each observation dimension as D d=1 E q ϕ (X) [log p(y d |X)] , where y d is the d-th column of Y . For large datasets, each term can be evaluated via a variational sparse GP approximation (Titsias, 2009; Hensman et al., 2015) . To do so, we introduce M inducing inputs {z d,m } M m=1 with z d,m ∈ H Q for each observation dimension d, whose corresponding inducing variables {u d,m } M m=1 are defined as noiseless observations of the GP in Eq. 3, i.e, u d ∼ GP(m d (z d ), k H Q d (z d , z d )) . Similar to (Hensman et al., 2015) , we can write log p(y d |X) ≥ E q λ (f d ) log N (y d ; f d (X), σ 2 d ) -KL q λ (u d )||p(u d |Z d ) , where we defined q λ (f d ) = p(f d |u d )q λ (u d )du d with the variational distribution q λ (u d ) = N (u d ; μd , Σd ) and variational parameters λ = { μd , Σd } D d=1 . Remember that the inducing variables u d,m are Euclidean, i.e., the variational distribution q λ (u d ) is a Euclidean Gaussian and the KL divergence in Eq. 8 has a closed-form solution. In this case, the training parameters of the GPHLVM are the set of inducing inputs {z d,m } M m=1 , the variational parameters ϕ and λ, and the hyperparameters Θ (see App. C.2 for the full derivation of the GPHLVM variational inference process). Optimization: As several training parameters of the GPHLVM belong to H Q , i.e., the latent variables x n for the MAP estimation, or the inducing inputs z d,m and means µ n for variational inference. To account for the hyperbolic geometry of these parameters, we leverage Riemannian optimization methods (Absil et al., 2007; Boumal, 2022) to train the GPHLVM. Each step of first order (stochastic) Riemannian optimization methods is generally of the form η t ← h grad L(x t ), τ t-1 , x t+1 ← Exp xt (-α t η t ), τ t ← P xt→xt+1 η t . The update η t ∈ T xt M is first computed as a function h of the Riemannian gradient grad of the loss L(x t ) and of τ t-1 , the previous update parallel-transported to the tangent space of the new estimate x t . The estimate x t is then updated by projecting the update η t scaled by a learning rate α t onto the manifold using the exponential map. The function h is equivalent to computing the update of the Euclidean algorithm, e.g., η t ← grad L(x t ) for a simple gradient descent. Notice that Eq. 9 is applied on a product of manifolds when optimizing several parameters. In this paper, we used the Riemannian Adam (Bécigneul & Ganea, 2019) implemented in Geoopt (Kochurov et al., 2020) to optimize the GPHLVM parameters.

4. INCORPORATING TAXONOMY KNOWLEDGE INTO GPHLVM

While we are now able to learn hyperbolic embeddings of the data associated to a taxonomy using our GPHLVM, they do not necessarily follow the graph structure of the taxonomy. In other words, the manifold distances between pairs of embeddings do not need to match the graph distances. To overcome this, we introduce graph-distance information as inductive bias to learn the embeddings. To do so, we leverage two well-known techniques in the GPLVM literature: priors on the embeddings and back constraints (Lawrence & Quiñonero Candela, 2006; Urtasun et al., 2008) . Both are reformulated to preserve the taxonomy graph structure in the hyperbolic latent space as a function of the node-to-node shortest paths. Graph-distance priors: As shown by Urtasun et al. (2008) , the structure of the latent space can be modified by adding priors of the form p(X) ∝ e -ϕ(X)/σ 2 ϕ to the GPLVM, where ϕ(X) is a function that we aim at minimizing. Incorporating such a prior may also be understood as augmenting the GPLVM loss L with a regularization term -ϕ(X). Therefore, we propose to augment the loss of the GPHLVM with a distance-preserving graph-based regularizer. Several such losses have been proposed in the literature, see (Cruceru et al., 2021) for a review. Specifically, we define ϕ(X) as the stress loss L stress (X) = i<j dist G (c i , c j ) -dist H Q (x i , x j ) 2 , ( ) where c i denotes the taxonomy node to which the observation y i belongs, and dist G , dist H Q are the taxonomy graph distance and the geodesic distance on H Q , respectively. The loss Eq. 10 encourages the preservation of all distances of the taxonomy graph in H Q . It therefore acts globally, thus allowing the complete taxonomy structure to be reflected by the GPHLVM. Notice that Cruceru et al. (2021) also survey a distortion loss that encourages the distance of the embeddings to match the graph distance by considering their ratio. We notice, however, that this distortion loss is only properly defined when the embeddings x i and x j correspond to different classes c i ̸ = c j . Interestingly, our empirical results using this loss were lackluster and numerically unstable (see App. E).

Back-constraints:

The back-constrained GPLVM (Lawrence & Quiñonero Candela, 2006) defines the latent variables as a function of the observations, i.e., x n,q = g q (y 1 . . . , y n , w q ) with parameters {w q } Q q=1 . It allows us to incorporate new observations in the latent space after training, while preserving local similarities between observations in the embeddings. To incorporate graph-distance information into the GPHLVM and ensure that latent variables lie on the hyperbolic manifold, we propose the back-constraints mapping x n = Exp µ0 ( xn ) with xn,q = N m=1 w q,m k R D (y n , y m )k G (c n , c m ). The mapping Eq. 11 not only expresses the similarities between data in the observation space via the kernel k R J , but encodes the relationships between data belonging to nearby taxonomy nodes via k G . In other words, similar observations associated to the same (or near) taxonomy nodes will be close to each other in the resulting latent space. The kernel k G is a Matérn kernel on the taxonomy graph following the formulation introduced in (Borovitskiy et al., 2021) , which accounts for the graph geometry (see also App. D). We also use a Euclidean SE kernel for k R D . Notice that the back constraints only incorporate local information into the latent embedding. Therefore, to preserve the global graph structure, we pair the proposed back-constrained GPHLVM with the stress prior Eq. 10. Note that both kernels are required in Eq. 11: By defining the mapping as a function of the graph kernel only, the observations of each taxonomy node would be encoded by a single latent point. When using the observation kernel only, dissimilar observations of the same taxonomy node would be distant in the latent space, despite the additional stress prior, as k R D (y n , y m ) ≈ 0.

5. EXPERIMENTS

We test the proposed GPHLVM to model data of the whole-body support pose taxonomy (Borràs et al., 2017) . Each node of the taxonomy graph (see Fig. 1-right ) is a support pose defined by its contacts, so that the distance between nodes can be viewed as the number of contact changes required to go from a support pose to another. We use standing and kneeling poses of the datasets in (Mandery et al., 2016) and (Langenstein, 2020) . The former were extracted from recordings of a human walking without hand support, or using supports from a handrail or from a table on one side or on both sides. The latter were obtained from a human standing up from a kneeling position. Each pose is identified with a node of the graph of Fig. 1 -right. We test our approach on three different datasets: an unbalanced dataset (i.e., 100 poses composed of 72 standing and 28 kneeling poses); a balanced dataset (i.e., only 60 standing poses); and an joint-space dataset (i.e., same 60 standing poses represented as joint configurations). For the first two datasets each pose is represented as a vector y n = [y LF , y RF , y LH , y RH ] T ∈ R 12 corresponding to the positions of the human's feet and hands. Instead, for the last dataset, each pose is represented by vector of joint angles y n ∈ R 44 . Last but not least, we also test our approach on an augmented version of the whole-body support pose taxonomy, which explicitly distinguishes between left and right contacts. The main results are analyzed in the sequel, while additional experimental details and results are given in App. F and G. For each, we test the model without regularization, with stress prior, and with back-constraints coupled with stress prior (see App. F.2 for the training parameters). Figs. 2a-2c show the learned embeddings alongside distance matrices, which are to be compared with the graph distances in Fig. 3 . As shown in Fig. 2a , the models without regularization do not encode any meaningful distance structure in latent space. In contrast, the models with stress prior result in embeddings that comply with the taxonomy graph structure: The embeddings are grouped and organized according to the taxonomy nodes, the geodesic distances match the graph ones, and arguably more so in the hyperbolic case (see Figs. 2b-2c ). This is further reflected in the stress values of the latent embeddings with respect to the graph distances (see Table 1 ). Interestingly, the hyperbolic models also outperform Euclidean models with 3-dimensional latent spaces (see App. G.1). This is due to the fact that the geometry of the hyperbolic manifold leads to exponentially-increasing distances w.r.t the origin, which provides an increased volume to match the graph structure when compared to Euclidean spaces, thus resulting in better low-dimensional representations of taxonomy data. Our GPHLVM also outperformed vanilla and hyperbolic versions of variational autoencoders (VAE) to encode meaningful taxonomy information in the latent space (see App. G.4). In general, the tested VAEs only captured a global structure that separates standing from kneeling poses. Moreover, the average stress of the VAEs' latent embeddings is higher compared to the GPHLVM's. Finally, notice that the back constraints further organize the embeddings inside a class according to the similarity between their observations (Fig. 2c ). Taxonomy expansion and unseen poses encoding: An advantage of back-constrained GPLVMs is their affordance to "embed" new observations into the latent space. We test the GPHLVM ability to place unseen poses or taxonomy classes into the latent space, hypothesizing that their respective embeddings would be placed at meaningful distances w.r.t. the rest of the latent points. First, we consider a back-constrained GPHLVM with stress prior previously trained on example poses from the taxonomy (i.e., the model of Fig. 2c ) and embedded unseen poses. Fig. 2d shows how these new poses land close to their respective class cluster. Second, we train a new GPHLVM while withholding all poses corresponding to the F 1 H 1 class. We then encode these poses and find that they are located at sensible distance when compared to the model trained on the full dataset. Although this is accomplished by both models, the GPHLVM displays lower stress values (see Table 1 ). Trajectory generation via geodesics: The geometry of the GPHLVM latent space can also be exploited to generate trajectories in the latent space by following the geodesic, i.e. the shortest path, between two embeddings. In other words, our GPHLVM intrinsically provides a mechanism to plan motions via geodesics in the low-dimensional latent space. Examples of geodesics between two poses are shown in Figs. 2b-2c , with the colors along the trajectory matching the class corresponding to the closest hyperbolic latent point. Importantly, the geodesics in our GPHLVM latent space follow the transitions between classes defined in the taxonomy. In other words, the shortest paths in the hyperbolic embedding correspond to the shortest paths in the taxonomy graph. For instance, the geodesic from F to F 2 H 2 follows F → F 2 → F 2 H → F 2 H 2 , while the geodesic from FH to K 2 H follows FH → F 2 H → FKH → KH → K 2 H. In contrast, straight lines in the Euclidean embeddings often do not match the graph shortest path, resulting in transitions that do not exist in the taxonomy, e.g., F → F 2 H 2 , or F 2 → FKH in the Euclidean latent space of Figs. 2b-2c (see also App. F.4). Fig. 4 shows examples of motions resulting from geodesic interpolation in the GPHLVM latent space. As expected, the resulting trajectories do not correspond to direct interpolations between the given initial and final poses. This is due to the lack of information about the objects location and the type of contact in the considered poses. Therefore, poses with very different feet and hands positions may belong to the same class, e.g., two-feet contact with a left hand contact on the handrail or a right hand contact on the table both belong to F 2 H. This results in artifacts throughout the interpolations, which are alleviated by augmenting the taxonomy to differentiate between left and right contacts, as described next. However, it is interesting that the motions are still consistent with the observed transitions, e.g., the hand positions vary little along a path involving only foot and knee contacts. Augmented taxonomy for enhanced trajectory generation: Here, we aim at improving the quality of the generated motion by augmenting the whole-body support pose taxonomy with additional contact information. To do so, we consider an augmented whole-body support pose taxonomy which explicitly distinguishes between left and right contacts by adapting the nodes and transitions of Fig. 1 -right. For instance, the 1-foot contact (F) node is separated into left-foot (F l ) and right-foot (F r ) contact nodes. To facilitate motion planning and to test the GPHLVM ability of dealing with high-dimensional spaces, we represent each pose as a vector y n ∈ R 44 of joint angles instead of a vector of hands and feet positions. A video of the resulting motions accompanies this paper. We embed the 60 standing poses described in App. G.2 into 3-dimensional hyperbolic and Euclidean spaces using GPHLVM and GPLVM, respectively. For each approach, we test the model without regularization, with stress prior, and with back-constraints coupled with stress prior (see App. G.3 and F.2 for detailed results and training parameters). Fig. 5 shows examples of motions planned by following geodesics in the GPHLVM latent space. We observe that the motions generated by considering the augmented taxonomy result in more realistic interpolations between the given initial and final poses than the trajectories of Fig. 4 . Moreover, the previously-observed artifacts are drastically reduced. This is due to the fact that the augmented taxonomy differentiates between left and right contacts, thus allowing very different poses to be placed far apart in the latent space. For example, poses corresponding to F l H r and F r H l in the augmented taxonomy belonged to the same FH node in the original taxonomy, and were embedded close together. It is also interesting to notice that considering joint angles instead of end-effector positions results in more realistic poses. Such poses may also be obtained by considering both end-effector positions and orientations as observations, which would require an extension of the GPHLVM to handle observations on Riemannian manifolds. 

6. CONCLUSIONS

Inspired by the recent developments of taxonomies in different robotics fields, we proposed a computational model GPHLVM that leveraged two types of domain knowledge: the structure of a humandesigned taxonomy and a hyperbolic geometry on the latent space which complies with the intrinsic taxonomy's hierarchical structure. Our GPHLVM allows us to learn hyperbolic embeddings of the features of the taxonomy nodes while capturing the associated hierarchical structure. To achieve this, our model exploited the curvature of the hyperbolic manifold and the graph-distance information, as inductive bias. We showed that these two forms of inductive bias are essential to: learn taxonomyaware embeddings, encode unseen data, and potentially expand the learned taxonomy. Moreover, we reported that vanilla Euclidean approaches underperformed on all the foregoing cases. Finally, we introduced a mechanism to generate taxonomy-aware motions in the hyperbolic latent space. It is important to emphasize that our geodesic motion generation does not use explicit knowledge on how physically feasible the generated trajectories are. We plan to investigate how to include physics constraints or explicit contact data into the GPHLVM to obtain physically-feasible motions that can be executed on real robots. Moreover, we will work on alleviating the computational cost of the hyperbolic kernel in H d . This could be tackled by using a different sampling strategy: Instead of sampling from a Gaussian distribution for the approximation Eq. 5, we could sample from the Rayleigh distribution. This is because complex numbers, whose real and imaginary components are i.i.d. Gaussian, have absolute value that is Rayleigh-distributed. As our current experimental study focused on testing our model on different graphs extracted from the whole-body support pose taxonomy (Borràs et al., 2017) , we plan to test it with datasets used to design other robotic taxonomies. Finally, we plan to investigate other types of manifold geometries that may accommodate more complex structures coming from highly-heterogeneous graphs (Giovanni et al., 2022) .

A.1 EQUIVALENCE OF POINCAR É AND LORENTZ MODELS

As pointed out in the main text ( § 2), it is possible to map points from the Lorentz model to the Poincaré ball via an isometric mapping. Formally, such an isometry is defined as the mapping function f : L d → P d such that f (x) = (x 1 , . . . , x d ) T x 0 + 1 , where x ∈ L d with components x 0 , x 1 , . . . , x d . The inverse mapping f -1 : P d → L d is defined as follows f -1 (y) = 1 + ∥y∥ 2 , 2y 1 , . . . , 2y d T 1 -∥y 2 ∥ , with y ∈ P d with components y 1 , . . . , y d . Notice that we used the mapping Eq. 12 to represent the hyperbolic embeddings in the Poincaré disk throughout the paper, as well as in the computation of the kernel k H 2 Eq. 4.

A.2 MANIFOLD OPERATIONS

As mentioned in the main text ( § 2), we resort to the exponential and logarithmic maps to operate with Riemannian manifold data. The exponential map Exp x (u) : T x M → M maps a point u in the tangent space of x to a point y on the manifold, while the logarithmic map Log x (u) : M → T x M performs the corresponding inverse operation. In some settings, it is necessary to work with data lying on different tangent spaces of the manifold. In this case, one needs to operate with all data on a single tangent space, which can be achieved by leveraging the parallel transport P x→y u : T x M → T y M. All the aforementioned operators are defined in Table 2 for the Lorentz model L d . Moreover, we introduce the inner product ⟨u, v⟩ x between two points on L d , which is used to compute the geodesic distance d M (u, v) and all the foregoing operations in the Lorentz model, as shown in Table 2 . Operation Formula  ⟨u, v⟩ x -u 0 v 0 + d i=1 u i v i d M (u, v) arcosh(-⟨u, v⟩ x ) Exp x (u) cosh(∥u∥ L )x + sinh(∥u∥ L ) u ∥u∥ L with ∥u∥ L = ⟨u, u⟩ x Log x (y) d M (x,y) √ α 2 -1 (y + αx) with α = ⟨x, y⟩ x P x→y v v + ⟨y,v⟩x 1-⟨x,y⟩x (x + y)

B HYPERBOLIC KERNELS

As mentioned in the main text ( § 3.1), following the developments on kernels on manifolds like Borovitskiy et al. (2020) ; Jaquier et al. (2021) , we may identify the generalized squared exponential kernel with the heat kernel-an important object studied on its own in the mathematical literature. Due to this, we can obtain the expressions Eq. 4. The expression for the case of H 2 requires discretizing the integral, which may lead to an approximation that is not positive semidefinite. We address this by suggesting another approximation guaranteed to be positive semidefinite. Reversing the derivation in (Chavel, 1984, p. 246) , we obtain k H 2 ∞,κ,σ 2 (x, x ′ ) = σ 2 C ′ ∞ ∞ 0 exp(-s 2 /(2κ 2 ))P -1/2+is (cosh(ρ))s tanh(πs)ds, where ρ = dist H d (x, x ′ ) denotes the geodesic distance between x, x ′ ∈ H 2 , κ and σ 2 are the kernel lengthscale and variance, C ′ ∞ is a normalizing constant and P α are Legendre functions Abramowitz & Stegun (1964) . Now we prove that these Legendre functions are connected to the spherical functions -special functions closely tied to the geometry of the hyperbolic space and possessing a very important property.  where z ∈ D is such that ρ = dist H 2 (z, 0) and z 1 , z 2 ∈ D are such that ρ = dist H 2 (z 1 , z 2 ). Here i denotes the imaginary unit and z is the complex conjugation. Proof. Let θ denote the angle between z and b, and note the following simple identities |z -b| 2 = |z| 2 + 1 -2|z| cos(θ) = tanh(ρ) 2 + 1 -2 tanh(ρ) cos(θ), ) 1 -|z| 2 = 1 -tanh(ρ) 2 = cosh(ρ) -2 . (17) Then, we write e (2si+1)⟨z,b⟩ = |z -b| 2 1 -|z| 2 -si-1/2 = cosh(ρ) 2 (tanh(ρ) 2 + 1 -2 tanh(ρ) cos(θ)) -si-1/2 , ( ) = sinh(ρ) 2 + cosh(ρ) 2 -2 sinh(ρ) cosh(ρ) cos(θ) -si-1/2 , = (cosh(2ρ) + sinh(2ρ) cos(θ)) -si-1/2 . ( ) On the other hand, by (Lebedev et al., 1965, Eq. 7.4 .3), we have P a (cosh(x)) = 1 π π 0 (cosh(x) + sinh(x) cos(θ)) a dθ, hence P -1/2+is (cosh(2ρ)) = 1 π π 0 (cosh(2ρ) + sinh(2ρ) cos(θ)) -1/2+is dθ, = 1 2π π -π (cosh(2ρ) + sinh(2ρ) cos(θ)) -1/2+is dθ, = T e (-2si+1)⟨z,b⟩ db = ϕ -2s (z). This computation roughly follows Cohen & Lifshits (2012, Section 4.3.4) . Now, by Cohen & Lifshits (2012, Section 3.5), we have ϕ -2s (z) = ϕ 2s (z) which proves the first identity. Finally, Lemma 3.5 from Cohen & Lifshits (2012) proves the second identity. By combining expressions Eq. 14 and Eq. 15, we get the following Monte Carlo approximation k H 2 ∞,κ,σ 2 (x, x ′ ) ≈ σ 2 C ′ ∞ 1 L L l=1 s l tanh(πs l )e (2s l i+1)⟨x P ,b l ⟩ e (2s l i+1)⟨x ′ P ,b l ⟩ , where b l i.i.d. ∼ U (T) and s l i.i.d. ∼ e -s 2 κ 2 /2 1 [0,∞) (s). This gives the approximation used in the main text (see § 3.1). Having established a way to evaluate or approximate the heat kernel, analogs of Matérn kernels can be defined by k ν,κ,σ 2 (x, x ′ ) = σ 2 C ν ∞ 0 u ν-1 e -2ν κ 2 u k∞, √ 2u,σ 2 (x, x ′ )du, where k∞, √ 2u,σ 2 is the same as k ∞, √ 2u,σ 2 but with the normalizing constant σ 2 /C ∞ dropped for simplicity. Here C ν is the normalizing constant ensuring that k ν,κ,σ 2 (x, x) = σ 2 for all x.

C GPHLVM VARIATIONAL INFERENCE

As mentioned in § 3.2, when training our GPHLVM on large datasets, we resort to variational inference as originally proposed in (Titsias & Lawrence, 2010). Here we provide the mathematical details about the changes that are needed to train our model via variational inference.

DISTRIBUTIONS

As mentioned in § 3.2, we approximate the KL divergence between two hyperbolic wrapped distributions via Monte-Carlo sampling. Namely, given two hyperbolic wrapped distributions q ϕ (x) and p(x), we write KL q ϕ (x)||p(x) = q ϕ (x) log q ϕ (x) p(x) dx ≈ 1 K K k=1 log q ϕ (x k ) p(x k ) , where we used K independent Monte-Carlo samples drawn from q ϕ (x) to approximate the KL divergence. These samples are obtained via the procedure described in § 2, i.e., by sampling an element on the tangent space of the origin µ 0 = (1, 0, . . . , 0) T of H d , via a Euclidean normal distribution, and then applying the parallel transport operation and the exponential map to project it onto H d .

C.2 DETAILS OF THE VARIATIONAL PROCESS

As mentioned in the main text, the marginal likelihood p(Y ) is approximated via variational inference by approximating the posterior p(X|Y ) with the hyperbolic variational distribution q ϕ (X) as defined by Eq. 6. The lower bound Eq. 7 is then obtained, similarly as in (Titsias & Lawrence, 2010), as log p(Y ) = log p(Y |X)p(X)dX (27) = log p(Y |X)p(X) q ϕ (X) q ϕ (X) dX = log E q ϕ (X) p(Y |X)p(X) q ϕ (X) (28) ≥ E q ϕ (X) log p(Y |X)p(X) q ϕ (X) = q ϕ (X) log p(Y |X)p(X) q ϕ (X) dX (29) = q ϕ (X) log p(Y |X)dX -q ϕ (X) log q ϕ (X) p(X) dX (30) = E q ϕ (X) [log p(Y |X)] -KL q ϕ (X)||p(X) , following Jensen's inequality in Eq. 29. As mentioned in § 3.2, the expectation E q ϕ (X) [log p(Y |X)] can be decomposed into individual terms for each observation dimension as D d=1 E q ϕ (X) [log p(y d |X)] , where y d is the d-th column of Y . We then define the inducing inputs Z d and inducing variables u d the same way as the noiseless observations f d , so that the joint distribution of f d and u d can be written as p(f d , u d ) = f d u d = N m d (X) m d (Z d ) , k d (X, X) k d (X, Z d ) k d (Z d , X) k d (Z d , Z d ) . ( ) The lower bound Eq. 8 is then obtained for each dimension, similarly as in (Hensman et al., 2015) , as log p(y d |X) = log p(y d |X, u d )p(u d )du d (33) = log p(y d |X, u d )p(u d ) q λ (u d ) q λ (u d ) du d = log E q λ (u d ) p(y d |X, u d )p(u d ) q λ (u d ) (34) ≥ E q λ (u d ) log p(y d |X, u d )p(u d ) q λ (u d ) = q λ (u d ) log p(y d |X, u d )p(u d ) q λ (u d ) du d (35) = q λ (u d ) log p(y d |X, u d )du d -q λ (u d ) log q λ (u d ) p(u d ) du d (36) = E q λ (u d ) [log p(y d |X, u d )] -KL q λ (u d )||p(u d ) (37) ≥ E q λ (u d ) E p(f d |u d ) [log p(y d |f d (X))] -KL q λ (u d )||p(u d ) (38) = E q λ (f d ) [log p(y d |f d (X))] -KL q λ (u d )||p(u d |Z d ) (39) = E q λ (f d ) log N (y d ; f d (X), σ 2 d ) -KL q λ (u d )||p(u d |Z d ) , where we defined q λ (f d ) = p(f d |u d )q λ (u d )du d with the Euclidean variational distribution q λ (u d ) = N (u d ; μd , Σd ), and wrote p(u d |Z d ) = p(u d ) for simplicity. The inequality Eq. 35 corresponds to Jensen's inequality, while Eq. 38 is shown in (Titsias, 2009) . Finally, substituting Eq. 40 in Eq. 31 results in the following bound on the marginal likelihood log p(Y ) ≥ N n=1 D d=1 E q ϕ (xn) E q λ (f n,d ) log N (y n,d ; f n,d (x n ), σ 2 d ) - D d=1 KL q λ (u d )||p(u d |Z d ) - N n=1 KL q ϕ (x n )||p(x n ) .

D MAT ÉRN KERNELS ON TAXONOMY GRAPHS

As explained in § 4 of the main paper, we leverage the Matérn kernel on graphs proposed by Borovitskiy et al. (2021) to design a kernel for our back-constrained GPHLVM that accounts for the geometry of the taxonomy graph. Here we provide the main equations of such a kernel, and refer the reader to (Borovitskiy et al., 2021) for further details. Formally, let us define a graph G = (V, E) with vertices V and edges E and the graph Laplacian as ∆ = D -W , where W is the graph adjacency matrix and D its corresponding diagonal degree matrix, with D ii = j W ij . The eigendecomposition U ΛU T of the Laplacian ∆ is then used to formulate both the SE and Matérn kernels on graphs, as follows, k G ∞,κ (c n , c m ) = U e -κ 2 2 Λ U T , and k G ν,κ (c n , c m ) = U 2ν κ 2 + Λ -ν U T , ( ) where κ is the lengthscale (i.e., it controls how distances are measured) and ν is the smoothness parameter determining mean-squared differentiability of the associated Gaussian process (GP). Note that the graph kernel expressions in Eq. 42 are obtained by considering the connection between Matérn kernel GPs and stochastic partial differential equations, originally proposed by Whittle (1963) and later extended to Riemannian manifolds in (Borovitskiy et al., 2020) . This connection establishes that SE and Matérn GPs satisfy e -κ 2 4 ∆ f = W, and 2ν κ 2 + ∆ ν 2 f = W, ( ) where W ∼ N (0, I) and f : V → R, which lead to definition of graph GPs (Borovitskiy et al., 2021) . 3 ).

E DISTORTION LOSS

As explained in the paper, we focus on two ways of embedding the graph in the hyperbolic space: a global approach using a stress regularization which matches graph distances with geodesic distances, and a combination between this stress regularization and the use of back constraints (see § 4). However, the literature on graph embeddings also surveys a distortion loss (Cruceru et al., 2021) given by L distortion (X) = i<j dist H Q (x i , x j ) 2 dist G (c i , c j ) 2 -1 2 , ( ) which tries to match the graph and manifold distances by minimizing their ratio's distance to 1. We found that our problem is more subtle than usual graph embeddings, given that several points in our dataset may correspond to the same graph node (e.g., two different poses in which the left foot is the only limb in contact). Indeed, notice that Eq. 45 is ill-defined for the case i = j (or equivalently dist G (c i , c j ) 2 = 0). This is because all nodes x i are assumed to be different from each other. However, in our setup, several x i may correspond to the exact same class in the taxonomy. Our first attempt to remediate this was to add a simple regularizer ε = 10 -1 to the denominator. However, this caused the loss to give more weight to the points where dist G (c i , c j ) 2 = 0 (see Fig. 6a -6b for the outcome of training a GPHLVM with this type of regularization). We then considered an alternate definition of distortion in which the term inside the sum is given by L distortion (x i , x j ) = λ 1 dist H Q (x i , x j ) if x i and x j 's classes are identical λ 2 L distortion (x i , x j ) otherwise ( ) where λ 1 , λ 2 ∈ R + are hyperparameters. λ 1 governs how much we encourage latent codes of the same class to collapse into a single point, while λ 2 weights how much the geodesic distance should match the graph distance. After manual hyperparameter tuning, we obtained the latent space and distance matrix portrayed in Figs. 6c-6d . As can be seen in both accounts, the distortion loss produced lackluster results and failed to properly match the latent space distances with that of the graph. For these experiments, we used a loss scale of 50, λ 1 = 0.01 and λ 2 = 10, meaning that we strongly encouraged the distances between non-identical classes to match in ratio.

F ADDITIONAL DETAILS ON THE EXPERIMENTS OF § 5

F.1 DATA Table 3 describes the data of the whole-body support pose taxonomy used in the experiments reported in § 5. Each pose is identified with a support pose category, i.e., a node of the graph in Fig. 1 -right, and with a set of associated contacts. As shown in the table, some support poses include several sets of contacts. For example, the support pose F groups all types of support poses where only one foot is in contact with the environment. Notice that some sets of contacts are not represented in the data and thus do not appear in Table 3 . Left knee, right knee, left hand, right hand 2 Table 3 : Poses description extracted from the whole-body support pose taxonomy (Borràs et al., 2017) used in § 5 and App. G.

F.2 TRAINING PARAMETERS AND PRIORS

Table 4 describes the hyperparameters used for the experiments reported in § 5 and App. G. We used the hyperbolic kernels defined in § 3.1 for the GPHLVMs, and the classical SE kernel for the Euclidean models. For the back-constraints mapping Eq. 11, we defined k R D (y n , y m ) as the product of a Euclidean SE kernel with lengthscale κ R D , and k G (c n , c m ) as a graph Matérn kernel with smoothness ν = 2.5 and lengthscale κ G . We additionally scaled the product of kernels with a variance σ R D ,G . For training the back-constrained GPHLVM and GPLVM, we used a Gamma prior Gamma(α, β) with shape α and rate β on the lengthscale κ of the kernels. The embeddings of the Euclidean models were initialized with PCA. For the GPHLVMs, the initial embeddings ṽ obtained via PCA were transformed to elements of the tangent space T µ0 H Q at the origin µ 0 by setting v = (0, ṽ) T and then projected to the hyperbolic manifold using the exponential map. All models were trained by maximizing the loss L = L MAP -γL stress , where L MAP denotes the log posterior of the model, L stress is the stress-based regularization loss defined in Eq. 10, and γ is a parameter balancing the two losses. The optimization was conducted using the Riemannian Adam optimizer (Bécigneul & Ganea, 2019) implemented in Geoopt (Kochurov et al., 2020) with a learning rate of 0.05. For the first part of the experiments on taxonomy expansion, we encoded unseen poses of each class for the back-constrained GPLVM and GPHLVM with a stress regularization using the models presented in Table 4 . For the second part of the experiments, we left the class FH out during training and we "embedded" it using the back-constraints mapping. The newly-trained models also followed the same hyperparameters presented in Table 4 . Table 5 shows the marginal loglikelihood (MLL) of the GPHLVM and GPLVM described in § 5. We observe that the hyperbolic models achieve a higher likelihood that their Euclidean counterparts.

F.4 FURTHER DETAILS ON TRAJECTORY GENERATION VIA GEODESICS

Table 6 describes the transitions between support poses obtained by following the geodesic trajectories of the back-constrained GPHLVM and GPLVM with stress prior depicted in Fig. 2c . In contrast to GPHLVM, the Euclidean GPLVM often results in transitions that do not exist in the taxonomy. Interestingly, it also often uses more transitions than those originally needed. Notice that similar results are observed for the GPHLVM and GPLVM with stress prior depicted in Fig. 2b . Start End Transitions in H 2 Transitions in R 2 F F 2 H F → FH → F 2 H F→FH 2 → FH → F 2 H F F 2 H 2 F → FH → F 2 H → F 2 H 2 F→FH 2 → F 2 H 2 F FH 2 F → FH → FH 2 F→FH 2 F 2 H FH 2 F 2 H → FH → FH 2 F 2 H → FH → FH 2 F FK F → F 2 → FK F→FH 2 → FH → F 2 H → F 2 →FKH → FK F 2 K 2 F 2 → FK → K 2 F 2 →FKH → FK → FKH → FKH 2 → KH 2 →K 2 FH K 2 H FH → F 2 H→FH 2 →FKH → KH → K 2 H FH → F 2 H → F 2 →FKH → FK → FKH → FKH 2 →K 2 H Table 6 : Transitions (→) between classes of the taxonomy obtained by following the geodesic trajectories depicted in Fig. 2c . The classes and transitions correspond to the colors along the trajectories and match the class corresponding to the closest embedding at each point along the geodesic. Transitions that do not exist in the taxonomy are denoted as →. G ADDITIONAL EXPERIMENTS

G.1 HYPERBOLIC EMBEDDINGS OF SUPPORT POSES IN H 3

In this section, we embed the 100 poses used in § 5 into 3-dimensional hyperbolic and Euclidean spaces to analyze the performance of the proposed models in higher-dimensional latent spaces. Namely, we test the GPHLVM and GPLVM without regularization, with stress prior, and with back-constraints coupled with stress prior, similarly to the experiments on 2-dimensional latent spaces reported in the paper. Figs. 7a-7c show the learned embeddings alongside the corresponding distance matrices, which are to be compared with the graph distances in Fig. 3 . As expected, and similarly to the 2-dimensional embeddings of Fig. 2a , the models without regularization do not encode any meaningful distance structure in the latent spaces (see Fig. 7a ). In contrast, the models with stress prior result in embeddings that comply with the taxonomy graph structure, and the back constraints further organize the embeddings inside a class according to the similarity between their observations (see Figs. 7b-7c ). We observed a prominent stress reduction for the Euclidean 3dimensional latent spaces compared to the 2-dimensional ones (see Table 7 ), as well as a reduction of non-existing transitions when following geodesic trajectories (see Table 8 ). This is due to the increase of volume available to match the graph structure in R 3 relatively to R 2 . However, all Euclidean models are still outperformed by the 2-dimensional hyperbolic embeddings presented in § 5 (see Table 1 ). This is due to the fact that the volume of balls in hyperbolic space increases exponentially with respect to the radius of the ball rather than polynomially as in Euclidean space. In other words, the geometry of the hyperbolic manifold increases the volume available to match the graph structure compared to Euclidean spaces, thus resulting in better low-dimensional representations of taxonomy data. Notice that the GPHLVM models with 3-dimensional hyperbolic latent space result in a similar or slightly reduced stress compared to their 2-dimensional counterparts (presented in § 5). This indicates that the volume of the 2-dimensional hyperbolic latent space is sufficient to represent the considered data. Moreover, similarly as for the 2-dimensional cases, the back-constrained GPHLVM and GPLVM allow us to properly place unseen poses or taxonomy classes into the latent space (see Figs. 7d-7e ).

Start End

Transitions in H 3 Transitions in R 3 F F 2 H F → FH → F 2 H F → FH → F 2 H F F 2 H 2 F → FH → F 2 H → F 2 H 2 F → FH→F 2 H 2 F FH 2 F → FH → FH 2 F → FH → FH 2 F 2 H FH 2 F 2 H → FH → FH 2 F 2 H → FH → FH 2 F FK F → F 2 → FK F → F 2 → FK F 2 K 2 F 2 → FK → K 2 F 2 → FK → K 2 FH K 2 H FH → F 2 H → FKH → KH → K 2 H FH → F 2 H → FKH → FKH 2 →K 2 H Table 8 : Transitions (→) between classes of the taxonomy obtained by following the geodesic trajectories depicted in Fig. 7c . The classes and transitions correspond to the colors along the trajectories and match the class corresponding to the closest embedding at each point along the geodesic. Transitions that do not exist in the taxonomy are denoted as →.

G.2 HYPERBOLIC EMBEDDINGS OF STANDING POSES

In this section, we consider a different subset of the whole-body support pose taxonomy, leading to a different graph. Namely, we use 60 standing poses of the dataset in (Mandery et al., 2016) and (Langenstein, 2020) , which correspond to graph nodes of standing support poses (left side of the graph in Fig. 1 ). Specifically, we use a balanced dataset composed of 5 poses for each of the contact sets of the standing support poses described in Table 3 . We embed the 60 poses into 2dimensional hyperbolic and Euclidean spaces using GPHLVM and GPLVM. For each approach, we test the model without regularization, with stress prior, and with back-constraints coupled with stress prior using the parameters described in App. F.2 and Table 4 . Figs. 8a-8c show the learned embeddings alongside their corresponding distance matrices, which are to be compared with the graph distances in Fig. 9a . As for the previous experiments, the models with stress prior result in embeddings that comply with the taxonomy graph structure, with additional intra-class organizations for the back-constrained models. It is worth noticing that, despite the fact that the considered taxonomy graph is smaller than for the previous experiments, all Euclidean GPLVMs remain outperformed by the hyperbolic models, which better match the taxonomy structure (see also Table 9b ). Similarly to the experiments reported in § 5, the back-constrained GPHLVM and GPLVM allow us to properly place unseen poses or taxonomy classes into the latent space (see Figs. 8d-8e ). As mentioned in the main text, our GPHLVM intrinsically provides a mechanism to plan motions via geodesics in the low-dimensional latent space. Examples of geodesics between two standing poses are shown in Figs. 8b-8c , where the trajectory color matches the class corresponding to the closest latent point. The transitions between standing support poses obtained by following these geodesic trajectories are also described in Table 9 . As for our previous experiments, the geodesics, i.e., shortest paths, in the GPHLVM latent space correspond to shortest paths in the taxonomy graph. Due to the size of the taxonomy graph, we observe fewer forbidden (i.e. nonexistent) transitions than for the previous experiments in the Euclidean models. However, as their latent space does not match the taxonomy structure, they often require additional transitions and thus do not follow shortest paths in the taxonomy graph. 

Start End

Transitions in H 2 Transitions in R 2 F F 2 H F → FH → F 2 H F → F 2 → F → F 2 → FH → F 2 H F F 2 H 2 F → FH → F 2 H → F 2 H 2 F → F 2 → F→FH 2 → F 2 H 2 F FH 2 F → FH → FH 2 F → F 2 → F→FH 2 F 2 H FH 2 F 2 H → FH → FH 2 F 2 H → FH → F 2 → F → F 2 → F→FH 2 → F 2 H 2 → FH 2 Table 9 : Embeddings of standing poses: Transitions (→) between classes of the taxonomy obtained by following the geodesic trajectories depicted in Fig. 8c . The classes and transitions correspond to the colors along the trajectories and match the class corresponding to the closest embedding at each point along the geodesic. Transitions that do not exist in the taxonomy are denoted as →.

G.3 HYPERBOLIC EMBEDDINGS OF STANDING POSES WITH AN AUGMENTED TAXONOMY FOR IMPROVED TRAJECTORY GENERATION

As shown in § 5, geodesics in the hyperbolic latent space of our GPHLVM intrinsically provide a mechanism to plan motions accounting for the underlying taxonomy. However, as discussed in § 5, the whole-body support pose taxonomy (Borràs et al., 2017 ) lacks information about the type of contact in the considered poses, thus leading to artifacts in the geodesic-generated motions. In the main paper, we showed that the quality of the generated motion is improved by augmenting the whole-body support pose taxonomy with additional contact information. To do so, we considered an augmented whole-body support pose taxonomy which explicitly distinguishes between left and right contacts. In other words, the nodes and transitions of Fig. 1 -right are adapted to consider left and right contacts. For instance, the 1-foot contact (F) node is separated into left-foot (F l ) and right-foot (F r ) contact nodes. To facilitate motion planning and to test the GPHLVM ability of dealing with high-dimensional spaces, we represent each pose as a vector y n ∈ R 44 of joint angles instead of a vector of hands and feet positions. We embed the 60 standing poses described in App. G.2 into 3-dimensional hyperbolic and Euclidean spaces using GPHLVM and GPLVM, respectively. For each approach, we test the model without regularization, with stress prior, and with back-constraints coupled with stress prior using the parameters described in App. F.2 and Table 4 . Figs. 10a-10c show the learned embeddings alongside their corresponding distance matrices, which are to be compared with the graph distances of the augmented taxonomy in Fig. 11a . Similarly to previous experiments, the models with stress prior result in embeddings complying with the taxonomy graph structure (Fig. 10b ), with additional intra-class organizations for the back-constrained models (Fig. 10c ). Notice that the embeddings differentiate between left and right contacts according to the augmented taxonomy: For instance, we observe four clusters of orange embeddings corresponding to F l H l , F l H r , F r H l , and F r H r . As shown in Table 11b, the hyperbolic models better represent the taxonomy structure and outperform the Euclidean models. Similarly to previous experiments, the back-constraint mapping introduced in § 4 allows us to properly place unseen poses or taxonomy classes into the latent space (see Figs. 10d-10e ). Examples of motions planned by following geodesics between two standing poses in the hyperbolic latent space are displayed in the main paper (Fig. 5 ). The corresponding geodesics are shown in Fig. 10c , with the colors along the trajectory matching the class corresponding to the closest hyperbolic latent point. The resulting transitions are given in Table 10 . As mentioned in the main paper, we observe that, in contrast to the trajectories of Fig. 4 , the motions generated by considering the augmented taxonomy (Fig. 5 ) result in more realistic -human-like -interpolations between the given initial and final poses. Moreover, these motions look more realistic than the motions obtained via linear interpolation in the Euclidean latent space of the vanilla back-constrained GPLVM. As shown in Fig. 12 , the motions planned in the Euclidean latent space sometimes result in unrealistic joint configurations and the same posture is associated with different types of contacts (see middle part of the motions). As shown in Table 10 , non-existing transitions arise more frequently when following trajectories generated by the Euclidean model. 

Start End

Transitions in H 3 Transitions in R 3 F l F 2 H r F l →F 2 H r F l → F l H l → F 2 H l →F 2 H r F l F 2 H 2 F l →F 2 H l → F 2 H 2 F l → F l H l → F 2 H l →F 2 H r → F 2 H 2 F r F r H 2 F r → F r H r → F r H 2 F r → F r H l →F r H r → F r H 2 F 2 H l F l H 2 F 2 H l → F 2 H 2 → F l H 2 F 2 H l →F 2 H r → F l H r → F l H 2 Table 10: Embeddings of standing poses considering the augmented whole-body support pose taxonomy: Transitions (→) between classes of the taxonomy obtained by following the geodesic trajectories depicted in Fig. 10 . The classes and transitions correspond to the colors along the trajectories and match the class corresponding to the closest embedding at each point along the geodesic. Transitions that do not exist in the taxonomy are denoted as →.

G.4 COMPARISON AGAINST VARIATIONAL AUTOENCODERS

Hyperbolic embeddings of support poses: In this section, we compare the trained GPHLVMs of Fig. 2 with two additional baselines: a vanilla variational autoencoder (VAE) and a hyperbolic variant of this VAE in which the latent space is the Lorentz model of hyperbolic geometry (akin to Mathieu et al. (2019) ). Both VAEs are designed with 12 input nodes, 6 hidden nodes, a 2dimensional latent space, and a symmetric decoder. Their encoder specifies the mean and standard deviation of a normal distribution (resp. wrapped normal for the hyperbolic VAE), and their decoder specifies the mean and standard deviation of the normal distribution that governs the reconstructions. Both models are trained by maximizing an Evidence Lower Bound (ELBO) under similar regimes as The first and second rows show the latent spaces of the (hyperbolic) VAE and the distance matrix between the latent codes, respectively. When comparing these distance matrices and encodings with that of our GPHLVMs (see Fig. 2 ), we notice that our proposed model is better able to preserve the graph distance structure. We argue this is because VAEs enforce latent spaces that follow a unit Gaussian, which is an opposite goal to ours. the GPHLVMs, i.e., 1000 epochs with a learning rate of 0.05. The KL divergence for the hyperbolic VAE is computed using Monte Carlo estimates. Importantly, the VAE models only seem to capture a global structure that separates standing from kneeling poses (except the vanilla hyperbolic VAE in Fig. 13a ). Although adding a stress regularization with the same scale as for the GPHLVM (γ = 6) helps preserve the graph distance structure, the embeddings organization is still not competitive with the one achieved by our GPHLVM models (see Fig. 2 ). Moreover, when compared to our proposed GPHLVM, all VAE models provide a subpar uncertainty modeling in their latent spaces. Table 11 shows that the average stress of the latent embeddings for the VAE baselines (trained with and without stress regularization) is higher than the average stress of our models (see Table 1 ). Overall, our proposed GPHLVM consistently outperforms all VAEs to encode meaningful taxonomy information in the latent space. We argue that VAEs are not the right tool for our target applications. When training VAEs, the Kullback-Leibler term in the ELBO tries to regularize the latent space to match a unit Gaussian. This regularization is in stark contrast with our goal of separating the embeddings to preserve the taxonomy graph distances. Hyperbolic embeddings of standing poses with an augmented taxonomy: We further compare our GPHLVM model against the vanilla and hyperbolic VAEs in the experiment described in Sec. G.3. Namely, we consider the augmented whole-body support pose taxonomy which explicitly distinguishes between left and right contacts and we represent each pose as a vector of joint angles. This increases the dimensionality of the data to 44. We tested the vanilla and hyperbolic VAEs without regularization and with a stress regularization with the same scale as for the GPHLVM (γ = 1.5). Fig. 14 shows the learned embeddings alongside distance matrices, which are to be compared with the GPHLVM model of Figs. 10b-10c and with the ground-truth graph distances of Fig. 11a . Despite the stress regularization, the VAEs' tendency to have unitnormally distributed latent representations hinders the distance matching. This is further quantified by the mean stress presented in Table 12 , which show a higher mean stress (0.34 and 0.44) than our model in the same taxonomy (0.23, see Table 11b ).  GPHLVM Q = 2 GPHLVM Q = 3 GPLVM Q = 2 GPLVM Q = 3 Training 2.5 × 10 3 8.91 5.9 6.3 Decoding 1.33 × 10 -2 1.57 × 10 -5 1.16 × 10 -5 1.22 × 10 -5 Table 13 : Average runtime (in seconds) for training and decoding phases of our GPHLVM and vanilla GPLVM over 5 experiments, using 2 and 3-dimensional latent spaces for both models. Training time was measured over 500 iterations for both models. The implementations are fully developed on Python, and runtime measurements were taken using a standard laptop with 32 GB RAM, Intel Xeon CPU E3-1505M v6 processor and Ubuntu 20.04 LTS. Fig. 15 shows examples of motions planned by following geodesics between two standing poses in the hyperbolic VAE latent space. Similarly as the motions generated in the latent space of the proposed GPHLVM (Fig. 5 ), these motions result in realistic interpolations between the given initial and final poses.

G.5 RUNTIME

In order to show the computational cost of our approach, we ran a set of experiments to measure the average runtime for the training and decoding phases, using 2 and 3-dimensional latent spaces. As a reference, we added the runtime measurements of Euclidean counterpart, that is, the vanilla GPLVM. Table 13 shows the runtime measurements. Note that the main computational burden arises in our GPLHVM with a 2-dimensional latent space, which is in sharp contrast with the experiments using a 3-dimensional latent space. As discussed in the main paper, this increase in computational cost is mainly attributed to the 2-dimensional hyperbolic kernel.



Figure 2: The first and last two rows respectively show the latent embeddings and examples of interpolating geodesics in P 2 and R 2 , followed by pairwise distance matrices. Embeddings colors match those of Fig. 1right, and background colors indicate the GPLVM uncertainty. Added poses (d) and classes (e) are marked with stars and highlighted with red in the distance matrices.Hyperbolic embeddings of support poses: We embed the 100 standing and kneeling poses into 2-dimensional hyperbolic and Euclidean spaces using GPHLVM and GPLVM.RegularizationStress ±σ

Figure 3: Graph distance between the poses following Fig. 1-right.

Figure 4: Motions obtained via geodesic interpolation in the back-constrained GPHLVM latent space. Left: F to F2. Right: F to FK. The colorbars identify the support pose of the closest pose in the latent space.

Figure 5: Motions obtained via geodesic interpolation in the latent space of the back-constrained GPHLVM trained on the augmented taxonomy (Fig. 10c). Contacts are denoted by gray circles. The colorbars identify the support pose of the closest pose in the latent space.

Proposition. Assume the disk model of H 2 (i.e. the Poincaré disk). Denote the disk by D and its boundary, the circle, by T. Define the hyperbolic outer product by ⟨z, b⟩ = 1 2 log 1-|z| 2 |z-b| 2 for z ∈ D, b ∈ T. Then P -1/2+is (cosh(ρ)) = T e (2si+1)⟨z,b⟩ db spherical function ϕ2s(z)= T e (2si+1)⟨z1,b⟩ e (2si+1)⟨z2,b⟩ db,

(a) Reg. with Ldistortion (b) Distances for 6a. (c) Reg. with Ldistortion (d) Distances for 6c.

Figure 6: Embeddings learned with distortion regularization. (a) and (c) display the latent embeddings after training our GPHLVM model with an added distortion loss Ldistortion as it was originally defined, and with our modified distortion loss Ldistortion, respectively. These embeddings indeed show that our regularizations failed to encode the distances in the graph (comparing the distances provided in (b) and (d) with Fig. 3).

Figure 7: The first and last two rows respectively show the latent embeddings and examples of interpolating geodesics in P 3 and R 3 , followed by pairwise distance matrices. Embeddings colors match those of Fig. 1right. Added poses (d) and classes (e) are marked with crosses and highlighted with red in the distance matrices.

Figure 8: Embeddings of standing poses: The first and last two rows respectively show the latent embeddings of bipedal poses and examples of interpolating geodesics in P 2 and R 2 , followed by pairwise distance matrices. Embeddings colors match those of Fig. 1-right, and background colors indicate the GPLVM uncertainty. Added poses (d) and classes (e) are marked with stars and highlighted with red in the distance matrices.

Figure 9: Embeddings of standing poses: (a) shows the graph distance following the left part of Fig. 1-right. (b) shows the stress resulting from the different embeddings of standing poses.

Figure 10: Embeddings of standing poses considering the augmented whole-body support pose taxonomy: The first and last two rows respectively show the latent embeddings and examples of interpolating geodesics in P 3 and R 3 , followed by pairwise distance matrices. Embeddings colors match those of Fig. 1-right. Added poses (d) and classes (e) are marked with crosses and highlighted with red in the distance matrices.

Figure 11: Embeddings of standing poses considering the augmented whole-body support pose taxonomy: (a) shows the graph distance (colors follow Fig. 1-right). (b) shows the stress resulting from the different embeddings of standing poses.

Figure 12: Motions obtained via linear interpolation in the latent space of the vanilla Euclidean backconstrained GPHLVM trained on the augmented taxonomy (Fig. 10c). Contacts are denoted by gray circles. The colorbars identify the support pose of the closest pose in the latent space.

Figure13: Embeddings of the VAE baselines: The first and second rows show the latent spaces of the (hyperbolic) VAE and the distance matrix between the latent codes, respectively. When comparing these distance matrices and encodings with that of our GPHLVMs (see Fig.2), we notice that our proposed model is better able to preserve the graph distance structure. We argue this is because VAEs enforce latent spaces that follow a unit Gaussian, which is an opposite goal to ours.

Figure 14: Embeddings of the VAE baselines considering the augmented whole-body support pose taxonomy: The first and second rows show the latent spaces of the (hyperbolic) VAE and the distance matrix between the latent codes, respectively.

Average stress per geometry and regularization.

Principal operations on H d for the Lorentz model. For more details, see(Bose et al., 2020) and(Peng et al., 2021).

Summary of experiments and list of hyperparameters.

Marginal log-likelihood per geometry and regularization.

Average stress per geometry and regularization.

Average stress per geometry and regularization for VAE baselines trained on the augmented taxonomy (see App. G.3).

Experiment

Model Regularization Loss scale γ Prior on κ 

