LEARNING HYPERBOLIC REPRESENTATIONS OF TOPO-LOGICAL FEATURES

Abstract

Learning task-specific representations of persistence diagrams is an important problem in topological data analysis and machine learning. However, current methods are restricted in terms of their expressivity as they are focused on Euclidean representations. Persistence diagrams often contain features of infinite persistence (i.e., essential features) and Euclidean spaces shrink their importance relative to non-essential features because they cannot assign infinite distance to finite points. To deal with this issue, we propose a method to learn representations of persistence diagrams on hyperbolic spaces, more specifically on the Poincare ball. By representing features of infinite persistence infinitesimally close to the boundary of the ball, their distance to non-essential features approaches infinity, thereby their relative importance is preserved. This is achieved without utilizing extremely high values for the learnable parameters, thus, the representation can be fed into downstream optimization methods and trained efficiently in an end-to-end fashion. We present experimental results on graph and image classification tasks and show that the performance of our method is on par with or exceeds the performance of other state of the art methods.

1. INTRODUCTION

Persistent homology is a topological data analysis tool which tracks how topological features (e.g. connected components, cycles, cavities) appear and disappear as we analyze the data at different scales or in nested sequences of subspaces (1; 2). A nested sequence of subspaces is known as a filtration. As an informal example of a filtration consider an image of variable brightness. As the brightness is increased, certain features (edges, texture) may become less or more prevalent. The birth of a topological feature refers to the "time" (i.e., the brightness value) when it appears in the filtration and the death refers to the "time" when it disappears. The lifespan of the feature is called persistence. Persistent homology summarizes these topological characteristics in a form of multiset called persistence diagram, which is a highly robust and versatile descriptor of the data. Persistence diagrams enjoy the stability property, which ensures that the diagrams of two similar objects are similar (3). Additionally, under some assumptions, one can approximately reconstruct the input space from a diagram (which is known as solving the inverse problem) (4). However, despite their strengths, the space of persistence diagrams lacks structure as basic operations, such as addition and scalar multiplication, are not well defined. The only imposed structure is induced by the Bottleneck and Wasserstein metrics, which are notoriously hard to compute, thereby preventing us from leveraging them for machine learning tasks. Related Work. To address these issues, several vectorization methods have been proposed. Some of the earliest approaches are based on kernels, i.e., generalized products that turn persistence diagrams into elements of a Hilbert space. 



Kusano et al. (5)  propose a persistence weighted Gaussian kernel which allows them to explicitly control the effect of persistence. Alternatively, Carrière et al. (6) leverage the sliced Wasserstein distance to define a kernel that mimics the distance between diagrams. The approaches by Bubenik (7) based on persistent landscapes, by Reininghaus et al. (8) based on scale space theory and by Le et al. (9) based on the Fisher information metric are along the same line of work. The major drawback in utilizing kernel methods is that they suffer from scalability issues as the training scales poorly with the number of samples.

