EMP: EFFECTIVE MULTIDIMENSIONAL PERSISTENCE FOR GRAPH REPRESENTATION LEARNING

Abstract

Topological data analysis (TDA) has become increasingly popular in a broad range of machine learning tasks, ranging from anomaly detection and manifold learning to graph classification. Persistent homology being the key approach in TDA provides a unique topological fingerprint of the data by assessing the evolution of various hidden patterns in the data as we vary a scale parameter. Current PH tools are limited to analyze the data by filtering with single parameter while in many applications, several relevant parameters are equally important to get a much finer information on the data. In this paper, we overcome this problem by introducing Effective Multidimensional Persistence (EMP) framework which enables to investigate the data by varying multiple scale parameters simultaneously. EMP framework provides a highly expressive summary of the data by integrating the multiple descriptor functions to the process successfully. EMP naturally adapts to many known single PH summaries and converts them into multidimensional summaries, for example, EMP Landscapes, EMP Silhouettes, EMP Images, and EMP Surfaces. These summaries deliver a multidimensional fingerprint of the data as matrices and arrays which are suitable for various machine learning models. We apply EMP framework in graph classification tasks and observe that EMP boosts the performances of various single PH descriptors, and outperforms the most state-of-the-art methods on benchmark datasets. We further derive theoretical guarantees of the proposed EMP summary and prove the stability properties.

1. INTRODUCTION

In the past decade, topological data analysis (TDA) proved to be a powerful machinery to discover many hidden patterns in various forms of data which are otherwise inaccessible with more traditional methods. In particular, for graph machine learning tasks, while many traditional methods fail, TDA and, specifically, tools of persistent homology (PH) , have demonstrated a high potential to detect local and global patterns and to produce a unique topological fingerprint to be used in various machine learning tasks. This makes PH particularly attractive for capturing various characteristics of the complex data which may play the key role behind the learning task performance. In turn, multiparameter persistence, or multipersistence (MP) is a novel idea to further advance the PH machinery by analysing the data in much finer way simultaneously along multiple dimensions. However, because of the technical problems related to commutative algebra because of its multidimensional structure, it has not been defined for general settings yet (See Section 2.1). In this paper, we develop an alternative approach to utilize multipersistence idea very efficiently for various types of data, with main focus on graph representation. In particular, we bypass technical issues with the MP by obtaining a very practical summaries by utilizing slicing idea in a structured way. In turn, we obtain suitable multidimensional topological fingerprints of the data as matrices and arrays where ML models can easily detect the hidden patterns developed in the complex data. Our contributions can be summarized as follows: • We develop a new computationally efficient and highly expressive EMP framework which provides multidimensional topological fingerprints of the data. EMP expands many popular summaries of single persistence to multidimensions by adapting an effective slicing direction. As such, our EMP framework provides a practical way to utilize the promising multipersistence approach in real-life applications. • We illustrate the utility of EMP summaries in various settings and compare our results to state-of-the-art methods. Our numerical experiments demonstrate that EMP summaries outperforms SOTA in several benchmark datasets for graph classification tasks. • We derive theoretical stability guarantees of the new topological summaries.

2. RELATED WORK

2.1 MULTIPARAMETER PERSISTENCE Persistent homology (PH), being a key tool in topological data analysis (TDA), delivers invaluable and complementary information on the intrinsic properties of data that are inaccessible with conventional methods (Chazal & Michel, 2021; Hensel et al., 2021) . In the past decade, PH has become quite popular in various ML tasks, ranging from manifold learning to medical image analysis, material science to finance (TDA applications library (Giunti, 2022)). One of the key benefits of PH is that it allows us to extract the evolution of subtler patterns in the data shape dynamics at multiple resolution scales which are not accessible with more conventional, non-topological methods (Wasserman, 2018). Multipersistence (MP) is a highly promising approach to significantly improve the success of single parameter persistence (SP) in applied topological data analysis, but there are some issues to convert this novel idea into an effective feature extraction method as desired (See Appendix D.3). Except for some special cases, MP theory suffers from the problem of the nonexistence of barcode decomposition because of the partially ordered structure of the index set {(α i , β j )} (Botnan & Lesnick, 2022). Lesnick & Wright (2015) suggested to bypass this issue via slicing technique by studying onedimensional fibers of the multiparameter domain where one restricts the multidimensional persistence module to a single direction (slice) and to use single persistence on this one dimensional slice. Later, by using this novel idea, Carrière & Blumberg (2020) combined several slicing directions (vineyards) and obtained a vectorization by summarizing the persistence diagrams (PDs) in these directions. There are several promising recent studies in this direction (Botnan et al., 2021; Vipond, 2020) , but these approaches are not computationally feasible, and cannot provide the expected effectiveness of MP approach in real life applications. Here we develop a highly efficient way to use MP approach for various forms of data, and provide a multidimensional topological vectorization with EMP summaries.

2.2. GRAPH REPRESENTATION LEARNING

After the success of convolutional neural networks (CNN) on image-based tasks, graph neural networks (GNNs) have emerged as a powerful tool for graph-level classification and representation learning. A wide variety of models are developed based on numerous theories (see Appendix A for further details). However, to our best knowledge, most of existing approaches do not account for the important topological information on the shapes of the node neighborhoods. While GNNs produce great performances in many graph learning tasks, they tend to suffer from over-smoothing problems and are vulnerable to graph perturbations. To address these challenges, TDA provides a computationally efficient alternative to GNNs, and can be used as an effective feature extractor to be combined with the deep learning methods Hofer et al. ( 2020 

3. BACKGROUND

We start from providing the basic background for our framework. Since we mainly focus on graph representation learning in this paper, we explain our construction on graph setting. Note that



Kyriakis et al. (2021)  and neural networks Hofer et al. (2019); Carrière et al. (2020) in graph classification tasks. In this work, we apply MP approach for the first time in this setting, and our EMP model outperforms most deep learning models in benchmark datasets.

